You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Vinayakumar B <vi...@apache.org> on 2019/09/27 12:38:30 UTC

[DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Hi All,

   I wanted to discuss about the separate repo for thirdparty dependencies
which we need to shaded and include in Hadoop component's jars.

   Apologies for the big text ahead, but this needs clear explanation!!

   Right now most needed such dependency is protobuf. Protobuf dependency
was not upgraded from 2.5.0 onwards with the fear that downstream builds,
which depends on transitive dependency protobuf coming from hadoop's jars,
may fail with the upgrade. Apparently protobuf does not guarantee source
compatibility, though it guarantees wire compatibility between versions.
Because of this behavior, version upgrade may cause breakage in known and
unknown (private?) downstreams.

   So to tackle this, we came up the following proposal in HADOOP-13363.

   Luckily, As far as I know, no APIs, either public to user or between
Hadoop processes, is not directly using protobuf classes in signatures. (If
any exist, please let us know).

   Proposal:
   ------------

   1. Create a artifact(s) which contains shaded dependencies. All such
shading/relocation will be with known prefix
**org.apache.hadoop.thirdparty.**.
   2. Right now protobuf jar (ex: o.a.h.thirdparty:hadoop-shaded-protobuf)
to start with, all **com.google.protobuf** classes will be relocated as
**org.apache.hadoop.thirdparty.com.google.protobuf**.
   3. Hadoop modules, which needs protobuf as dependency, will add this
shaded artifact as dependency (ex: o.a.h.thirdparty:hadoop-shaded-protobuf).
   4. All previous usages of "com.google.protobuf" will be relocated to
"org.apache.hadoop.thirdparty.com.google.protobuf" in the code and will be
committed. Please note, this replacement is One-Time directly in source
code, NOT during compile and package.
   5. Once all usages of "com.google.protobuf" is relocated, then hadoop
dont care about which version of original  "protobuf-java" is in dependency.
   6. Just keep "protobuf-java:2.5.0" in dependency tree not to break the
downstreams. But hadoop will be originally using the latest protobuf
present in "o.a.h.thirdparty:hadoop-shaded-protobuf".

   7. Coming back to separate repo, Following are most appropriate reasons
of keeping shaded dependency artifact in separate repo instead of submodule.

      7a. These artifacts need not be built all the time. It needs to be
built only when there is a change in the dependency version or the build
process.
      7b. If added as "submodule in Hadoop repo", maven-shade-plugin:shade
will execute only in package phase. That means, "mvn compile" or "mvn
test-compile" will not be failed as this artifact will not have relocated
classes, instead it will have original classes, resulting in compilation
failure. Workaround, build thirdparty submodule first and exclude
"thirdparty" submodule in other executions. This will be a complex process
compared to keeping in a separate repo.

      7c. Separate repo, will be a subproject of Hadoop, using the same
HADOOP jira project, with different versioning prefixed with "thirdparty-"
(ex: thirdparty-1.0.0).
      7d. Separate will have same release process as Hadoop.


    HADOOP-13363 (https://issues.apache.org/jira/browse/HADOOP-13363) is an
umbrella jira tracking the changes to protobuf upgrade.

    PR (https://github.com/apache/hadoop-thirdparty/pull/1) has been raised
for separate repo creation in (HADOOP-16595 (
https://issues.apache.org/jira/browse/HADOOP-16595)

    Please provide your inputs for the proposal and review the PR to
proceed with the proposal.


   -Thanks,
    Vinay

On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <vi...@apache.org>
wrote:

> Moving the thread to the dev lists.
>
> Thanks
> +Vinod
>
> > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <vi...@apache.org>
> wrote:
> >
> > Thanks Marton,
> >
> > Current created 'hadoop-thirdparty' repo is empty right now.
> > Whether to use that repo  for shaded artifact or not will be monitored in
> > HADOOP-13363 umbrella jira. Please feel free to join the discussion.
> >
> > There is no existing codebase is being moved out of hadoop repo. So I
> think
> > right now we are good to go.
> >
> > -Vinay
> >
> > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org> wrote:
> >
> >>
> >> I am not sure if it's defined when is a vote required.
> >>
> >> https://www.apache.org/foundation/voting.html
> >>
> >> Personally I think it's a big enough change to send a notification to
> the
> >> dev lists with a 'lazy consensus'  closure
> >>
> >> Marton
> >>
> >> On 2019/09/23 17:46:37, Vinayakumar B <vi...@apache.org> wrote:
> >>> Hi,
> >>>
> >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be more in
> >> future)
> >>> will be kept as a shaded artifact in a separate repo, which will be
> >>> referred as dependency in hadoop modules.  This approach avoids shading
> >> of
> >>> every submodule during build.
> >>>
> >>> So question is does any VOTE required before asking to create a git
> repo?
> >>>
> >>> On selfserve platform https://gitbox.apache.org/setup/newrepo.html
> >>> I can access see that, requester should be PMC.
> >>>
> >>> Wanted to confirm here first.
> >>>
> >>> -Vinay
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
> >> For additional commands, e-mail: private-help@hadoop.apache.org
> >>
> >>
>
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Vinayakumar B <vi...@apache.org>.
PR has been merged.

Thanks everyone for discussions.

-Vinay


On Thu, Jan 9, 2020 at 4:47 PM Ayush Saxena <ay...@gmail.com> wrote:

> Hi All,
> FYI :
> We will be going ahead with the present approach, will merge by tomorrow
> EOD. Considering no one has objections.
> Thanx Everyone!!!
>
> -Ayush
>
> > On 07-Jan-2020, at 9:22 PM, Brahma Reddy Battula <br...@apache.org>
> wrote:
> >
> > Hi Sree vaddi,Owen,stack,Duo Zhang,
> >
> > We can move forward based on your comments, just waiting for your
> > reply.Hope all of your comments answered..(unification we can think
> > parallel thread as Vinay mentioned).
> >
> >
> >
> > On Mon, 6 Jan 2020 at 6:21 PM, Vinayakumar B <vi...@apache.org>
> > wrote:
> >
> >> Hi Sree,
> >>
> >>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> >> Project ? Or as a TLP ?
> >>> Or as a new project definition ?
> >> As already mentioned by Ayush, this will be a subproject of Hadoop.
> >> Releases will be voted by Hadoop PMC as per ASF process.
> >>
> >>
> >>> The effort to streamline and put in an accepted standard for the
> >> dependencies that require shading,
> >>> seems beyond the siloed efforts of hadoop, hbase, etc....
> >>
> >>> I propose, we bring all the decision makers from all these artifacts in
> >> one room and decide best course of action.
> >>> I am looking at, no projects should ever had to shade any artifacts
> >> except as an absolute necessary alternative.
> >>
> >> This is the ideal proposal for any project. But unfortunately some
> projects
> >> takes their own course based on need.
> >>
> >> In the current case of protobuf in Hadoop,
> >>    Protobuf upgrade from 2.5.0 (which is already EOL) was not taken up
> to
> >> avoid downstream failures. Since Hadoop is a platform, its dependencies
> >> will get added to downstream projects' classpath. So any change in
> Hadoop's
> >> dependencies will directly affect downstreams. Hadoop strictly follows
> >> backward compatibility as far as possible.
> >>    Though protobuf provides wire compatibility b/w versions, it doesnt
> >> provide compatibility for generated sources.
> >>    Now, to support ARM protobuf upgrade is mandatory. Using shading
> >> technique, In Hadoop internally can upgrade to shaded protobuf 3.x and
> >> still have 2.5.0 protobuf (deprecated) for downstreams.
> >>
> >> This shading is necessary to have both versions of protobuf supported.
> >> (2.5.0 (non-shaded) for downstream's classpath and 3.x (shaded) for
> >> hadoop's internal usage).
> >> And this entire work to be done before 3.3.0 release.
> >>
> >> So, though its ideal to make a common approach for all projects, I
> suggest
> >> for Hadoop we can go ahead as per current approach.
> >> We can also start the parallel effort to address these problems in a
> >> separate discussion/proposal. Once the solution is available we can
> revisit
> >> and adopt new solution accordingly in all such projects (ex: HBase,
> Hadoop,
> >> Ratis).
> >>
> >> -Vinay
> >>
> >>> On Mon, Jan 6, 2020 at 12:39 AM Ayush Saxena <ay...@gmail.com>
> wrote:
> >>>
> >>> Hey Sree
> >>>
> >>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> >>>> Project ? Or as a TLP ?
> >>>> Or as a new project definition ?
> >>>>
> >>> A sub project of Apache Hadoop, having its own independent release
> >> cycles.
> >>> May be you can put this into the same column as ozone or as
> >>> submarine(couple of months ago).
> >>>
> >>> Unifying for all, seems interesting but each project is independent and
> >> has
> >>> its own limitations and way of thinking, I don't think it would be an
> >> easy
> >>> task to bring all on the same table and get them agree to a common
> stuff.
> >>>
> >>> I guess this has been into discussion since quite long, and there
> hasn't
> >>> been any other alternative suggested. Still we can hold up for a week,
> if
> >>> someone comes up with a better solution, else we can continue in the
> >>> present direction.
> >>>
> >>> -Ayush
> >>>
> >>>
> >>>
> >>> On Sun, 5 Jan 2020 at 05:03, Sree Vaddi <sree_at_chess@yahoo.com
> >> .invalid>
> >>> wrote:
> >>>
> >>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> >>>> Project ? Or as a TLP ?
> >>>> Or as a new project definition ?
> >>>>
> >>>> The effort to streamline and put in an accepted standard for the
> >>>> dependencies that require shading,seems beyond the siloed efforts of
> >>>> hadoop, hbase, etc....
> >>>>
> >>>> I propose, we bring all the decision makers from all these artifacts
> in
> >>>> one room and decide best course of action.I am looking at, no projects
> >>>> should ever had to shade any artifacts except as an absolute necessary
> >>>> alternative.
> >>>>
> >>>>
> >>>> Thank you./Sree
> >>>>
> >>>>
> >>>>
> >>>>    On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
> >>>> vinayakumarb@apache.org> wrote:
> >>>>
> >>>> Hi,
> >>>> Sorry for the late reply,.
> >>>>>>> To be exact, how can we better use the thirdparty repo? Looking at
> >>>> HBase as an example, it looks like everything that are known to break
> a
> >>> lot
> >>>> after an update get shaded into the hbase-thirdparty artifact: guava,
> >>>> netty, ... etc.
> >>>> Is it the purpose to isolate these naughty dependencies?
> >>>> Yes, shading is to isolate these naughty dependencies from downstream
> >>>> classpath and have independent control on these upgrades without
> >> breaking
> >>>> downstreams.
> >>>>
> >>>> First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create
> >>> the
> >>>> protobuf shaded jar is ready to merge.
> >>>>
> >>>> Please take a look if anyone interested, will be merged may be after
> >> two
> >>>> days if no objections.
> >>>>
> >>>> -Vinay
> >>>>
> >>>>
> >>>> On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> Hi I am late to this but I am keen to understand more.
> >>>>>
> >>>>> To be exact, how can we better use the thirdparty repo? Looking at
> >>> HBase
> >>>>> as an example, it looks like everything that are known to break a lot
> >>>> after
> >>>>> an update get shaded into the hbase-thirdparty artifact: guava,
> >> netty,
> >>>> ...
> >>>>> etc.
> >>>>> Is it the purpose to isolate these naughty dependencies?
> >>>>>
> >>>>> On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <
> >> vinayakumarb@apache.org
> >>>>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi All,
> >>>>>>
> >>>>>> I have updated the PR as per @Owen O'Malley <owen.omalley@gmail.com
> >>>
> >>>>>> 's suggestions.
> >>>>>>
> >>>>>>   i. Renamed the module to 'hadoop-shaded-protobuf37'
> >>>>>>   ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
> >>>>>>
> >>>>>> Please review!!
> >>>>>>
> >>>>>> Thanks,
> >>>>>> -Vinay
> >>>>>>
> >>>>>>
> >>>>>> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <
> >> palomino219@gmail.com
> >>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> For HBase we have a separated repo for hbase-thirdparty
> >>>>>>>
> >>>>>>> https://github.com/apache/hbase-thirdparty
> >>>>>>>
> >>>>>>> We will publish the artifacts to nexus so we do not need to
> >> include
> >>>>>>> binaries in our git repo, just add a dependency in the pom.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> >>>>>>>
> >>>>>>>
> >>>>>>> And it has its own release cycles, only when there are special
> >>>>>> requirements
> >>>>>>> or we want to upgrade some of the dependencies. This is the vote
> >>>> thread
> >>>>>> for
> >>>>>>> the newest release, where we want to provide a shaded gson for
> >> jdk7.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> >>>>>>>
> >>>>>>>
> >>>>>>> Thanks.
> >>>>>>>
> >>>>>>> Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
> >>>>>>>
> >>>>>>>> Please find replies inline.
> >>>>>>>>
> >>>>>>>> -Vinay
> >>>>>>>>
> >>>>>>>> On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
> >>>>>> owen.omalley@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> I'm very unhappy with this direction. In particular, I don't
> >>> think
> >>>>>> git
> >>>>>>> is
> >>>>>>>>> a good place for distribution of binary artifacts.
> >> Furthermore,
> >>>> the
> >>>>>> PMC
> >>>>>>>>> shouldn't be releasing anything without a release vote.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> Proposed solution doesnt release any binaries in git. Its
> >>> actually a
> >>>>>>>> complete sub-project which follows entire release process,
> >>> including
> >>>>>> VOTE
> >>>>>>>> in public. I have mentioned already that release process is
> >>> similar
> >>>> to
> >>>>>>>> hadoop.
> >>>>>>>> To be specific, using the (almost) same script used in hadoop to
> >>>>>> generate
> >>>>>>>> artifacts, sign and deploy to staging repository. Please let me
> >>> know
> >>>>>> If I
> >>>>>>>> am conveying anything wrong.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> I'd propose that we make a third party module that contains
> >> the
> >>>>>>> *source*
> >>>>>>>>> of the pom files to build the relocated jars. This should
> >>>>>> absolutely be
> >>>>>>>>> treated as a last resort for the mostly Google projects that
> >>>>>> regularly
> >>>>>>>>> break binary compatibility (eg. Protobuf & Guava).
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> Same has been implemented in the PR
> >>>>>>>> https://github.com/apache/hadoop-thirdparty/pull/1. Please
> >> check
> >>>> and
> >>>>>> let
> >>>>>>>> me
> >>>>>>>> know If I misunderstood. Yes, this is the last option we have
> >>> AFAIK.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> In terms of naming, I'd propose something like:
> >>>>>>>>>
> >>>>>>>>> org.apache.hadoop.thirdparty.protobuf2_5
> >>>>>>>>> org.apache.hadoop.thirdparty.guava28
> >>>>>>>>>
> >>>>>>>>> In particular, I think we absolutely need to include the
> >> version
> >>>> of
> >>>>>> the
> >>>>>>>>> underlying project. On the other hand, since we should not be
> >>>>>> shading
> >>>>>>>>> *everything* we can drop the leading com.google.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> IMO, This naming convention is easy for identifying the
> >> underlying
> >>>>>>> project,
> >>>>>>>> but  it will be difficult to maintain going forward if
> >> underlying
> >>>>>> project
> >>>>>>>> versions changes. Since thirdparty module have its own releases,
> >>>> each
> >>>>>> of
> >>>>>>>> those release can be mapped to specific version of underlying
> >>>> project.
> >>>>>>> Even
> >>>>>>>> the binary artifact can include a MANIFEST with underlying
> >> project
> >>>>>>> details
> >>>>>>>> as per Steve's suggestion on HADOOP-13363.
> >>>>>>>> That said, if you still prefer to have project number in
> >> artifact
> >>>> id,
> >>>>>> it
> >>>>>>>> can be done.
> >>>>>>>>
> >>>>>>>> The Hadoop project can make releases of  the thirdparty module:
> >>>>>>>>>
> >>>>>>>>> <dependency>
> >>>>>>>>> <groupId>org.apache.hadoop</groupId>
> >>>>>>>>> <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> >>>>>>>>> <version>1.0</version>
> >>>>>>>>> </dependency>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> Note that the version has to be the hadoop thirdparty release
> >>>> number,
> >>>>>>> which
> >>>>>>>>> is part of why you need to have the underlying version in the
> >>>>>> artifact
> >>>>>>>>> name. These we can push to maven central as new releases from
> >>>>>> Hadoop.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> Exactly, same has been implemented in the PR. hadoop-thirdparty
> >>>> module
> >>>>>>> have
> >>>>>>>> its own releases. But in HADOOP Jira, thirdparty versions can be
> >>>>>>>> differentiated using prefix "thirdparty-".
> >>>>>>>>
> >>>>>>>> Same solution is being followed in HBase. May be people involved
> >>> in
> >>>>>> HBase
> >>>>>>>> can add some points here.
> >>>>>>>>
> >>>>>>>> Thoughts?
> >>>>>>>>>
> >>>>>>>>> .. Owen
> >>>>>>>>>
> >>>>>>>>> On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> >>>>>> vinayakumarb@apache.org
> >>>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi All,
> >>>>>>>>>>
> >>>>>>>>>>   I wanted to discuss about the separate repo for thirdparty
> >>>>>>>> dependencies
> >>>>>>>>>> which we need to shaded and include in Hadoop component's
> >> jars.
> >>>>>>>>>>
> >>>>>>>>>>   Apologies for the big text ahead, but this needs clear
> >>>>>>> explanation!!
> >>>>>>>>>>
> >>>>>>>>>>   Right now most needed such dependency is protobuf.
> >> Protobuf
> >>>>>>>> dependency
> >>>>>>>>>> was not upgraded from 2.5.0 onwards with the fear that
> >>> downstream
> >>>>>>>> builds,
> >>>>>>>>>> which depends on transitive dependency protobuf coming from
> >>>>>> hadoop's
> >>>>>>>> jars,
> >>>>>>>>>> may fail with the upgrade. Apparently protobuf does not
> >>> guarantee
> >>>>>>> source
> >>>>>>>>>> compatibility, though it guarantees wire compatibility
> >> between
> >>>>>>> versions.
> >>>>>>>>>> Because of this behavior, version upgrade may cause breakage
> >> in
> >>>>>> known
> >>>>>>>> and
> >>>>>>>>>> unknown (private?) downstreams.
> >>>>>>>>>>
> >>>>>>>>>>   So to tackle this, we came up the following proposal in
> >>>>>>> HADOOP-13363.
> >>>>>>>>>>
> >>>>>>>>>>   Luckily, As far as I know, no APIs, either public to user
> >> or
> >>>>>>> between
> >>>>>>>>>> Hadoop processes, is not directly using protobuf classes in
> >>>>>>> signatures.
> >>>>>>>>>> (If
> >>>>>>>>>> any exist, please let us know).
> >>>>>>>>>>
> >>>>>>>>>>   Proposal:
> >>>>>>>>>>   ------------
> >>>>>>>>>>
> >>>>>>>>>>   1. Create a artifact(s) which contains shaded
> >> dependencies.
> >>>> All
> >>>>>>> such
> >>>>>>>>>> shading/relocation will be with known prefix
> >>>>>>>>>> **org.apache.hadoop.thirdparty.**.
> >>>>>>>>>>   2. Right now protobuf jar (ex:
> >>>>>>>> o.a.h.thirdparty:hadoop-shaded-protobuf)
> >>>>>>>>>> to start with, all **com.google.protobuf** classes will be
> >>>>>> relocated
> >>>>>>> as
> >>>>>>>>>> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> >>>>>>>>>>   3. Hadoop modules, which needs protobuf as dependency,
> >> will
> >>>> add
> >>>>>>> this
> >>>>>>>>>> shaded artifact as dependency (ex:
> >>>>>>>>>> o.a.h.thirdparty:hadoop-shaded-protobuf).
> >>>>>>>>>>   4. All previous usages of "com.google.protobuf" will be
> >>>>>> relocated
> >>>>>>> to
> >>>>>>>>>> "org.apache.hadoop.thirdparty.com.google.protobuf" in the
> >> code
> >>>> and
> >>>>>>> will
> >>>>>>>> be
> >>>>>>>>>> committed. Please note, this replacement is One-Time directly
> >>> in
> >>>>>>> source
> >>>>>>>>>> code, NOT during compile and package.
> >>>>>>>>>>   5. Once all usages of "com.google.protobuf" is relocated,
> >>> then
> >>>>>>> hadoop
> >>>>>>>>>> dont care about which version of original  "protobuf-java" is
> >>> in
> >>>>>>>>>> dependency.
> >>>>>>>>>>   6. Just keep "protobuf-java:2.5.0" in dependency tree not
> >> to
> >>>>>> break
> >>>>>>>> the
> >>>>>>>>>> downstreams. But hadoop will be originally using the latest
> >>>>>> protobuf
> >>>>>>>>>> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> >>>>>>>>>>
> >>>>>>>>>>   7. Coming back to separate repo, Following are most
> >>>> appropriate
> >>>>>>>> reasons
> >>>>>>>>>> of keeping shaded dependency artifact in separate repo
> >> instead
> >>> of
> >>>>>>>>>> submodule.
> >>>>>>>>>>
> >>>>>>>>>>     7a. These artifacts need not be built all the time. It
> >>> needs
> >>>>>> to
> >>>>>>> be
> >>>>>>>>>> built only when there is a change in the dependency version
> >> or
> >>>> the
> >>>>>>> build
> >>>>>>>>>> process.
> >>>>>>>>>>     7b. If added as "submodule in Hadoop repo",
> >>>>>>>> maven-shade-plugin:shade
> >>>>>>>>>> will execute only in package phase. That means, "mvn compile"
> >>> or
> >>>>>> "mvn
> >>>>>>>>>> test-compile" will not be failed as this artifact will not
> >> have
> >>>>>>>> relocated
> >>>>>>>>>> classes, instead it will have original classes, resulting in
> >>>>>>> compilation
> >>>>>>>>>> failure. Workaround, build thirdparty submodule first and
> >>> exclude
> >>>>>>>>>> "thirdparty" submodule in other executions. This will be a
> >>>> complex
> >>>>>>>> process
> >>>>>>>>>> compared to keeping in a separate repo.
> >>>>>>>>>>
> >>>>>>>>>>     7c. Separate repo, will be a subproject of Hadoop, using
> >>> the
> >>>>>>> same
> >>>>>>>>>> HADOOP jira project, with different versioning prefixed with
> >>>>>>>> "thirdparty-"
> >>>>>>>>>> (ex: thirdparty-1.0.0).
> >>>>>>>>>>     7d. Separate will have same release process as Hadoop.
> >>>>>>>>>>
> >>>>>>>>>>   HADOOP-13363 (
> >>>>>> https://issues.apache.org/jira/browse/HADOOP-13363)
> >>>>>>>> is
> >>>>>>>>>> an
> >>>>>>>>>> umbrella jira tracking the changes to protobuf upgrade.
> >>>>>>>>>>
> >>>>>>>>>>   PR (https://github.com/apache/hadoop-thirdparty/pull/1)
> >> has
> >>>>>> been
> >>>>>>>>>> raised
> >>>>>>>>>> for separate repo creation in (HADOOP-16595 (
> >>>>>>>>>> https://issues.apache.org/jira/browse/HADOOP-16595)
> >>>>>>>>>>
> >>>>>>>>>>   Please provide your inputs for the proposal and review the
> >>> PR
> >>>>>> to
> >>>>>>>>>> proceed with the proposal.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>   -Thanks,
> >>>>>>>>>>   Vinay
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> >>>>>>>>>> vinodkv@apache.org>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Moving the thread to the dev lists.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks
> >>>>>>>>>>> +Vinod
> >>>>>>>>>>>
> >>>>>>>>>>>> On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> >>>>>>>> vinayakumarb@apache.org>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks Marton,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Current created 'hadoop-thirdparty' repo is empty right
> >>> now.
> >>>>>>>>>>>> Whether to use that repo  for shaded artifact or not will
> >>> be
> >>>>>>>>>> monitored in
> >>>>>>>>>>>> HADOOP-13363 umbrella jira. Please feel free to join the
> >>>>>>> discussion.
> >>>>>>>>>>>>
> >>>>>>>>>>>> There is no existing codebase is being moved out of
> >> hadoop
> >>>>>> repo.
> >>>>>>> So
> >>>>>>>> I
> >>>>>>>>>>> think
> >>>>>>>>>>>> right now we are good to go.
> >>>>>>>>>>>>
> >>>>>>>>>>>> -Vinay
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
> >>>> elek@apache.org>
> >>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I am not sure if it's defined when is a vote required.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> https://www.apache.org/foundation/voting.html
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Personally I think it's a big enough change to send a
> >>>>>>> notification
> >>>>>>>> to
> >>>>>>>>>>> the
> >>>>>>>>>>>>> dev lists with a 'lazy consensus'  closure
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Marton
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 2019/09/23 17:46:37, Vinayakumar B <
> >>>>>> vinayakumarb@apache.org>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> As discussed in HADOOP-13363, protobuf 3.x jar (and may
> >>> be
> >>>>>> more
> >>>>>>> in
> >>>>>>>>>>>>> future)
> >>>>>>>>>>>>>> will be kept as a shaded artifact in a separate repo,
> >>> which
> >>>>>> will
> >>>>>>>> be
> >>>>>>>>>>>>>> referred as dependency in hadoop modules.  This
> >> approach
> >>>>>> avoids
> >>>>>>>>>> shading
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>> every submodule during build.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> So question is does any VOTE required before asking to
> >>>>>> create a
> >>>>>>>> git
> >>>>>>>>>>> repo?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On selfserve platform
> >>>>>>>> https://gitbox.apache.org/setup/newrepo.html
> >>>>>>>>>>>>>> I can access see that, requester should be PMC.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Wanted to confirm here first.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -Vinay
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>
> >>>> ---------------------------------------------------------------------
> >>>>>>>>>>>>> To unsubscribe, e-mail:
> >>>> private-unsubscribe@hadoop.apache.org
> >>>>>>>>>>>>> For additional commands, e-mail:
> >>>>>> private-help@hadoop.apache.org
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> > --
> >
> >
> >
> > --Brahma Reddy Battula
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org
>
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Vinayakumar B <vi...@apache.org>.
PR has been merged.

Thanks everyone for discussions.

-Vinay


On Thu, Jan 9, 2020 at 4:47 PM Ayush Saxena <ay...@gmail.com> wrote:

> Hi All,
> FYI :
> We will be going ahead with the present approach, will merge by tomorrow
> EOD. Considering no one has objections.
> Thanx Everyone!!!
>
> -Ayush
>
> > On 07-Jan-2020, at 9:22 PM, Brahma Reddy Battula <br...@apache.org>
> wrote:
> >
> > Hi Sree vaddi,Owen,stack,Duo Zhang,
> >
> > We can move forward based on your comments, just waiting for your
> > reply.Hope all of your comments answered..(unification we can think
> > parallel thread as Vinay mentioned).
> >
> >
> >
> > On Mon, 6 Jan 2020 at 6:21 PM, Vinayakumar B <vi...@apache.org>
> > wrote:
> >
> >> Hi Sree,
> >>
> >>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> >> Project ? Or as a TLP ?
> >>> Or as a new project definition ?
> >> As already mentioned by Ayush, this will be a subproject of Hadoop.
> >> Releases will be voted by Hadoop PMC as per ASF process.
> >>
> >>
> >>> The effort to streamline and put in an accepted standard for the
> >> dependencies that require shading,
> >>> seems beyond the siloed efforts of hadoop, hbase, etc....
> >>
> >>> I propose, we bring all the decision makers from all these artifacts in
> >> one room and decide best course of action.
> >>> I am looking at, no projects should ever had to shade any artifacts
> >> except as an absolute necessary alternative.
> >>
> >> This is the ideal proposal for any project. But unfortunately some
> projects
> >> takes their own course based on need.
> >>
> >> In the current case of protobuf in Hadoop,
> >>    Protobuf upgrade from 2.5.0 (which is already EOL) was not taken up
> to
> >> avoid downstream failures. Since Hadoop is a platform, its dependencies
> >> will get added to downstream projects' classpath. So any change in
> Hadoop's
> >> dependencies will directly affect downstreams. Hadoop strictly follows
> >> backward compatibility as far as possible.
> >>    Though protobuf provides wire compatibility b/w versions, it doesnt
> >> provide compatibility for generated sources.
> >>    Now, to support ARM protobuf upgrade is mandatory. Using shading
> >> technique, In Hadoop internally can upgrade to shaded protobuf 3.x and
> >> still have 2.5.0 protobuf (deprecated) for downstreams.
> >>
> >> This shading is necessary to have both versions of protobuf supported.
> >> (2.5.0 (non-shaded) for downstream's classpath and 3.x (shaded) for
> >> hadoop's internal usage).
> >> And this entire work to be done before 3.3.0 release.
> >>
> >> So, though its ideal to make a common approach for all projects, I
> suggest
> >> for Hadoop we can go ahead as per current approach.
> >> We can also start the parallel effort to address these problems in a
> >> separate discussion/proposal. Once the solution is available we can
> revisit
> >> and adopt new solution accordingly in all such projects (ex: HBase,
> Hadoop,
> >> Ratis).
> >>
> >> -Vinay
> >>
> >>> On Mon, Jan 6, 2020 at 12:39 AM Ayush Saxena <ay...@gmail.com>
> wrote:
> >>>
> >>> Hey Sree
> >>>
> >>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> >>>> Project ? Or as a TLP ?
> >>>> Or as a new project definition ?
> >>>>
> >>> A sub project of Apache Hadoop, having its own independent release
> >> cycles.
> >>> May be you can put this into the same column as ozone or as
> >>> submarine(couple of months ago).
> >>>
> >>> Unifying for all, seems interesting but each project is independent and
> >> has
> >>> its own limitations and way of thinking, I don't think it would be an
> >> easy
> >>> task to bring all on the same table and get them agree to a common
> stuff.
> >>>
> >>> I guess this has been into discussion since quite long, and there
> hasn't
> >>> been any other alternative suggested. Still we can hold up for a week,
> if
> >>> someone comes up with a better solution, else we can continue in the
> >>> present direction.
> >>>
> >>> -Ayush
> >>>
> >>>
> >>>
> >>> On Sun, 5 Jan 2020 at 05:03, Sree Vaddi <sree_at_chess@yahoo.com
> >> .invalid>
> >>> wrote:
> >>>
> >>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> >>>> Project ? Or as a TLP ?
> >>>> Or as a new project definition ?
> >>>>
> >>>> The effort to streamline and put in an accepted standard for the
> >>>> dependencies that require shading,seems beyond the siloed efforts of
> >>>> hadoop, hbase, etc....
> >>>>
> >>>> I propose, we bring all the decision makers from all these artifacts
> in
> >>>> one room and decide best course of action.I am looking at, no projects
> >>>> should ever had to shade any artifacts except as an absolute necessary
> >>>> alternative.
> >>>>
> >>>>
> >>>> Thank you./Sree
> >>>>
> >>>>
> >>>>
> >>>>    On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
> >>>> vinayakumarb@apache.org> wrote:
> >>>>
> >>>> Hi,
> >>>> Sorry for the late reply,.
> >>>>>>> To be exact, how can we better use the thirdparty repo? Looking at
> >>>> HBase as an example, it looks like everything that are known to break
> a
> >>> lot
> >>>> after an update get shaded into the hbase-thirdparty artifact: guava,
> >>>> netty, ... etc.
> >>>> Is it the purpose to isolate these naughty dependencies?
> >>>> Yes, shading is to isolate these naughty dependencies from downstream
> >>>> classpath and have independent control on these upgrades without
> >> breaking
> >>>> downstreams.
> >>>>
> >>>> First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create
> >>> the
> >>>> protobuf shaded jar is ready to merge.
> >>>>
> >>>> Please take a look if anyone interested, will be merged may be after
> >> two
> >>>> days if no objections.
> >>>>
> >>>> -Vinay
> >>>>
> >>>>
> >>>> On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> Hi I am late to this but I am keen to understand more.
> >>>>>
> >>>>> To be exact, how can we better use the thirdparty repo? Looking at
> >>> HBase
> >>>>> as an example, it looks like everything that are known to break a lot
> >>>> after
> >>>>> an update get shaded into the hbase-thirdparty artifact: guava,
> >> netty,
> >>>> ...
> >>>>> etc.
> >>>>> Is it the purpose to isolate these naughty dependencies?
> >>>>>
> >>>>> On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <
> >> vinayakumarb@apache.org
> >>>>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi All,
> >>>>>>
> >>>>>> I have updated the PR as per @Owen O'Malley <owen.omalley@gmail.com
> >>>
> >>>>>> 's suggestions.
> >>>>>>
> >>>>>>   i. Renamed the module to 'hadoop-shaded-protobuf37'
> >>>>>>   ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
> >>>>>>
> >>>>>> Please review!!
> >>>>>>
> >>>>>> Thanks,
> >>>>>> -Vinay
> >>>>>>
> >>>>>>
> >>>>>> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <
> >> palomino219@gmail.com
> >>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> For HBase we have a separated repo for hbase-thirdparty
> >>>>>>>
> >>>>>>> https://github.com/apache/hbase-thirdparty
> >>>>>>>
> >>>>>>> We will publish the artifacts to nexus so we do not need to
> >> include
> >>>>>>> binaries in our git repo, just add a dependency in the pom.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> >>>>>>>
> >>>>>>>
> >>>>>>> And it has its own release cycles, only when there are special
> >>>>>> requirements
> >>>>>>> or we want to upgrade some of the dependencies. This is the vote
> >>>> thread
> >>>>>> for
> >>>>>>> the newest release, where we want to provide a shaded gson for
> >> jdk7.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> >>>>>>>
> >>>>>>>
> >>>>>>> Thanks.
> >>>>>>>
> >>>>>>> Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
> >>>>>>>
> >>>>>>>> Please find replies inline.
> >>>>>>>>
> >>>>>>>> -Vinay
> >>>>>>>>
> >>>>>>>> On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
> >>>>>> owen.omalley@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> I'm very unhappy with this direction. In particular, I don't
> >>> think
> >>>>>> git
> >>>>>>> is
> >>>>>>>>> a good place for distribution of binary artifacts.
> >> Furthermore,
> >>>> the
> >>>>>> PMC
> >>>>>>>>> shouldn't be releasing anything without a release vote.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> Proposed solution doesnt release any binaries in git. Its
> >>> actually a
> >>>>>>>> complete sub-project which follows entire release process,
> >>> including
> >>>>>> VOTE
> >>>>>>>> in public. I have mentioned already that release process is
> >>> similar
> >>>> to
> >>>>>>>> hadoop.
> >>>>>>>> To be specific, using the (almost) same script used in hadoop to
> >>>>>> generate
> >>>>>>>> artifacts, sign and deploy to staging repository. Please let me
> >>> know
> >>>>>> If I
> >>>>>>>> am conveying anything wrong.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> I'd propose that we make a third party module that contains
> >> the
> >>>>>>> *source*
> >>>>>>>>> of the pom files to build the relocated jars. This should
> >>>>>> absolutely be
> >>>>>>>>> treated as a last resort for the mostly Google projects that
> >>>>>> regularly
> >>>>>>>>> break binary compatibility (eg. Protobuf & Guava).
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> Same has been implemented in the PR
> >>>>>>>> https://github.com/apache/hadoop-thirdparty/pull/1. Please
> >> check
> >>>> and
> >>>>>> let
> >>>>>>>> me
> >>>>>>>> know If I misunderstood. Yes, this is the last option we have
> >>> AFAIK.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> In terms of naming, I'd propose something like:
> >>>>>>>>>
> >>>>>>>>> org.apache.hadoop.thirdparty.protobuf2_5
> >>>>>>>>> org.apache.hadoop.thirdparty.guava28
> >>>>>>>>>
> >>>>>>>>> In particular, I think we absolutely need to include the
> >> version
> >>>> of
> >>>>>> the
> >>>>>>>>> underlying project. On the other hand, since we should not be
> >>>>>> shading
> >>>>>>>>> *everything* we can drop the leading com.google.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> IMO, This naming convention is easy for identifying the
> >> underlying
> >>>>>>> project,
> >>>>>>>> but  it will be difficult to maintain going forward if
> >> underlying
> >>>>>> project
> >>>>>>>> versions changes. Since thirdparty module have its own releases,
> >>>> each
> >>>>>> of
> >>>>>>>> those release can be mapped to specific version of underlying
> >>>> project.
> >>>>>>> Even
> >>>>>>>> the binary artifact can include a MANIFEST with underlying
> >> project
> >>>>>>> details
> >>>>>>>> as per Steve's suggestion on HADOOP-13363.
> >>>>>>>> That said, if you still prefer to have project number in
> >> artifact
> >>>> id,
> >>>>>> it
> >>>>>>>> can be done.
> >>>>>>>>
> >>>>>>>> The Hadoop project can make releases of  the thirdparty module:
> >>>>>>>>>
> >>>>>>>>> <dependency>
> >>>>>>>>> <groupId>org.apache.hadoop</groupId>
> >>>>>>>>> <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> >>>>>>>>> <version>1.0</version>
> >>>>>>>>> </dependency>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> Note that the version has to be the hadoop thirdparty release
> >>>> number,
> >>>>>>> which
> >>>>>>>>> is part of why you need to have the underlying version in the
> >>>>>> artifact
> >>>>>>>>> name. These we can push to maven central as new releases from
> >>>>>> Hadoop.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> Exactly, same has been implemented in the PR. hadoop-thirdparty
> >>>> module
> >>>>>>> have
> >>>>>>>> its own releases. But in HADOOP Jira, thirdparty versions can be
> >>>>>>>> differentiated using prefix "thirdparty-".
> >>>>>>>>
> >>>>>>>> Same solution is being followed in HBase. May be people involved
> >>> in
> >>>>>> HBase
> >>>>>>>> can add some points here.
> >>>>>>>>
> >>>>>>>> Thoughts?
> >>>>>>>>>
> >>>>>>>>> .. Owen
> >>>>>>>>>
> >>>>>>>>> On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> >>>>>> vinayakumarb@apache.org
> >>>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi All,
> >>>>>>>>>>
> >>>>>>>>>>   I wanted to discuss about the separate repo for thirdparty
> >>>>>>>> dependencies
> >>>>>>>>>> which we need to shaded and include in Hadoop component's
> >> jars.
> >>>>>>>>>>
> >>>>>>>>>>   Apologies for the big text ahead, but this needs clear
> >>>>>>> explanation!!
> >>>>>>>>>>
> >>>>>>>>>>   Right now most needed such dependency is protobuf.
> >> Protobuf
> >>>>>>>> dependency
> >>>>>>>>>> was not upgraded from 2.5.0 onwards with the fear that
> >>> downstream
> >>>>>>>> builds,
> >>>>>>>>>> which depends on transitive dependency protobuf coming from
> >>>>>> hadoop's
> >>>>>>>> jars,
> >>>>>>>>>> may fail with the upgrade. Apparently protobuf does not
> >>> guarantee
> >>>>>>> source
> >>>>>>>>>> compatibility, though it guarantees wire compatibility
> >> between
> >>>>>>> versions.
> >>>>>>>>>> Because of this behavior, version upgrade may cause breakage
> >> in
> >>>>>> known
> >>>>>>>> and
> >>>>>>>>>> unknown (private?) downstreams.
> >>>>>>>>>>
> >>>>>>>>>>   So to tackle this, we came up the following proposal in
> >>>>>>> HADOOP-13363.
> >>>>>>>>>>
> >>>>>>>>>>   Luckily, As far as I know, no APIs, either public to user
> >> or
> >>>>>>> between
> >>>>>>>>>> Hadoop processes, is not directly using protobuf classes in
> >>>>>>> signatures.
> >>>>>>>>>> (If
> >>>>>>>>>> any exist, please let us know).
> >>>>>>>>>>
> >>>>>>>>>>   Proposal:
> >>>>>>>>>>   ------------
> >>>>>>>>>>
> >>>>>>>>>>   1. Create a artifact(s) which contains shaded
> >> dependencies.
> >>>> All
> >>>>>>> such
> >>>>>>>>>> shading/relocation will be with known prefix
> >>>>>>>>>> **org.apache.hadoop.thirdparty.**.
> >>>>>>>>>>   2. Right now protobuf jar (ex:
> >>>>>>>> o.a.h.thirdparty:hadoop-shaded-protobuf)
> >>>>>>>>>> to start with, all **com.google.protobuf** classes will be
> >>>>>> relocated
> >>>>>>> as
> >>>>>>>>>> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> >>>>>>>>>>   3. Hadoop modules, which needs protobuf as dependency,
> >> will
> >>>> add
> >>>>>>> this
> >>>>>>>>>> shaded artifact as dependency (ex:
> >>>>>>>>>> o.a.h.thirdparty:hadoop-shaded-protobuf).
> >>>>>>>>>>   4. All previous usages of "com.google.protobuf" will be
> >>>>>> relocated
> >>>>>>> to
> >>>>>>>>>> "org.apache.hadoop.thirdparty.com.google.protobuf" in the
> >> code
> >>>> and
> >>>>>>> will
> >>>>>>>> be
> >>>>>>>>>> committed. Please note, this replacement is One-Time directly
> >>> in
> >>>>>>> source
> >>>>>>>>>> code, NOT during compile and package.
> >>>>>>>>>>   5. Once all usages of "com.google.protobuf" is relocated,
> >>> then
> >>>>>>> hadoop
> >>>>>>>>>> dont care about which version of original  "protobuf-java" is
> >>> in
> >>>>>>>>>> dependency.
> >>>>>>>>>>   6. Just keep "protobuf-java:2.5.0" in dependency tree not
> >> to
> >>>>>> break
> >>>>>>>> the
> >>>>>>>>>> downstreams. But hadoop will be originally using the latest
> >>>>>> protobuf
> >>>>>>>>>> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> >>>>>>>>>>
> >>>>>>>>>>   7. Coming back to separate repo, Following are most
> >>>> appropriate
> >>>>>>>> reasons
> >>>>>>>>>> of keeping shaded dependency artifact in separate repo
> >> instead
> >>> of
> >>>>>>>>>> submodule.
> >>>>>>>>>>
> >>>>>>>>>>     7a. These artifacts need not be built all the time. It
> >>> needs
> >>>>>> to
> >>>>>>> be
> >>>>>>>>>> built only when there is a change in the dependency version
> >> or
> >>>> the
> >>>>>>> build
> >>>>>>>>>> process.
> >>>>>>>>>>     7b. If added as "submodule in Hadoop repo",
> >>>>>>>> maven-shade-plugin:shade
> >>>>>>>>>> will execute only in package phase. That means, "mvn compile"
> >>> or
> >>>>>> "mvn
> >>>>>>>>>> test-compile" will not be failed as this artifact will not
> >> have
> >>>>>>>> relocated
> >>>>>>>>>> classes, instead it will have original classes, resulting in
> >>>>>>> compilation
> >>>>>>>>>> failure. Workaround, build thirdparty submodule first and
> >>> exclude
> >>>>>>>>>> "thirdparty" submodule in other executions. This will be a
> >>>> complex
> >>>>>>>> process
> >>>>>>>>>> compared to keeping in a separate repo.
> >>>>>>>>>>
> >>>>>>>>>>     7c. Separate repo, will be a subproject of Hadoop, using
> >>> the
> >>>>>>> same
> >>>>>>>>>> HADOOP jira project, with different versioning prefixed with
> >>>>>>>> "thirdparty-"
> >>>>>>>>>> (ex: thirdparty-1.0.0).
> >>>>>>>>>>     7d. Separate will have same release process as Hadoop.
> >>>>>>>>>>
> >>>>>>>>>>   HADOOP-13363 (
> >>>>>> https://issues.apache.org/jira/browse/HADOOP-13363)
> >>>>>>>> is
> >>>>>>>>>> an
> >>>>>>>>>> umbrella jira tracking the changes to protobuf upgrade.
> >>>>>>>>>>
> >>>>>>>>>>   PR (https://github.com/apache/hadoop-thirdparty/pull/1)
> >> has
> >>>>>> been
> >>>>>>>>>> raised
> >>>>>>>>>> for separate repo creation in (HADOOP-16595 (
> >>>>>>>>>> https://issues.apache.org/jira/browse/HADOOP-16595)
> >>>>>>>>>>
> >>>>>>>>>>   Please provide your inputs for the proposal and review the
> >>> PR
> >>>>>> to
> >>>>>>>>>> proceed with the proposal.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>   -Thanks,
> >>>>>>>>>>   Vinay
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> >>>>>>>>>> vinodkv@apache.org>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Moving the thread to the dev lists.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks
> >>>>>>>>>>> +Vinod
> >>>>>>>>>>>
> >>>>>>>>>>>> On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> >>>>>>>> vinayakumarb@apache.org>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks Marton,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Current created 'hadoop-thirdparty' repo is empty right
> >>> now.
> >>>>>>>>>>>> Whether to use that repo  for shaded artifact or not will
> >>> be
> >>>>>>>>>> monitored in
> >>>>>>>>>>>> HADOOP-13363 umbrella jira. Please feel free to join the
> >>>>>>> discussion.
> >>>>>>>>>>>>
> >>>>>>>>>>>> There is no existing codebase is being moved out of
> >> hadoop
> >>>>>> repo.
> >>>>>>> So
> >>>>>>>> I
> >>>>>>>>>>> think
> >>>>>>>>>>>> right now we are good to go.
> >>>>>>>>>>>>
> >>>>>>>>>>>> -Vinay
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
> >>>> elek@apache.org>
> >>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I am not sure if it's defined when is a vote required.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> https://www.apache.org/foundation/voting.html
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Personally I think it's a big enough change to send a
> >>>>>>> notification
> >>>>>>>> to
> >>>>>>>>>>> the
> >>>>>>>>>>>>> dev lists with a 'lazy consensus'  closure
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Marton
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 2019/09/23 17:46:37, Vinayakumar B <
> >>>>>> vinayakumarb@apache.org>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> As discussed in HADOOP-13363, protobuf 3.x jar (and may
> >>> be
> >>>>>> more
> >>>>>>> in
> >>>>>>>>>>>>> future)
> >>>>>>>>>>>>>> will be kept as a shaded artifact in a separate repo,
> >>> which
> >>>>>> will
> >>>>>>>> be
> >>>>>>>>>>>>>> referred as dependency in hadoop modules.  This
> >> approach
> >>>>>> avoids
> >>>>>>>>>> shading
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>> every submodule during build.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> So question is does any VOTE required before asking to
> >>>>>> create a
> >>>>>>>> git
> >>>>>>>>>>> repo?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On selfserve platform
> >>>>>>>> https://gitbox.apache.org/setup/newrepo.html
> >>>>>>>>>>>>>> I can access see that, requester should be PMC.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Wanted to confirm here first.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -Vinay
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>
> >>>> ---------------------------------------------------------------------
> >>>>>>>>>>>>> To unsubscribe, e-mail:
> >>>> private-unsubscribe@hadoop.apache.org
> >>>>>>>>>>>>> For additional commands, e-mail:
> >>>>>> private-help@hadoop.apache.org
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> > --
> >
> >
> >
> > --Brahma Reddy Battula
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org
>
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Vinayakumar B <vi...@apache.org>.
PR has been merged.

Thanks everyone for discussions.

-Vinay


On Thu, Jan 9, 2020 at 4:47 PM Ayush Saxena <ay...@gmail.com> wrote:

> Hi All,
> FYI :
> We will be going ahead with the present approach, will merge by tomorrow
> EOD. Considering no one has objections.
> Thanx Everyone!!!
>
> -Ayush
>
> > On 07-Jan-2020, at 9:22 PM, Brahma Reddy Battula <br...@apache.org>
> wrote:
> >
> > Hi Sree vaddi,Owen,stack,Duo Zhang,
> >
> > We can move forward based on your comments, just waiting for your
> > reply.Hope all of your comments answered..(unification we can think
> > parallel thread as Vinay mentioned).
> >
> >
> >
> > On Mon, 6 Jan 2020 at 6:21 PM, Vinayakumar B <vi...@apache.org>
> > wrote:
> >
> >> Hi Sree,
> >>
> >>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> >> Project ? Or as a TLP ?
> >>> Or as a new project definition ?
> >> As already mentioned by Ayush, this will be a subproject of Hadoop.
> >> Releases will be voted by Hadoop PMC as per ASF process.
> >>
> >>
> >>> The effort to streamline and put in an accepted standard for the
> >> dependencies that require shading,
> >>> seems beyond the siloed efforts of hadoop, hbase, etc....
> >>
> >>> I propose, we bring all the decision makers from all these artifacts in
> >> one room and decide best course of action.
> >>> I am looking at, no projects should ever had to shade any artifacts
> >> except as an absolute necessary alternative.
> >>
> >> This is the ideal proposal for any project. But unfortunately some
> projects
> >> takes their own course based on need.
> >>
> >> In the current case of protobuf in Hadoop,
> >>    Protobuf upgrade from 2.5.0 (which is already EOL) was not taken up
> to
> >> avoid downstream failures. Since Hadoop is a platform, its dependencies
> >> will get added to downstream projects' classpath. So any change in
> Hadoop's
> >> dependencies will directly affect downstreams. Hadoop strictly follows
> >> backward compatibility as far as possible.
> >>    Though protobuf provides wire compatibility b/w versions, it doesnt
> >> provide compatibility for generated sources.
> >>    Now, to support ARM protobuf upgrade is mandatory. Using shading
> >> technique, In Hadoop internally can upgrade to shaded protobuf 3.x and
> >> still have 2.5.0 protobuf (deprecated) for downstreams.
> >>
> >> This shading is necessary to have both versions of protobuf supported.
> >> (2.5.0 (non-shaded) for downstream's classpath and 3.x (shaded) for
> >> hadoop's internal usage).
> >> And this entire work to be done before 3.3.0 release.
> >>
> >> So, though its ideal to make a common approach for all projects, I
> suggest
> >> for Hadoop we can go ahead as per current approach.
> >> We can also start the parallel effort to address these problems in a
> >> separate discussion/proposal. Once the solution is available we can
> revisit
> >> and adopt new solution accordingly in all such projects (ex: HBase,
> Hadoop,
> >> Ratis).
> >>
> >> -Vinay
> >>
> >>> On Mon, Jan 6, 2020 at 12:39 AM Ayush Saxena <ay...@gmail.com>
> wrote:
> >>>
> >>> Hey Sree
> >>>
> >>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> >>>> Project ? Or as a TLP ?
> >>>> Or as a new project definition ?
> >>>>
> >>> A sub project of Apache Hadoop, having its own independent release
> >> cycles.
> >>> May be you can put this into the same column as ozone or as
> >>> submarine(couple of months ago).
> >>>
> >>> Unifying for all, seems interesting but each project is independent and
> >> has
> >>> its own limitations and way of thinking, I don't think it would be an
> >> easy
> >>> task to bring all on the same table and get them agree to a common
> stuff.
> >>>
> >>> I guess this has been into discussion since quite long, and there
> hasn't
> >>> been any other alternative suggested. Still we can hold up for a week,
> if
> >>> someone comes up with a better solution, else we can continue in the
> >>> present direction.
> >>>
> >>> -Ayush
> >>>
> >>>
> >>>
> >>> On Sun, 5 Jan 2020 at 05:03, Sree Vaddi <sree_at_chess@yahoo.com
> >> .invalid>
> >>> wrote:
> >>>
> >>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> >>>> Project ? Or as a TLP ?
> >>>> Or as a new project definition ?
> >>>>
> >>>> The effort to streamline and put in an accepted standard for the
> >>>> dependencies that require shading,seems beyond the siloed efforts of
> >>>> hadoop, hbase, etc....
> >>>>
> >>>> I propose, we bring all the decision makers from all these artifacts
> in
> >>>> one room and decide best course of action.I am looking at, no projects
> >>>> should ever had to shade any artifacts except as an absolute necessary
> >>>> alternative.
> >>>>
> >>>>
> >>>> Thank you./Sree
> >>>>
> >>>>
> >>>>
> >>>>    On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
> >>>> vinayakumarb@apache.org> wrote:
> >>>>
> >>>> Hi,
> >>>> Sorry for the late reply,.
> >>>>>>> To be exact, how can we better use the thirdparty repo? Looking at
> >>>> HBase as an example, it looks like everything that are known to break
> a
> >>> lot
> >>>> after an update get shaded into the hbase-thirdparty artifact: guava,
> >>>> netty, ... etc.
> >>>> Is it the purpose to isolate these naughty dependencies?
> >>>> Yes, shading is to isolate these naughty dependencies from downstream
> >>>> classpath and have independent control on these upgrades without
> >> breaking
> >>>> downstreams.
> >>>>
> >>>> First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create
> >>> the
> >>>> protobuf shaded jar is ready to merge.
> >>>>
> >>>> Please take a look if anyone interested, will be merged may be after
> >> two
> >>>> days if no objections.
> >>>>
> >>>> -Vinay
> >>>>
> >>>>
> >>>> On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> Hi I am late to this but I am keen to understand more.
> >>>>>
> >>>>> To be exact, how can we better use the thirdparty repo? Looking at
> >>> HBase
> >>>>> as an example, it looks like everything that are known to break a lot
> >>>> after
> >>>>> an update get shaded into the hbase-thirdparty artifact: guava,
> >> netty,
> >>>> ...
> >>>>> etc.
> >>>>> Is it the purpose to isolate these naughty dependencies?
> >>>>>
> >>>>> On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <
> >> vinayakumarb@apache.org
> >>>>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi All,
> >>>>>>
> >>>>>> I have updated the PR as per @Owen O'Malley <owen.omalley@gmail.com
> >>>
> >>>>>> 's suggestions.
> >>>>>>
> >>>>>>   i. Renamed the module to 'hadoop-shaded-protobuf37'
> >>>>>>   ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
> >>>>>>
> >>>>>> Please review!!
> >>>>>>
> >>>>>> Thanks,
> >>>>>> -Vinay
> >>>>>>
> >>>>>>
> >>>>>> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <
> >> palomino219@gmail.com
> >>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> For HBase we have a separated repo for hbase-thirdparty
> >>>>>>>
> >>>>>>> https://github.com/apache/hbase-thirdparty
> >>>>>>>
> >>>>>>> We will publish the artifacts to nexus so we do not need to
> >> include
> >>>>>>> binaries in our git repo, just add a dependency in the pom.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> >>>>>>>
> >>>>>>>
> >>>>>>> And it has its own release cycles, only when there are special
> >>>>>> requirements
> >>>>>>> or we want to upgrade some of the dependencies. This is the vote
> >>>> thread
> >>>>>> for
> >>>>>>> the newest release, where we want to provide a shaded gson for
> >> jdk7.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> >>>>>>>
> >>>>>>>
> >>>>>>> Thanks.
> >>>>>>>
> >>>>>>> Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
> >>>>>>>
> >>>>>>>> Please find replies inline.
> >>>>>>>>
> >>>>>>>> -Vinay
> >>>>>>>>
> >>>>>>>> On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
> >>>>>> owen.omalley@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> I'm very unhappy with this direction. In particular, I don't
> >>> think
> >>>>>> git
> >>>>>>> is
> >>>>>>>>> a good place for distribution of binary artifacts.
> >> Furthermore,
> >>>> the
> >>>>>> PMC
> >>>>>>>>> shouldn't be releasing anything without a release vote.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> Proposed solution doesnt release any binaries in git. Its
> >>> actually a
> >>>>>>>> complete sub-project which follows entire release process,
> >>> including
> >>>>>> VOTE
> >>>>>>>> in public. I have mentioned already that release process is
> >>> similar
> >>>> to
> >>>>>>>> hadoop.
> >>>>>>>> To be specific, using the (almost) same script used in hadoop to
> >>>>>> generate
> >>>>>>>> artifacts, sign and deploy to staging repository. Please let me
> >>> know
> >>>>>> If I
> >>>>>>>> am conveying anything wrong.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> I'd propose that we make a third party module that contains
> >> the
> >>>>>>> *source*
> >>>>>>>>> of the pom files to build the relocated jars. This should
> >>>>>> absolutely be
> >>>>>>>>> treated as a last resort for the mostly Google projects that
> >>>>>> regularly
> >>>>>>>>> break binary compatibility (eg. Protobuf & Guava).
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> Same has been implemented in the PR
> >>>>>>>> https://github.com/apache/hadoop-thirdparty/pull/1. Please
> >> check
> >>>> and
> >>>>>> let
> >>>>>>>> me
> >>>>>>>> know If I misunderstood. Yes, this is the last option we have
> >>> AFAIK.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> In terms of naming, I'd propose something like:
> >>>>>>>>>
> >>>>>>>>> org.apache.hadoop.thirdparty.protobuf2_5
> >>>>>>>>> org.apache.hadoop.thirdparty.guava28
> >>>>>>>>>
> >>>>>>>>> In particular, I think we absolutely need to include the
> >> version
> >>>> of
> >>>>>> the
> >>>>>>>>> underlying project. On the other hand, since we should not be
> >>>>>> shading
> >>>>>>>>> *everything* we can drop the leading com.google.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> IMO, This naming convention is easy for identifying the
> >> underlying
> >>>>>>> project,
> >>>>>>>> but  it will be difficult to maintain going forward if
> >> underlying
> >>>>>> project
> >>>>>>>> versions changes. Since thirdparty module have its own releases,
> >>>> each
> >>>>>> of
> >>>>>>>> those release can be mapped to specific version of underlying
> >>>> project.
> >>>>>>> Even
> >>>>>>>> the binary artifact can include a MANIFEST with underlying
> >> project
> >>>>>>> details
> >>>>>>>> as per Steve's suggestion on HADOOP-13363.
> >>>>>>>> That said, if you still prefer to have project number in
> >> artifact
> >>>> id,
> >>>>>> it
> >>>>>>>> can be done.
> >>>>>>>>
> >>>>>>>> The Hadoop project can make releases of  the thirdparty module:
> >>>>>>>>>
> >>>>>>>>> <dependency>
> >>>>>>>>> <groupId>org.apache.hadoop</groupId>
> >>>>>>>>> <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> >>>>>>>>> <version>1.0</version>
> >>>>>>>>> </dependency>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> Note that the version has to be the hadoop thirdparty release
> >>>> number,
> >>>>>>> which
> >>>>>>>>> is part of why you need to have the underlying version in the
> >>>>>> artifact
> >>>>>>>>> name. These we can push to maven central as new releases from
> >>>>>> Hadoop.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> Exactly, same has been implemented in the PR. hadoop-thirdparty
> >>>> module
> >>>>>>> have
> >>>>>>>> its own releases. But in HADOOP Jira, thirdparty versions can be
> >>>>>>>> differentiated using prefix "thirdparty-".
> >>>>>>>>
> >>>>>>>> Same solution is being followed in HBase. May be people involved
> >>> in
> >>>>>> HBase
> >>>>>>>> can add some points here.
> >>>>>>>>
> >>>>>>>> Thoughts?
> >>>>>>>>>
> >>>>>>>>> .. Owen
> >>>>>>>>>
> >>>>>>>>> On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> >>>>>> vinayakumarb@apache.org
> >>>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi All,
> >>>>>>>>>>
> >>>>>>>>>>   I wanted to discuss about the separate repo for thirdparty
> >>>>>>>> dependencies
> >>>>>>>>>> which we need to shaded and include in Hadoop component's
> >> jars.
> >>>>>>>>>>
> >>>>>>>>>>   Apologies for the big text ahead, but this needs clear
> >>>>>>> explanation!!
> >>>>>>>>>>
> >>>>>>>>>>   Right now most needed such dependency is protobuf.
> >> Protobuf
> >>>>>>>> dependency
> >>>>>>>>>> was not upgraded from 2.5.0 onwards with the fear that
> >>> downstream
> >>>>>>>> builds,
> >>>>>>>>>> which depends on transitive dependency protobuf coming from
> >>>>>> hadoop's
> >>>>>>>> jars,
> >>>>>>>>>> may fail with the upgrade. Apparently protobuf does not
> >>> guarantee
> >>>>>>> source
> >>>>>>>>>> compatibility, though it guarantees wire compatibility
> >> between
> >>>>>>> versions.
> >>>>>>>>>> Because of this behavior, version upgrade may cause breakage
> >> in
> >>>>>> known
> >>>>>>>> and
> >>>>>>>>>> unknown (private?) downstreams.
> >>>>>>>>>>
> >>>>>>>>>>   So to tackle this, we came up the following proposal in
> >>>>>>> HADOOP-13363.
> >>>>>>>>>>
> >>>>>>>>>>   Luckily, As far as I know, no APIs, either public to user
> >> or
> >>>>>>> between
> >>>>>>>>>> Hadoop processes, is not directly using protobuf classes in
> >>>>>>> signatures.
> >>>>>>>>>> (If
> >>>>>>>>>> any exist, please let us know).
> >>>>>>>>>>
> >>>>>>>>>>   Proposal:
> >>>>>>>>>>   ------------
> >>>>>>>>>>
> >>>>>>>>>>   1. Create a artifact(s) which contains shaded
> >> dependencies.
> >>>> All
> >>>>>>> such
> >>>>>>>>>> shading/relocation will be with known prefix
> >>>>>>>>>> **org.apache.hadoop.thirdparty.**.
> >>>>>>>>>>   2. Right now protobuf jar (ex:
> >>>>>>>> o.a.h.thirdparty:hadoop-shaded-protobuf)
> >>>>>>>>>> to start with, all **com.google.protobuf** classes will be
> >>>>>> relocated
> >>>>>>> as
> >>>>>>>>>> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> >>>>>>>>>>   3. Hadoop modules, which needs protobuf as dependency,
> >> will
> >>>> add
> >>>>>>> this
> >>>>>>>>>> shaded artifact as dependency (ex:
> >>>>>>>>>> o.a.h.thirdparty:hadoop-shaded-protobuf).
> >>>>>>>>>>   4. All previous usages of "com.google.protobuf" will be
> >>>>>> relocated
> >>>>>>> to
> >>>>>>>>>> "org.apache.hadoop.thirdparty.com.google.protobuf" in the
> >> code
> >>>> and
> >>>>>>> will
> >>>>>>>> be
> >>>>>>>>>> committed. Please note, this replacement is One-Time directly
> >>> in
> >>>>>>> source
> >>>>>>>>>> code, NOT during compile and package.
> >>>>>>>>>>   5. Once all usages of "com.google.protobuf" is relocated,
> >>> then
> >>>>>>> hadoop
> >>>>>>>>>> dont care about which version of original  "protobuf-java" is
> >>> in
> >>>>>>>>>> dependency.
> >>>>>>>>>>   6. Just keep "protobuf-java:2.5.0" in dependency tree not
> >> to
> >>>>>> break
> >>>>>>>> the
> >>>>>>>>>> downstreams. But hadoop will be originally using the latest
> >>>>>> protobuf
> >>>>>>>>>> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> >>>>>>>>>>
> >>>>>>>>>>   7. Coming back to separate repo, Following are most
> >>>> appropriate
> >>>>>>>> reasons
> >>>>>>>>>> of keeping shaded dependency artifact in separate repo
> >> instead
> >>> of
> >>>>>>>>>> submodule.
> >>>>>>>>>>
> >>>>>>>>>>     7a. These artifacts need not be built all the time. It
> >>> needs
> >>>>>> to
> >>>>>>> be
> >>>>>>>>>> built only when there is a change in the dependency version
> >> or
> >>>> the
> >>>>>>> build
> >>>>>>>>>> process.
> >>>>>>>>>>     7b. If added as "submodule in Hadoop repo",
> >>>>>>>> maven-shade-plugin:shade
> >>>>>>>>>> will execute only in package phase. That means, "mvn compile"
> >>> or
> >>>>>> "mvn
> >>>>>>>>>> test-compile" will not be failed as this artifact will not
> >> have
> >>>>>>>> relocated
> >>>>>>>>>> classes, instead it will have original classes, resulting in
> >>>>>>> compilation
> >>>>>>>>>> failure. Workaround, build thirdparty submodule first and
> >>> exclude
> >>>>>>>>>> "thirdparty" submodule in other executions. This will be a
> >>>> complex
> >>>>>>>> process
> >>>>>>>>>> compared to keeping in a separate repo.
> >>>>>>>>>>
> >>>>>>>>>>     7c. Separate repo, will be a subproject of Hadoop, using
> >>> the
> >>>>>>> same
> >>>>>>>>>> HADOOP jira project, with different versioning prefixed with
> >>>>>>>> "thirdparty-"
> >>>>>>>>>> (ex: thirdparty-1.0.0).
> >>>>>>>>>>     7d. Separate will have same release process as Hadoop.
> >>>>>>>>>>
> >>>>>>>>>>   HADOOP-13363 (
> >>>>>> https://issues.apache.org/jira/browse/HADOOP-13363)
> >>>>>>>> is
> >>>>>>>>>> an
> >>>>>>>>>> umbrella jira tracking the changes to protobuf upgrade.
> >>>>>>>>>>
> >>>>>>>>>>   PR (https://github.com/apache/hadoop-thirdparty/pull/1)
> >> has
> >>>>>> been
> >>>>>>>>>> raised
> >>>>>>>>>> for separate repo creation in (HADOOP-16595 (
> >>>>>>>>>> https://issues.apache.org/jira/browse/HADOOP-16595)
> >>>>>>>>>>
> >>>>>>>>>>   Please provide your inputs for the proposal and review the
> >>> PR
> >>>>>> to
> >>>>>>>>>> proceed with the proposal.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>   -Thanks,
> >>>>>>>>>>   Vinay
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> >>>>>>>>>> vinodkv@apache.org>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Moving the thread to the dev lists.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks
> >>>>>>>>>>> +Vinod
> >>>>>>>>>>>
> >>>>>>>>>>>> On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> >>>>>>>> vinayakumarb@apache.org>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks Marton,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Current created 'hadoop-thirdparty' repo is empty right
> >>> now.
> >>>>>>>>>>>> Whether to use that repo  for shaded artifact or not will
> >>> be
> >>>>>>>>>> monitored in
> >>>>>>>>>>>> HADOOP-13363 umbrella jira. Please feel free to join the
> >>>>>>> discussion.
> >>>>>>>>>>>>
> >>>>>>>>>>>> There is no existing codebase is being moved out of
> >> hadoop
> >>>>>> repo.
> >>>>>>> So
> >>>>>>>> I
> >>>>>>>>>>> think
> >>>>>>>>>>>> right now we are good to go.
> >>>>>>>>>>>>
> >>>>>>>>>>>> -Vinay
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
> >>>> elek@apache.org>
> >>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I am not sure if it's defined when is a vote required.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> https://www.apache.org/foundation/voting.html
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Personally I think it's a big enough change to send a
> >>>>>>> notification
> >>>>>>>> to
> >>>>>>>>>>> the
> >>>>>>>>>>>>> dev lists with a 'lazy consensus'  closure
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Marton
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 2019/09/23 17:46:37, Vinayakumar B <
> >>>>>> vinayakumarb@apache.org>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> As discussed in HADOOP-13363, protobuf 3.x jar (and may
> >>> be
> >>>>>> more
> >>>>>>> in
> >>>>>>>>>>>>> future)
> >>>>>>>>>>>>>> will be kept as a shaded artifact in a separate repo,
> >>> which
> >>>>>> will
> >>>>>>>> be
> >>>>>>>>>>>>>> referred as dependency in hadoop modules.  This
> >> approach
> >>>>>> avoids
> >>>>>>>>>> shading
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>> every submodule during build.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> So question is does any VOTE required before asking to
> >>>>>> create a
> >>>>>>>> git
> >>>>>>>>>>> repo?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On selfserve platform
> >>>>>>>> https://gitbox.apache.org/setup/newrepo.html
> >>>>>>>>>>>>>> I can access see that, requester should be PMC.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Wanted to confirm here first.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -Vinay
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>
> >>>> ---------------------------------------------------------------------
> >>>>>>>>>>>>> To unsubscribe, e-mail:
> >>>> private-unsubscribe@hadoop.apache.org
> >>>>>>>>>>>>> For additional commands, e-mail:
> >>>>>> private-help@hadoop.apache.org
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> > --
> >
> >
> >
> > --Brahma Reddy Battula
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org
>
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Vinayakumar B <vi...@apache.org>.
PR has been merged.

Thanks everyone for discussions.

-Vinay


On Thu, Jan 9, 2020 at 4:47 PM Ayush Saxena <ay...@gmail.com> wrote:

> Hi All,
> FYI :
> We will be going ahead with the present approach, will merge by tomorrow
> EOD. Considering no one has objections.
> Thanx Everyone!!!
>
> -Ayush
>
> > On 07-Jan-2020, at 9:22 PM, Brahma Reddy Battula <br...@apache.org>
> wrote:
> >
> > Hi Sree vaddi,Owen,stack,Duo Zhang,
> >
> > We can move forward based on your comments, just waiting for your
> > reply.Hope all of your comments answered..(unification we can think
> > parallel thread as Vinay mentioned).
> >
> >
> >
> > On Mon, 6 Jan 2020 at 6:21 PM, Vinayakumar B <vi...@apache.org>
> > wrote:
> >
> >> Hi Sree,
> >>
> >>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> >> Project ? Or as a TLP ?
> >>> Or as a new project definition ?
> >> As already mentioned by Ayush, this will be a subproject of Hadoop.
> >> Releases will be voted by Hadoop PMC as per ASF process.
> >>
> >>
> >>> The effort to streamline and put in an accepted standard for the
> >> dependencies that require shading,
> >>> seems beyond the siloed efforts of hadoop, hbase, etc....
> >>
> >>> I propose, we bring all the decision makers from all these artifacts in
> >> one room and decide best course of action.
> >>> I am looking at, no projects should ever had to shade any artifacts
> >> except as an absolute necessary alternative.
> >>
> >> This is the ideal proposal for any project. But unfortunately some
> projects
> >> takes their own course based on need.
> >>
> >> In the current case of protobuf in Hadoop,
> >>    Protobuf upgrade from 2.5.0 (which is already EOL) was not taken up
> to
> >> avoid downstream failures. Since Hadoop is a platform, its dependencies
> >> will get added to downstream projects' classpath. So any change in
> Hadoop's
> >> dependencies will directly affect downstreams. Hadoop strictly follows
> >> backward compatibility as far as possible.
> >>    Though protobuf provides wire compatibility b/w versions, it doesnt
> >> provide compatibility for generated sources.
> >>    Now, to support ARM protobuf upgrade is mandatory. Using shading
> >> technique, In Hadoop internally can upgrade to shaded protobuf 3.x and
> >> still have 2.5.0 protobuf (deprecated) for downstreams.
> >>
> >> This shading is necessary to have both versions of protobuf supported.
> >> (2.5.0 (non-shaded) for downstream's classpath and 3.x (shaded) for
> >> hadoop's internal usage).
> >> And this entire work to be done before 3.3.0 release.
> >>
> >> So, though its ideal to make a common approach for all projects, I
> suggest
> >> for Hadoop we can go ahead as per current approach.
> >> We can also start the parallel effort to address these problems in a
> >> separate discussion/proposal. Once the solution is available we can
> revisit
> >> and adopt new solution accordingly in all such projects (ex: HBase,
> Hadoop,
> >> Ratis).
> >>
> >> -Vinay
> >>
> >>> On Mon, Jan 6, 2020 at 12:39 AM Ayush Saxena <ay...@gmail.com>
> wrote:
> >>>
> >>> Hey Sree
> >>>
> >>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> >>>> Project ? Or as a TLP ?
> >>>> Or as a new project definition ?
> >>>>
> >>> A sub project of Apache Hadoop, having its own independent release
> >> cycles.
> >>> May be you can put this into the same column as ozone or as
> >>> submarine(couple of months ago).
> >>>
> >>> Unifying for all, seems interesting but each project is independent and
> >> has
> >>> its own limitations and way of thinking, I don't think it would be an
> >> easy
> >>> task to bring all on the same table and get them agree to a common
> stuff.
> >>>
> >>> I guess this has been into discussion since quite long, and there
> hasn't
> >>> been any other alternative suggested. Still we can hold up for a week,
> if
> >>> someone comes up with a better solution, else we can continue in the
> >>> present direction.
> >>>
> >>> -Ayush
> >>>
> >>>
> >>>
> >>> On Sun, 5 Jan 2020 at 05:03, Sree Vaddi <sree_at_chess@yahoo.com
> >> .invalid>
> >>> wrote:
> >>>
> >>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> >>>> Project ? Or as a TLP ?
> >>>> Or as a new project definition ?
> >>>>
> >>>> The effort to streamline and put in an accepted standard for the
> >>>> dependencies that require shading,seems beyond the siloed efforts of
> >>>> hadoop, hbase, etc....
> >>>>
> >>>> I propose, we bring all the decision makers from all these artifacts
> in
> >>>> one room and decide best course of action.I am looking at, no projects
> >>>> should ever had to shade any artifacts except as an absolute necessary
> >>>> alternative.
> >>>>
> >>>>
> >>>> Thank you./Sree
> >>>>
> >>>>
> >>>>
> >>>>    On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
> >>>> vinayakumarb@apache.org> wrote:
> >>>>
> >>>> Hi,
> >>>> Sorry for the late reply,.
> >>>>>>> To be exact, how can we better use the thirdparty repo? Looking at
> >>>> HBase as an example, it looks like everything that are known to break
> a
> >>> lot
> >>>> after an update get shaded into the hbase-thirdparty artifact: guava,
> >>>> netty, ... etc.
> >>>> Is it the purpose to isolate these naughty dependencies?
> >>>> Yes, shading is to isolate these naughty dependencies from downstream
> >>>> classpath and have independent control on these upgrades without
> >> breaking
> >>>> downstreams.
> >>>>
> >>>> First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create
> >>> the
> >>>> protobuf shaded jar is ready to merge.
> >>>>
> >>>> Please take a look if anyone interested, will be merged may be after
> >> two
> >>>> days if no objections.
> >>>>
> >>>> -Vinay
> >>>>
> >>>>
> >>>> On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> Hi I am late to this but I am keen to understand more.
> >>>>>
> >>>>> To be exact, how can we better use the thirdparty repo? Looking at
> >>> HBase
> >>>>> as an example, it looks like everything that are known to break a lot
> >>>> after
> >>>>> an update get shaded into the hbase-thirdparty artifact: guava,
> >> netty,
> >>>> ...
> >>>>> etc.
> >>>>> Is it the purpose to isolate these naughty dependencies?
> >>>>>
> >>>>> On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <
> >> vinayakumarb@apache.org
> >>>>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi All,
> >>>>>>
> >>>>>> I have updated the PR as per @Owen O'Malley <owen.omalley@gmail.com
> >>>
> >>>>>> 's suggestions.
> >>>>>>
> >>>>>>   i. Renamed the module to 'hadoop-shaded-protobuf37'
> >>>>>>   ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
> >>>>>>
> >>>>>> Please review!!
> >>>>>>
> >>>>>> Thanks,
> >>>>>> -Vinay
> >>>>>>
> >>>>>>
> >>>>>> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <
> >> palomino219@gmail.com
> >>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> For HBase we have a separated repo for hbase-thirdparty
> >>>>>>>
> >>>>>>> https://github.com/apache/hbase-thirdparty
> >>>>>>>
> >>>>>>> We will publish the artifacts to nexus so we do not need to
> >> include
> >>>>>>> binaries in our git repo, just add a dependency in the pom.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> >>>>>>>
> >>>>>>>
> >>>>>>> And it has its own release cycles, only when there are special
> >>>>>> requirements
> >>>>>>> or we want to upgrade some of the dependencies. This is the vote
> >>>> thread
> >>>>>> for
> >>>>>>> the newest release, where we want to provide a shaded gson for
> >> jdk7.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> >>>>>>>
> >>>>>>>
> >>>>>>> Thanks.
> >>>>>>>
> >>>>>>> Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
> >>>>>>>
> >>>>>>>> Please find replies inline.
> >>>>>>>>
> >>>>>>>> -Vinay
> >>>>>>>>
> >>>>>>>> On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
> >>>>>> owen.omalley@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> I'm very unhappy with this direction. In particular, I don't
> >>> think
> >>>>>> git
> >>>>>>> is
> >>>>>>>>> a good place for distribution of binary artifacts.
> >> Furthermore,
> >>>> the
> >>>>>> PMC
> >>>>>>>>> shouldn't be releasing anything without a release vote.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> Proposed solution doesnt release any binaries in git. Its
> >>> actually a
> >>>>>>>> complete sub-project which follows entire release process,
> >>> including
> >>>>>> VOTE
> >>>>>>>> in public. I have mentioned already that release process is
> >>> similar
> >>>> to
> >>>>>>>> hadoop.
> >>>>>>>> To be specific, using the (almost) same script used in hadoop to
> >>>>>> generate
> >>>>>>>> artifacts, sign and deploy to staging repository. Please let me
> >>> know
> >>>>>> If I
> >>>>>>>> am conveying anything wrong.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> I'd propose that we make a third party module that contains
> >> the
> >>>>>>> *source*
> >>>>>>>>> of the pom files to build the relocated jars. This should
> >>>>>> absolutely be
> >>>>>>>>> treated as a last resort for the mostly Google projects that
> >>>>>> regularly
> >>>>>>>>> break binary compatibility (eg. Protobuf & Guava).
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> Same has been implemented in the PR
> >>>>>>>> https://github.com/apache/hadoop-thirdparty/pull/1. Please
> >> check
> >>>> and
> >>>>>> let
> >>>>>>>> me
> >>>>>>>> know If I misunderstood. Yes, this is the last option we have
> >>> AFAIK.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> In terms of naming, I'd propose something like:
> >>>>>>>>>
> >>>>>>>>> org.apache.hadoop.thirdparty.protobuf2_5
> >>>>>>>>> org.apache.hadoop.thirdparty.guava28
> >>>>>>>>>
> >>>>>>>>> In particular, I think we absolutely need to include the
> >> version
> >>>> of
> >>>>>> the
> >>>>>>>>> underlying project. On the other hand, since we should not be
> >>>>>> shading
> >>>>>>>>> *everything* we can drop the leading com.google.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> IMO, This naming convention is easy for identifying the
> >> underlying
> >>>>>>> project,
> >>>>>>>> but  it will be difficult to maintain going forward if
> >> underlying
> >>>>>> project
> >>>>>>>> versions changes. Since thirdparty module have its own releases,
> >>>> each
> >>>>>> of
> >>>>>>>> those release can be mapped to specific version of underlying
> >>>> project.
> >>>>>>> Even
> >>>>>>>> the binary artifact can include a MANIFEST with underlying
> >> project
> >>>>>>> details
> >>>>>>>> as per Steve's suggestion on HADOOP-13363.
> >>>>>>>> That said, if you still prefer to have project number in
> >> artifact
> >>>> id,
> >>>>>> it
> >>>>>>>> can be done.
> >>>>>>>>
> >>>>>>>> The Hadoop project can make releases of  the thirdparty module:
> >>>>>>>>>
> >>>>>>>>> <dependency>
> >>>>>>>>> <groupId>org.apache.hadoop</groupId>
> >>>>>>>>> <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> >>>>>>>>> <version>1.0</version>
> >>>>>>>>> </dependency>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> Note that the version has to be the hadoop thirdparty release
> >>>> number,
> >>>>>>> which
> >>>>>>>>> is part of why you need to have the underlying version in the
> >>>>>> artifact
> >>>>>>>>> name. These we can push to maven central as new releases from
> >>>>>> Hadoop.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> Exactly, same has been implemented in the PR. hadoop-thirdparty
> >>>> module
> >>>>>>> have
> >>>>>>>> its own releases. But in HADOOP Jira, thirdparty versions can be
> >>>>>>>> differentiated using prefix "thirdparty-".
> >>>>>>>>
> >>>>>>>> Same solution is being followed in HBase. May be people involved
> >>> in
> >>>>>> HBase
> >>>>>>>> can add some points here.
> >>>>>>>>
> >>>>>>>> Thoughts?
> >>>>>>>>>
> >>>>>>>>> .. Owen
> >>>>>>>>>
> >>>>>>>>> On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> >>>>>> vinayakumarb@apache.org
> >>>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi All,
> >>>>>>>>>>
> >>>>>>>>>>   I wanted to discuss about the separate repo for thirdparty
> >>>>>>>> dependencies
> >>>>>>>>>> which we need to shaded and include in Hadoop component's
> >> jars.
> >>>>>>>>>>
> >>>>>>>>>>   Apologies for the big text ahead, but this needs clear
> >>>>>>> explanation!!
> >>>>>>>>>>
> >>>>>>>>>>   Right now most needed such dependency is protobuf.
> >> Protobuf
> >>>>>>>> dependency
> >>>>>>>>>> was not upgraded from 2.5.0 onwards with the fear that
> >>> downstream
> >>>>>>>> builds,
> >>>>>>>>>> which depends on transitive dependency protobuf coming from
> >>>>>> hadoop's
> >>>>>>>> jars,
> >>>>>>>>>> may fail with the upgrade. Apparently protobuf does not
> >>> guarantee
> >>>>>>> source
> >>>>>>>>>> compatibility, though it guarantees wire compatibility
> >> between
> >>>>>>> versions.
> >>>>>>>>>> Because of this behavior, version upgrade may cause breakage
> >> in
> >>>>>> known
> >>>>>>>> and
> >>>>>>>>>> unknown (private?) downstreams.
> >>>>>>>>>>
> >>>>>>>>>>   So to tackle this, we came up the following proposal in
> >>>>>>> HADOOP-13363.
> >>>>>>>>>>
> >>>>>>>>>>   Luckily, As far as I know, no APIs, either public to user
> >> or
> >>>>>>> between
> >>>>>>>>>> Hadoop processes, is not directly using protobuf classes in
> >>>>>>> signatures.
> >>>>>>>>>> (If
> >>>>>>>>>> any exist, please let us know).
> >>>>>>>>>>
> >>>>>>>>>>   Proposal:
> >>>>>>>>>>   ------------
> >>>>>>>>>>
> >>>>>>>>>>   1. Create a artifact(s) which contains shaded
> >> dependencies.
> >>>> All
> >>>>>>> such
> >>>>>>>>>> shading/relocation will be with known prefix
> >>>>>>>>>> **org.apache.hadoop.thirdparty.**.
> >>>>>>>>>>   2. Right now protobuf jar (ex:
> >>>>>>>> o.a.h.thirdparty:hadoop-shaded-protobuf)
> >>>>>>>>>> to start with, all **com.google.protobuf** classes will be
> >>>>>> relocated
> >>>>>>> as
> >>>>>>>>>> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> >>>>>>>>>>   3. Hadoop modules, which needs protobuf as dependency,
> >> will
> >>>> add
> >>>>>>> this
> >>>>>>>>>> shaded artifact as dependency (ex:
> >>>>>>>>>> o.a.h.thirdparty:hadoop-shaded-protobuf).
> >>>>>>>>>>   4. All previous usages of "com.google.protobuf" will be
> >>>>>> relocated
> >>>>>>> to
> >>>>>>>>>> "org.apache.hadoop.thirdparty.com.google.protobuf" in the
> >> code
> >>>> and
> >>>>>>> will
> >>>>>>>> be
> >>>>>>>>>> committed. Please note, this replacement is One-Time directly
> >>> in
> >>>>>>> source
> >>>>>>>>>> code, NOT during compile and package.
> >>>>>>>>>>   5. Once all usages of "com.google.protobuf" is relocated,
> >>> then
> >>>>>>> hadoop
> >>>>>>>>>> dont care about which version of original  "protobuf-java" is
> >>> in
> >>>>>>>>>> dependency.
> >>>>>>>>>>   6. Just keep "protobuf-java:2.5.0" in dependency tree not
> >> to
> >>>>>> break
> >>>>>>>> the
> >>>>>>>>>> downstreams. But hadoop will be originally using the latest
> >>>>>> protobuf
> >>>>>>>>>> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> >>>>>>>>>>
> >>>>>>>>>>   7. Coming back to separate repo, Following are most
> >>>> appropriate
> >>>>>>>> reasons
> >>>>>>>>>> of keeping shaded dependency artifact in separate repo
> >> instead
> >>> of
> >>>>>>>>>> submodule.
> >>>>>>>>>>
> >>>>>>>>>>     7a. These artifacts need not be built all the time. It
> >>> needs
> >>>>>> to
> >>>>>>> be
> >>>>>>>>>> built only when there is a change in the dependency version
> >> or
> >>>> the
> >>>>>>> build
> >>>>>>>>>> process.
> >>>>>>>>>>     7b. If added as "submodule in Hadoop repo",
> >>>>>>>> maven-shade-plugin:shade
> >>>>>>>>>> will execute only in package phase. That means, "mvn compile"
> >>> or
> >>>>>> "mvn
> >>>>>>>>>> test-compile" will not be failed as this artifact will not
> >> have
> >>>>>>>> relocated
> >>>>>>>>>> classes, instead it will have original classes, resulting in
> >>>>>>> compilation
> >>>>>>>>>> failure. Workaround, build thirdparty submodule first and
> >>> exclude
> >>>>>>>>>> "thirdparty" submodule in other executions. This will be a
> >>>> complex
> >>>>>>>> process
> >>>>>>>>>> compared to keeping in a separate repo.
> >>>>>>>>>>
> >>>>>>>>>>     7c. Separate repo, will be a subproject of Hadoop, using
> >>> the
> >>>>>>> same
> >>>>>>>>>> HADOOP jira project, with different versioning prefixed with
> >>>>>>>> "thirdparty-"
> >>>>>>>>>> (ex: thirdparty-1.0.0).
> >>>>>>>>>>     7d. Separate will have same release process as Hadoop.
> >>>>>>>>>>
> >>>>>>>>>>   HADOOP-13363 (
> >>>>>> https://issues.apache.org/jira/browse/HADOOP-13363)
> >>>>>>>> is
> >>>>>>>>>> an
> >>>>>>>>>> umbrella jira tracking the changes to protobuf upgrade.
> >>>>>>>>>>
> >>>>>>>>>>   PR (https://github.com/apache/hadoop-thirdparty/pull/1)
> >> has
> >>>>>> been
> >>>>>>>>>> raised
> >>>>>>>>>> for separate repo creation in (HADOOP-16595 (
> >>>>>>>>>> https://issues.apache.org/jira/browse/HADOOP-16595)
> >>>>>>>>>>
> >>>>>>>>>>   Please provide your inputs for the proposal and review the
> >>> PR
> >>>>>> to
> >>>>>>>>>> proceed with the proposal.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>   -Thanks,
> >>>>>>>>>>   Vinay
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> >>>>>>>>>> vinodkv@apache.org>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Moving the thread to the dev lists.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks
> >>>>>>>>>>> +Vinod
> >>>>>>>>>>>
> >>>>>>>>>>>> On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> >>>>>>>> vinayakumarb@apache.org>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks Marton,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Current created 'hadoop-thirdparty' repo is empty right
> >>> now.
> >>>>>>>>>>>> Whether to use that repo  for shaded artifact or not will
> >>> be
> >>>>>>>>>> monitored in
> >>>>>>>>>>>> HADOOP-13363 umbrella jira. Please feel free to join the
> >>>>>>> discussion.
> >>>>>>>>>>>>
> >>>>>>>>>>>> There is no existing codebase is being moved out of
> >> hadoop
> >>>>>> repo.
> >>>>>>> So
> >>>>>>>> I
> >>>>>>>>>>> think
> >>>>>>>>>>>> right now we are good to go.
> >>>>>>>>>>>>
> >>>>>>>>>>>> -Vinay
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
> >>>> elek@apache.org>
> >>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I am not sure if it's defined when is a vote required.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> https://www.apache.org/foundation/voting.html
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Personally I think it's a big enough change to send a
> >>>>>>> notification
> >>>>>>>> to
> >>>>>>>>>>> the
> >>>>>>>>>>>>> dev lists with a 'lazy consensus'  closure
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Marton
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 2019/09/23 17:46:37, Vinayakumar B <
> >>>>>> vinayakumarb@apache.org>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> As discussed in HADOOP-13363, protobuf 3.x jar (and may
> >>> be
> >>>>>> more
> >>>>>>> in
> >>>>>>>>>>>>> future)
> >>>>>>>>>>>>>> will be kept as a shaded artifact in a separate repo,
> >>> which
> >>>>>> will
> >>>>>>>> be
> >>>>>>>>>>>>>> referred as dependency in hadoop modules.  This
> >> approach
> >>>>>> avoids
> >>>>>>>>>> shading
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>> every submodule during build.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> So question is does any VOTE required before asking to
> >>>>>> create a
> >>>>>>>> git
> >>>>>>>>>>> repo?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On selfserve platform
> >>>>>>>> https://gitbox.apache.org/setup/newrepo.html
> >>>>>>>>>>>>>> I can access see that, requester should be PMC.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Wanted to confirm here first.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -Vinay
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>
> >>>> ---------------------------------------------------------------------
> >>>>>>>>>>>>> To unsubscribe, e-mail:
> >>>> private-unsubscribe@hadoop.apache.org
> >>>>>>>>>>>>> For additional commands, e-mail:
> >>>>>> private-help@hadoop.apache.org
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> > --
> >
> >
> >
> > --Brahma Reddy Battula
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org
>
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Ayush Saxena <ay...@gmail.com>.
Hi All,
FYI :
We will be going ahead with the present approach, will merge by tomorrow EOD. Considering no one has objections.
Thanx Everyone!!!

-Ayush

> On 07-Jan-2020, at 9:22 PM, Brahma Reddy Battula <br...@apache.org> wrote:
> 
> Hi Sree vaddi,Owen,stack,Duo Zhang,
> 
> We can move forward based on your comments, just waiting for your
> reply.Hope all of your comments answered..(unification we can think
> parallel thread as Vinay mentioned).
> 
> 
> 
> On Mon, 6 Jan 2020 at 6:21 PM, Vinayakumar B <vi...@apache.org>
> wrote:
> 
>> Hi Sree,
>> 
>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
>> Project ? Or as a TLP ?
>>> Or as a new project definition ?
>> As already mentioned by Ayush, this will be a subproject of Hadoop.
>> Releases will be voted by Hadoop PMC as per ASF process.
>> 
>> 
>>> The effort to streamline and put in an accepted standard for the
>> dependencies that require shading,
>>> seems beyond the siloed efforts of hadoop, hbase, etc....
>> 
>>> I propose, we bring all the decision makers from all these artifacts in
>> one room and decide best course of action.
>>> I am looking at, no projects should ever had to shade any artifacts
>> except as an absolute necessary alternative.
>> 
>> This is the ideal proposal for any project. But unfortunately some projects
>> takes their own course based on need.
>> 
>> In the current case of protobuf in Hadoop,
>>    Protobuf upgrade from 2.5.0 (which is already EOL) was not taken up to
>> avoid downstream failures. Since Hadoop is a platform, its dependencies
>> will get added to downstream projects' classpath. So any change in Hadoop's
>> dependencies will directly affect downstreams. Hadoop strictly follows
>> backward compatibility as far as possible.
>>    Though protobuf provides wire compatibility b/w versions, it doesnt
>> provide compatibility for generated sources.
>>    Now, to support ARM protobuf upgrade is mandatory. Using shading
>> technique, In Hadoop internally can upgrade to shaded protobuf 3.x and
>> still have 2.5.0 protobuf (deprecated) for downstreams.
>> 
>> This shading is necessary to have both versions of protobuf supported.
>> (2.5.0 (non-shaded) for downstream's classpath and 3.x (shaded) for
>> hadoop's internal usage).
>> And this entire work to be done before 3.3.0 release.
>> 
>> So, though its ideal to make a common approach for all projects, I suggest
>> for Hadoop we can go ahead as per current approach.
>> We can also start the parallel effort to address these problems in a
>> separate discussion/proposal. Once the solution is available we can revisit
>> and adopt new solution accordingly in all such projects (ex: HBase, Hadoop,
>> Ratis).
>> 
>> -Vinay
>> 
>>> On Mon, Jan 6, 2020 at 12:39 AM Ayush Saxena <ay...@gmail.com> wrote:
>>> 
>>> Hey Sree
>>> 
>>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
>>>> Project ? Or as a TLP ?
>>>> Or as a new project definition ?
>>>> 
>>> A sub project of Apache Hadoop, having its own independent release
>> cycles.
>>> May be you can put this into the same column as ozone or as
>>> submarine(couple of months ago).
>>> 
>>> Unifying for all, seems interesting but each project is independent and
>> has
>>> its own limitations and way of thinking, I don't think it would be an
>> easy
>>> task to bring all on the same table and get them agree to a common stuff.
>>> 
>>> I guess this has been into discussion since quite long, and there hasn't
>>> been any other alternative suggested. Still we can hold up for a week, if
>>> someone comes up with a better solution, else we can continue in the
>>> present direction.
>>> 
>>> -Ayush
>>> 
>>> 
>>> 
>>> On Sun, 5 Jan 2020 at 05:03, Sree Vaddi <sree_at_chess@yahoo.com
>> .invalid>
>>> wrote:
>>> 
>>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
>>>> Project ? Or as a TLP ?
>>>> Or as a new project definition ?
>>>> 
>>>> The effort to streamline and put in an accepted standard for the
>>>> dependencies that require shading,seems beyond the siloed efforts of
>>>> hadoop, hbase, etc....
>>>> 
>>>> I propose, we bring all the decision makers from all these artifacts in
>>>> one room and decide best course of action.I am looking at, no projects
>>>> should ever had to shade any artifacts except as an absolute necessary
>>>> alternative.
>>>> 
>>>> 
>>>> Thank you./Sree
>>>> 
>>>> 
>>>> 
>>>>    On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
>>>> vinayakumarb@apache.org> wrote:
>>>> 
>>>> Hi,
>>>> Sorry for the late reply,.
>>>>>>> To be exact, how can we better use the thirdparty repo? Looking at
>>>> HBase as an example, it looks like everything that are known to break a
>>> lot
>>>> after an update get shaded into the hbase-thirdparty artifact: guava,
>>>> netty, ... etc.
>>>> Is it the purpose to isolate these naughty dependencies?
>>>> Yes, shading is to isolate these naughty dependencies from downstream
>>>> classpath and have independent control on these upgrades without
>> breaking
>>>> downstreams.
>>>> 
>>>> First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create
>>> the
>>>> protobuf shaded jar is ready to merge.
>>>> 
>>>> Please take a look if anyone interested, will be merged may be after
>> two
>>>> days if no objections.
>>>> 
>>>> -Vinay
>>>> 
>>>> 
>>>> On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org>
>>>> wrote:
>>>> 
>>>>> Hi I am late to this but I am keen to understand more.
>>>>> 
>>>>> To be exact, how can we better use the thirdparty repo? Looking at
>>> HBase
>>>>> as an example, it looks like everything that are known to break a lot
>>>> after
>>>>> an update get shaded into the hbase-thirdparty artifact: guava,
>> netty,
>>>> ...
>>>>> etc.
>>>>> Is it the purpose to isolate these naughty dependencies?
>>>>> 
>>>>> On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <
>> vinayakumarb@apache.org
>>>> 
>>>>> wrote:
>>>>> 
>>>>>> Hi All,
>>>>>> 
>>>>>> I have updated the PR as per @Owen O'Malley <owen.omalley@gmail.com
>>> 
>>>>>> 's suggestions.
>>>>>> 
>>>>>>   i. Renamed the module to 'hadoop-shaded-protobuf37'
>>>>>>   ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
>>>>>> 
>>>>>> Please review!!
>>>>>> 
>>>>>> Thanks,
>>>>>> -Vinay
>>>>>> 
>>>>>> 
>>>>>> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <
>> palomino219@gmail.com
>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> For HBase we have a separated repo for hbase-thirdparty
>>>>>>> 
>>>>>>> https://github.com/apache/hbase-thirdparty
>>>>>>> 
>>>>>>> We will publish the artifacts to nexus so we do not need to
>> include
>>>>>>> binaries in our git repo, just add a dependency in the pom.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
>>>>>>> 
>>>>>>> 
>>>>>>> And it has its own release cycles, only when there are special
>>>>>> requirements
>>>>>>> or we want to upgrade some of the dependencies. This is the vote
>>>> thread
>>>>>> for
>>>>>>> the newest release, where we want to provide a shaded gson for
>> jdk7.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks.
>>>>>>> 
>>>>>>> Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
>>>>>>> 
>>>>>>>> Please find replies inline.
>>>>>>>> 
>>>>>>>> -Vinay
>>>>>>>> 
>>>>>>>> On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
>>>>>> owen.omalley@gmail.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> I'm very unhappy with this direction. In particular, I don't
>>> think
>>>>>> git
>>>>>>> is
>>>>>>>>> a good place for distribution of binary artifacts.
>> Furthermore,
>>>> the
>>>>>> PMC
>>>>>>>>> shouldn't be releasing anything without a release vote.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> Proposed solution doesnt release any binaries in git. Its
>>> actually a
>>>>>>>> complete sub-project which follows entire release process,
>>> including
>>>>>> VOTE
>>>>>>>> in public. I have mentioned already that release process is
>>> similar
>>>> to
>>>>>>>> hadoop.
>>>>>>>> To be specific, using the (almost) same script used in hadoop to
>>>>>> generate
>>>>>>>> artifacts, sign and deploy to staging repository. Please let me
>>> know
>>>>>> If I
>>>>>>>> am conveying anything wrong.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> I'd propose that we make a third party module that contains
>> the
>>>>>>> *source*
>>>>>>>>> of the pom files to build the relocated jars. This should
>>>>>> absolutely be
>>>>>>>>> treated as a last resort for the mostly Google projects that
>>>>>> regularly
>>>>>>>>> break binary compatibility (eg. Protobuf & Guava).
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> Same has been implemented in the PR
>>>>>>>> https://github.com/apache/hadoop-thirdparty/pull/1. Please
>> check
>>>> and
>>>>>> let
>>>>>>>> me
>>>>>>>> know If I misunderstood. Yes, this is the last option we have
>>> AFAIK.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> In terms of naming, I'd propose something like:
>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.thirdparty.protobuf2_5
>>>>>>>>> org.apache.hadoop.thirdparty.guava28
>>>>>>>>> 
>>>>>>>>> In particular, I think we absolutely need to include the
>> version
>>>> of
>>>>>> the
>>>>>>>>> underlying project. On the other hand, since we should not be
>>>>>> shading
>>>>>>>>> *everything* we can drop the leading com.google.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> IMO, This naming convention is easy for identifying the
>> underlying
>>>>>>> project,
>>>>>>>> but  it will be difficult to maintain going forward if
>> underlying
>>>>>> project
>>>>>>>> versions changes. Since thirdparty module have its own releases,
>>>> each
>>>>>> of
>>>>>>>> those release can be mapped to specific version of underlying
>>>> project.
>>>>>>> Even
>>>>>>>> the binary artifact can include a MANIFEST with underlying
>> project
>>>>>>> details
>>>>>>>> as per Steve's suggestion on HADOOP-13363.
>>>>>>>> That said, if you still prefer to have project number in
>> artifact
>>>> id,
>>>>>> it
>>>>>>>> can be done.
>>>>>>>> 
>>>>>>>> The Hadoop project can make releases of  the thirdparty module:
>>>>>>>>> 
>>>>>>>>> <dependency>
>>>>>>>>> <groupId>org.apache.hadoop</groupId>
>>>>>>>>> <artifactId>hadoop-thirdparty-protobuf25</artifactId>
>>>>>>>>> <version>1.0</version>
>>>>>>>>> </dependency>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> Note that the version has to be the hadoop thirdparty release
>>>> number,
>>>>>>> which
>>>>>>>>> is part of why you need to have the underlying version in the
>>>>>> artifact
>>>>>>>>> name. These we can push to maven central as new releases from
>>>>>> Hadoop.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> Exactly, same has been implemented in the PR. hadoop-thirdparty
>>>> module
>>>>>>> have
>>>>>>>> its own releases. But in HADOOP Jira, thirdparty versions can be
>>>>>>>> differentiated using prefix "thirdparty-".
>>>>>>>> 
>>>>>>>> Same solution is being followed in HBase. May be people involved
>>> in
>>>>>> HBase
>>>>>>>> can add some points here.
>>>>>>>> 
>>>>>>>> Thoughts?
>>>>>>>>> 
>>>>>>>>> .. Owen
>>>>>>>>> 
>>>>>>>>> On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
>>>>>> vinayakumarb@apache.org
>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi All,
>>>>>>>>>> 
>>>>>>>>>>   I wanted to discuss about the separate repo for thirdparty
>>>>>>>> dependencies
>>>>>>>>>> which we need to shaded and include in Hadoop component's
>> jars.
>>>>>>>>>> 
>>>>>>>>>>   Apologies for the big text ahead, but this needs clear
>>>>>>> explanation!!
>>>>>>>>>> 
>>>>>>>>>>   Right now most needed such dependency is protobuf.
>> Protobuf
>>>>>>>> dependency
>>>>>>>>>> was not upgraded from 2.5.0 onwards with the fear that
>>> downstream
>>>>>>>> builds,
>>>>>>>>>> which depends on transitive dependency protobuf coming from
>>>>>> hadoop's
>>>>>>>> jars,
>>>>>>>>>> may fail with the upgrade. Apparently protobuf does not
>>> guarantee
>>>>>>> source
>>>>>>>>>> compatibility, though it guarantees wire compatibility
>> between
>>>>>>> versions.
>>>>>>>>>> Because of this behavior, version upgrade may cause breakage
>> in
>>>>>> known
>>>>>>>> and
>>>>>>>>>> unknown (private?) downstreams.
>>>>>>>>>> 
>>>>>>>>>>   So to tackle this, we came up the following proposal in
>>>>>>> HADOOP-13363.
>>>>>>>>>> 
>>>>>>>>>>   Luckily, As far as I know, no APIs, either public to user
>> or
>>>>>>> between
>>>>>>>>>> Hadoop processes, is not directly using protobuf classes in
>>>>>>> signatures.
>>>>>>>>>> (If
>>>>>>>>>> any exist, please let us know).
>>>>>>>>>> 
>>>>>>>>>>   Proposal:
>>>>>>>>>>   ------------
>>>>>>>>>> 
>>>>>>>>>>   1. Create a artifact(s) which contains shaded
>> dependencies.
>>>> All
>>>>>>> such
>>>>>>>>>> shading/relocation will be with known prefix
>>>>>>>>>> **org.apache.hadoop.thirdparty.**.
>>>>>>>>>>   2. Right now protobuf jar (ex:
>>>>>>>> o.a.h.thirdparty:hadoop-shaded-protobuf)
>>>>>>>>>> to start with, all **com.google.protobuf** classes will be
>>>>>> relocated
>>>>>>> as
>>>>>>>>>> **org.apache.hadoop.thirdparty.com.google.protobuf**.
>>>>>>>>>>   3. Hadoop modules, which needs protobuf as dependency,
>> will
>>>> add
>>>>>>> this
>>>>>>>>>> shaded artifact as dependency (ex:
>>>>>>>>>> o.a.h.thirdparty:hadoop-shaded-protobuf).
>>>>>>>>>>   4. All previous usages of "com.google.protobuf" will be
>>>>>> relocated
>>>>>>> to
>>>>>>>>>> "org.apache.hadoop.thirdparty.com.google.protobuf" in the
>> code
>>>> and
>>>>>>> will
>>>>>>>> be
>>>>>>>>>> committed. Please note, this replacement is One-Time directly
>>> in
>>>>>>> source
>>>>>>>>>> code, NOT during compile and package.
>>>>>>>>>>   5. Once all usages of "com.google.protobuf" is relocated,
>>> then
>>>>>>> hadoop
>>>>>>>>>> dont care about which version of original  "protobuf-java" is
>>> in
>>>>>>>>>> dependency.
>>>>>>>>>>   6. Just keep "protobuf-java:2.5.0" in dependency tree not
>> to
>>>>>> break
>>>>>>>> the
>>>>>>>>>> downstreams. But hadoop will be originally using the latest
>>>>>> protobuf
>>>>>>>>>> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
>>>>>>>>>> 
>>>>>>>>>>   7. Coming back to separate repo, Following are most
>>>> appropriate
>>>>>>>> reasons
>>>>>>>>>> of keeping shaded dependency artifact in separate repo
>> instead
>>> of
>>>>>>>>>> submodule.
>>>>>>>>>> 
>>>>>>>>>>     7a. These artifacts need not be built all the time. It
>>> needs
>>>>>> to
>>>>>>> be
>>>>>>>>>> built only when there is a change in the dependency version
>> or
>>>> the
>>>>>>> build
>>>>>>>>>> process.
>>>>>>>>>>     7b. If added as "submodule in Hadoop repo",
>>>>>>>> maven-shade-plugin:shade
>>>>>>>>>> will execute only in package phase. That means, "mvn compile"
>>> or
>>>>>> "mvn
>>>>>>>>>> test-compile" will not be failed as this artifact will not
>> have
>>>>>>>> relocated
>>>>>>>>>> classes, instead it will have original classes, resulting in
>>>>>>> compilation
>>>>>>>>>> failure. Workaround, build thirdparty submodule first and
>>> exclude
>>>>>>>>>> "thirdparty" submodule in other executions. This will be a
>>>> complex
>>>>>>>> process
>>>>>>>>>> compared to keeping in a separate repo.
>>>>>>>>>> 
>>>>>>>>>>     7c. Separate repo, will be a subproject of Hadoop, using
>>> the
>>>>>>> same
>>>>>>>>>> HADOOP jira project, with different versioning prefixed with
>>>>>>>> "thirdparty-"
>>>>>>>>>> (ex: thirdparty-1.0.0).
>>>>>>>>>>     7d. Separate will have same release process as Hadoop.
>>>>>>>>>> 
>>>>>>>>>>   HADOOP-13363 (
>>>>>> https://issues.apache.org/jira/browse/HADOOP-13363)
>>>>>>>> is
>>>>>>>>>> an
>>>>>>>>>> umbrella jira tracking the changes to protobuf upgrade.
>>>>>>>>>> 
>>>>>>>>>>   PR (https://github.com/apache/hadoop-thirdparty/pull/1)
>> has
>>>>>> been
>>>>>>>>>> raised
>>>>>>>>>> for separate repo creation in (HADOOP-16595 (
>>>>>>>>>> https://issues.apache.org/jira/browse/HADOOP-16595)
>>>>>>>>>> 
>>>>>>>>>>   Please provide your inputs for the proposal and review the
>>> PR
>>>>>> to
>>>>>>>>>> proceed with the proposal.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>   -Thanks,
>>>>>>>>>>   Vinay
>>>>>>>>>> 
>>>>>>>>>> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
>>>>>>>>>> vinodkv@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Moving the thread to the dev lists.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks
>>>>>>>>>>> +Vinod
>>>>>>>>>>> 
>>>>>>>>>>>> On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
>>>>>>>> vinayakumarb@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks Marton,
>>>>>>>>>>>> 
>>>>>>>>>>>> Current created 'hadoop-thirdparty' repo is empty right
>>> now.
>>>>>>>>>>>> Whether to use that repo  for shaded artifact or not will
>>> be
>>>>>>>>>> monitored in
>>>>>>>>>>>> HADOOP-13363 umbrella jira. Please feel free to join the
>>>>>>> discussion.
>>>>>>>>>>>> 
>>>>>>>>>>>> There is no existing codebase is being moved out of
>> hadoop
>>>>>> repo.
>>>>>>> So
>>>>>>>> I
>>>>>>>>>>> think
>>>>>>>>>>>> right now we are good to go.
>>>>>>>>>>>> 
>>>>>>>>>>>> -Vinay
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
>>>> elek@apache.org>
>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I am not sure if it's defined when is a vote required.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> https://www.apache.org/foundation/voting.html
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Personally I think it's a big enough change to send a
>>>>>>> notification
>>>>>>>> to
>>>>>>>>>>> the
>>>>>>>>>>>>> dev lists with a 'lazy consensus'  closure
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Marton
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 2019/09/23 17:46:37, Vinayakumar B <
>>>>>> vinayakumarb@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> As discussed in HADOOP-13363, protobuf 3.x jar (and may
>>> be
>>>>>> more
>>>>>>> in
>>>>>>>>>>>>> future)
>>>>>>>>>>>>>> will be kept as a shaded artifact in a separate repo,
>>> which
>>>>>> will
>>>>>>>> be
>>>>>>>>>>>>>> referred as dependency in hadoop modules.  This
>> approach
>>>>>> avoids
>>>>>>>>>> shading
>>>>>>>>>>>>> of
>>>>>>>>>>>>>> every submodule during build.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> So question is does any VOTE required before asking to
>>>>>> create a
>>>>>>>> git
>>>>>>>>>>> repo?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On selfserve platform
>>>>>>>> https://gitbox.apache.org/setup/newrepo.html
>>>>>>>>>>>>>> I can access see that, requester should be PMC.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Wanted to confirm here first.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -Vinay
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>> 
>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>> To unsubscribe, e-mail:
>>>> private-unsubscribe@hadoop.apache.org
>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>> private-help@hadoop.apache.org
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> 
> -- 
> 
> 
> 
> --Brahma Reddy Battula

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org


Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Ayush Saxena <ay...@gmail.com>.
Hi All,
FYI :
We will be going ahead with the present approach, will merge by tomorrow EOD. Considering no one has objections.
Thanx Everyone!!!

-Ayush

> On 07-Jan-2020, at 9:22 PM, Brahma Reddy Battula <br...@apache.org> wrote:
> 
> Hi Sree vaddi,Owen,stack,Duo Zhang,
> 
> We can move forward based on your comments, just waiting for your
> reply.Hope all of your comments answered..(unification we can think
> parallel thread as Vinay mentioned).
> 
> 
> 
> On Mon, 6 Jan 2020 at 6:21 PM, Vinayakumar B <vi...@apache.org>
> wrote:
> 
>> Hi Sree,
>> 
>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
>> Project ? Or as a TLP ?
>>> Or as a new project definition ?
>> As already mentioned by Ayush, this will be a subproject of Hadoop.
>> Releases will be voted by Hadoop PMC as per ASF process.
>> 
>> 
>>> The effort to streamline and put in an accepted standard for the
>> dependencies that require shading,
>>> seems beyond the siloed efforts of hadoop, hbase, etc....
>> 
>>> I propose, we bring all the decision makers from all these artifacts in
>> one room and decide best course of action.
>>> I am looking at, no projects should ever had to shade any artifacts
>> except as an absolute necessary alternative.
>> 
>> This is the ideal proposal for any project. But unfortunately some projects
>> takes their own course based on need.
>> 
>> In the current case of protobuf in Hadoop,
>>    Protobuf upgrade from 2.5.0 (which is already EOL) was not taken up to
>> avoid downstream failures. Since Hadoop is a platform, its dependencies
>> will get added to downstream projects' classpath. So any change in Hadoop's
>> dependencies will directly affect downstreams. Hadoop strictly follows
>> backward compatibility as far as possible.
>>    Though protobuf provides wire compatibility b/w versions, it doesnt
>> provide compatibility for generated sources.
>>    Now, to support ARM protobuf upgrade is mandatory. Using shading
>> technique, In Hadoop internally can upgrade to shaded protobuf 3.x and
>> still have 2.5.0 protobuf (deprecated) for downstreams.
>> 
>> This shading is necessary to have both versions of protobuf supported.
>> (2.5.0 (non-shaded) for downstream's classpath and 3.x (shaded) for
>> hadoop's internal usage).
>> And this entire work to be done before 3.3.0 release.
>> 
>> So, though its ideal to make a common approach for all projects, I suggest
>> for Hadoop we can go ahead as per current approach.
>> We can also start the parallel effort to address these problems in a
>> separate discussion/proposal. Once the solution is available we can revisit
>> and adopt new solution accordingly in all such projects (ex: HBase, Hadoop,
>> Ratis).
>> 
>> -Vinay
>> 
>>> On Mon, Jan 6, 2020 at 12:39 AM Ayush Saxena <ay...@gmail.com> wrote:
>>> 
>>> Hey Sree
>>> 
>>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
>>>> Project ? Or as a TLP ?
>>>> Or as a new project definition ?
>>>> 
>>> A sub project of Apache Hadoop, having its own independent release
>> cycles.
>>> May be you can put this into the same column as ozone or as
>>> submarine(couple of months ago).
>>> 
>>> Unifying for all, seems interesting but each project is independent and
>> has
>>> its own limitations and way of thinking, I don't think it would be an
>> easy
>>> task to bring all on the same table and get them agree to a common stuff.
>>> 
>>> I guess this has been into discussion since quite long, and there hasn't
>>> been any other alternative suggested. Still we can hold up for a week, if
>>> someone comes up with a better solution, else we can continue in the
>>> present direction.
>>> 
>>> -Ayush
>>> 
>>> 
>>> 
>>> On Sun, 5 Jan 2020 at 05:03, Sree Vaddi <sree_at_chess@yahoo.com
>> .invalid>
>>> wrote:
>>> 
>>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
>>>> Project ? Or as a TLP ?
>>>> Or as a new project definition ?
>>>> 
>>>> The effort to streamline and put in an accepted standard for the
>>>> dependencies that require shading,seems beyond the siloed efforts of
>>>> hadoop, hbase, etc....
>>>> 
>>>> I propose, we bring all the decision makers from all these artifacts in
>>>> one room and decide best course of action.I am looking at, no projects
>>>> should ever had to shade any artifacts except as an absolute necessary
>>>> alternative.
>>>> 
>>>> 
>>>> Thank you./Sree
>>>> 
>>>> 
>>>> 
>>>>    On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
>>>> vinayakumarb@apache.org> wrote:
>>>> 
>>>> Hi,
>>>> Sorry for the late reply,.
>>>>>>> To be exact, how can we better use the thirdparty repo? Looking at
>>>> HBase as an example, it looks like everything that are known to break a
>>> lot
>>>> after an update get shaded into the hbase-thirdparty artifact: guava,
>>>> netty, ... etc.
>>>> Is it the purpose to isolate these naughty dependencies?
>>>> Yes, shading is to isolate these naughty dependencies from downstream
>>>> classpath and have independent control on these upgrades without
>> breaking
>>>> downstreams.
>>>> 
>>>> First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create
>>> the
>>>> protobuf shaded jar is ready to merge.
>>>> 
>>>> Please take a look if anyone interested, will be merged may be after
>> two
>>>> days if no objections.
>>>> 
>>>> -Vinay
>>>> 
>>>> 
>>>> On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org>
>>>> wrote:
>>>> 
>>>>> Hi I am late to this but I am keen to understand more.
>>>>> 
>>>>> To be exact, how can we better use the thirdparty repo? Looking at
>>> HBase
>>>>> as an example, it looks like everything that are known to break a lot
>>>> after
>>>>> an update get shaded into the hbase-thirdparty artifact: guava,
>> netty,
>>>> ...
>>>>> etc.
>>>>> Is it the purpose to isolate these naughty dependencies?
>>>>> 
>>>>> On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <
>> vinayakumarb@apache.org
>>>> 
>>>>> wrote:
>>>>> 
>>>>>> Hi All,
>>>>>> 
>>>>>> I have updated the PR as per @Owen O'Malley <owen.omalley@gmail.com
>>> 
>>>>>> 's suggestions.
>>>>>> 
>>>>>>   i. Renamed the module to 'hadoop-shaded-protobuf37'
>>>>>>   ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
>>>>>> 
>>>>>> Please review!!
>>>>>> 
>>>>>> Thanks,
>>>>>> -Vinay
>>>>>> 
>>>>>> 
>>>>>> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <
>> palomino219@gmail.com
>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> For HBase we have a separated repo for hbase-thirdparty
>>>>>>> 
>>>>>>> https://github.com/apache/hbase-thirdparty
>>>>>>> 
>>>>>>> We will publish the artifacts to nexus so we do not need to
>> include
>>>>>>> binaries in our git repo, just add a dependency in the pom.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
>>>>>>> 
>>>>>>> 
>>>>>>> And it has its own release cycles, only when there are special
>>>>>> requirements
>>>>>>> or we want to upgrade some of the dependencies. This is the vote
>>>> thread
>>>>>> for
>>>>>>> the newest release, where we want to provide a shaded gson for
>> jdk7.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks.
>>>>>>> 
>>>>>>> Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
>>>>>>> 
>>>>>>>> Please find replies inline.
>>>>>>>> 
>>>>>>>> -Vinay
>>>>>>>> 
>>>>>>>> On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
>>>>>> owen.omalley@gmail.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> I'm very unhappy with this direction. In particular, I don't
>>> think
>>>>>> git
>>>>>>> is
>>>>>>>>> a good place for distribution of binary artifacts.
>> Furthermore,
>>>> the
>>>>>> PMC
>>>>>>>>> shouldn't be releasing anything without a release vote.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> Proposed solution doesnt release any binaries in git. Its
>>> actually a
>>>>>>>> complete sub-project which follows entire release process,
>>> including
>>>>>> VOTE
>>>>>>>> in public. I have mentioned already that release process is
>>> similar
>>>> to
>>>>>>>> hadoop.
>>>>>>>> To be specific, using the (almost) same script used in hadoop to
>>>>>> generate
>>>>>>>> artifacts, sign and deploy to staging repository. Please let me
>>> know
>>>>>> If I
>>>>>>>> am conveying anything wrong.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> I'd propose that we make a third party module that contains
>> the
>>>>>>> *source*
>>>>>>>>> of the pom files to build the relocated jars. This should
>>>>>> absolutely be
>>>>>>>>> treated as a last resort for the mostly Google projects that
>>>>>> regularly
>>>>>>>>> break binary compatibility (eg. Protobuf & Guava).
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> Same has been implemented in the PR
>>>>>>>> https://github.com/apache/hadoop-thirdparty/pull/1. Please
>> check
>>>> and
>>>>>> let
>>>>>>>> me
>>>>>>>> know If I misunderstood. Yes, this is the last option we have
>>> AFAIK.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> In terms of naming, I'd propose something like:
>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.thirdparty.protobuf2_5
>>>>>>>>> org.apache.hadoop.thirdparty.guava28
>>>>>>>>> 
>>>>>>>>> In particular, I think we absolutely need to include the
>> version
>>>> of
>>>>>> the
>>>>>>>>> underlying project. On the other hand, since we should not be
>>>>>> shading
>>>>>>>>> *everything* we can drop the leading com.google.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> IMO, This naming convention is easy for identifying the
>> underlying
>>>>>>> project,
>>>>>>>> but  it will be difficult to maintain going forward if
>> underlying
>>>>>> project
>>>>>>>> versions changes. Since thirdparty module have its own releases,
>>>> each
>>>>>> of
>>>>>>>> those release can be mapped to specific version of underlying
>>>> project.
>>>>>>> Even
>>>>>>>> the binary artifact can include a MANIFEST with underlying
>> project
>>>>>>> details
>>>>>>>> as per Steve's suggestion on HADOOP-13363.
>>>>>>>> That said, if you still prefer to have project number in
>> artifact
>>>> id,
>>>>>> it
>>>>>>>> can be done.
>>>>>>>> 
>>>>>>>> The Hadoop project can make releases of  the thirdparty module:
>>>>>>>>> 
>>>>>>>>> <dependency>
>>>>>>>>> <groupId>org.apache.hadoop</groupId>
>>>>>>>>> <artifactId>hadoop-thirdparty-protobuf25</artifactId>
>>>>>>>>> <version>1.0</version>
>>>>>>>>> </dependency>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> Note that the version has to be the hadoop thirdparty release
>>>> number,
>>>>>>> which
>>>>>>>>> is part of why you need to have the underlying version in the
>>>>>> artifact
>>>>>>>>> name. These we can push to maven central as new releases from
>>>>>> Hadoop.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> Exactly, same has been implemented in the PR. hadoop-thirdparty
>>>> module
>>>>>>> have
>>>>>>>> its own releases. But in HADOOP Jira, thirdparty versions can be
>>>>>>>> differentiated using prefix "thirdparty-".
>>>>>>>> 
>>>>>>>> Same solution is being followed in HBase. May be people involved
>>> in
>>>>>> HBase
>>>>>>>> can add some points here.
>>>>>>>> 
>>>>>>>> Thoughts?
>>>>>>>>> 
>>>>>>>>> .. Owen
>>>>>>>>> 
>>>>>>>>> On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
>>>>>> vinayakumarb@apache.org
>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi All,
>>>>>>>>>> 
>>>>>>>>>>   I wanted to discuss about the separate repo for thirdparty
>>>>>>>> dependencies
>>>>>>>>>> which we need to shaded and include in Hadoop component's
>> jars.
>>>>>>>>>> 
>>>>>>>>>>   Apologies for the big text ahead, but this needs clear
>>>>>>> explanation!!
>>>>>>>>>> 
>>>>>>>>>>   Right now most needed such dependency is protobuf.
>> Protobuf
>>>>>>>> dependency
>>>>>>>>>> was not upgraded from 2.5.0 onwards with the fear that
>>> downstream
>>>>>>>> builds,
>>>>>>>>>> which depends on transitive dependency protobuf coming from
>>>>>> hadoop's
>>>>>>>> jars,
>>>>>>>>>> may fail with the upgrade. Apparently protobuf does not
>>> guarantee
>>>>>>> source
>>>>>>>>>> compatibility, though it guarantees wire compatibility
>> between
>>>>>>> versions.
>>>>>>>>>> Because of this behavior, version upgrade may cause breakage
>> in
>>>>>> known
>>>>>>>> and
>>>>>>>>>> unknown (private?) downstreams.
>>>>>>>>>> 
>>>>>>>>>>   So to tackle this, we came up the following proposal in
>>>>>>> HADOOP-13363.
>>>>>>>>>> 
>>>>>>>>>>   Luckily, As far as I know, no APIs, either public to user
>> or
>>>>>>> between
>>>>>>>>>> Hadoop processes, is not directly using protobuf classes in
>>>>>>> signatures.
>>>>>>>>>> (If
>>>>>>>>>> any exist, please let us know).
>>>>>>>>>> 
>>>>>>>>>>   Proposal:
>>>>>>>>>>   ------------
>>>>>>>>>> 
>>>>>>>>>>   1. Create a artifact(s) which contains shaded
>> dependencies.
>>>> All
>>>>>>> such
>>>>>>>>>> shading/relocation will be with known prefix
>>>>>>>>>> **org.apache.hadoop.thirdparty.**.
>>>>>>>>>>   2. Right now protobuf jar (ex:
>>>>>>>> o.a.h.thirdparty:hadoop-shaded-protobuf)
>>>>>>>>>> to start with, all **com.google.protobuf** classes will be
>>>>>> relocated
>>>>>>> as
>>>>>>>>>> **org.apache.hadoop.thirdparty.com.google.protobuf**.
>>>>>>>>>>   3. Hadoop modules, which needs protobuf as dependency,
>> will
>>>> add
>>>>>>> this
>>>>>>>>>> shaded artifact as dependency (ex:
>>>>>>>>>> o.a.h.thirdparty:hadoop-shaded-protobuf).
>>>>>>>>>>   4. All previous usages of "com.google.protobuf" will be
>>>>>> relocated
>>>>>>> to
>>>>>>>>>> "org.apache.hadoop.thirdparty.com.google.protobuf" in the
>> code
>>>> and
>>>>>>> will
>>>>>>>> be
>>>>>>>>>> committed. Please note, this replacement is One-Time directly
>>> in
>>>>>>> source
>>>>>>>>>> code, NOT during compile and package.
>>>>>>>>>>   5. Once all usages of "com.google.protobuf" is relocated,
>>> then
>>>>>>> hadoop
>>>>>>>>>> dont care about which version of original  "protobuf-java" is
>>> in
>>>>>>>>>> dependency.
>>>>>>>>>>   6. Just keep "protobuf-java:2.5.0" in dependency tree not
>> to
>>>>>> break
>>>>>>>> the
>>>>>>>>>> downstreams. But hadoop will be originally using the latest
>>>>>> protobuf
>>>>>>>>>> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
>>>>>>>>>> 
>>>>>>>>>>   7. Coming back to separate repo, Following are most
>>>> appropriate
>>>>>>>> reasons
>>>>>>>>>> of keeping shaded dependency artifact in separate repo
>> instead
>>> of
>>>>>>>>>> submodule.
>>>>>>>>>> 
>>>>>>>>>>     7a. These artifacts need not be built all the time. It
>>> needs
>>>>>> to
>>>>>>> be
>>>>>>>>>> built only when there is a change in the dependency version
>> or
>>>> the
>>>>>>> build
>>>>>>>>>> process.
>>>>>>>>>>     7b. If added as "submodule in Hadoop repo",
>>>>>>>> maven-shade-plugin:shade
>>>>>>>>>> will execute only in package phase. That means, "mvn compile"
>>> or
>>>>>> "mvn
>>>>>>>>>> test-compile" will not be failed as this artifact will not
>> have
>>>>>>>> relocated
>>>>>>>>>> classes, instead it will have original classes, resulting in
>>>>>>> compilation
>>>>>>>>>> failure. Workaround, build thirdparty submodule first and
>>> exclude
>>>>>>>>>> "thirdparty" submodule in other executions. This will be a
>>>> complex
>>>>>>>> process
>>>>>>>>>> compared to keeping in a separate repo.
>>>>>>>>>> 
>>>>>>>>>>     7c. Separate repo, will be a subproject of Hadoop, using
>>> the
>>>>>>> same
>>>>>>>>>> HADOOP jira project, with different versioning prefixed with
>>>>>>>> "thirdparty-"
>>>>>>>>>> (ex: thirdparty-1.0.0).
>>>>>>>>>>     7d. Separate will have same release process as Hadoop.
>>>>>>>>>> 
>>>>>>>>>>   HADOOP-13363 (
>>>>>> https://issues.apache.org/jira/browse/HADOOP-13363)
>>>>>>>> is
>>>>>>>>>> an
>>>>>>>>>> umbrella jira tracking the changes to protobuf upgrade.
>>>>>>>>>> 
>>>>>>>>>>   PR (https://github.com/apache/hadoop-thirdparty/pull/1)
>> has
>>>>>> been
>>>>>>>>>> raised
>>>>>>>>>> for separate repo creation in (HADOOP-16595 (
>>>>>>>>>> https://issues.apache.org/jira/browse/HADOOP-16595)
>>>>>>>>>> 
>>>>>>>>>>   Please provide your inputs for the proposal and review the
>>> PR
>>>>>> to
>>>>>>>>>> proceed with the proposal.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>   -Thanks,
>>>>>>>>>>   Vinay
>>>>>>>>>> 
>>>>>>>>>> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
>>>>>>>>>> vinodkv@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Moving the thread to the dev lists.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks
>>>>>>>>>>> +Vinod
>>>>>>>>>>> 
>>>>>>>>>>>> On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
>>>>>>>> vinayakumarb@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks Marton,
>>>>>>>>>>>> 
>>>>>>>>>>>> Current created 'hadoop-thirdparty' repo is empty right
>>> now.
>>>>>>>>>>>> Whether to use that repo  for shaded artifact or not will
>>> be
>>>>>>>>>> monitored in
>>>>>>>>>>>> HADOOP-13363 umbrella jira. Please feel free to join the
>>>>>>> discussion.
>>>>>>>>>>>> 
>>>>>>>>>>>> There is no existing codebase is being moved out of
>> hadoop
>>>>>> repo.
>>>>>>> So
>>>>>>>> I
>>>>>>>>>>> think
>>>>>>>>>>>> right now we are good to go.
>>>>>>>>>>>> 
>>>>>>>>>>>> -Vinay
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
>>>> elek@apache.org>
>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I am not sure if it's defined when is a vote required.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> https://www.apache.org/foundation/voting.html
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Personally I think it's a big enough change to send a
>>>>>>> notification
>>>>>>>> to
>>>>>>>>>>> the
>>>>>>>>>>>>> dev lists with a 'lazy consensus'  closure
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Marton
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 2019/09/23 17:46:37, Vinayakumar B <
>>>>>> vinayakumarb@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> As discussed in HADOOP-13363, protobuf 3.x jar (and may
>>> be
>>>>>> more
>>>>>>> in
>>>>>>>>>>>>> future)
>>>>>>>>>>>>>> will be kept as a shaded artifact in a separate repo,
>>> which
>>>>>> will
>>>>>>>> be
>>>>>>>>>>>>>> referred as dependency in hadoop modules.  This
>> approach
>>>>>> avoids
>>>>>>>>>> shading
>>>>>>>>>>>>> of
>>>>>>>>>>>>>> every submodule during build.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> So question is does any VOTE required before asking to
>>>>>> create a
>>>>>>>> git
>>>>>>>>>>> repo?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On selfserve platform
>>>>>>>> https://gitbox.apache.org/setup/newrepo.html
>>>>>>>>>>>>>> I can access see that, requester should be PMC.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Wanted to confirm here first.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -Vinay
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>> 
>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>> To unsubscribe, e-mail:
>>>> private-unsubscribe@hadoop.apache.org
>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>> private-help@hadoop.apache.org
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> 
> -- 
> 
> 
> 
> --Brahma Reddy Battula

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org


Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Ayush Saxena <ay...@gmail.com>.
Hi All,
FYI :
We will be going ahead with the present approach, will merge by tomorrow EOD. Considering no one has objections.
Thanx Everyone!!!

-Ayush

> On 07-Jan-2020, at 9:22 PM, Brahma Reddy Battula <br...@apache.org> wrote:
> 
> Hi Sree vaddi,Owen,stack,Duo Zhang,
> 
> We can move forward based on your comments, just waiting for your
> reply.Hope all of your comments answered..(unification we can think
> parallel thread as Vinay mentioned).
> 
> 
> 
> On Mon, 6 Jan 2020 at 6:21 PM, Vinayakumar B <vi...@apache.org>
> wrote:
> 
>> Hi Sree,
>> 
>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
>> Project ? Or as a TLP ?
>>> Or as a new project definition ?
>> As already mentioned by Ayush, this will be a subproject of Hadoop.
>> Releases will be voted by Hadoop PMC as per ASF process.
>> 
>> 
>>> The effort to streamline and put in an accepted standard for the
>> dependencies that require shading,
>>> seems beyond the siloed efforts of hadoop, hbase, etc....
>> 
>>> I propose, we bring all the decision makers from all these artifacts in
>> one room and decide best course of action.
>>> I am looking at, no projects should ever had to shade any artifacts
>> except as an absolute necessary alternative.
>> 
>> This is the ideal proposal for any project. But unfortunately some projects
>> takes their own course based on need.
>> 
>> In the current case of protobuf in Hadoop,
>>    Protobuf upgrade from 2.5.0 (which is already EOL) was not taken up to
>> avoid downstream failures. Since Hadoop is a platform, its dependencies
>> will get added to downstream projects' classpath. So any change in Hadoop's
>> dependencies will directly affect downstreams. Hadoop strictly follows
>> backward compatibility as far as possible.
>>    Though protobuf provides wire compatibility b/w versions, it doesnt
>> provide compatibility for generated sources.
>>    Now, to support ARM protobuf upgrade is mandatory. Using shading
>> technique, In Hadoop internally can upgrade to shaded protobuf 3.x and
>> still have 2.5.0 protobuf (deprecated) for downstreams.
>> 
>> This shading is necessary to have both versions of protobuf supported.
>> (2.5.0 (non-shaded) for downstream's classpath and 3.x (shaded) for
>> hadoop's internal usage).
>> And this entire work to be done before 3.3.0 release.
>> 
>> So, though its ideal to make a common approach for all projects, I suggest
>> for Hadoop we can go ahead as per current approach.
>> We can also start the parallel effort to address these problems in a
>> separate discussion/proposal. Once the solution is available we can revisit
>> and adopt new solution accordingly in all such projects (ex: HBase, Hadoop,
>> Ratis).
>> 
>> -Vinay
>> 
>>> On Mon, Jan 6, 2020 at 12:39 AM Ayush Saxena <ay...@gmail.com> wrote:
>>> 
>>> Hey Sree
>>> 
>>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
>>>> Project ? Or as a TLP ?
>>>> Or as a new project definition ?
>>>> 
>>> A sub project of Apache Hadoop, having its own independent release
>> cycles.
>>> May be you can put this into the same column as ozone or as
>>> submarine(couple of months ago).
>>> 
>>> Unifying for all, seems interesting but each project is independent and
>> has
>>> its own limitations and way of thinking, I don't think it would be an
>> easy
>>> task to bring all on the same table and get them agree to a common stuff.
>>> 
>>> I guess this has been into discussion since quite long, and there hasn't
>>> been any other alternative suggested. Still we can hold up for a week, if
>>> someone comes up with a better solution, else we can continue in the
>>> present direction.
>>> 
>>> -Ayush
>>> 
>>> 
>>> 
>>> On Sun, 5 Jan 2020 at 05:03, Sree Vaddi <sree_at_chess@yahoo.com
>> .invalid>
>>> wrote:
>>> 
>>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
>>>> Project ? Or as a TLP ?
>>>> Or as a new project definition ?
>>>> 
>>>> The effort to streamline and put in an accepted standard for the
>>>> dependencies that require shading,seems beyond the siloed efforts of
>>>> hadoop, hbase, etc....
>>>> 
>>>> I propose, we bring all the decision makers from all these artifacts in
>>>> one room and decide best course of action.I am looking at, no projects
>>>> should ever had to shade any artifacts except as an absolute necessary
>>>> alternative.
>>>> 
>>>> 
>>>> Thank you./Sree
>>>> 
>>>> 
>>>> 
>>>>    On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
>>>> vinayakumarb@apache.org> wrote:
>>>> 
>>>> Hi,
>>>> Sorry for the late reply,.
>>>>>>> To be exact, how can we better use the thirdparty repo? Looking at
>>>> HBase as an example, it looks like everything that are known to break a
>>> lot
>>>> after an update get shaded into the hbase-thirdparty artifact: guava,
>>>> netty, ... etc.
>>>> Is it the purpose to isolate these naughty dependencies?
>>>> Yes, shading is to isolate these naughty dependencies from downstream
>>>> classpath and have independent control on these upgrades without
>> breaking
>>>> downstreams.
>>>> 
>>>> First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create
>>> the
>>>> protobuf shaded jar is ready to merge.
>>>> 
>>>> Please take a look if anyone interested, will be merged may be after
>> two
>>>> days if no objections.
>>>> 
>>>> -Vinay
>>>> 
>>>> 
>>>> On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org>
>>>> wrote:
>>>> 
>>>>> Hi I am late to this but I am keen to understand more.
>>>>> 
>>>>> To be exact, how can we better use the thirdparty repo? Looking at
>>> HBase
>>>>> as an example, it looks like everything that are known to break a lot
>>>> after
>>>>> an update get shaded into the hbase-thirdparty artifact: guava,
>> netty,
>>>> ...
>>>>> etc.
>>>>> Is it the purpose to isolate these naughty dependencies?
>>>>> 
>>>>> On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <
>> vinayakumarb@apache.org
>>>> 
>>>>> wrote:
>>>>> 
>>>>>> Hi All,
>>>>>> 
>>>>>> I have updated the PR as per @Owen O'Malley <owen.omalley@gmail.com
>>> 
>>>>>> 's suggestions.
>>>>>> 
>>>>>>   i. Renamed the module to 'hadoop-shaded-protobuf37'
>>>>>>   ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
>>>>>> 
>>>>>> Please review!!
>>>>>> 
>>>>>> Thanks,
>>>>>> -Vinay
>>>>>> 
>>>>>> 
>>>>>> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <
>> palomino219@gmail.com
>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> For HBase we have a separated repo for hbase-thirdparty
>>>>>>> 
>>>>>>> https://github.com/apache/hbase-thirdparty
>>>>>>> 
>>>>>>> We will publish the artifacts to nexus so we do not need to
>> include
>>>>>>> binaries in our git repo, just add a dependency in the pom.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
>>>>>>> 
>>>>>>> 
>>>>>>> And it has its own release cycles, only when there are special
>>>>>> requirements
>>>>>>> or we want to upgrade some of the dependencies. This is the vote
>>>> thread
>>>>>> for
>>>>>>> the newest release, where we want to provide a shaded gson for
>> jdk7.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks.
>>>>>>> 
>>>>>>> Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
>>>>>>> 
>>>>>>>> Please find replies inline.
>>>>>>>> 
>>>>>>>> -Vinay
>>>>>>>> 
>>>>>>>> On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
>>>>>> owen.omalley@gmail.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> I'm very unhappy with this direction. In particular, I don't
>>> think
>>>>>> git
>>>>>>> is
>>>>>>>>> a good place for distribution of binary artifacts.
>> Furthermore,
>>>> the
>>>>>> PMC
>>>>>>>>> shouldn't be releasing anything without a release vote.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> Proposed solution doesnt release any binaries in git. Its
>>> actually a
>>>>>>>> complete sub-project which follows entire release process,
>>> including
>>>>>> VOTE
>>>>>>>> in public. I have mentioned already that release process is
>>> similar
>>>> to
>>>>>>>> hadoop.
>>>>>>>> To be specific, using the (almost) same script used in hadoop to
>>>>>> generate
>>>>>>>> artifacts, sign and deploy to staging repository. Please let me
>>> know
>>>>>> If I
>>>>>>>> am conveying anything wrong.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> I'd propose that we make a third party module that contains
>> the
>>>>>>> *source*
>>>>>>>>> of the pom files to build the relocated jars. This should
>>>>>> absolutely be
>>>>>>>>> treated as a last resort for the mostly Google projects that
>>>>>> regularly
>>>>>>>>> break binary compatibility (eg. Protobuf & Guava).
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> Same has been implemented in the PR
>>>>>>>> https://github.com/apache/hadoop-thirdparty/pull/1. Please
>> check
>>>> and
>>>>>> let
>>>>>>>> me
>>>>>>>> know If I misunderstood. Yes, this is the last option we have
>>> AFAIK.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> In terms of naming, I'd propose something like:
>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.thirdparty.protobuf2_5
>>>>>>>>> org.apache.hadoop.thirdparty.guava28
>>>>>>>>> 
>>>>>>>>> In particular, I think we absolutely need to include the
>> version
>>>> of
>>>>>> the
>>>>>>>>> underlying project. On the other hand, since we should not be
>>>>>> shading
>>>>>>>>> *everything* we can drop the leading com.google.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> IMO, This naming convention is easy for identifying the
>> underlying
>>>>>>> project,
>>>>>>>> but  it will be difficult to maintain going forward if
>> underlying
>>>>>> project
>>>>>>>> versions changes. Since thirdparty module have its own releases,
>>>> each
>>>>>> of
>>>>>>>> those release can be mapped to specific version of underlying
>>>> project.
>>>>>>> Even
>>>>>>>> the binary artifact can include a MANIFEST with underlying
>> project
>>>>>>> details
>>>>>>>> as per Steve's suggestion on HADOOP-13363.
>>>>>>>> That said, if you still prefer to have project number in
>> artifact
>>>> id,
>>>>>> it
>>>>>>>> can be done.
>>>>>>>> 
>>>>>>>> The Hadoop project can make releases of  the thirdparty module:
>>>>>>>>> 
>>>>>>>>> <dependency>
>>>>>>>>> <groupId>org.apache.hadoop</groupId>
>>>>>>>>> <artifactId>hadoop-thirdparty-protobuf25</artifactId>
>>>>>>>>> <version>1.0</version>
>>>>>>>>> </dependency>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> Note that the version has to be the hadoop thirdparty release
>>>> number,
>>>>>>> which
>>>>>>>>> is part of why you need to have the underlying version in the
>>>>>> artifact
>>>>>>>>> name. These we can push to maven central as new releases from
>>>>>> Hadoop.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> Exactly, same has been implemented in the PR. hadoop-thirdparty
>>>> module
>>>>>>> have
>>>>>>>> its own releases. But in HADOOP Jira, thirdparty versions can be
>>>>>>>> differentiated using prefix "thirdparty-".
>>>>>>>> 
>>>>>>>> Same solution is being followed in HBase. May be people involved
>>> in
>>>>>> HBase
>>>>>>>> can add some points here.
>>>>>>>> 
>>>>>>>> Thoughts?
>>>>>>>>> 
>>>>>>>>> .. Owen
>>>>>>>>> 
>>>>>>>>> On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
>>>>>> vinayakumarb@apache.org
>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi All,
>>>>>>>>>> 
>>>>>>>>>>   I wanted to discuss about the separate repo for thirdparty
>>>>>>>> dependencies
>>>>>>>>>> which we need to shaded and include in Hadoop component's
>> jars.
>>>>>>>>>> 
>>>>>>>>>>   Apologies for the big text ahead, but this needs clear
>>>>>>> explanation!!
>>>>>>>>>> 
>>>>>>>>>>   Right now most needed such dependency is protobuf.
>> Protobuf
>>>>>>>> dependency
>>>>>>>>>> was not upgraded from 2.5.0 onwards with the fear that
>>> downstream
>>>>>>>> builds,
>>>>>>>>>> which depends on transitive dependency protobuf coming from
>>>>>> hadoop's
>>>>>>>> jars,
>>>>>>>>>> may fail with the upgrade. Apparently protobuf does not
>>> guarantee
>>>>>>> source
>>>>>>>>>> compatibility, though it guarantees wire compatibility
>> between
>>>>>>> versions.
>>>>>>>>>> Because of this behavior, version upgrade may cause breakage
>> in
>>>>>> known
>>>>>>>> and
>>>>>>>>>> unknown (private?) downstreams.
>>>>>>>>>> 
>>>>>>>>>>   So to tackle this, we came up the following proposal in
>>>>>>> HADOOP-13363.
>>>>>>>>>> 
>>>>>>>>>>   Luckily, As far as I know, no APIs, either public to user
>> or
>>>>>>> between
>>>>>>>>>> Hadoop processes, is not directly using protobuf classes in
>>>>>>> signatures.
>>>>>>>>>> (If
>>>>>>>>>> any exist, please let us know).
>>>>>>>>>> 
>>>>>>>>>>   Proposal:
>>>>>>>>>>   ------------
>>>>>>>>>> 
>>>>>>>>>>   1. Create a artifact(s) which contains shaded
>> dependencies.
>>>> All
>>>>>>> such
>>>>>>>>>> shading/relocation will be with known prefix
>>>>>>>>>> **org.apache.hadoop.thirdparty.**.
>>>>>>>>>>   2. Right now protobuf jar (ex:
>>>>>>>> o.a.h.thirdparty:hadoop-shaded-protobuf)
>>>>>>>>>> to start with, all **com.google.protobuf** classes will be
>>>>>> relocated
>>>>>>> as
>>>>>>>>>> **org.apache.hadoop.thirdparty.com.google.protobuf**.
>>>>>>>>>>   3. Hadoop modules, which needs protobuf as dependency,
>> will
>>>> add
>>>>>>> this
>>>>>>>>>> shaded artifact as dependency (ex:
>>>>>>>>>> o.a.h.thirdparty:hadoop-shaded-protobuf).
>>>>>>>>>>   4. All previous usages of "com.google.protobuf" will be
>>>>>> relocated
>>>>>>> to
>>>>>>>>>> "org.apache.hadoop.thirdparty.com.google.protobuf" in the
>> code
>>>> and
>>>>>>> will
>>>>>>>> be
>>>>>>>>>> committed. Please note, this replacement is One-Time directly
>>> in
>>>>>>> source
>>>>>>>>>> code, NOT during compile and package.
>>>>>>>>>>   5. Once all usages of "com.google.protobuf" is relocated,
>>> then
>>>>>>> hadoop
>>>>>>>>>> dont care about which version of original  "protobuf-java" is
>>> in
>>>>>>>>>> dependency.
>>>>>>>>>>   6. Just keep "protobuf-java:2.5.0" in dependency tree not
>> to
>>>>>> break
>>>>>>>> the
>>>>>>>>>> downstreams. But hadoop will be originally using the latest
>>>>>> protobuf
>>>>>>>>>> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
>>>>>>>>>> 
>>>>>>>>>>   7. Coming back to separate repo, Following are most
>>>> appropriate
>>>>>>>> reasons
>>>>>>>>>> of keeping shaded dependency artifact in separate repo
>> instead
>>> of
>>>>>>>>>> submodule.
>>>>>>>>>> 
>>>>>>>>>>     7a. These artifacts need not be built all the time. It
>>> needs
>>>>>> to
>>>>>>> be
>>>>>>>>>> built only when there is a change in the dependency version
>> or
>>>> the
>>>>>>> build
>>>>>>>>>> process.
>>>>>>>>>>     7b. If added as "submodule in Hadoop repo",
>>>>>>>> maven-shade-plugin:shade
>>>>>>>>>> will execute only in package phase. That means, "mvn compile"
>>> or
>>>>>> "mvn
>>>>>>>>>> test-compile" will not be failed as this artifact will not
>> have
>>>>>>>> relocated
>>>>>>>>>> classes, instead it will have original classes, resulting in
>>>>>>> compilation
>>>>>>>>>> failure. Workaround, build thirdparty submodule first and
>>> exclude
>>>>>>>>>> "thirdparty" submodule in other executions. This will be a
>>>> complex
>>>>>>>> process
>>>>>>>>>> compared to keeping in a separate repo.
>>>>>>>>>> 
>>>>>>>>>>     7c. Separate repo, will be a subproject of Hadoop, using
>>> the
>>>>>>> same
>>>>>>>>>> HADOOP jira project, with different versioning prefixed with
>>>>>>>> "thirdparty-"
>>>>>>>>>> (ex: thirdparty-1.0.0).
>>>>>>>>>>     7d. Separate will have same release process as Hadoop.
>>>>>>>>>> 
>>>>>>>>>>   HADOOP-13363 (
>>>>>> https://issues.apache.org/jira/browse/HADOOP-13363)
>>>>>>>> is
>>>>>>>>>> an
>>>>>>>>>> umbrella jira tracking the changes to protobuf upgrade.
>>>>>>>>>> 
>>>>>>>>>>   PR (https://github.com/apache/hadoop-thirdparty/pull/1)
>> has
>>>>>> been
>>>>>>>>>> raised
>>>>>>>>>> for separate repo creation in (HADOOP-16595 (
>>>>>>>>>> https://issues.apache.org/jira/browse/HADOOP-16595)
>>>>>>>>>> 
>>>>>>>>>>   Please provide your inputs for the proposal and review the
>>> PR
>>>>>> to
>>>>>>>>>> proceed with the proposal.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>   -Thanks,
>>>>>>>>>>   Vinay
>>>>>>>>>> 
>>>>>>>>>> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
>>>>>>>>>> vinodkv@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Moving the thread to the dev lists.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks
>>>>>>>>>>> +Vinod
>>>>>>>>>>> 
>>>>>>>>>>>> On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
>>>>>>>> vinayakumarb@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks Marton,
>>>>>>>>>>>> 
>>>>>>>>>>>> Current created 'hadoop-thirdparty' repo is empty right
>>> now.
>>>>>>>>>>>> Whether to use that repo  for shaded artifact or not will
>>> be
>>>>>>>>>> monitored in
>>>>>>>>>>>> HADOOP-13363 umbrella jira. Please feel free to join the
>>>>>>> discussion.
>>>>>>>>>>>> 
>>>>>>>>>>>> There is no existing codebase is being moved out of
>> hadoop
>>>>>> repo.
>>>>>>> So
>>>>>>>> I
>>>>>>>>>>> think
>>>>>>>>>>>> right now we are good to go.
>>>>>>>>>>>> 
>>>>>>>>>>>> -Vinay
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
>>>> elek@apache.org>
>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I am not sure if it's defined when is a vote required.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> https://www.apache.org/foundation/voting.html
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Personally I think it's a big enough change to send a
>>>>>>> notification
>>>>>>>> to
>>>>>>>>>>> the
>>>>>>>>>>>>> dev lists with a 'lazy consensus'  closure
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Marton
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 2019/09/23 17:46:37, Vinayakumar B <
>>>>>> vinayakumarb@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> As discussed in HADOOP-13363, protobuf 3.x jar (and may
>>> be
>>>>>> more
>>>>>>> in
>>>>>>>>>>>>> future)
>>>>>>>>>>>>>> will be kept as a shaded artifact in a separate repo,
>>> which
>>>>>> will
>>>>>>>> be
>>>>>>>>>>>>>> referred as dependency in hadoop modules.  This
>> approach
>>>>>> avoids
>>>>>>>>>> shading
>>>>>>>>>>>>> of
>>>>>>>>>>>>>> every submodule during build.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> So question is does any VOTE required before asking to
>>>>>> create a
>>>>>>>> git
>>>>>>>>>>> repo?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On selfserve platform
>>>>>>>> https://gitbox.apache.org/setup/newrepo.html
>>>>>>>>>>>>>> I can access see that, requester should be PMC.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Wanted to confirm here first.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -Vinay
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>> 
>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>> To unsubscribe, e-mail:
>>>> private-unsubscribe@hadoop.apache.org
>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>> private-help@hadoop.apache.org
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> 
> -- 
> 
> 
> 
> --Brahma Reddy Battula

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org


Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Ayush Saxena <ay...@gmail.com>.
Hi All,
FYI :
We will be going ahead with the present approach, will merge by tomorrow EOD. Considering no one has objections.
Thanx Everyone!!!

-Ayush

> On 07-Jan-2020, at 9:22 PM, Brahma Reddy Battula <br...@apache.org> wrote:
> 
> Hi Sree vaddi,Owen,stack,Duo Zhang,
> 
> We can move forward based on your comments, just waiting for your
> reply.Hope all of your comments answered..(unification we can think
> parallel thread as Vinay mentioned).
> 
> 
> 
> On Mon, 6 Jan 2020 at 6:21 PM, Vinayakumar B <vi...@apache.org>
> wrote:
> 
>> Hi Sree,
>> 
>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
>> Project ? Or as a TLP ?
>>> Or as a new project definition ?
>> As already mentioned by Ayush, this will be a subproject of Hadoop.
>> Releases will be voted by Hadoop PMC as per ASF process.
>> 
>> 
>>> The effort to streamline and put in an accepted standard for the
>> dependencies that require shading,
>>> seems beyond the siloed efforts of hadoop, hbase, etc....
>> 
>>> I propose, we bring all the decision makers from all these artifacts in
>> one room and decide best course of action.
>>> I am looking at, no projects should ever had to shade any artifacts
>> except as an absolute necessary alternative.
>> 
>> This is the ideal proposal for any project. But unfortunately some projects
>> takes their own course based on need.
>> 
>> In the current case of protobuf in Hadoop,
>>    Protobuf upgrade from 2.5.0 (which is already EOL) was not taken up to
>> avoid downstream failures. Since Hadoop is a platform, its dependencies
>> will get added to downstream projects' classpath. So any change in Hadoop's
>> dependencies will directly affect downstreams. Hadoop strictly follows
>> backward compatibility as far as possible.
>>    Though protobuf provides wire compatibility b/w versions, it doesnt
>> provide compatibility for generated sources.
>>    Now, to support ARM protobuf upgrade is mandatory. Using shading
>> technique, In Hadoop internally can upgrade to shaded protobuf 3.x and
>> still have 2.5.0 protobuf (deprecated) for downstreams.
>> 
>> This shading is necessary to have both versions of protobuf supported.
>> (2.5.0 (non-shaded) for downstream's classpath and 3.x (shaded) for
>> hadoop's internal usage).
>> And this entire work to be done before 3.3.0 release.
>> 
>> So, though its ideal to make a common approach for all projects, I suggest
>> for Hadoop we can go ahead as per current approach.
>> We can also start the parallel effort to address these problems in a
>> separate discussion/proposal. Once the solution is available we can revisit
>> and adopt new solution accordingly in all such projects (ex: HBase, Hadoop,
>> Ratis).
>> 
>> -Vinay
>> 
>>> On Mon, Jan 6, 2020 at 12:39 AM Ayush Saxena <ay...@gmail.com> wrote:
>>> 
>>> Hey Sree
>>> 
>>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
>>>> Project ? Or as a TLP ?
>>>> Or as a new project definition ?
>>>> 
>>> A sub project of Apache Hadoop, having its own independent release
>> cycles.
>>> May be you can put this into the same column as ozone or as
>>> submarine(couple of months ago).
>>> 
>>> Unifying for all, seems interesting but each project is independent and
>> has
>>> its own limitations and way of thinking, I don't think it would be an
>> easy
>>> task to bring all on the same table and get them agree to a common stuff.
>>> 
>>> I guess this has been into discussion since quite long, and there hasn't
>>> been any other alternative suggested. Still we can hold up for a week, if
>>> someone comes up with a better solution, else we can continue in the
>>> present direction.
>>> 
>>> -Ayush
>>> 
>>> 
>>> 
>>> On Sun, 5 Jan 2020 at 05:03, Sree Vaddi <sree_at_chess@yahoo.com
>> .invalid>
>>> wrote:
>>> 
>>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
>>>> Project ? Or as a TLP ?
>>>> Or as a new project definition ?
>>>> 
>>>> The effort to streamline and put in an accepted standard for the
>>>> dependencies that require shading,seems beyond the siloed efforts of
>>>> hadoop, hbase, etc....
>>>> 
>>>> I propose, we bring all the decision makers from all these artifacts in
>>>> one room and decide best course of action.I am looking at, no projects
>>>> should ever had to shade any artifacts except as an absolute necessary
>>>> alternative.
>>>> 
>>>> 
>>>> Thank you./Sree
>>>> 
>>>> 
>>>> 
>>>>    On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
>>>> vinayakumarb@apache.org> wrote:
>>>> 
>>>> Hi,
>>>> Sorry for the late reply,.
>>>>>>> To be exact, how can we better use the thirdparty repo? Looking at
>>>> HBase as an example, it looks like everything that are known to break a
>>> lot
>>>> after an update get shaded into the hbase-thirdparty artifact: guava,
>>>> netty, ... etc.
>>>> Is it the purpose to isolate these naughty dependencies?
>>>> Yes, shading is to isolate these naughty dependencies from downstream
>>>> classpath and have independent control on these upgrades without
>> breaking
>>>> downstreams.
>>>> 
>>>> First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create
>>> the
>>>> protobuf shaded jar is ready to merge.
>>>> 
>>>> Please take a look if anyone interested, will be merged may be after
>> two
>>>> days if no objections.
>>>> 
>>>> -Vinay
>>>> 
>>>> 
>>>> On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org>
>>>> wrote:
>>>> 
>>>>> Hi I am late to this but I am keen to understand more.
>>>>> 
>>>>> To be exact, how can we better use the thirdparty repo? Looking at
>>> HBase
>>>>> as an example, it looks like everything that are known to break a lot
>>>> after
>>>>> an update get shaded into the hbase-thirdparty artifact: guava,
>> netty,
>>>> ...
>>>>> etc.
>>>>> Is it the purpose to isolate these naughty dependencies?
>>>>> 
>>>>> On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <
>> vinayakumarb@apache.org
>>>> 
>>>>> wrote:
>>>>> 
>>>>>> Hi All,
>>>>>> 
>>>>>> I have updated the PR as per @Owen O'Malley <owen.omalley@gmail.com
>>> 
>>>>>> 's suggestions.
>>>>>> 
>>>>>>   i. Renamed the module to 'hadoop-shaded-protobuf37'
>>>>>>   ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
>>>>>> 
>>>>>> Please review!!
>>>>>> 
>>>>>> Thanks,
>>>>>> -Vinay
>>>>>> 
>>>>>> 
>>>>>> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <
>> palomino219@gmail.com
>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> For HBase we have a separated repo for hbase-thirdparty
>>>>>>> 
>>>>>>> https://github.com/apache/hbase-thirdparty
>>>>>>> 
>>>>>>> We will publish the artifacts to nexus so we do not need to
>> include
>>>>>>> binaries in our git repo, just add a dependency in the pom.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
>>>>>>> 
>>>>>>> 
>>>>>>> And it has its own release cycles, only when there are special
>>>>>> requirements
>>>>>>> or we want to upgrade some of the dependencies. This is the vote
>>>> thread
>>>>>> for
>>>>>>> the newest release, where we want to provide a shaded gson for
>> jdk7.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks.
>>>>>>> 
>>>>>>> Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
>>>>>>> 
>>>>>>>> Please find replies inline.
>>>>>>>> 
>>>>>>>> -Vinay
>>>>>>>> 
>>>>>>>> On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
>>>>>> owen.omalley@gmail.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> I'm very unhappy with this direction. In particular, I don't
>>> think
>>>>>> git
>>>>>>> is
>>>>>>>>> a good place for distribution of binary artifacts.
>> Furthermore,
>>>> the
>>>>>> PMC
>>>>>>>>> shouldn't be releasing anything without a release vote.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> Proposed solution doesnt release any binaries in git. Its
>>> actually a
>>>>>>>> complete sub-project which follows entire release process,
>>> including
>>>>>> VOTE
>>>>>>>> in public. I have mentioned already that release process is
>>> similar
>>>> to
>>>>>>>> hadoop.
>>>>>>>> To be specific, using the (almost) same script used in hadoop to
>>>>>> generate
>>>>>>>> artifacts, sign and deploy to staging repository. Please let me
>>> know
>>>>>> If I
>>>>>>>> am conveying anything wrong.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> I'd propose that we make a third party module that contains
>> the
>>>>>>> *source*
>>>>>>>>> of the pom files to build the relocated jars. This should
>>>>>> absolutely be
>>>>>>>>> treated as a last resort for the mostly Google projects that
>>>>>> regularly
>>>>>>>>> break binary compatibility (eg. Protobuf & Guava).
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> Same has been implemented in the PR
>>>>>>>> https://github.com/apache/hadoop-thirdparty/pull/1. Please
>> check
>>>> and
>>>>>> let
>>>>>>>> me
>>>>>>>> know If I misunderstood. Yes, this is the last option we have
>>> AFAIK.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> In terms of naming, I'd propose something like:
>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.thirdparty.protobuf2_5
>>>>>>>>> org.apache.hadoop.thirdparty.guava28
>>>>>>>>> 
>>>>>>>>> In particular, I think we absolutely need to include the
>> version
>>>> of
>>>>>> the
>>>>>>>>> underlying project. On the other hand, since we should not be
>>>>>> shading
>>>>>>>>> *everything* we can drop the leading com.google.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> IMO, This naming convention is easy for identifying the
>> underlying
>>>>>>> project,
>>>>>>>> but  it will be difficult to maintain going forward if
>> underlying
>>>>>> project
>>>>>>>> versions changes. Since thirdparty module have its own releases,
>>>> each
>>>>>> of
>>>>>>>> those release can be mapped to specific version of underlying
>>>> project.
>>>>>>> Even
>>>>>>>> the binary artifact can include a MANIFEST with underlying
>> project
>>>>>>> details
>>>>>>>> as per Steve's suggestion on HADOOP-13363.
>>>>>>>> That said, if you still prefer to have project number in
>> artifact
>>>> id,
>>>>>> it
>>>>>>>> can be done.
>>>>>>>> 
>>>>>>>> The Hadoop project can make releases of  the thirdparty module:
>>>>>>>>> 
>>>>>>>>> <dependency>
>>>>>>>>> <groupId>org.apache.hadoop</groupId>
>>>>>>>>> <artifactId>hadoop-thirdparty-protobuf25</artifactId>
>>>>>>>>> <version>1.0</version>
>>>>>>>>> </dependency>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> Note that the version has to be the hadoop thirdparty release
>>>> number,
>>>>>>> which
>>>>>>>>> is part of why you need to have the underlying version in the
>>>>>> artifact
>>>>>>>>> name. These we can push to maven central as new releases from
>>>>>> Hadoop.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> Exactly, same has been implemented in the PR. hadoop-thirdparty
>>>> module
>>>>>>> have
>>>>>>>> its own releases. But in HADOOP Jira, thirdparty versions can be
>>>>>>>> differentiated using prefix "thirdparty-".
>>>>>>>> 
>>>>>>>> Same solution is being followed in HBase. May be people involved
>>> in
>>>>>> HBase
>>>>>>>> can add some points here.
>>>>>>>> 
>>>>>>>> Thoughts?
>>>>>>>>> 
>>>>>>>>> .. Owen
>>>>>>>>> 
>>>>>>>>> On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
>>>>>> vinayakumarb@apache.org
>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi All,
>>>>>>>>>> 
>>>>>>>>>>   I wanted to discuss about the separate repo for thirdparty
>>>>>>>> dependencies
>>>>>>>>>> which we need to shaded and include in Hadoop component's
>> jars.
>>>>>>>>>> 
>>>>>>>>>>   Apologies for the big text ahead, but this needs clear
>>>>>>> explanation!!
>>>>>>>>>> 
>>>>>>>>>>   Right now most needed such dependency is protobuf.
>> Protobuf
>>>>>>>> dependency
>>>>>>>>>> was not upgraded from 2.5.0 onwards with the fear that
>>> downstream
>>>>>>>> builds,
>>>>>>>>>> which depends on transitive dependency protobuf coming from
>>>>>> hadoop's
>>>>>>>> jars,
>>>>>>>>>> may fail with the upgrade. Apparently protobuf does not
>>> guarantee
>>>>>>> source
>>>>>>>>>> compatibility, though it guarantees wire compatibility
>> between
>>>>>>> versions.
>>>>>>>>>> Because of this behavior, version upgrade may cause breakage
>> in
>>>>>> known
>>>>>>>> and
>>>>>>>>>> unknown (private?) downstreams.
>>>>>>>>>> 
>>>>>>>>>>   So to tackle this, we came up the following proposal in
>>>>>>> HADOOP-13363.
>>>>>>>>>> 
>>>>>>>>>>   Luckily, As far as I know, no APIs, either public to user
>> or
>>>>>>> between
>>>>>>>>>> Hadoop processes, is not directly using protobuf classes in
>>>>>>> signatures.
>>>>>>>>>> (If
>>>>>>>>>> any exist, please let us know).
>>>>>>>>>> 
>>>>>>>>>>   Proposal:
>>>>>>>>>>   ------------
>>>>>>>>>> 
>>>>>>>>>>   1. Create a artifact(s) which contains shaded
>> dependencies.
>>>> All
>>>>>>> such
>>>>>>>>>> shading/relocation will be with known prefix
>>>>>>>>>> **org.apache.hadoop.thirdparty.**.
>>>>>>>>>>   2. Right now protobuf jar (ex:
>>>>>>>> o.a.h.thirdparty:hadoop-shaded-protobuf)
>>>>>>>>>> to start with, all **com.google.protobuf** classes will be
>>>>>> relocated
>>>>>>> as
>>>>>>>>>> **org.apache.hadoop.thirdparty.com.google.protobuf**.
>>>>>>>>>>   3. Hadoop modules, which needs protobuf as dependency,
>> will
>>>> add
>>>>>>> this
>>>>>>>>>> shaded artifact as dependency (ex:
>>>>>>>>>> o.a.h.thirdparty:hadoop-shaded-protobuf).
>>>>>>>>>>   4. All previous usages of "com.google.protobuf" will be
>>>>>> relocated
>>>>>>> to
>>>>>>>>>> "org.apache.hadoop.thirdparty.com.google.protobuf" in the
>> code
>>>> and
>>>>>>> will
>>>>>>>> be
>>>>>>>>>> committed. Please note, this replacement is One-Time directly
>>> in
>>>>>>> source
>>>>>>>>>> code, NOT during compile and package.
>>>>>>>>>>   5. Once all usages of "com.google.protobuf" is relocated,
>>> then
>>>>>>> hadoop
>>>>>>>>>> dont care about which version of original  "protobuf-java" is
>>> in
>>>>>>>>>> dependency.
>>>>>>>>>>   6. Just keep "protobuf-java:2.5.0" in dependency tree not
>> to
>>>>>> break
>>>>>>>> the
>>>>>>>>>> downstreams. But hadoop will be originally using the latest
>>>>>> protobuf
>>>>>>>>>> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
>>>>>>>>>> 
>>>>>>>>>>   7. Coming back to separate repo, Following are most
>>>> appropriate
>>>>>>>> reasons
>>>>>>>>>> of keeping shaded dependency artifact in separate repo
>> instead
>>> of
>>>>>>>>>> submodule.
>>>>>>>>>> 
>>>>>>>>>>     7a. These artifacts need not be built all the time. It
>>> needs
>>>>>> to
>>>>>>> be
>>>>>>>>>> built only when there is a change in the dependency version
>> or
>>>> the
>>>>>>> build
>>>>>>>>>> process.
>>>>>>>>>>     7b. If added as "submodule in Hadoop repo",
>>>>>>>> maven-shade-plugin:shade
>>>>>>>>>> will execute only in package phase. That means, "mvn compile"
>>> or
>>>>>> "mvn
>>>>>>>>>> test-compile" will not be failed as this artifact will not
>> have
>>>>>>>> relocated
>>>>>>>>>> classes, instead it will have original classes, resulting in
>>>>>>> compilation
>>>>>>>>>> failure. Workaround, build thirdparty submodule first and
>>> exclude
>>>>>>>>>> "thirdparty" submodule in other executions. This will be a
>>>> complex
>>>>>>>> process
>>>>>>>>>> compared to keeping in a separate repo.
>>>>>>>>>> 
>>>>>>>>>>     7c. Separate repo, will be a subproject of Hadoop, using
>>> the
>>>>>>> same
>>>>>>>>>> HADOOP jira project, with different versioning prefixed with
>>>>>>>> "thirdparty-"
>>>>>>>>>> (ex: thirdparty-1.0.0).
>>>>>>>>>>     7d. Separate will have same release process as Hadoop.
>>>>>>>>>> 
>>>>>>>>>>   HADOOP-13363 (
>>>>>> https://issues.apache.org/jira/browse/HADOOP-13363)
>>>>>>>> is
>>>>>>>>>> an
>>>>>>>>>> umbrella jira tracking the changes to protobuf upgrade.
>>>>>>>>>> 
>>>>>>>>>>   PR (https://github.com/apache/hadoop-thirdparty/pull/1)
>> has
>>>>>> been
>>>>>>>>>> raised
>>>>>>>>>> for separate repo creation in (HADOOP-16595 (
>>>>>>>>>> https://issues.apache.org/jira/browse/HADOOP-16595)
>>>>>>>>>> 
>>>>>>>>>>   Please provide your inputs for the proposal and review the
>>> PR
>>>>>> to
>>>>>>>>>> proceed with the proposal.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>   -Thanks,
>>>>>>>>>>   Vinay
>>>>>>>>>> 
>>>>>>>>>> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
>>>>>>>>>> vinodkv@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Moving the thread to the dev lists.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks
>>>>>>>>>>> +Vinod
>>>>>>>>>>> 
>>>>>>>>>>>> On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
>>>>>>>> vinayakumarb@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks Marton,
>>>>>>>>>>>> 
>>>>>>>>>>>> Current created 'hadoop-thirdparty' repo is empty right
>>> now.
>>>>>>>>>>>> Whether to use that repo  for shaded artifact or not will
>>> be
>>>>>>>>>> monitored in
>>>>>>>>>>>> HADOOP-13363 umbrella jira. Please feel free to join the
>>>>>>> discussion.
>>>>>>>>>>>> 
>>>>>>>>>>>> There is no existing codebase is being moved out of
>> hadoop
>>>>>> repo.
>>>>>>> So
>>>>>>>> I
>>>>>>>>>>> think
>>>>>>>>>>>> right now we are good to go.
>>>>>>>>>>>> 
>>>>>>>>>>>> -Vinay
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
>>>> elek@apache.org>
>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I am not sure if it's defined when is a vote required.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> https://www.apache.org/foundation/voting.html
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Personally I think it's a big enough change to send a
>>>>>>> notification
>>>>>>>> to
>>>>>>>>>>> the
>>>>>>>>>>>>> dev lists with a 'lazy consensus'  closure
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Marton
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 2019/09/23 17:46:37, Vinayakumar B <
>>>>>> vinayakumarb@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> As discussed in HADOOP-13363, protobuf 3.x jar (and may
>>> be
>>>>>> more
>>>>>>> in
>>>>>>>>>>>>> future)
>>>>>>>>>>>>>> will be kept as a shaded artifact in a separate repo,
>>> which
>>>>>> will
>>>>>>>> be
>>>>>>>>>>>>>> referred as dependency in hadoop modules.  This
>> approach
>>>>>> avoids
>>>>>>>>>> shading
>>>>>>>>>>>>> of
>>>>>>>>>>>>>> every submodule during build.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> So question is does any VOTE required before asking to
>>>>>> create a
>>>>>>>> git
>>>>>>>>>>> repo?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On selfserve platform
>>>>>>>> https://gitbox.apache.org/setup/newrepo.html
>>>>>>>>>>>>>> I can access see that, requester should be PMC.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Wanted to confirm here first.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -Vinay
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>> 
>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>> To unsubscribe, e-mail:
>>>> private-unsubscribe@hadoop.apache.org
>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>> private-help@hadoop.apache.org
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> 
> -- 
> 
> 
> 
> --Brahma Reddy Battula

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org


Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Brahma Reddy Battula <br...@apache.org>.
Hi Sree vaddi,Owen,stack,Duo Zhang,

We can move forward based on your comments, just waiting for your
reply.Hope all of your comments answered..(unification we can think
parallel thread as Vinay mentioned).



On Mon, 6 Jan 2020 at 6:21 PM, Vinayakumar B <vi...@apache.org>
wrote:

> Hi Sree,
>
> > apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> Project ? Or as a TLP ?
> > Or as a new project definition ?
> As already mentioned by Ayush, this will be a subproject of Hadoop.
> Releases will be voted by Hadoop PMC as per ASF process.
>
>
> > The effort to streamline and put in an accepted standard for the
> dependencies that require shading,
> > seems beyond the siloed efforts of hadoop, hbase, etc....
>
> >I propose, we bring all the decision makers from all these artifacts in
> one room and decide best course of action.
> > I am looking at, no projects should ever had to shade any artifacts
> except as an absolute necessary alternative.
>
> This is the ideal proposal for any project. But unfortunately some projects
> takes their own course based on need.
>
> In the current case of protobuf in Hadoop,
>     Protobuf upgrade from 2.5.0 (which is already EOL) was not taken up to
> avoid downstream failures. Since Hadoop is a platform, its dependencies
> will get added to downstream projects' classpath. So any change in Hadoop's
> dependencies will directly affect downstreams. Hadoop strictly follows
> backward compatibility as far as possible.
>     Though protobuf provides wire compatibility b/w versions, it doesnt
> provide compatibility for generated sources.
>     Now, to support ARM protobuf upgrade is mandatory. Using shading
> technique, In Hadoop internally can upgrade to shaded protobuf 3.x and
> still have 2.5.0 protobuf (deprecated) for downstreams.
>
> This shading is necessary to have both versions of protobuf supported.
> (2.5.0 (non-shaded) for downstream's classpath and 3.x (shaded) for
> hadoop's internal usage).
> And this entire work to be done before 3.3.0 release.
>
> So, though its ideal to make a common approach for all projects, I suggest
> for Hadoop we can go ahead as per current approach.
> We can also start the parallel effort to address these problems in a
> separate discussion/proposal. Once the solution is available we can revisit
> and adopt new solution accordingly in all such projects (ex: HBase, Hadoop,
> Ratis).
>
> -Vinay
>
> On Mon, Jan 6, 2020 at 12:39 AM Ayush Saxena <ay...@gmail.com> wrote:
>
> > Hey Sree
> >
> > > apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> > > Project ? Or as a TLP ?
> > > Or as a new project definition ?
> > >
> > A sub project of Apache Hadoop, having its own independent release
> cycles.
> > May be you can put this into the same column as ozone or as
> > submarine(couple of months ago).
> >
> > Unifying for all, seems interesting but each project is independent and
> has
> > its own limitations and way of thinking, I don't think it would be an
> easy
> > task to bring all on the same table and get them agree to a common stuff.
> >
> > I guess this has been into discussion since quite long, and there hasn't
> > been any other alternative suggested. Still we can hold up for a week, if
> > someone comes up with a better solution, else we can continue in the
> > present direction.
> >
> > -Ayush
> >
> >
> >
> > On Sun, 5 Jan 2020 at 05:03, Sree Vaddi <sree_at_chess@yahoo.com
> .invalid>
> > wrote:
> >
> > > apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> > > Project ? Or as a TLP ?
> > > Or as a new project definition ?
> > >
> > > The effort to streamline and put in an accepted standard for the
> > > dependencies that require shading,seems beyond the siloed efforts of
> > > hadoop, hbase, etc....
> > >
> > > I propose, we bring all the decision makers from all these artifacts in
> > > one room and decide best course of action.I am looking at, no projects
> > > should ever had to shade any artifacts except as an absolute necessary
> > > alternative.
> > >
> > >
> > > Thank you./Sree
> > >
> > >
> > >
> > >     On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
> > > vinayakumarb@apache.org> wrote:
> > >
> > >  Hi,
> > > Sorry for the late reply,.
> > > >>> To be exact, how can we better use the thirdparty repo? Looking at
> > > HBase as an example, it looks like everything that are known to break a
> > lot
> > > after an update get shaded into the hbase-thirdparty artifact: guava,
> > > netty, ... etc.
> > > Is it the purpose to isolate these naughty dependencies?
> > > Yes, shading is to isolate these naughty dependencies from downstream
> > > classpath and have independent control on these upgrades without
> breaking
> > > downstreams.
> > >
> > > First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create
> > the
> > > protobuf shaded jar is ready to merge.
> > >
> > > Please take a look if anyone interested, will be merged may be after
> two
> > > days if no objections.
> > >
> > > -Vinay
> > >
> > >
> > > On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org>
> > > wrote:
> > >
> > > > Hi I am late to this but I am keen to understand more.
> > > >
> > > > To be exact, how can we better use the thirdparty repo? Looking at
> > HBase
> > > > as an example, it looks like everything that are known to break a lot
> > > after
> > > > an update get shaded into the hbase-thirdparty artifact: guava,
> netty,
> > > ...
> > > > etc.
> > > > Is it the purpose to isolate these naughty dependencies?
> > > >
> > > > On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <
> vinayakumarb@apache.org
> > >
> > > > wrote:
> > > >
> > > >> Hi All,
> > > >>
> > > >> I have updated the PR as per @Owen O'Malley <owen.omalley@gmail.com
> >
> > > >> 's suggestions.
> > > >>
> > > >>    i. Renamed the module to 'hadoop-shaded-protobuf37'
> > > >>    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
> > > >>
> > > >> Please review!!
> > > >>
> > > >> Thanks,
> > > >> -Vinay
> > > >>
> > > >>
> > > >> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <
> palomino219@gmail.com
> > >
> > > >> wrote:
> > > >>
> > > >> > For HBase we have a separated repo for hbase-thirdparty
> > > >> >
> > > >> > https://github.com/apache/hbase-thirdparty
> > > >> >
> > > >> > We will publish the artifacts to nexus so we do not need to
> include
> > > >> > binaries in our git repo, just add a dependency in the pom.
> > > >> >
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> > > >> >
> > > >> >
> > > >> > And it has its own release cycles, only when there are special
> > > >> requirements
> > > >> > or we want to upgrade some of the dependencies. This is the vote
> > > thread
> > > >> for
> > > >> > the newest release, where we want to provide a shaded gson for
> jdk7.
> > > >> >
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> > > >> >
> > > >> >
> > > >> > Thanks.
> > > >> >
> > > >> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
> > > >> >
> > > >> > > Please find replies inline.
> > > >> > >
> > > >> > > -Vinay
> > > >> > >
> > > >> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
> > > >> owen.omalley@gmail.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > I'm very unhappy with this direction. In particular, I don't
> > think
> > > >> git
> > > >> > is
> > > >> > > > a good place for distribution of binary artifacts.
> Furthermore,
> > > the
> > > >> PMC
> > > >> > > > shouldn't be releasing anything without a release vote.
> > > >> > > >
> > > >> > > >
> > > >> > > Proposed solution doesnt release any binaries in git. Its
> > actually a
> > > >> > > complete sub-project which follows entire release process,
> > including
> > > >> VOTE
> > > >> > > in public. I have mentioned already that release process is
> > similar
> > > to
> > > >> > > hadoop.
> > > >> > > To be specific, using the (almost) same script used in hadoop to
> > > >> generate
> > > >> > > artifacts, sign and deploy to staging repository. Please let me
> > know
> > > >> If I
> > > >> > > am conveying anything wrong.
> > > >> > >
> > > >> > >
> > > >> > > > I'd propose that we make a third party module that contains
> the
> > > >> > *source*
> > > >> > > > of the pom files to build the relocated jars. This should
> > > >> absolutely be
> > > >> > > > treated as a last resort for the mostly Google projects that
> > > >> regularly
> > > >> > > > break binary compatibility (eg. Protobuf & Guava).
> > > >> > > >
> > > >> > > >
> > > >> > > Same has been implemented in the PR
> > > >> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please
> check
> > > and
> > > >> let
> > > >> > > me
> > > >> > > know If I misunderstood. Yes, this is the last option we have
> > AFAIK.
> > > >> > >
> > > >> > >
> > > >> > > > In terms of naming, I'd propose something like:
> > > >> > > >
> > > >> > > > org.apache.hadoop.thirdparty.protobuf2_5
> > > >> > > > org.apache.hadoop.thirdparty.guava28
> > > >> > > >
> > > >> > > > In particular, I think we absolutely need to include the
> version
> > > of
> > > >> the
> > > >> > > > underlying project. On the other hand, since we should not be
> > > >> shading
> > > >> > > > *everything* we can drop the leading com.google.
> > > >> > > >
> > > >> > > >
> > > >> > > IMO, This naming convention is easy for identifying the
> underlying
> > > >> > project,
> > > >> > > but  it will be difficult to maintain going forward if
> underlying
> > > >> project
> > > >> > > versions changes. Since thirdparty module have its own releases,
> > > each
> > > >> of
> > > >> > > those release can be mapped to specific version of underlying
> > > project.
> > > >> > Even
> > > >> > > the binary artifact can include a MANIFEST with underlying
> project
> > > >> > details
> > > >> > > as per Steve's suggestion on HADOOP-13363.
> > > >> > > That said, if you still prefer to have project number in
> artifact
> > > id,
> > > >> it
> > > >> > > can be done.
> > > >> > >
> > > >> > > The Hadoop project can make releases of  the thirdparty module:
> > > >> > > >
> > > >> > > > <dependency>
> > > >> > > >  <groupId>org.apache.hadoop</groupId>
> > > >> > > >  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> > > >> > > >  <version>1.0</version>
> > > >> > > > </dependency>
> > > >> > > >
> > > >> > > >
> > > >> > > Note that the version has to be the hadoop thirdparty release
> > > number,
> > > >> > which
> > > >> > > > is part of why you need to have the underlying version in the
> > > >> artifact
> > > >> > > > name. These we can push to maven central as new releases from
> > > >> Hadoop.
> > > >> > > >
> > > >> > > >
> > > >> > > Exactly, same has been implemented in the PR. hadoop-thirdparty
> > > module
> > > >> > have
> > > >> > > its own releases. But in HADOOP Jira, thirdparty versions can be
> > > >> > > differentiated using prefix "thirdparty-".
> > > >> > >
> > > >> > > Same solution is being followed in HBase. May be people involved
> > in
> > > >> HBase
> > > >> > > can add some points here.
> > > >> > >
> > > >> > > Thoughts?
> > > >> > > >
> > > >> > > > .. Owen
> > > >> > > >
> > > >> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> > > >> vinayakumarb@apache.org
> > > >> > >
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > >> Hi All,
> > > >> > > >>
> > > >> > > >>    I wanted to discuss about the separate repo for thirdparty
> > > >> > > dependencies
> > > >> > > >> which we need to shaded and include in Hadoop component's
> jars.
> > > >> > > >>
> > > >> > > >>    Apologies for the big text ahead, but this needs clear
> > > >> > explanation!!
> > > >> > > >>
> > > >> > > >>    Right now most needed such dependency is protobuf.
> Protobuf
> > > >> > > dependency
> > > >> > > >> was not upgraded from 2.5.0 onwards with the fear that
> > downstream
> > > >> > > builds,
> > > >> > > >> which depends on transitive dependency protobuf coming from
> > > >> hadoop's
> > > >> > > jars,
> > > >> > > >> may fail with the upgrade. Apparently protobuf does not
> > guarantee
> > > >> > source
> > > >> > > >> compatibility, though it guarantees wire compatibility
> between
> > > >> > versions.
> > > >> > > >> Because of this behavior, version upgrade may cause breakage
> in
> > > >> known
> > > >> > > and
> > > >> > > >> unknown (private?) downstreams.
> > > >> > > >>
> > > >> > > >>    So to tackle this, we came up the following proposal in
> > > >> > HADOOP-13363.
> > > >> > > >>
> > > >> > > >>    Luckily, As far as I know, no APIs, either public to user
> or
> > > >> > between
> > > >> > > >> Hadoop processes, is not directly using protobuf classes in
> > > >> > signatures.
> > > >> > > >> (If
> > > >> > > >> any exist, please let us know).
> > > >> > > >>
> > > >> > > >>    Proposal:
> > > >> > > >>    ------------
> > > >> > > >>
> > > >> > > >>    1. Create a artifact(s) which contains shaded
> dependencies.
> > > All
> > > >> > such
> > > >> > > >> shading/relocation will be with known prefix
> > > >> > > >> **org.apache.hadoop.thirdparty.**.
> > > >> > > >>    2. Right now protobuf jar (ex:
> > > >> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
> > > >> > > >> to start with, all **com.google.protobuf** classes will be
> > > >> relocated
> > > >> > as
> > > >> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> > > >> > > >>    3. Hadoop modules, which needs protobuf as dependency,
> will
> > > add
> > > >> > this
> > > >> > > >> shaded artifact as dependency (ex:
> > > >> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> > > >> > > >>    4. All previous usages of "com.google.protobuf" will be
> > > >> relocated
> > > >> > to
> > > >> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the
> code
> > > and
> > > >> > will
> > > >> > > be
> > > >> > > >> committed. Please note, this replacement is One-Time directly
> > in
> > > >> > source
> > > >> > > >> code, NOT during compile and package.
> > > >> > > >>    5. Once all usages of "com.google.protobuf" is relocated,
> > then
> > > >> > hadoop
> > > >> > > >> dont care about which version of original  "protobuf-java" is
> > in
> > > >> > > >> dependency.
> > > >> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not
> to
> > > >> break
> > > >> > > the
> > > >> > > >> downstreams. But hadoop will be originally using the latest
> > > >> protobuf
> > > >> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> > > >> > > >>
> > > >> > > >>    7. Coming back to separate repo, Following are most
> > > appropriate
> > > >> > > reasons
> > > >> > > >> of keeping shaded dependency artifact in separate repo
> instead
> > of
> > > >> > > >> submodule.
> > > >> > > >>
> > > >> > > >>      7a. These artifacts need not be built all the time. It
> > needs
> > > >> to
> > > >> > be
> > > >> > > >> built only when there is a change in the dependency version
> or
> > > the
> > > >> > build
> > > >> > > >> process.
> > > >> > > >>      7b. If added as "submodule in Hadoop repo",
> > > >> > > maven-shade-plugin:shade
> > > >> > > >> will execute only in package phase. That means, "mvn compile"
> > or
> > > >> "mvn
> > > >> > > >> test-compile" will not be failed as this artifact will not
> have
> > > >> > > relocated
> > > >> > > >> classes, instead it will have original classes, resulting in
> > > >> > compilation
> > > >> > > >> failure. Workaround, build thirdparty submodule first and
> > exclude
> > > >> > > >> "thirdparty" submodule in other executions. This will be a
> > > complex
> > > >> > > process
> > > >> > > >> compared to keeping in a separate repo.
> > > >> > > >>
> > > >> > > >>      7c. Separate repo, will be a subproject of Hadoop, using
> > the
> > > >> > same
> > > >> > > >> HADOOP jira project, with different versioning prefixed with
> > > >> > > "thirdparty-"
> > > >> > > >> (ex: thirdparty-1.0.0).
> > > >> > > >>      7d. Separate will have same release process as Hadoop.
> > > >> > > >>
> > > >> > > >>    HADOOP-13363 (
> > > >> https://issues.apache.org/jira/browse/HADOOP-13363)
> > > >> > > is
> > > >> > > >> an
> > > >> > > >> umbrella jira tracking the changes to protobuf upgrade.
> > > >> > > >>
> > > >> > > >>    PR (https://github.com/apache/hadoop-thirdparty/pull/1)
> has
> > > >> been
> > > >> > > >> raised
> > > >> > > >> for separate repo creation in (HADOOP-16595 (
> > > >> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> > > >> > > >>
> > > >> > > >>    Please provide your inputs for the proposal and review the
> > PR
> > > >> to
> > > >> > > >> proceed with the proposal.
> > > >> > > >>
> > > >> > > >>
> > > >> > > >    -Thanks,
> > > >> > > >>    Vinay
> > > >> > > >>
> > > >> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> > > >> > > >> vinodkv@apache.org>
> > > >> > > >> wrote:
> > > >> > > >>
> > > >> > > >> > Moving the thread to the dev lists.
> > > >> > > >> >
> > > >> > > >> > Thanks
> > > >> > > >> > +Vinod
> > > >> > > >> >
> > > >> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> > > >> > > vinayakumarb@apache.org>
> > > >> > > >> > wrote:
> > > >> > > >> > >
> > > >> > > >> > > Thanks Marton,
> > > >> > > >> > >
> > > >> > > >> > > Current created 'hadoop-thirdparty' repo is empty right
> > now.
> > > >> > > >> > > Whether to use that repo  for shaded artifact or not will
> > be
> > > >> > > >> monitored in
> > > >> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> > > >> > discussion.
> > > >> > > >> > >
> > > >> > > >> > > There is no existing codebase is being moved out of
> hadoop
> > > >> repo.
> > > >> > So
> > > >> > > I
> > > >> > > >> > think
> > > >> > > >> > > right now we are good to go.
> > > >> > > >> > >
> > > >> > > >> > > -Vinay
> > > >> > > >> > >
> > > >> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
> > > elek@apache.org>
> > > >> > > wrote:
> > > >> > > >> > >
> > > >> > > >> > >>
> > > >> > > >> > >> I am not sure if it's defined when is a vote required.
> > > >> > > >> > >>
> > > >> > > >> > >> https://www.apache.org/foundation/voting.html
> > > >> > > >> > >>
> > > >> > > >> > >> Personally I think it's a big enough change to send a
> > > >> > notification
> > > >> > > to
> > > >> > > >> > the
> > > >> > > >> > >> dev lists with a 'lazy consensus'  closure
> > > >> > > >> > >>
> > > >> > > >> > >> Marton
> > > >> > > >> > >>
> > > >> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
> > > >> vinayakumarb@apache.org>
> > > >> > > >> wrote:
> > > >> > > >> > >>> Hi,
> > > >> > > >> > >>>
> > > >> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may
> > be
> > > >> more
> > > >> > in
> > > >> > > >> > >> future)
> > > >> > > >> > >>> will be kept as a shaded artifact in a separate repo,
> > which
> > > >> will
> > > >> > > be
> > > >> > > >> > >>> referred as dependency in hadoop modules.  This
> approach
> > > >> avoids
> > > >> > > >> shading
> > > >> > > >> > >> of
> > > >> > > >> > >>> every submodule during build.
> > > >> > > >> > >>>
> > > >> > > >> > >>> So question is does any VOTE required before asking to
> > > >> create a
> > > >> > > git
> > > >> > > >> > repo?
> > > >> > > >> > >>>
> > > >> > > >> > >>> On selfserve platform
> > > >> > > https://gitbox.apache.org/setup/newrepo.html
> > > >> > > >> > >>> I can access see that, requester should be PMC.
> > > >> > > >> > >>>
> > > >> > > >> > >>> Wanted to confirm here first.
> > > >> > > >> > >>>
> > > >> > > >> > >>> -Vinay
> > > >> > > >> > >>>
> > > >> > > >> > >>
> > > >> > > >> > >>
> > > >> > >
> > > ---------------------------------------------------------------------
> > > >> > > >> > >> To unsubscribe, e-mail:
> > > private-unsubscribe@hadoop.apache.org
> > > >> > > >> > >> For additional commands, e-mail:
> > > >> private-help@hadoop.apache.org
> > > >> > > >> > >>
> > > >> > > >> > >>
> > > >> > > >> >
> > > >> > > >> >
> > > >> > > >>
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> >
>
-- 



--Brahma Reddy Battula

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Brahma Reddy Battula <br...@apache.org>.
Hi Sree vaddi,Owen,stack,Duo Zhang,

We can move forward based on your comments, just waiting for your
reply.Hope all of your comments answered..(unification we can think
parallel thread as Vinay mentioned).



On Mon, 6 Jan 2020 at 6:21 PM, Vinayakumar B <vi...@apache.org>
wrote:

> Hi Sree,
>
> > apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> Project ? Or as a TLP ?
> > Or as a new project definition ?
> As already mentioned by Ayush, this will be a subproject of Hadoop.
> Releases will be voted by Hadoop PMC as per ASF process.
>
>
> > The effort to streamline and put in an accepted standard for the
> dependencies that require shading,
> > seems beyond the siloed efforts of hadoop, hbase, etc....
>
> >I propose, we bring all the decision makers from all these artifacts in
> one room and decide best course of action.
> > I am looking at, no projects should ever had to shade any artifacts
> except as an absolute necessary alternative.
>
> This is the ideal proposal for any project. But unfortunately some projects
> takes their own course based on need.
>
> In the current case of protobuf in Hadoop,
>     Protobuf upgrade from 2.5.0 (which is already EOL) was not taken up to
> avoid downstream failures. Since Hadoop is a platform, its dependencies
> will get added to downstream projects' classpath. So any change in Hadoop's
> dependencies will directly affect downstreams. Hadoop strictly follows
> backward compatibility as far as possible.
>     Though protobuf provides wire compatibility b/w versions, it doesnt
> provide compatibility for generated sources.
>     Now, to support ARM protobuf upgrade is mandatory. Using shading
> technique, In Hadoop internally can upgrade to shaded protobuf 3.x and
> still have 2.5.0 protobuf (deprecated) for downstreams.
>
> This shading is necessary to have both versions of protobuf supported.
> (2.5.0 (non-shaded) for downstream's classpath and 3.x (shaded) for
> hadoop's internal usage).
> And this entire work to be done before 3.3.0 release.
>
> So, though its ideal to make a common approach for all projects, I suggest
> for Hadoop we can go ahead as per current approach.
> We can also start the parallel effort to address these problems in a
> separate discussion/proposal. Once the solution is available we can revisit
> and adopt new solution accordingly in all such projects (ex: HBase, Hadoop,
> Ratis).
>
> -Vinay
>
> On Mon, Jan 6, 2020 at 12:39 AM Ayush Saxena <ay...@gmail.com> wrote:
>
> > Hey Sree
> >
> > > apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> > > Project ? Or as a TLP ?
> > > Or as a new project definition ?
> > >
> > A sub project of Apache Hadoop, having its own independent release
> cycles.
> > May be you can put this into the same column as ozone or as
> > submarine(couple of months ago).
> >
> > Unifying for all, seems interesting but each project is independent and
> has
> > its own limitations and way of thinking, I don't think it would be an
> easy
> > task to bring all on the same table and get them agree to a common stuff.
> >
> > I guess this has been into discussion since quite long, and there hasn't
> > been any other alternative suggested. Still we can hold up for a week, if
> > someone comes up with a better solution, else we can continue in the
> > present direction.
> >
> > -Ayush
> >
> >
> >
> > On Sun, 5 Jan 2020 at 05:03, Sree Vaddi <sree_at_chess@yahoo.com
> .invalid>
> > wrote:
> >
> > > apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> > > Project ? Or as a TLP ?
> > > Or as a new project definition ?
> > >
> > > The effort to streamline and put in an accepted standard for the
> > > dependencies that require shading,seems beyond the siloed efforts of
> > > hadoop, hbase, etc....
> > >
> > > I propose, we bring all the decision makers from all these artifacts in
> > > one room and decide best course of action.I am looking at, no projects
> > > should ever had to shade any artifacts except as an absolute necessary
> > > alternative.
> > >
> > >
> > > Thank you./Sree
> > >
> > >
> > >
> > >     On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
> > > vinayakumarb@apache.org> wrote:
> > >
> > >  Hi,
> > > Sorry for the late reply,.
> > > >>> To be exact, how can we better use the thirdparty repo? Looking at
> > > HBase as an example, it looks like everything that are known to break a
> > lot
> > > after an update get shaded into the hbase-thirdparty artifact: guava,
> > > netty, ... etc.
> > > Is it the purpose to isolate these naughty dependencies?
> > > Yes, shading is to isolate these naughty dependencies from downstream
> > > classpath and have independent control on these upgrades without
> breaking
> > > downstreams.
> > >
> > > First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create
> > the
> > > protobuf shaded jar is ready to merge.
> > >
> > > Please take a look if anyone interested, will be merged may be after
> two
> > > days if no objections.
> > >
> > > -Vinay
> > >
> > >
> > > On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org>
> > > wrote:
> > >
> > > > Hi I am late to this but I am keen to understand more.
> > > >
> > > > To be exact, how can we better use the thirdparty repo? Looking at
> > HBase
> > > > as an example, it looks like everything that are known to break a lot
> > > after
> > > > an update get shaded into the hbase-thirdparty artifact: guava,
> netty,
> > > ...
> > > > etc.
> > > > Is it the purpose to isolate these naughty dependencies?
> > > >
> > > > On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <
> vinayakumarb@apache.org
> > >
> > > > wrote:
> > > >
> > > >> Hi All,
> > > >>
> > > >> I have updated the PR as per @Owen O'Malley <owen.omalley@gmail.com
> >
> > > >> 's suggestions.
> > > >>
> > > >>    i. Renamed the module to 'hadoop-shaded-protobuf37'
> > > >>    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
> > > >>
> > > >> Please review!!
> > > >>
> > > >> Thanks,
> > > >> -Vinay
> > > >>
> > > >>
> > > >> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <
> palomino219@gmail.com
> > >
> > > >> wrote:
> > > >>
> > > >> > For HBase we have a separated repo for hbase-thirdparty
> > > >> >
> > > >> > https://github.com/apache/hbase-thirdparty
> > > >> >
> > > >> > We will publish the artifacts to nexus so we do not need to
> include
> > > >> > binaries in our git repo, just add a dependency in the pom.
> > > >> >
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> > > >> >
> > > >> >
> > > >> > And it has its own release cycles, only when there are special
> > > >> requirements
> > > >> > or we want to upgrade some of the dependencies. This is the vote
> > > thread
> > > >> for
> > > >> > the newest release, where we want to provide a shaded gson for
> jdk7.
> > > >> >
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> > > >> >
> > > >> >
> > > >> > Thanks.
> > > >> >
> > > >> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
> > > >> >
> > > >> > > Please find replies inline.
> > > >> > >
> > > >> > > -Vinay
> > > >> > >
> > > >> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
> > > >> owen.omalley@gmail.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > I'm very unhappy with this direction. In particular, I don't
> > think
> > > >> git
> > > >> > is
> > > >> > > > a good place for distribution of binary artifacts.
> Furthermore,
> > > the
> > > >> PMC
> > > >> > > > shouldn't be releasing anything without a release vote.
> > > >> > > >
> > > >> > > >
> > > >> > > Proposed solution doesnt release any binaries in git. Its
> > actually a
> > > >> > > complete sub-project which follows entire release process,
> > including
> > > >> VOTE
> > > >> > > in public. I have mentioned already that release process is
> > similar
> > > to
> > > >> > > hadoop.
> > > >> > > To be specific, using the (almost) same script used in hadoop to
> > > >> generate
> > > >> > > artifacts, sign and deploy to staging repository. Please let me
> > know
> > > >> If I
> > > >> > > am conveying anything wrong.
> > > >> > >
> > > >> > >
> > > >> > > > I'd propose that we make a third party module that contains
> the
> > > >> > *source*
> > > >> > > > of the pom files to build the relocated jars. This should
> > > >> absolutely be
> > > >> > > > treated as a last resort for the mostly Google projects that
> > > >> regularly
> > > >> > > > break binary compatibility (eg. Protobuf & Guava).
> > > >> > > >
> > > >> > > >
> > > >> > > Same has been implemented in the PR
> > > >> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please
> check
> > > and
> > > >> let
> > > >> > > me
> > > >> > > know If I misunderstood. Yes, this is the last option we have
> > AFAIK.
> > > >> > >
> > > >> > >
> > > >> > > > In terms of naming, I'd propose something like:
> > > >> > > >
> > > >> > > > org.apache.hadoop.thirdparty.protobuf2_5
> > > >> > > > org.apache.hadoop.thirdparty.guava28
> > > >> > > >
> > > >> > > > In particular, I think we absolutely need to include the
> version
> > > of
> > > >> the
> > > >> > > > underlying project. On the other hand, since we should not be
> > > >> shading
> > > >> > > > *everything* we can drop the leading com.google.
> > > >> > > >
> > > >> > > >
> > > >> > > IMO, This naming convention is easy for identifying the
> underlying
> > > >> > project,
> > > >> > > but  it will be difficult to maintain going forward if
> underlying
> > > >> project
> > > >> > > versions changes. Since thirdparty module have its own releases,
> > > each
> > > >> of
> > > >> > > those release can be mapped to specific version of underlying
> > > project.
> > > >> > Even
> > > >> > > the binary artifact can include a MANIFEST with underlying
> project
> > > >> > details
> > > >> > > as per Steve's suggestion on HADOOP-13363.
> > > >> > > That said, if you still prefer to have project number in
> artifact
> > > id,
> > > >> it
> > > >> > > can be done.
> > > >> > >
> > > >> > > The Hadoop project can make releases of  the thirdparty module:
> > > >> > > >
> > > >> > > > <dependency>
> > > >> > > >  <groupId>org.apache.hadoop</groupId>
> > > >> > > >  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> > > >> > > >  <version>1.0</version>
> > > >> > > > </dependency>
> > > >> > > >
> > > >> > > >
> > > >> > > Note that the version has to be the hadoop thirdparty release
> > > number,
> > > >> > which
> > > >> > > > is part of why you need to have the underlying version in the
> > > >> artifact
> > > >> > > > name. These we can push to maven central as new releases from
> > > >> Hadoop.
> > > >> > > >
> > > >> > > >
> > > >> > > Exactly, same has been implemented in the PR. hadoop-thirdparty
> > > module
> > > >> > have
> > > >> > > its own releases. But in HADOOP Jira, thirdparty versions can be
> > > >> > > differentiated using prefix "thirdparty-".
> > > >> > >
> > > >> > > Same solution is being followed in HBase. May be people involved
> > in
> > > >> HBase
> > > >> > > can add some points here.
> > > >> > >
> > > >> > > Thoughts?
> > > >> > > >
> > > >> > > > .. Owen
> > > >> > > >
> > > >> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> > > >> vinayakumarb@apache.org
> > > >> > >
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > >> Hi All,
> > > >> > > >>
> > > >> > > >>    I wanted to discuss about the separate repo for thirdparty
> > > >> > > dependencies
> > > >> > > >> which we need to shaded and include in Hadoop component's
> jars.
> > > >> > > >>
> > > >> > > >>    Apologies for the big text ahead, but this needs clear
> > > >> > explanation!!
> > > >> > > >>
> > > >> > > >>    Right now most needed such dependency is protobuf.
> Protobuf
> > > >> > > dependency
> > > >> > > >> was not upgraded from 2.5.0 onwards with the fear that
> > downstream
> > > >> > > builds,
> > > >> > > >> which depends on transitive dependency protobuf coming from
> > > >> hadoop's
> > > >> > > jars,
> > > >> > > >> may fail with the upgrade. Apparently protobuf does not
> > guarantee
> > > >> > source
> > > >> > > >> compatibility, though it guarantees wire compatibility
> between
> > > >> > versions.
> > > >> > > >> Because of this behavior, version upgrade may cause breakage
> in
> > > >> known
> > > >> > > and
> > > >> > > >> unknown (private?) downstreams.
> > > >> > > >>
> > > >> > > >>    So to tackle this, we came up the following proposal in
> > > >> > HADOOP-13363.
> > > >> > > >>
> > > >> > > >>    Luckily, As far as I know, no APIs, either public to user
> or
> > > >> > between
> > > >> > > >> Hadoop processes, is not directly using protobuf classes in
> > > >> > signatures.
> > > >> > > >> (If
> > > >> > > >> any exist, please let us know).
> > > >> > > >>
> > > >> > > >>    Proposal:
> > > >> > > >>    ------------
> > > >> > > >>
> > > >> > > >>    1. Create a artifact(s) which contains shaded
> dependencies.
> > > All
> > > >> > such
> > > >> > > >> shading/relocation will be with known prefix
> > > >> > > >> **org.apache.hadoop.thirdparty.**.
> > > >> > > >>    2. Right now protobuf jar (ex:
> > > >> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
> > > >> > > >> to start with, all **com.google.protobuf** classes will be
> > > >> relocated
> > > >> > as
> > > >> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> > > >> > > >>    3. Hadoop modules, which needs protobuf as dependency,
> will
> > > add
> > > >> > this
> > > >> > > >> shaded artifact as dependency (ex:
> > > >> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> > > >> > > >>    4. All previous usages of "com.google.protobuf" will be
> > > >> relocated
> > > >> > to
> > > >> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the
> code
> > > and
> > > >> > will
> > > >> > > be
> > > >> > > >> committed. Please note, this replacement is One-Time directly
> > in
> > > >> > source
> > > >> > > >> code, NOT during compile and package.
> > > >> > > >>    5. Once all usages of "com.google.protobuf" is relocated,
> > then
> > > >> > hadoop
> > > >> > > >> dont care about which version of original  "protobuf-java" is
> > in
> > > >> > > >> dependency.
> > > >> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not
> to
> > > >> break
> > > >> > > the
> > > >> > > >> downstreams. But hadoop will be originally using the latest
> > > >> protobuf
> > > >> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> > > >> > > >>
> > > >> > > >>    7. Coming back to separate repo, Following are most
> > > appropriate
> > > >> > > reasons
> > > >> > > >> of keeping shaded dependency artifact in separate repo
> instead
> > of
> > > >> > > >> submodule.
> > > >> > > >>
> > > >> > > >>      7a. These artifacts need not be built all the time. It
> > needs
> > > >> to
> > > >> > be
> > > >> > > >> built only when there is a change in the dependency version
> or
> > > the
> > > >> > build
> > > >> > > >> process.
> > > >> > > >>      7b. If added as "submodule in Hadoop repo",
> > > >> > > maven-shade-plugin:shade
> > > >> > > >> will execute only in package phase. That means, "mvn compile"
> > or
> > > >> "mvn
> > > >> > > >> test-compile" will not be failed as this artifact will not
> have
> > > >> > > relocated
> > > >> > > >> classes, instead it will have original classes, resulting in
> > > >> > compilation
> > > >> > > >> failure. Workaround, build thirdparty submodule first and
> > exclude
> > > >> > > >> "thirdparty" submodule in other executions. This will be a
> > > complex
> > > >> > > process
> > > >> > > >> compared to keeping in a separate repo.
> > > >> > > >>
> > > >> > > >>      7c. Separate repo, will be a subproject of Hadoop, using
> > the
> > > >> > same
> > > >> > > >> HADOOP jira project, with different versioning prefixed with
> > > >> > > "thirdparty-"
> > > >> > > >> (ex: thirdparty-1.0.0).
> > > >> > > >>      7d. Separate will have same release process as Hadoop.
> > > >> > > >>
> > > >> > > >>    HADOOP-13363 (
> > > >> https://issues.apache.org/jira/browse/HADOOP-13363)
> > > >> > > is
> > > >> > > >> an
> > > >> > > >> umbrella jira tracking the changes to protobuf upgrade.
> > > >> > > >>
> > > >> > > >>    PR (https://github.com/apache/hadoop-thirdparty/pull/1)
> has
> > > >> been
> > > >> > > >> raised
> > > >> > > >> for separate repo creation in (HADOOP-16595 (
> > > >> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> > > >> > > >>
> > > >> > > >>    Please provide your inputs for the proposal and review the
> > PR
> > > >> to
> > > >> > > >> proceed with the proposal.
> > > >> > > >>
> > > >> > > >>
> > > >> > > >    -Thanks,
> > > >> > > >>    Vinay
> > > >> > > >>
> > > >> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> > > >> > > >> vinodkv@apache.org>
> > > >> > > >> wrote:
> > > >> > > >>
> > > >> > > >> > Moving the thread to the dev lists.
> > > >> > > >> >
> > > >> > > >> > Thanks
> > > >> > > >> > +Vinod
> > > >> > > >> >
> > > >> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> > > >> > > vinayakumarb@apache.org>
> > > >> > > >> > wrote:
> > > >> > > >> > >
> > > >> > > >> > > Thanks Marton,
> > > >> > > >> > >
> > > >> > > >> > > Current created 'hadoop-thirdparty' repo is empty right
> > now.
> > > >> > > >> > > Whether to use that repo  for shaded artifact or not will
> > be
> > > >> > > >> monitored in
> > > >> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> > > >> > discussion.
> > > >> > > >> > >
> > > >> > > >> > > There is no existing codebase is being moved out of
> hadoop
> > > >> repo.
> > > >> > So
> > > >> > > I
> > > >> > > >> > think
> > > >> > > >> > > right now we are good to go.
> > > >> > > >> > >
> > > >> > > >> > > -Vinay
> > > >> > > >> > >
> > > >> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
> > > elek@apache.org>
> > > >> > > wrote:
> > > >> > > >> > >
> > > >> > > >> > >>
> > > >> > > >> > >> I am not sure if it's defined when is a vote required.
> > > >> > > >> > >>
> > > >> > > >> > >> https://www.apache.org/foundation/voting.html
> > > >> > > >> > >>
> > > >> > > >> > >> Personally I think it's a big enough change to send a
> > > >> > notification
> > > >> > > to
> > > >> > > >> > the
> > > >> > > >> > >> dev lists with a 'lazy consensus'  closure
> > > >> > > >> > >>
> > > >> > > >> > >> Marton
> > > >> > > >> > >>
> > > >> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
> > > >> vinayakumarb@apache.org>
> > > >> > > >> wrote:
> > > >> > > >> > >>> Hi,
> > > >> > > >> > >>>
> > > >> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may
> > be
> > > >> more
> > > >> > in
> > > >> > > >> > >> future)
> > > >> > > >> > >>> will be kept as a shaded artifact in a separate repo,
> > which
> > > >> will
> > > >> > > be
> > > >> > > >> > >>> referred as dependency in hadoop modules.  This
> approach
> > > >> avoids
> > > >> > > >> shading
> > > >> > > >> > >> of
> > > >> > > >> > >>> every submodule during build.
> > > >> > > >> > >>>
> > > >> > > >> > >>> So question is does any VOTE required before asking to
> > > >> create a
> > > >> > > git
> > > >> > > >> > repo?
> > > >> > > >> > >>>
> > > >> > > >> > >>> On selfserve platform
> > > >> > > https://gitbox.apache.org/setup/newrepo.html
> > > >> > > >> > >>> I can access see that, requester should be PMC.
> > > >> > > >> > >>>
> > > >> > > >> > >>> Wanted to confirm here first.
> > > >> > > >> > >>>
> > > >> > > >> > >>> -Vinay
> > > >> > > >> > >>>
> > > >> > > >> > >>
> > > >> > > >> > >>
> > > >> > >
> > > ---------------------------------------------------------------------
> > > >> > > >> > >> To unsubscribe, e-mail:
> > > private-unsubscribe@hadoop.apache.org
> > > >> > > >> > >> For additional commands, e-mail:
> > > >> private-help@hadoop.apache.org
> > > >> > > >> > >>
> > > >> > > >> > >>
> > > >> > > >> >
> > > >> > > >> >
> > > >> > > >>
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> >
>
-- 



--Brahma Reddy Battula

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Brahma Reddy Battula <br...@apache.org>.
Hi Sree vaddi,Owen,stack,Duo Zhang,

We can move forward based on your comments, just waiting for your
reply.Hope all of your comments answered..(unification we can think
parallel thread as Vinay mentioned).



On Mon, 6 Jan 2020 at 6:21 PM, Vinayakumar B <vi...@apache.org>
wrote:

> Hi Sree,
>
> > apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> Project ? Or as a TLP ?
> > Or as a new project definition ?
> As already mentioned by Ayush, this will be a subproject of Hadoop.
> Releases will be voted by Hadoop PMC as per ASF process.
>
>
> > The effort to streamline and put in an accepted standard for the
> dependencies that require shading,
> > seems beyond the siloed efforts of hadoop, hbase, etc....
>
> >I propose, we bring all the decision makers from all these artifacts in
> one room and decide best course of action.
> > I am looking at, no projects should ever had to shade any artifacts
> except as an absolute necessary alternative.
>
> This is the ideal proposal for any project. But unfortunately some projects
> takes their own course based on need.
>
> In the current case of protobuf in Hadoop,
>     Protobuf upgrade from 2.5.0 (which is already EOL) was not taken up to
> avoid downstream failures. Since Hadoop is a platform, its dependencies
> will get added to downstream projects' classpath. So any change in Hadoop's
> dependencies will directly affect downstreams. Hadoop strictly follows
> backward compatibility as far as possible.
>     Though protobuf provides wire compatibility b/w versions, it doesnt
> provide compatibility for generated sources.
>     Now, to support ARM protobuf upgrade is mandatory. Using shading
> technique, In Hadoop internally can upgrade to shaded protobuf 3.x and
> still have 2.5.0 protobuf (deprecated) for downstreams.
>
> This shading is necessary to have both versions of protobuf supported.
> (2.5.0 (non-shaded) for downstream's classpath and 3.x (shaded) for
> hadoop's internal usage).
> And this entire work to be done before 3.3.0 release.
>
> So, though its ideal to make a common approach for all projects, I suggest
> for Hadoop we can go ahead as per current approach.
> We can also start the parallel effort to address these problems in a
> separate discussion/proposal. Once the solution is available we can revisit
> and adopt new solution accordingly in all such projects (ex: HBase, Hadoop,
> Ratis).
>
> -Vinay
>
> On Mon, Jan 6, 2020 at 12:39 AM Ayush Saxena <ay...@gmail.com> wrote:
>
> > Hey Sree
> >
> > > apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> > > Project ? Or as a TLP ?
> > > Or as a new project definition ?
> > >
> > A sub project of Apache Hadoop, having its own independent release
> cycles.
> > May be you can put this into the same column as ozone or as
> > submarine(couple of months ago).
> >
> > Unifying for all, seems interesting but each project is independent and
> has
> > its own limitations and way of thinking, I don't think it would be an
> easy
> > task to bring all on the same table and get them agree to a common stuff.
> >
> > I guess this has been into discussion since quite long, and there hasn't
> > been any other alternative suggested. Still we can hold up for a week, if
> > someone comes up with a better solution, else we can continue in the
> > present direction.
> >
> > -Ayush
> >
> >
> >
> > On Sun, 5 Jan 2020 at 05:03, Sree Vaddi <sree_at_chess@yahoo.com
> .invalid>
> > wrote:
> >
> > > apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> > > Project ? Or as a TLP ?
> > > Or as a new project definition ?
> > >
> > > The effort to streamline and put in an accepted standard for the
> > > dependencies that require shading,seems beyond the siloed efforts of
> > > hadoop, hbase, etc....
> > >
> > > I propose, we bring all the decision makers from all these artifacts in
> > > one room and decide best course of action.I am looking at, no projects
> > > should ever had to shade any artifacts except as an absolute necessary
> > > alternative.
> > >
> > >
> > > Thank you./Sree
> > >
> > >
> > >
> > >     On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
> > > vinayakumarb@apache.org> wrote:
> > >
> > >  Hi,
> > > Sorry for the late reply,.
> > > >>> To be exact, how can we better use the thirdparty repo? Looking at
> > > HBase as an example, it looks like everything that are known to break a
> > lot
> > > after an update get shaded into the hbase-thirdparty artifact: guava,
> > > netty, ... etc.
> > > Is it the purpose to isolate these naughty dependencies?
> > > Yes, shading is to isolate these naughty dependencies from downstream
> > > classpath and have independent control on these upgrades without
> breaking
> > > downstreams.
> > >
> > > First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create
> > the
> > > protobuf shaded jar is ready to merge.
> > >
> > > Please take a look if anyone interested, will be merged may be after
> two
> > > days if no objections.
> > >
> > > -Vinay
> > >
> > >
> > > On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org>
> > > wrote:
> > >
> > > > Hi I am late to this but I am keen to understand more.
> > > >
> > > > To be exact, how can we better use the thirdparty repo? Looking at
> > HBase
> > > > as an example, it looks like everything that are known to break a lot
> > > after
> > > > an update get shaded into the hbase-thirdparty artifact: guava,
> netty,
> > > ...
> > > > etc.
> > > > Is it the purpose to isolate these naughty dependencies?
> > > >
> > > > On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <
> vinayakumarb@apache.org
> > >
> > > > wrote:
> > > >
> > > >> Hi All,
> > > >>
> > > >> I have updated the PR as per @Owen O'Malley <owen.omalley@gmail.com
> >
> > > >> 's suggestions.
> > > >>
> > > >>    i. Renamed the module to 'hadoop-shaded-protobuf37'
> > > >>    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
> > > >>
> > > >> Please review!!
> > > >>
> > > >> Thanks,
> > > >> -Vinay
> > > >>
> > > >>
> > > >> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <
> palomino219@gmail.com
> > >
> > > >> wrote:
> > > >>
> > > >> > For HBase we have a separated repo for hbase-thirdparty
> > > >> >
> > > >> > https://github.com/apache/hbase-thirdparty
> > > >> >
> > > >> > We will publish the artifacts to nexus so we do not need to
> include
> > > >> > binaries in our git repo, just add a dependency in the pom.
> > > >> >
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> > > >> >
> > > >> >
> > > >> > And it has its own release cycles, only when there are special
> > > >> requirements
> > > >> > or we want to upgrade some of the dependencies. This is the vote
> > > thread
> > > >> for
> > > >> > the newest release, where we want to provide a shaded gson for
> jdk7.
> > > >> >
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> > > >> >
> > > >> >
> > > >> > Thanks.
> > > >> >
> > > >> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
> > > >> >
> > > >> > > Please find replies inline.
> > > >> > >
> > > >> > > -Vinay
> > > >> > >
> > > >> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
> > > >> owen.omalley@gmail.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > I'm very unhappy with this direction. In particular, I don't
> > think
> > > >> git
> > > >> > is
> > > >> > > > a good place for distribution of binary artifacts.
> Furthermore,
> > > the
> > > >> PMC
> > > >> > > > shouldn't be releasing anything without a release vote.
> > > >> > > >
> > > >> > > >
> > > >> > > Proposed solution doesnt release any binaries in git. Its
> > actually a
> > > >> > > complete sub-project which follows entire release process,
> > including
> > > >> VOTE
> > > >> > > in public. I have mentioned already that release process is
> > similar
> > > to
> > > >> > > hadoop.
> > > >> > > To be specific, using the (almost) same script used in hadoop to
> > > >> generate
> > > >> > > artifacts, sign and deploy to staging repository. Please let me
> > know
> > > >> If I
> > > >> > > am conveying anything wrong.
> > > >> > >
> > > >> > >
> > > >> > > > I'd propose that we make a third party module that contains
> the
> > > >> > *source*
> > > >> > > > of the pom files to build the relocated jars. This should
> > > >> absolutely be
> > > >> > > > treated as a last resort for the mostly Google projects that
> > > >> regularly
> > > >> > > > break binary compatibility (eg. Protobuf & Guava).
> > > >> > > >
> > > >> > > >
> > > >> > > Same has been implemented in the PR
> > > >> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please
> check
> > > and
> > > >> let
> > > >> > > me
> > > >> > > know If I misunderstood. Yes, this is the last option we have
> > AFAIK.
> > > >> > >
> > > >> > >
> > > >> > > > In terms of naming, I'd propose something like:
> > > >> > > >
> > > >> > > > org.apache.hadoop.thirdparty.protobuf2_5
> > > >> > > > org.apache.hadoop.thirdparty.guava28
> > > >> > > >
> > > >> > > > In particular, I think we absolutely need to include the
> version
> > > of
> > > >> the
> > > >> > > > underlying project. On the other hand, since we should not be
> > > >> shading
> > > >> > > > *everything* we can drop the leading com.google.
> > > >> > > >
> > > >> > > >
> > > >> > > IMO, This naming convention is easy for identifying the
> underlying
> > > >> > project,
> > > >> > > but  it will be difficult to maintain going forward if
> underlying
> > > >> project
> > > >> > > versions changes. Since thirdparty module have its own releases,
> > > each
> > > >> of
> > > >> > > those release can be mapped to specific version of underlying
> > > project.
> > > >> > Even
> > > >> > > the binary artifact can include a MANIFEST with underlying
> project
> > > >> > details
> > > >> > > as per Steve's suggestion on HADOOP-13363.
> > > >> > > That said, if you still prefer to have project number in
> artifact
> > > id,
> > > >> it
> > > >> > > can be done.
> > > >> > >
> > > >> > > The Hadoop project can make releases of  the thirdparty module:
> > > >> > > >
> > > >> > > > <dependency>
> > > >> > > >  <groupId>org.apache.hadoop</groupId>
> > > >> > > >  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> > > >> > > >  <version>1.0</version>
> > > >> > > > </dependency>
> > > >> > > >
> > > >> > > >
> > > >> > > Note that the version has to be the hadoop thirdparty release
> > > number,
> > > >> > which
> > > >> > > > is part of why you need to have the underlying version in the
> > > >> artifact
> > > >> > > > name. These we can push to maven central as new releases from
> > > >> Hadoop.
> > > >> > > >
> > > >> > > >
> > > >> > > Exactly, same has been implemented in the PR. hadoop-thirdparty
> > > module
> > > >> > have
> > > >> > > its own releases. But in HADOOP Jira, thirdparty versions can be
> > > >> > > differentiated using prefix "thirdparty-".
> > > >> > >
> > > >> > > Same solution is being followed in HBase. May be people involved
> > in
> > > >> HBase
> > > >> > > can add some points here.
> > > >> > >
> > > >> > > Thoughts?
> > > >> > > >
> > > >> > > > .. Owen
> > > >> > > >
> > > >> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> > > >> vinayakumarb@apache.org
> > > >> > >
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > >> Hi All,
> > > >> > > >>
> > > >> > > >>    I wanted to discuss about the separate repo for thirdparty
> > > >> > > dependencies
> > > >> > > >> which we need to shaded and include in Hadoop component's
> jars.
> > > >> > > >>
> > > >> > > >>    Apologies for the big text ahead, but this needs clear
> > > >> > explanation!!
> > > >> > > >>
> > > >> > > >>    Right now most needed such dependency is protobuf.
> Protobuf
> > > >> > > dependency
> > > >> > > >> was not upgraded from 2.5.0 onwards with the fear that
> > downstream
> > > >> > > builds,
> > > >> > > >> which depends on transitive dependency protobuf coming from
> > > >> hadoop's
> > > >> > > jars,
> > > >> > > >> may fail with the upgrade. Apparently protobuf does not
> > guarantee
> > > >> > source
> > > >> > > >> compatibility, though it guarantees wire compatibility
> between
> > > >> > versions.
> > > >> > > >> Because of this behavior, version upgrade may cause breakage
> in
> > > >> known
> > > >> > > and
> > > >> > > >> unknown (private?) downstreams.
> > > >> > > >>
> > > >> > > >>    So to tackle this, we came up the following proposal in
> > > >> > HADOOP-13363.
> > > >> > > >>
> > > >> > > >>    Luckily, As far as I know, no APIs, either public to user
> or
> > > >> > between
> > > >> > > >> Hadoop processes, is not directly using protobuf classes in
> > > >> > signatures.
> > > >> > > >> (If
> > > >> > > >> any exist, please let us know).
> > > >> > > >>
> > > >> > > >>    Proposal:
> > > >> > > >>    ------------
> > > >> > > >>
> > > >> > > >>    1. Create a artifact(s) which contains shaded
> dependencies.
> > > All
> > > >> > such
> > > >> > > >> shading/relocation will be with known prefix
> > > >> > > >> **org.apache.hadoop.thirdparty.**.
> > > >> > > >>    2. Right now protobuf jar (ex:
> > > >> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
> > > >> > > >> to start with, all **com.google.protobuf** classes will be
> > > >> relocated
> > > >> > as
> > > >> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> > > >> > > >>    3. Hadoop modules, which needs protobuf as dependency,
> will
> > > add
> > > >> > this
> > > >> > > >> shaded artifact as dependency (ex:
> > > >> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> > > >> > > >>    4. All previous usages of "com.google.protobuf" will be
> > > >> relocated
> > > >> > to
> > > >> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the
> code
> > > and
> > > >> > will
> > > >> > > be
> > > >> > > >> committed. Please note, this replacement is One-Time directly
> > in
> > > >> > source
> > > >> > > >> code, NOT during compile and package.
> > > >> > > >>    5. Once all usages of "com.google.protobuf" is relocated,
> > then
> > > >> > hadoop
> > > >> > > >> dont care about which version of original  "protobuf-java" is
> > in
> > > >> > > >> dependency.
> > > >> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not
> to
> > > >> break
> > > >> > > the
> > > >> > > >> downstreams. But hadoop will be originally using the latest
> > > >> protobuf
> > > >> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> > > >> > > >>
> > > >> > > >>    7. Coming back to separate repo, Following are most
> > > appropriate
> > > >> > > reasons
> > > >> > > >> of keeping shaded dependency artifact in separate repo
> instead
> > of
> > > >> > > >> submodule.
> > > >> > > >>
> > > >> > > >>      7a. These artifacts need not be built all the time. It
> > needs
> > > >> to
> > > >> > be
> > > >> > > >> built only when there is a change in the dependency version
> or
> > > the
> > > >> > build
> > > >> > > >> process.
> > > >> > > >>      7b. If added as "submodule in Hadoop repo",
> > > >> > > maven-shade-plugin:shade
> > > >> > > >> will execute only in package phase. That means, "mvn compile"
> > or
> > > >> "mvn
> > > >> > > >> test-compile" will not be failed as this artifact will not
> have
> > > >> > > relocated
> > > >> > > >> classes, instead it will have original classes, resulting in
> > > >> > compilation
> > > >> > > >> failure. Workaround, build thirdparty submodule first and
> > exclude
> > > >> > > >> "thirdparty" submodule in other executions. This will be a
> > > complex
> > > >> > > process
> > > >> > > >> compared to keeping in a separate repo.
> > > >> > > >>
> > > >> > > >>      7c. Separate repo, will be a subproject of Hadoop, using
> > the
> > > >> > same
> > > >> > > >> HADOOP jira project, with different versioning prefixed with
> > > >> > > "thirdparty-"
> > > >> > > >> (ex: thirdparty-1.0.0).
> > > >> > > >>      7d. Separate will have same release process as Hadoop.
> > > >> > > >>
> > > >> > > >>    HADOOP-13363 (
> > > >> https://issues.apache.org/jira/browse/HADOOP-13363)
> > > >> > > is
> > > >> > > >> an
> > > >> > > >> umbrella jira tracking the changes to protobuf upgrade.
> > > >> > > >>
> > > >> > > >>    PR (https://github.com/apache/hadoop-thirdparty/pull/1)
> has
> > > >> been
> > > >> > > >> raised
> > > >> > > >> for separate repo creation in (HADOOP-16595 (
> > > >> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> > > >> > > >>
> > > >> > > >>    Please provide your inputs for the proposal and review the
> > PR
> > > >> to
> > > >> > > >> proceed with the proposal.
> > > >> > > >>
> > > >> > > >>
> > > >> > > >    -Thanks,
> > > >> > > >>    Vinay
> > > >> > > >>
> > > >> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> > > >> > > >> vinodkv@apache.org>
> > > >> > > >> wrote:
> > > >> > > >>
> > > >> > > >> > Moving the thread to the dev lists.
> > > >> > > >> >
> > > >> > > >> > Thanks
> > > >> > > >> > +Vinod
> > > >> > > >> >
> > > >> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> > > >> > > vinayakumarb@apache.org>
> > > >> > > >> > wrote:
> > > >> > > >> > >
> > > >> > > >> > > Thanks Marton,
> > > >> > > >> > >
> > > >> > > >> > > Current created 'hadoop-thirdparty' repo is empty right
> > now.
> > > >> > > >> > > Whether to use that repo  for shaded artifact or not will
> > be
> > > >> > > >> monitored in
> > > >> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> > > >> > discussion.
> > > >> > > >> > >
> > > >> > > >> > > There is no existing codebase is being moved out of
> hadoop
> > > >> repo.
> > > >> > So
> > > >> > > I
> > > >> > > >> > think
> > > >> > > >> > > right now we are good to go.
> > > >> > > >> > >
> > > >> > > >> > > -Vinay
> > > >> > > >> > >
> > > >> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
> > > elek@apache.org>
> > > >> > > wrote:
> > > >> > > >> > >
> > > >> > > >> > >>
> > > >> > > >> > >> I am not sure if it's defined when is a vote required.
> > > >> > > >> > >>
> > > >> > > >> > >> https://www.apache.org/foundation/voting.html
> > > >> > > >> > >>
> > > >> > > >> > >> Personally I think it's a big enough change to send a
> > > >> > notification
> > > >> > > to
> > > >> > > >> > the
> > > >> > > >> > >> dev lists with a 'lazy consensus'  closure
> > > >> > > >> > >>
> > > >> > > >> > >> Marton
> > > >> > > >> > >>
> > > >> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
> > > >> vinayakumarb@apache.org>
> > > >> > > >> wrote:
> > > >> > > >> > >>> Hi,
> > > >> > > >> > >>>
> > > >> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may
> > be
> > > >> more
> > > >> > in
> > > >> > > >> > >> future)
> > > >> > > >> > >>> will be kept as a shaded artifact in a separate repo,
> > which
> > > >> will
> > > >> > > be
> > > >> > > >> > >>> referred as dependency in hadoop modules.  This
> approach
> > > >> avoids
> > > >> > > >> shading
> > > >> > > >> > >> of
> > > >> > > >> > >>> every submodule during build.
> > > >> > > >> > >>>
> > > >> > > >> > >>> So question is does any VOTE required before asking to
> > > >> create a
> > > >> > > git
> > > >> > > >> > repo?
> > > >> > > >> > >>>
> > > >> > > >> > >>> On selfserve platform
> > > >> > > https://gitbox.apache.org/setup/newrepo.html
> > > >> > > >> > >>> I can access see that, requester should be PMC.
> > > >> > > >> > >>>
> > > >> > > >> > >>> Wanted to confirm here first.
> > > >> > > >> > >>>
> > > >> > > >> > >>> -Vinay
> > > >> > > >> > >>>
> > > >> > > >> > >>
> > > >> > > >> > >>
> > > >> > >
> > > ---------------------------------------------------------------------
> > > >> > > >> > >> To unsubscribe, e-mail:
> > > private-unsubscribe@hadoop.apache.org
> > > >> > > >> > >> For additional commands, e-mail:
> > > >> private-help@hadoop.apache.org
> > > >> > > >> > >>
> > > >> > > >> > >>
> > > >> > > >> >
> > > >> > > >> >
> > > >> > > >>
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> >
>
-- 



--Brahma Reddy Battula

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Brahma Reddy Battula <br...@apache.org>.
Hi Sree vaddi,Owen,stack,Duo Zhang,

We can move forward based on your comments, just waiting for your
reply.Hope all of your comments answered..(unification we can think
parallel thread as Vinay mentioned).



On Mon, 6 Jan 2020 at 6:21 PM, Vinayakumar B <vi...@apache.org>
wrote:

> Hi Sree,
>
> > apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> Project ? Or as a TLP ?
> > Or as a new project definition ?
> As already mentioned by Ayush, this will be a subproject of Hadoop.
> Releases will be voted by Hadoop PMC as per ASF process.
>
>
> > The effort to streamline and put in an accepted standard for the
> dependencies that require shading,
> > seems beyond the siloed efforts of hadoop, hbase, etc....
>
> >I propose, we bring all the decision makers from all these artifacts in
> one room and decide best course of action.
> > I am looking at, no projects should ever had to shade any artifacts
> except as an absolute necessary alternative.
>
> This is the ideal proposal for any project. But unfortunately some projects
> takes their own course based on need.
>
> In the current case of protobuf in Hadoop,
>     Protobuf upgrade from 2.5.0 (which is already EOL) was not taken up to
> avoid downstream failures. Since Hadoop is a platform, its dependencies
> will get added to downstream projects' classpath. So any change in Hadoop's
> dependencies will directly affect downstreams. Hadoop strictly follows
> backward compatibility as far as possible.
>     Though protobuf provides wire compatibility b/w versions, it doesnt
> provide compatibility for generated sources.
>     Now, to support ARM protobuf upgrade is mandatory. Using shading
> technique, In Hadoop internally can upgrade to shaded protobuf 3.x and
> still have 2.5.0 protobuf (deprecated) for downstreams.
>
> This shading is necessary to have both versions of protobuf supported.
> (2.5.0 (non-shaded) for downstream's classpath and 3.x (shaded) for
> hadoop's internal usage).
> And this entire work to be done before 3.3.0 release.
>
> So, though its ideal to make a common approach for all projects, I suggest
> for Hadoop we can go ahead as per current approach.
> We can also start the parallel effort to address these problems in a
> separate discussion/proposal. Once the solution is available we can revisit
> and adopt new solution accordingly in all such projects (ex: HBase, Hadoop,
> Ratis).
>
> -Vinay
>
> On Mon, Jan 6, 2020 at 12:39 AM Ayush Saxena <ay...@gmail.com> wrote:
>
> > Hey Sree
> >
> > > apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> > > Project ? Or as a TLP ?
> > > Or as a new project definition ?
> > >
> > A sub project of Apache Hadoop, having its own independent release
> cycles.
> > May be you can put this into the same column as ozone or as
> > submarine(couple of months ago).
> >
> > Unifying for all, seems interesting but each project is independent and
> has
> > its own limitations and way of thinking, I don't think it would be an
> easy
> > task to bring all on the same table and get them agree to a common stuff.
> >
> > I guess this has been into discussion since quite long, and there hasn't
> > been any other alternative suggested. Still we can hold up for a week, if
> > someone comes up with a better solution, else we can continue in the
> > present direction.
> >
> > -Ayush
> >
> >
> >
> > On Sun, 5 Jan 2020 at 05:03, Sree Vaddi <sree_at_chess@yahoo.com
> .invalid>
> > wrote:
> >
> > > apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> > > Project ? Or as a TLP ?
> > > Or as a new project definition ?
> > >
> > > The effort to streamline and put in an accepted standard for the
> > > dependencies that require shading,seems beyond the siloed efforts of
> > > hadoop, hbase, etc....
> > >
> > > I propose, we bring all the decision makers from all these artifacts in
> > > one room and decide best course of action.I am looking at, no projects
> > > should ever had to shade any artifacts except as an absolute necessary
> > > alternative.
> > >
> > >
> > > Thank you./Sree
> > >
> > >
> > >
> > >     On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
> > > vinayakumarb@apache.org> wrote:
> > >
> > >  Hi,
> > > Sorry for the late reply,.
> > > >>> To be exact, how can we better use the thirdparty repo? Looking at
> > > HBase as an example, it looks like everything that are known to break a
> > lot
> > > after an update get shaded into the hbase-thirdparty artifact: guava,
> > > netty, ... etc.
> > > Is it the purpose to isolate these naughty dependencies?
> > > Yes, shading is to isolate these naughty dependencies from downstream
> > > classpath and have independent control on these upgrades without
> breaking
> > > downstreams.
> > >
> > > First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create
> > the
> > > protobuf shaded jar is ready to merge.
> > >
> > > Please take a look if anyone interested, will be merged may be after
> two
> > > days if no objections.
> > >
> > > -Vinay
> > >
> > >
> > > On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org>
> > > wrote:
> > >
> > > > Hi I am late to this but I am keen to understand more.
> > > >
> > > > To be exact, how can we better use the thirdparty repo? Looking at
> > HBase
> > > > as an example, it looks like everything that are known to break a lot
> > > after
> > > > an update get shaded into the hbase-thirdparty artifact: guava,
> netty,
> > > ...
> > > > etc.
> > > > Is it the purpose to isolate these naughty dependencies?
> > > >
> > > > On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <
> vinayakumarb@apache.org
> > >
> > > > wrote:
> > > >
> > > >> Hi All,
> > > >>
> > > >> I have updated the PR as per @Owen O'Malley <owen.omalley@gmail.com
> >
> > > >> 's suggestions.
> > > >>
> > > >>    i. Renamed the module to 'hadoop-shaded-protobuf37'
> > > >>    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
> > > >>
> > > >> Please review!!
> > > >>
> > > >> Thanks,
> > > >> -Vinay
> > > >>
> > > >>
> > > >> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <
> palomino219@gmail.com
> > >
> > > >> wrote:
> > > >>
> > > >> > For HBase we have a separated repo for hbase-thirdparty
> > > >> >
> > > >> > https://github.com/apache/hbase-thirdparty
> > > >> >
> > > >> > We will publish the artifacts to nexus so we do not need to
> include
> > > >> > binaries in our git repo, just add a dependency in the pom.
> > > >> >
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> > > >> >
> > > >> >
> > > >> > And it has its own release cycles, only when there are special
> > > >> requirements
> > > >> > or we want to upgrade some of the dependencies. This is the vote
> > > thread
> > > >> for
> > > >> > the newest release, where we want to provide a shaded gson for
> jdk7.
> > > >> >
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> > > >> >
> > > >> >
> > > >> > Thanks.
> > > >> >
> > > >> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
> > > >> >
> > > >> > > Please find replies inline.
> > > >> > >
> > > >> > > -Vinay
> > > >> > >
> > > >> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
> > > >> owen.omalley@gmail.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > I'm very unhappy with this direction. In particular, I don't
> > think
> > > >> git
> > > >> > is
> > > >> > > > a good place for distribution of binary artifacts.
> Furthermore,
> > > the
> > > >> PMC
> > > >> > > > shouldn't be releasing anything without a release vote.
> > > >> > > >
> > > >> > > >
> > > >> > > Proposed solution doesnt release any binaries in git. Its
> > actually a
> > > >> > > complete sub-project which follows entire release process,
> > including
> > > >> VOTE
> > > >> > > in public. I have mentioned already that release process is
> > similar
> > > to
> > > >> > > hadoop.
> > > >> > > To be specific, using the (almost) same script used in hadoop to
> > > >> generate
> > > >> > > artifacts, sign and deploy to staging repository. Please let me
> > know
> > > >> If I
> > > >> > > am conveying anything wrong.
> > > >> > >
> > > >> > >
> > > >> > > > I'd propose that we make a third party module that contains
> the
> > > >> > *source*
> > > >> > > > of the pom files to build the relocated jars. This should
> > > >> absolutely be
> > > >> > > > treated as a last resort for the mostly Google projects that
> > > >> regularly
> > > >> > > > break binary compatibility (eg. Protobuf & Guava).
> > > >> > > >
> > > >> > > >
> > > >> > > Same has been implemented in the PR
> > > >> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please
> check
> > > and
> > > >> let
> > > >> > > me
> > > >> > > know If I misunderstood. Yes, this is the last option we have
> > AFAIK.
> > > >> > >
> > > >> > >
> > > >> > > > In terms of naming, I'd propose something like:
> > > >> > > >
> > > >> > > > org.apache.hadoop.thirdparty.protobuf2_5
> > > >> > > > org.apache.hadoop.thirdparty.guava28
> > > >> > > >
> > > >> > > > In particular, I think we absolutely need to include the
> version
> > > of
> > > >> the
> > > >> > > > underlying project. On the other hand, since we should not be
> > > >> shading
> > > >> > > > *everything* we can drop the leading com.google.
> > > >> > > >
> > > >> > > >
> > > >> > > IMO, This naming convention is easy for identifying the
> underlying
> > > >> > project,
> > > >> > > but  it will be difficult to maintain going forward if
> underlying
> > > >> project
> > > >> > > versions changes. Since thirdparty module have its own releases,
> > > each
> > > >> of
> > > >> > > those release can be mapped to specific version of underlying
> > > project.
> > > >> > Even
> > > >> > > the binary artifact can include a MANIFEST with underlying
> project
> > > >> > details
> > > >> > > as per Steve's suggestion on HADOOP-13363.
> > > >> > > That said, if you still prefer to have project number in
> artifact
> > > id,
> > > >> it
> > > >> > > can be done.
> > > >> > >
> > > >> > > The Hadoop project can make releases of  the thirdparty module:
> > > >> > > >
> > > >> > > > <dependency>
> > > >> > > >  <groupId>org.apache.hadoop</groupId>
> > > >> > > >  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> > > >> > > >  <version>1.0</version>
> > > >> > > > </dependency>
> > > >> > > >
> > > >> > > >
> > > >> > > Note that the version has to be the hadoop thirdparty release
> > > number,
> > > >> > which
> > > >> > > > is part of why you need to have the underlying version in the
> > > >> artifact
> > > >> > > > name. These we can push to maven central as new releases from
> > > >> Hadoop.
> > > >> > > >
> > > >> > > >
> > > >> > > Exactly, same has been implemented in the PR. hadoop-thirdparty
> > > module
> > > >> > have
> > > >> > > its own releases. But in HADOOP Jira, thirdparty versions can be
> > > >> > > differentiated using prefix "thirdparty-".
> > > >> > >
> > > >> > > Same solution is being followed in HBase. May be people involved
> > in
> > > >> HBase
> > > >> > > can add some points here.
> > > >> > >
> > > >> > > Thoughts?
> > > >> > > >
> > > >> > > > .. Owen
> > > >> > > >
> > > >> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> > > >> vinayakumarb@apache.org
> > > >> > >
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > >> Hi All,
> > > >> > > >>
> > > >> > > >>    I wanted to discuss about the separate repo for thirdparty
> > > >> > > dependencies
> > > >> > > >> which we need to shaded and include in Hadoop component's
> jars.
> > > >> > > >>
> > > >> > > >>    Apologies for the big text ahead, but this needs clear
> > > >> > explanation!!
> > > >> > > >>
> > > >> > > >>    Right now most needed such dependency is protobuf.
> Protobuf
> > > >> > > dependency
> > > >> > > >> was not upgraded from 2.5.0 onwards with the fear that
> > downstream
> > > >> > > builds,
> > > >> > > >> which depends on transitive dependency protobuf coming from
> > > >> hadoop's
> > > >> > > jars,
> > > >> > > >> may fail with the upgrade. Apparently protobuf does not
> > guarantee
> > > >> > source
> > > >> > > >> compatibility, though it guarantees wire compatibility
> between
> > > >> > versions.
> > > >> > > >> Because of this behavior, version upgrade may cause breakage
> in
> > > >> known
> > > >> > > and
> > > >> > > >> unknown (private?) downstreams.
> > > >> > > >>
> > > >> > > >>    So to tackle this, we came up the following proposal in
> > > >> > HADOOP-13363.
> > > >> > > >>
> > > >> > > >>    Luckily, As far as I know, no APIs, either public to user
> or
> > > >> > between
> > > >> > > >> Hadoop processes, is not directly using protobuf classes in
> > > >> > signatures.
> > > >> > > >> (If
> > > >> > > >> any exist, please let us know).
> > > >> > > >>
> > > >> > > >>    Proposal:
> > > >> > > >>    ------------
> > > >> > > >>
> > > >> > > >>    1. Create a artifact(s) which contains shaded
> dependencies.
> > > All
> > > >> > such
> > > >> > > >> shading/relocation will be with known prefix
> > > >> > > >> **org.apache.hadoop.thirdparty.**.
> > > >> > > >>    2. Right now protobuf jar (ex:
> > > >> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
> > > >> > > >> to start with, all **com.google.protobuf** classes will be
> > > >> relocated
> > > >> > as
> > > >> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> > > >> > > >>    3. Hadoop modules, which needs protobuf as dependency,
> will
> > > add
> > > >> > this
> > > >> > > >> shaded artifact as dependency (ex:
> > > >> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> > > >> > > >>    4. All previous usages of "com.google.protobuf" will be
> > > >> relocated
> > > >> > to
> > > >> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the
> code
> > > and
> > > >> > will
> > > >> > > be
> > > >> > > >> committed. Please note, this replacement is One-Time directly
> > in
> > > >> > source
> > > >> > > >> code, NOT during compile and package.
> > > >> > > >>    5. Once all usages of "com.google.protobuf" is relocated,
> > then
> > > >> > hadoop
> > > >> > > >> dont care about which version of original  "protobuf-java" is
> > in
> > > >> > > >> dependency.
> > > >> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not
> to
> > > >> break
> > > >> > > the
> > > >> > > >> downstreams. But hadoop will be originally using the latest
> > > >> protobuf
> > > >> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> > > >> > > >>
> > > >> > > >>    7. Coming back to separate repo, Following are most
> > > appropriate
> > > >> > > reasons
> > > >> > > >> of keeping shaded dependency artifact in separate repo
> instead
> > of
> > > >> > > >> submodule.
> > > >> > > >>
> > > >> > > >>      7a. These artifacts need not be built all the time. It
> > needs
> > > >> to
> > > >> > be
> > > >> > > >> built only when there is a change in the dependency version
> or
> > > the
> > > >> > build
> > > >> > > >> process.
> > > >> > > >>      7b. If added as "submodule in Hadoop repo",
> > > >> > > maven-shade-plugin:shade
> > > >> > > >> will execute only in package phase. That means, "mvn compile"
> > or
> > > >> "mvn
> > > >> > > >> test-compile" will not be failed as this artifact will not
> have
> > > >> > > relocated
> > > >> > > >> classes, instead it will have original classes, resulting in
> > > >> > compilation
> > > >> > > >> failure. Workaround, build thirdparty submodule first and
> > exclude
> > > >> > > >> "thirdparty" submodule in other executions. This will be a
> > > complex
> > > >> > > process
> > > >> > > >> compared to keeping in a separate repo.
> > > >> > > >>
> > > >> > > >>      7c. Separate repo, will be a subproject of Hadoop, using
> > the
> > > >> > same
> > > >> > > >> HADOOP jira project, with different versioning prefixed with
> > > >> > > "thirdparty-"
> > > >> > > >> (ex: thirdparty-1.0.0).
> > > >> > > >>      7d. Separate will have same release process as Hadoop.
> > > >> > > >>
> > > >> > > >>    HADOOP-13363 (
> > > >> https://issues.apache.org/jira/browse/HADOOP-13363)
> > > >> > > is
> > > >> > > >> an
> > > >> > > >> umbrella jira tracking the changes to protobuf upgrade.
> > > >> > > >>
> > > >> > > >>    PR (https://github.com/apache/hadoop-thirdparty/pull/1)
> has
> > > >> been
> > > >> > > >> raised
> > > >> > > >> for separate repo creation in (HADOOP-16595 (
> > > >> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> > > >> > > >>
> > > >> > > >>    Please provide your inputs for the proposal and review the
> > PR
> > > >> to
> > > >> > > >> proceed with the proposal.
> > > >> > > >>
> > > >> > > >>
> > > >> > > >    -Thanks,
> > > >> > > >>    Vinay
> > > >> > > >>
> > > >> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> > > >> > > >> vinodkv@apache.org>
> > > >> > > >> wrote:
> > > >> > > >>
> > > >> > > >> > Moving the thread to the dev lists.
> > > >> > > >> >
> > > >> > > >> > Thanks
> > > >> > > >> > +Vinod
> > > >> > > >> >
> > > >> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> > > >> > > vinayakumarb@apache.org>
> > > >> > > >> > wrote:
> > > >> > > >> > >
> > > >> > > >> > > Thanks Marton,
> > > >> > > >> > >
> > > >> > > >> > > Current created 'hadoop-thirdparty' repo is empty right
> > now.
> > > >> > > >> > > Whether to use that repo  for shaded artifact or not will
> > be
> > > >> > > >> monitored in
> > > >> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> > > >> > discussion.
> > > >> > > >> > >
> > > >> > > >> > > There is no existing codebase is being moved out of
> hadoop
> > > >> repo.
> > > >> > So
> > > >> > > I
> > > >> > > >> > think
> > > >> > > >> > > right now we are good to go.
> > > >> > > >> > >
> > > >> > > >> > > -Vinay
> > > >> > > >> > >
> > > >> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
> > > elek@apache.org>
> > > >> > > wrote:
> > > >> > > >> > >
> > > >> > > >> > >>
> > > >> > > >> > >> I am not sure if it's defined when is a vote required.
> > > >> > > >> > >>
> > > >> > > >> > >> https://www.apache.org/foundation/voting.html
> > > >> > > >> > >>
> > > >> > > >> > >> Personally I think it's a big enough change to send a
> > > >> > notification
> > > >> > > to
> > > >> > > >> > the
> > > >> > > >> > >> dev lists with a 'lazy consensus'  closure
> > > >> > > >> > >>
> > > >> > > >> > >> Marton
> > > >> > > >> > >>
> > > >> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
> > > >> vinayakumarb@apache.org>
> > > >> > > >> wrote:
> > > >> > > >> > >>> Hi,
> > > >> > > >> > >>>
> > > >> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may
> > be
> > > >> more
> > > >> > in
> > > >> > > >> > >> future)
> > > >> > > >> > >>> will be kept as a shaded artifact in a separate repo,
> > which
> > > >> will
> > > >> > > be
> > > >> > > >> > >>> referred as dependency in hadoop modules.  This
> approach
> > > >> avoids
> > > >> > > >> shading
> > > >> > > >> > >> of
> > > >> > > >> > >>> every submodule during build.
> > > >> > > >> > >>>
> > > >> > > >> > >>> So question is does any VOTE required before asking to
> > > >> create a
> > > >> > > git
> > > >> > > >> > repo?
> > > >> > > >> > >>>
> > > >> > > >> > >>> On selfserve platform
> > > >> > > https://gitbox.apache.org/setup/newrepo.html
> > > >> > > >> > >>> I can access see that, requester should be PMC.
> > > >> > > >> > >>>
> > > >> > > >> > >>> Wanted to confirm here first.
> > > >> > > >> > >>>
> > > >> > > >> > >>> -Vinay
> > > >> > > >> > >>>
> > > >> > > >> > >>
> > > >> > > >> > >>
> > > >> > >
> > > ---------------------------------------------------------------------
> > > >> > > >> > >> To unsubscribe, e-mail:
> > > private-unsubscribe@hadoop.apache.org
> > > >> > > >> > >> For additional commands, e-mail:
> > > >> private-help@hadoop.apache.org
> > > >> > > >> > >>
> > > >> > > >> > >>
> > > >> > > >> >
> > > >> > > >> >
> > > >> > > >>
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> >
>
-- 



--Brahma Reddy Battula

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Vinayakumar B <vi...@apache.org>.
Hi Sree,

> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
Project ? Or as a TLP ?
> Or as a new project definition ?
As already mentioned by Ayush, this will be a subproject of Hadoop.
Releases will be voted by Hadoop PMC as per ASF process.


> The effort to streamline and put in an accepted standard for the
dependencies that require shading,
> seems beyond the siloed efforts of hadoop, hbase, etc....

>I propose, we bring all the decision makers from all these artifacts in
one room and decide best course of action.
> I am looking at, no projects should ever had to shade any artifacts
except as an absolute necessary alternative.

This is the ideal proposal for any project. But unfortunately some projects
takes their own course based on need.

In the current case of protobuf in Hadoop,
    Protobuf upgrade from 2.5.0 (which is already EOL) was not taken up to
avoid downstream failures. Since Hadoop is a platform, its dependencies
will get added to downstream projects' classpath. So any change in Hadoop's
dependencies will directly affect downstreams. Hadoop strictly follows
backward compatibility as far as possible.
    Though protobuf provides wire compatibility b/w versions, it doesnt
provide compatibility for generated sources.
    Now, to support ARM protobuf upgrade is mandatory. Using shading
technique, In Hadoop internally can upgrade to shaded protobuf 3.x and
still have 2.5.0 protobuf (deprecated) for downstreams.

This shading is necessary to have both versions of protobuf supported.
(2.5.0 (non-shaded) for downstream's classpath and 3.x (shaded) for
hadoop's internal usage).
And this entire work to be done before 3.3.0 release.

So, though its ideal to make a common approach for all projects, I suggest
for Hadoop we can go ahead as per current approach.
We can also start the parallel effort to address these problems in a
separate discussion/proposal. Once the solution is available we can revisit
and adopt new solution accordingly in all such projects (ex: HBase, Hadoop,
Ratis).

-Vinay

On Mon, Jan 6, 2020 at 12:39 AM Ayush Saxena <ay...@gmail.com> wrote:

> Hey Sree
>
> > apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> > Project ? Or as a TLP ?
> > Or as a new project definition ?
> >
> A sub project of Apache Hadoop, having its own independent release cycles.
> May be you can put this into the same column as ozone or as
> submarine(couple of months ago).
>
> Unifying for all, seems interesting but each project is independent and has
> its own limitations and way of thinking, I don't think it would be an easy
> task to bring all on the same table and get them agree to a common stuff.
>
> I guess this has been into discussion since quite long, and there hasn't
> been any other alternative suggested. Still we can hold up for a week, if
> someone comes up with a better solution, else we can continue in the
> present direction.
>
> -Ayush
>
>
>
> On Sun, 5 Jan 2020 at 05:03, Sree Vaddi <sr...@yahoo.com.invalid>
> wrote:
>
> > apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> > Project ? Or as a TLP ?
> > Or as a new project definition ?
> >
> > The effort to streamline and put in an accepted standard for the
> > dependencies that require shading,seems beyond the siloed efforts of
> > hadoop, hbase, etc....
> >
> > I propose, we bring all the decision makers from all these artifacts in
> > one room and decide best course of action.I am looking at, no projects
> > should ever had to shade any artifacts except as an absolute necessary
> > alternative.
> >
> >
> > Thank you./Sree
> >
> >
> >
> >     On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
> > vinayakumarb@apache.org> wrote:
> >
> >  Hi,
> > Sorry for the late reply,.
> > >>> To be exact, how can we better use the thirdparty repo? Looking at
> > HBase as an example, it looks like everything that are known to break a
> lot
> > after an update get shaded into the hbase-thirdparty artifact: guava,
> > netty, ... etc.
> > Is it the purpose to isolate these naughty dependencies?
> > Yes, shading is to isolate these naughty dependencies from downstream
> > classpath and have independent control on these upgrades without breaking
> > downstreams.
> >
> > First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create
> the
> > protobuf shaded jar is ready to merge.
> >
> > Please take a look if anyone interested, will be merged may be after two
> > days if no objections.
> >
> > -Vinay
> >
> >
> > On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org>
> > wrote:
> >
> > > Hi I am late to this but I am keen to understand more.
> > >
> > > To be exact, how can we better use the thirdparty repo? Looking at
> HBase
> > > as an example, it looks like everything that are known to break a lot
> > after
> > > an update get shaded into the hbase-thirdparty artifact: guava, netty,
> > ...
> > > etc.
> > > Is it the purpose to isolate these naughty dependencies?
> > >
> > > On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vinayakumarb@apache.org
> >
> > > wrote:
> > >
> > >> Hi All,
> > >>
> > >> I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
> > >> 's suggestions.
> > >>
> > >>    i. Renamed the module to 'hadoop-shaded-protobuf37'
> > >>    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
> > >>
> > >> Please review!!
> > >>
> > >> Thanks,
> > >> -Vinay
> > >>
> > >>
> > >> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <palomino219@gmail.com
> >
> > >> wrote:
> > >>
> > >> > For HBase we have a separated repo for hbase-thirdparty
> > >> >
> > >> > https://github.com/apache/hbase-thirdparty
> > >> >
> > >> > We will publish the artifacts to nexus so we do not need to include
> > >> > binaries in our git repo, just add a dependency in the pom.
> > >> >
> > >> >
> > >> >
> > >>
> >
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> > >> >
> > >> >
> > >> > And it has its own release cycles, only when there are special
> > >> requirements
> > >> > or we want to upgrade some of the dependencies. This is the vote
> > thread
> > >> for
> > >> > the newest release, where we want to provide a shaded gson for jdk7.
> > >> >
> > >> >
> > >> >
> > >>
> >
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> > >> >
> > >> >
> > >> > Thanks.
> > >> >
> > >> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
> > >> >
> > >> > > Please find replies inline.
> > >> > >
> > >> > > -Vinay
> > >> > >
> > >> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
> > >> owen.omalley@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > I'm very unhappy with this direction. In particular, I don't
> think
> > >> git
> > >> > is
> > >> > > > a good place for distribution of binary artifacts. Furthermore,
> > the
> > >> PMC
> > >> > > > shouldn't be releasing anything without a release vote.
> > >> > > >
> > >> > > >
> > >> > > Proposed solution doesnt release any binaries in git. Its
> actually a
> > >> > > complete sub-project which follows entire release process,
> including
> > >> VOTE
> > >> > > in public. I have mentioned already that release process is
> similar
> > to
> > >> > > hadoop.
> > >> > > To be specific, using the (almost) same script used in hadoop to
> > >> generate
> > >> > > artifacts, sign and deploy to staging repository. Please let me
> know
> > >> If I
> > >> > > am conveying anything wrong.
> > >> > >
> > >> > >
> > >> > > > I'd propose that we make a third party module that contains the
> > >> > *source*
> > >> > > > of the pom files to build the relocated jars. This should
> > >> absolutely be
> > >> > > > treated as a last resort for the mostly Google projects that
> > >> regularly
> > >> > > > break binary compatibility (eg. Protobuf & Guava).
> > >> > > >
> > >> > > >
> > >> > > Same has been implemented in the PR
> > >> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check
> > and
> > >> let
> > >> > > me
> > >> > > know If I misunderstood. Yes, this is the last option we have
> AFAIK.
> > >> > >
> > >> > >
> > >> > > > In terms of naming, I'd propose something like:
> > >> > > >
> > >> > > > org.apache.hadoop.thirdparty.protobuf2_5
> > >> > > > org.apache.hadoop.thirdparty.guava28
> > >> > > >
> > >> > > > In particular, I think we absolutely need to include the version
> > of
> > >> the
> > >> > > > underlying project. On the other hand, since we should not be
> > >> shading
> > >> > > > *everything* we can drop the leading com.google.
> > >> > > >
> > >> > > >
> > >> > > IMO, This naming convention is easy for identifying the underlying
> > >> > project,
> > >> > > but  it will be difficult to maintain going forward if underlying
> > >> project
> > >> > > versions changes. Since thirdparty module have its own releases,
> > each
> > >> of
> > >> > > those release can be mapped to specific version of underlying
> > project.
> > >> > Even
> > >> > > the binary artifact can include a MANIFEST with underlying project
> > >> > details
> > >> > > as per Steve's suggestion on HADOOP-13363.
> > >> > > That said, if you still prefer to have project number in artifact
> > id,
> > >> it
> > >> > > can be done.
> > >> > >
> > >> > > The Hadoop project can make releases of  the thirdparty module:
> > >> > > >
> > >> > > > <dependency>
> > >> > > >  <groupId>org.apache.hadoop</groupId>
> > >> > > >  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> > >> > > >  <version>1.0</version>
> > >> > > > </dependency>
> > >> > > >
> > >> > > >
> > >> > > Note that the version has to be the hadoop thirdparty release
> > number,
> > >> > which
> > >> > > > is part of why you need to have the underlying version in the
> > >> artifact
> > >> > > > name. These we can push to maven central as new releases from
> > >> Hadoop.
> > >> > > >
> > >> > > >
> > >> > > Exactly, same has been implemented in the PR. hadoop-thirdparty
> > module
> > >> > have
> > >> > > its own releases. But in HADOOP Jira, thirdparty versions can be
> > >> > > differentiated using prefix "thirdparty-".
> > >> > >
> > >> > > Same solution is being followed in HBase. May be people involved
> in
> > >> HBase
> > >> > > can add some points here.
> > >> > >
> > >> > > Thoughts?
> > >> > > >
> > >> > > > .. Owen
> > >> > > >
> > >> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> > >> vinayakumarb@apache.org
> > >> > >
> > >> > > > wrote:
> > >> > > >
> > >> > > >> Hi All,
> > >> > > >>
> > >> > > >>    I wanted to discuss about the separate repo for thirdparty
> > >> > > dependencies
> > >> > > >> which we need to shaded and include in Hadoop component's jars.
> > >> > > >>
> > >> > > >>    Apologies for the big text ahead, but this needs clear
> > >> > explanation!!
> > >> > > >>
> > >> > > >>    Right now most needed such dependency is protobuf. Protobuf
> > >> > > dependency
> > >> > > >> was not upgraded from 2.5.0 onwards with the fear that
> downstream
> > >> > > builds,
> > >> > > >> which depends on transitive dependency protobuf coming from
> > >> hadoop's
> > >> > > jars,
> > >> > > >> may fail with the upgrade. Apparently protobuf does not
> guarantee
> > >> > source
> > >> > > >> compatibility, though it guarantees wire compatibility between
> > >> > versions.
> > >> > > >> Because of this behavior, version upgrade may cause breakage in
> > >> known
> > >> > > and
> > >> > > >> unknown (private?) downstreams.
> > >> > > >>
> > >> > > >>    So to tackle this, we came up the following proposal in
> > >> > HADOOP-13363.
> > >> > > >>
> > >> > > >>    Luckily, As far as I know, no APIs, either public to user or
> > >> > between
> > >> > > >> Hadoop processes, is not directly using protobuf classes in
> > >> > signatures.
> > >> > > >> (If
> > >> > > >> any exist, please let us know).
> > >> > > >>
> > >> > > >>    Proposal:
> > >> > > >>    ------------
> > >> > > >>
> > >> > > >>    1. Create a artifact(s) which contains shaded dependencies.
> > All
> > >> > such
> > >> > > >> shading/relocation will be with known prefix
> > >> > > >> **org.apache.hadoop.thirdparty.**.
> > >> > > >>    2. Right now protobuf jar (ex:
> > >> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
> > >> > > >> to start with, all **com.google.protobuf** classes will be
> > >> relocated
> > >> > as
> > >> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> > >> > > >>    3. Hadoop modules, which needs protobuf as dependency, will
> > add
> > >> > this
> > >> > > >> shaded artifact as dependency (ex:
> > >> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> > >> > > >>    4. All previous usages of "com.google.protobuf" will be
> > >> relocated
> > >> > to
> > >> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code
> > and
> > >> > will
> > >> > > be
> > >> > > >> committed. Please note, this replacement is One-Time directly
> in
> > >> > source
> > >> > > >> code, NOT during compile and package.
> > >> > > >>    5. Once all usages of "com.google.protobuf" is relocated,
> then
> > >> > hadoop
> > >> > > >> dont care about which version of original  "protobuf-java" is
> in
> > >> > > >> dependency.
> > >> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
> > >> break
> > >> > > the
> > >> > > >> downstreams. But hadoop will be originally using the latest
> > >> protobuf
> > >> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> > >> > > >>
> > >> > > >>    7. Coming back to separate repo, Following are most
> > appropriate
> > >> > > reasons
> > >> > > >> of keeping shaded dependency artifact in separate repo instead
> of
> > >> > > >> submodule.
> > >> > > >>
> > >> > > >>      7a. These artifacts need not be built all the time. It
> needs
> > >> to
> > >> > be
> > >> > > >> built only when there is a change in the dependency version or
> > the
> > >> > build
> > >> > > >> process.
> > >> > > >>      7b. If added as "submodule in Hadoop repo",
> > >> > > maven-shade-plugin:shade
> > >> > > >> will execute only in package phase. That means, "mvn compile"
> or
> > >> "mvn
> > >> > > >> test-compile" will not be failed as this artifact will not have
> > >> > > relocated
> > >> > > >> classes, instead it will have original classes, resulting in
> > >> > compilation
> > >> > > >> failure. Workaround, build thirdparty submodule first and
> exclude
> > >> > > >> "thirdparty" submodule in other executions. This will be a
> > complex
> > >> > > process
> > >> > > >> compared to keeping in a separate repo.
> > >> > > >>
> > >> > > >>      7c. Separate repo, will be a subproject of Hadoop, using
> the
> > >> > same
> > >> > > >> HADOOP jira project, with different versioning prefixed with
> > >> > > "thirdparty-"
> > >> > > >> (ex: thirdparty-1.0.0).
> > >> > > >>      7d. Separate will have same release process as Hadoop.
> > >> > > >>
> > >> > > >>    HADOOP-13363 (
> > >> https://issues.apache.org/jira/browse/HADOOP-13363)
> > >> > > is
> > >> > > >> an
> > >> > > >> umbrella jira tracking the changes to protobuf upgrade.
> > >> > > >>
> > >> > > >>    PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
> > >> been
> > >> > > >> raised
> > >> > > >> for separate repo creation in (HADOOP-16595 (
> > >> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> > >> > > >>
> > >> > > >>    Please provide your inputs for the proposal and review the
> PR
> > >> to
> > >> > > >> proceed with the proposal.
> > >> > > >>
> > >> > > >>
> > >> > > >    -Thanks,
> > >> > > >>    Vinay
> > >> > > >>
> > >> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> > >> > > >> vinodkv@apache.org>
> > >> > > >> wrote:
> > >> > > >>
> > >> > > >> > Moving the thread to the dev lists.
> > >> > > >> >
> > >> > > >> > Thanks
> > >> > > >> > +Vinod
> > >> > > >> >
> > >> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> > >> > > vinayakumarb@apache.org>
> > >> > > >> > wrote:
> > >> > > >> > >
> > >> > > >> > > Thanks Marton,
> > >> > > >> > >
> > >> > > >> > > Current created 'hadoop-thirdparty' repo is empty right
> now.
> > >> > > >> > > Whether to use that repo  for shaded artifact or not will
> be
> > >> > > >> monitored in
> > >> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> > >> > discussion.
> > >> > > >> > >
> > >> > > >> > > There is no existing codebase is being moved out of hadoop
> > >> repo.
> > >> > So
> > >> > > I
> > >> > > >> > think
> > >> > > >> > > right now we are good to go.
> > >> > > >> > >
> > >> > > >> > > -Vinay
> > >> > > >> > >
> > >> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
> > elek@apache.org>
> > >> > > wrote:
> > >> > > >> > >
> > >> > > >> > >>
> > >> > > >> > >> I am not sure if it's defined when is a vote required.
> > >> > > >> > >>
> > >> > > >> > >> https://www.apache.org/foundation/voting.html
> > >> > > >> > >>
> > >> > > >> > >> Personally I think it's a big enough change to send a
> > >> > notification
> > >> > > to
> > >> > > >> > the
> > >> > > >> > >> dev lists with a 'lazy consensus'  closure
> > >> > > >> > >>
> > >> > > >> > >> Marton
> > >> > > >> > >>
> > >> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
> > >> vinayakumarb@apache.org>
> > >> > > >> wrote:
> > >> > > >> > >>> Hi,
> > >> > > >> > >>>
> > >> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may
> be
> > >> more
> > >> > in
> > >> > > >> > >> future)
> > >> > > >> > >>> will be kept as a shaded artifact in a separate repo,
> which
> > >> will
> > >> > > be
> > >> > > >> > >>> referred as dependency in hadoop modules.  This approach
> > >> avoids
> > >> > > >> shading
> > >> > > >> > >> of
> > >> > > >> > >>> every submodule during build.
> > >> > > >> > >>>
> > >> > > >> > >>> So question is does any VOTE required before asking to
> > >> create a
> > >> > > git
> > >> > > >> > repo?
> > >> > > >> > >>>
> > >> > > >> > >>> On selfserve platform
> > >> > > https://gitbox.apache.org/setup/newrepo.html
> > >> > > >> > >>> I can access see that, requester should be PMC.
> > >> > > >> > >>>
> > >> > > >> > >>> Wanted to confirm here first.
> > >> > > >> > >>>
> > >> > > >> > >>> -Vinay
> > >> > > >> > >>>
> > >> > > >> > >>
> > >> > > >> > >>
> > >> > >
> > ---------------------------------------------------------------------
> > >> > > >> > >> To unsubscribe, e-mail:
> > private-unsubscribe@hadoop.apache.org
> > >> > > >> > >> For additional commands, e-mail:
> > >> private-help@hadoop.apache.org
> > >> > > >> > >>
> > >> > > >> > >>
> > >> > > >> >
> > >> > > >> >
> > >> > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Vinayakumar B <vi...@apache.org>.
Hi Sree,

> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
Project ? Or as a TLP ?
> Or as a new project definition ?
As already mentioned by Ayush, this will be a subproject of Hadoop.
Releases will be voted by Hadoop PMC as per ASF process.


> The effort to streamline and put in an accepted standard for the
dependencies that require shading,
> seems beyond the siloed efforts of hadoop, hbase, etc....

>I propose, we bring all the decision makers from all these artifacts in
one room and decide best course of action.
> I am looking at, no projects should ever had to shade any artifacts
except as an absolute necessary alternative.

This is the ideal proposal for any project. But unfortunately some projects
takes their own course based on need.

In the current case of protobuf in Hadoop,
    Protobuf upgrade from 2.5.0 (which is already EOL) was not taken up to
avoid downstream failures. Since Hadoop is a platform, its dependencies
will get added to downstream projects' classpath. So any change in Hadoop's
dependencies will directly affect downstreams. Hadoop strictly follows
backward compatibility as far as possible.
    Though protobuf provides wire compatibility b/w versions, it doesnt
provide compatibility for generated sources.
    Now, to support ARM protobuf upgrade is mandatory. Using shading
technique, In Hadoop internally can upgrade to shaded protobuf 3.x and
still have 2.5.0 protobuf (deprecated) for downstreams.

This shading is necessary to have both versions of protobuf supported.
(2.5.0 (non-shaded) for downstream's classpath and 3.x (shaded) for
hadoop's internal usage).
And this entire work to be done before 3.3.0 release.

So, though its ideal to make a common approach for all projects, I suggest
for Hadoop we can go ahead as per current approach.
We can also start the parallel effort to address these problems in a
separate discussion/proposal. Once the solution is available we can revisit
and adopt new solution accordingly in all such projects (ex: HBase, Hadoop,
Ratis).

-Vinay

On Mon, Jan 6, 2020 at 12:39 AM Ayush Saxena <ay...@gmail.com> wrote:

> Hey Sree
>
> > apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> > Project ? Or as a TLP ?
> > Or as a new project definition ?
> >
> A sub project of Apache Hadoop, having its own independent release cycles.
> May be you can put this into the same column as ozone or as
> submarine(couple of months ago).
>
> Unifying for all, seems interesting but each project is independent and has
> its own limitations and way of thinking, I don't think it would be an easy
> task to bring all on the same table and get them agree to a common stuff.
>
> I guess this has been into discussion since quite long, and there hasn't
> been any other alternative suggested. Still we can hold up for a week, if
> someone comes up with a better solution, else we can continue in the
> present direction.
>
> -Ayush
>
>
>
> On Sun, 5 Jan 2020 at 05:03, Sree Vaddi <sr...@yahoo.com.invalid>
> wrote:
>
> > apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> > Project ? Or as a TLP ?
> > Or as a new project definition ?
> >
> > The effort to streamline and put in an accepted standard for the
> > dependencies that require shading,seems beyond the siloed efforts of
> > hadoop, hbase, etc....
> >
> > I propose, we bring all the decision makers from all these artifacts in
> > one room and decide best course of action.I am looking at, no projects
> > should ever had to shade any artifacts except as an absolute necessary
> > alternative.
> >
> >
> > Thank you./Sree
> >
> >
> >
> >     On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
> > vinayakumarb@apache.org> wrote:
> >
> >  Hi,
> > Sorry for the late reply,.
> > >>> To be exact, how can we better use the thirdparty repo? Looking at
> > HBase as an example, it looks like everything that are known to break a
> lot
> > after an update get shaded into the hbase-thirdparty artifact: guava,
> > netty, ... etc.
> > Is it the purpose to isolate these naughty dependencies?
> > Yes, shading is to isolate these naughty dependencies from downstream
> > classpath and have independent control on these upgrades without breaking
> > downstreams.
> >
> > First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create
> the
> > protobuf shaded jar is ready to merge.
> >
> > Please take a look if anyone interested, will be merged may be after two
> > days if no objections.
> >
> > -Vinay
> >
> >
> > On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org>
> > wrote:
> >
> > > Hi I am late to this but I am keen to understand more.
> > >
> > > To be exact, how can we better use the thirdparty repo? Looking at
> HBase
> > > as an example, it looks like everything that are known to break a lot
> > after
> > > an update get shaded into the hbase-thirdparty artifact: guava, netty,
> > ...
> > > etc.
> > > Is it the purpose to isolate these naughty dependencies?
> > >
> > > On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vinayakumarb@apache.org
> >
> > > wrote:
> > >
> > >> Hi All,
> > >>
> > >> I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
> > >> 's suggestions.
> > >>
> > >>    i. Renamed the module to 'hadoop-shaded-protobuf37'
> > >>    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
> > >>
> > >> Please review!!
> > >>
> > >> Thanks,
> > >> -Vinay
> > >>
> > >>
> > >> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <palomino219@gmail.com
> >
> > >> wrote:
> > >>
> > >> > For HBase we have a separated repo for hbase-thirdparty
> > >> >
> > >> > https://github.com/apache/hbase-thirdparty
> > >> >
> > >> > We will publish the artifacts to nexus so we do not need to include
> > >> > binaries in our git repo, just add a dependency in the pom.
> > >> >
> > >> >
> > >> >
> > >>
> >
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> > >> >
> > >> >
> > >> > And it has its own release cycles, only when there are special
> > >> requirements
> > >> > or we want to upgrade some of the dependencies. This is the vote
> > thread
> > >> for
> > >> > the newest release, where we want to provide a shaded gson for jdk7.
> > >> >
> > >> >
> > >> >
> > >>
> >
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> > >> >
> > >> >
> > >> > Thanks.
> > >> >
> > >> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
> > >> >
> > >> > > Please find replies inline.
> > >> > >
> > >> > > -Vinay
> > >> > >
> > >> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
> > >> owen.omalley@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > I'm very unhappy with this direction. In particular, I don't
> think
> > >> git
> > >> > is
> > >> > > > a good place for distribution of binary artifacts. Furthermore,
> > the
> > >> PMC
> > >> > > > shouldn't be releasing anything without a release vote.
> > >> > > >
> > >> > > >
> > >> > > Proposed solution doesnt release any binaries in git. Its
> actually a
> > >> > > complete sub-project which follows entire release process,
> including
> > >> VOTE
> > >> > > in public. I have mentioned already that release process is
> similar
> > to
> > >> > > hadoop.
> > >> > > To be specific, using the (almost) same script used in hadoop to
> > >> generate
> > >> > > artifacts, sign and deploy to staging repository. Please let me
> know
> > >> If I
> > >> > > am conveying anything wrong.
> > >> > >
> > >> > >
> > >> > > > I'd propose that we make a third party module that contains the
> > >> > *source*
> > >> > > > of the pom files to build the relocated jars. This should
> > >> absolutely be
> > >> > > > treated as a last resort for the mostly Google projects that
> > >> regularly
> > >> > > > break binary compatibility (eg. Protobuf & Guava).
> > >> > > >
> > >> > > >
> > >> > > Same has been implemented in the PR
> > >> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check
> > and
> > >> let
> > >> > > me
> > >> > > know If I misunderstood. Yes, this is the last option we have
> AFAIK.
> > >> > >
> > >> > >
> > >> > > > In terms of naming, I'd propose something like:
> > >> > > >
> > >> > > > org.apache.hadoop.thirdparty.protobuf2_5
> > >> > > > org.apache.hadoop.thirdparty.guava28
> > >> > > >
> > >> > > > In particular, I think we absolutely need to include the version
> > of
> > >> the
> > >> > > > underlying project. On the other hand, since we should not be
> > >> shading
> > >> > > > *everything* we can drop the leading com.google.
> > >> > > >
> > >> > > >
> > >> > > IMO, This naming convention is easy for identifying the underlying
> > >> > project,
> > >> > > but  it will be difficult to maintain going forward if underlying
> > >> project
> > >> > > versions changes. Since thirdparty module have its own releases,
> > each
> > >> of
> > >> > > those release can be mapped to specific version of underlying
> > project.
> > >> > Even
> > >> > > the binary artifact can include a MANIFEST with underlying project
> > >> > details
> > >> > > as per Steve's suggestion on HADOOP-13363.
> > >> > > That said, if you still prefer to have project number in artifact
> > id,
> > >> it
> > >> > > can be done.
> > >> > >
> > >> > > The Hadoop project can make releases of  the thirdparty module:
> > >> > > >
> > >> > > > <dependency>
> > >> > > >  <groupId>org.apache.hadoop</groupId>
> > >> > > >  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> > >> > > >  <version>1.0</version>
> > >> > > > </dependency>
> > >> > > >
> > >> > > >
> > >> > > Note that the version has to be the hadoop thirdparty release
> > number,
> > >> > which
> > >> > > > is part of why you need to have the underlying version in the
> > >> artifact
> > >> > > > name. These we can push to maven central as new releases from
> > >> Hadoop.
> > >> > > >
> > >> > > >
> > >> > > Exactly, same has been implemented in the PR. hadoop-thirdparty
> > module
> > >> > have
> > >> > > its own releases. But in HADOOP Jira, thirdparty versions can be
> > >> > > differentiated using prefix "thirdparty-".
> > >> > >
> > >> > > Same solution is being followed in HBase. May be people involved
> in
> > >> HBase
> > >> > > can add some points here.
> > >> > >
> > >> > > Thoughts?
> > >> > > >
> > >> > > > .. Owen
> > >> > > >
> > >> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> > >> vinayakumarb@apache.org
> > >> > >
> > >> > > > wrote:
> > >> > > >
> > >> > > >> Hi All,
> > >> > > >>
> > >> > > >>    I wanted to discuss about the separate repo for thirdparty
> > >> > > dependencies
> > >> > > >> which we need to shaded and include in Hadoop component's jars.
> > >> > > >>
> > >> > > >>    Apologies for the big text ahead, but this needs clear
> > >> > explanation!!
> > >> > > >>
> > >> > > >>    Right now most needed such dependency is protobuf. Protobuf
> > >> > > dependency
> > >> > > >> was not upgraded from 2.5.0 onwards with the fear that
> downstream
> > >> > > builds,
> > >> > > >> which depends on transitive dependency protobuf coming from
> > >> hadoop's
> > >> > > jars,
> > >> > > >> may fail with the upgrade. Apparently protobuf does not
> guarantee
> > >> > source
> > >> > > >> compatibility, though it guarantees wire compatibility between
> > >> > versions.
> > >> > > >> Because of this behavior, version upgrade may cause breakage in
> > >> known
> > >> > > and
> > >> > > >> unknown (private?) downstreams.
> > >> > > >>
> > >> > > >>    So to tackle this, we came up the following proposal in
> > >> > HADOOP-13363.
> > >> > > >>
> > >> > > >>    Luckily, As far as I know, no APIs, either public to user or
> > >> > between
> > >> > > >> Hadoop processes, is not directly using protobuf classes in
> > >> > signatures.
> > >> > > >> (If
> > >> > > >> any exist, please let us know).
> > >> > > >>
> > >> > > >>    Proposal:
> > >> > > >>    ------------
> > >> > > >>
> > >> > > >>    1. Create a artifact(s) which contains shaded dependencies.
> > All
> > >> > such
> > >> > > >> shading/relocation will be with known prefix
> > >> > > >> **org.apache.hadoop.thirdparty.**.
> > >> > > >>    2. Right now protobuf jar (ex:
> > >> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
> > >> > > >> to start with, all **com.google.protobuf** classes will be
> > >> relocated
> > >> > as
> > >> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> > >> > > >>    3. Hadoop modules, which needs protobuf as dependency, will
> > add
> > >> > this
> > >> > > >> shaded artifact as dependency (ex:
> > >> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> > >> > > >>    4. All previous usages of "com.google.protobuf" will be
> > >> relocated
> > >> > to
> > >> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code
> > and
> > >> > will
> > >> > > be
> > >> > > >> committed. Please note, this replacement is One-Time directly
> in
> > >> > source
> > >> > > >> code, NOT during compile and package.
> > >> > > >>    5. Once all usages of "com.google.protobuf" is relocated,
> then
> > >> > hadoop
> > >> > > >> dont care about which version of original  "protobuf-java" is
> in
> > >> > > >> dependency.
> > >> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
> > >> break
> > >> > > the
> > >> > > >> downstreams. But hadoop will be originally using the latest
> > >> protobuf
> > >> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> > >> > > >>
> > >> > > >>    7. Coming back to separate repo, Following are most
> > appropriate
> > >> > > reasons
> > >> > > >> of keeping shaded dependency artifact in separate repo instead
> of
> > >> > > >> submodule.
> > >> > > >>
> > >> > > >>      7a. These artifacts need not be built all the time. It
> needs
> > >> to
> > >> > be
> > >> > > >> built only when there is a change in the dependency version or
> > the
> > >> > build
> > >> > > >> process.
> > >> > > >>      7b. If added as "submodule in Hadoop repo",
> > >> > > maven-shade-plugin:shade
> > >> > > >> will execute only in package phase. That means, "mvn compile"
> or
> > >> "mvn
> > >> > > >> test-compile" will not be failed as this artifact will not have
> > >> > > relocated
> > >> > > >> classes, instead it will have original classes, resulting in
> > >> > compilation
> > >> > > >> failure. Workaround, build thirdparty submodule first and
> exclude
> > >> > > >> "thirdparty" submodule in other executions. This will be a
> > complex
> > >> > > process
> > >> > > >> compared to keeping in a separate repo.
> > >> > > >>
> > >> > > >>      7c. Separate repo, will be a subproject of Hadoop, using
> the
> > >> > same
> > >> > > >> HADOOP jira project, with different versioning prefixed with
> > >> > > "thirdparty-"
> > >> > > >> (ex: thirdparty-1.0.0).
> > >> > > >>      7d. Separate will have same release process as Hadoop.
> > >> > > >>
> > >> > > >>    HADOOP-13363 (
> > >> https://issues.apache.org/jira/browse/HADOOP-13363)
> > >> > > is
> > >> > > >> an
> > >> > > >> umbrella jira tracking the changes to protobuf upgrade.
> > >> > > >>
> > >> > > >>    PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
> > >> been
> > >> > > >> raised
> > >> > > >> for separate repo creation in (HADOOP-16595 (
> > >> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> > >> > > >>
> > >> > > >>    Please provide your inputs for the proposal and review the
> PR
> > >> to
> > >> > > >> proceed with the proposal.
> > >> > > >>
> > >> > > >>
> > >> > > >    -Thanks,
> > >> > > >>    Vinay
> > >> > > >>
> > >> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> > >> > > >> vinodkv@apache.org>
> > >> > > >> wrote:
> > >> > > >>
> > >> > > >> > Moving the thread to the dev lists.
> > >> > > >> >
> > >> > > >> > Thanks
> > >> > > >> > +Vinod
> > >> > > >> >
> > >> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> > >> > > vinayakumarb@apache.org>
> > >> > > >> > wrote:
> > >> > > >> > >
> > >> > > >> > > Thanks Marton,
> > >> > > >> > >
> > >> > > >> > > Current created 'hadoop-thirdparty' repo is empty right
> now.
> > >> > > >> > > Whether to use that repo  for shaded artifact or not will
> be
> > >> > > >> monitored in
> > >> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> > >> > discussion.
> > >> > > >> > >
> > >> > > >> > > There is no existing codebase is being moved out of hadoop
> > >> repo.
> > >> > So
> > >> > > I
> > >> > > >> > think
> > >> > > >> > > right now we are good to go.
> > >> > > >> > >
> > >> > > >> > > -Vinay
> > >> > > >> > >
> > >> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
> > elek@apache.org>
> > >> > > wrote:
> > >> > > >> > >
> > >> > > >> > >>
> > >> > > >> > >> I am not sure if it's defined when is a vote required.
> > >> > > >> > >>
> > >> > > >> > >> https://www.apache.org/foundation/voting.html
> > >> > > >> > >>
> > >> > > >> > >> Personally I think it's a big enough change to send a
> > >> > notification
> > >> > > to
> > >> > > >> > the
> > >> > > >> > >> dev lists with a 'lazy consensus'  closure
> > >> > > >> > >>
> > >> > > >> > >> Marton
> > >> > > >> > >>
> > >> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
> > >> vinayakumarb@apache.org>
> > >> > > >> wrote:
> > >> > > >> > >>> Hi,
> > >> > > >> > >>>
> > >> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may
> be
> > >> more
> > >> > in
> > >> > > >> > >> future)
> > >> > > >> > >>> will be kept as a shaded artifact in a separate repo,
> which
> > >> will
> > >> > > be
> > >> > > >> > >>> referred as dependency in hadoop modules.  This approach
> > >> avoids
> > >> > > >> shading
> > >> > > >> > >> of
> > >> > > >> > >>> every submodule during build.
> > >> > > >> > >>>
> > >> > > >> > >>> So question is does any VOTE required before asking to
> > >> create a
> > >> > > git
> > >> > > >> > repo?
> > >> > > >> > >>>
> > >> > > >> > >>> On selfserve platform
> > >> > > https://gitbox.apache.org/setup/newrepo.html
> > >> > > >> > >>> I can access see that, requester should be PMC.
> > >> > > >> > >>>
> > >> > > >> > >>> Wanted to confirm here first.
> > >> > > >> > >>>
> > >> > > >> > >>> -Vinay
> > >> > > >> > >>>
> > >> > > >> > >>
> > >> > > >> > >>
> > >> > >
> > ---------------------------------------------------------------------
> > >> > > >> > >> To unsubscribe, e-mail:
> > private-unsubscribe@hadoop.apache.org
> > >> > > >> > >> For additional commands, e-mail:
> > >> private-help@hadoop.apache.org
> > >> > > >> > >>
> > >> > > >> > >>
> > >> > > >> >
> > >> > > >> >
> > >> > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Vinayakumar B <vi...@apache.org>.
Hi Sree,

> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
Project ? Or as a TLP ?
> Or as a new project definition ?
As already mentioned by Ayush, this will be a subproject of Hadoop.
Releases will be voted by Hadoop PMC as per ASF process.


> The effort to streamline and put in an accepted standard for the
dependencies that require shading,
> seems beyond the siloed efforts of hadoop, hbase, etc....

>I propose, we bring all the decision makers from all these artifacts in
one room and decide best course of action.
> I am looking at, no projects should ever had to shade any artifacts
except as an absolute necessary alternative.

This is the ideal proposal for any project. But unfortunately some projects
takes their own course based on need.

In the current case of protobuf in Hadoop,
    Protobuf upgrade from 2.5.0 (which is already EOL) was not taken up to
avoid downstream failures. Since Hadoop is a platform, its dependencies
will get added to downstream projects' classpath. So any change in Hadoop's
dependencies will directly affect downstreams. Hadoop strictly follows
backward compatibility as far as possible.
    Though protobuf provides wire compatibility b/w versions, it doesnt
provide compatibility for generated sources.
    Now, to support ARM protobuf upgrade is mandatory. Using shading
technique, In Hadoop internally can upgrade to shaded protobuf 3.x and
still have 2.5.0 protobuf (deprecated) for downstreams.

This shading is necessary to have both versions of protobuf supported.
(2.5.0 (non-shaded) for downstream's classpath and 3.x (shaded) for
hadoop's internal usage).
And this entire work to be done before 3.3.0 release.

So, though its ideal to make a common approach for all projects, I suggest
for Hadoop we can go ahead as per current approach.
We can also start the parallel effort to address these problems in a
separate discussion/proposal. Once the solution is available we can revisit
and adopt new solution accordingly in all such projects (ex: HBase, Hadoop,
Ratis).

-Vinay

On Mon, Jan 6, 2020 at 12:39 AM Ayush Saxena <ay...@gmail.com> wrote:

> Hey Sree
>
> > apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> > Project ? Or as a TLP ?
> > Or as a new project definition ?
> >
> A sub project of Apache Hadoop, having its own independent release cycles.
> May be you can put this into the same column as ozone or as
> submarine(couple of months ago).
>
> Unifying for all, seems interesting but each project is independent and has
> its own limitations and way of thinking, I don't think it would be an easy
> task to bring all on the same table and get them agree to a common stuff.
>
> I guess this has been into discussion since quite long, and there hasn't
> been any other alternative suggested. Still we can hold up for a week, if
> someone comes up with a better solution, else we can continue in the
> present direction.
>
> -Ayush
>
>
>
> On Sun, 5 Jan 2020 at 05:03, Sree Vaddi <sr...@yahoo.com.invalid>
> wrote:
>
> > apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> > Project ? Or as a TLP ?
> > Or as a new project definition ?
> >
> > The effort to streamline and put in an accepted standard for the
> > dependencies that require shading,seems beyond the siloed efforts of
> > hadoop, hbase, etc....
> >
> > I propose, we bring all the decision makers from all these artifacts in
> > one room and decide best course of action.I am looking at, no projects
> > should ever had to shade any artifacts except as an absolute necessary
> > alternative.
> >
> >
> > Thank you./Sree
> >
> >
> >
> >     On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
> > vinayakumarb@apache.org> wrote:
> >
> >  Hi,
> > Sorry for the late reply,.
> > >>> To be exact, how can we better use the thirdparty repo? Looking at
> > HBase as an example, it looks like everything that are known to break a
> lot
> > after an update get shaded into the hbase-thirdparty artifact: guava,
> > netty, ... etc.
> > Is it the purpose to isolate these naughty dependencies?
> > Yes, shading is to isolate these naughty dependencies from downstream
> > classpath and have independent control on these upgrades without breaking
> > downstreams.
> >
> > First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create
> the
> > protobuf shaded jar is ready to merge.
> >
> > Please take a look if anyone interested, will be merged may be after two
> > days if no objections.
> >
> > -Vinay
> >
> >
> > On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org>
> > wrote:
> >
> > > Hi I am late to this but I am keen to understand more.
> > >
> > > To be exact, how can we better use the thirdparty repo? Looking at
> HBase
> > > as an example, it looks like everything that are known to break a lot
> > after
> > > an update get shaded into the hbase-thirdparty artifact: guava, netty,
> > ...
> > > etc.
> > > Is it the purpose to isolate these naughty dependencies?
> > >
> > > On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vinayakumarb@apache.org
> >
> > > wrote:
> > >
> > >> Hi All,
> > >>
> > >> I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
> > >> 's suggestions.
> > >>
> > >>    i. Renamed the module to 'hadoop-shaded-protobuf37'
> > >>    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
> > >>
> > >> Please review!!
> > >>
> > >> Thanks,
> > >> -Vinay
> > >>
> > >>
> > >> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <palomino219@gmail.com
> >
> > >> wrote:
> > >>
> > >> > For HBase we have a separated repo for hbase-thirdparty
> > >> >
> > >> > https://github.com/apache/hbase-thirdparty
> > >> >
> > >> > We will publish the artifacts to nexus so we do not need to include
> > >> > binaries in our git repo, just add a dependency in the pom.
> > >> >
> > >> >
> > >> >
> > >>
> >
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> > >> >
> > >> >
> > >> > And it has its own release cycles, only when there are special
> > >> requirements
> > >> > or we want to upgrade some of the dependencies. This is the vote
> > thread
> > >> for
> > >> > the newest release, where we want to provide a shaded gson for jdk7.
> > >> >
> > >> >
> > >> >
> > >>
> >
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> > >> >
> > >> >
> > >> > Thanks.
> > >> >
> > >> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
> > >> >
> > >> > > Please find replies inline.
> > >> > >
> > >> > > -Vinay
> > >> > >
> > >> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
> > >> owen.omalley@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > I'm very unhappy with this direction. In particular, I don't
> think
> > >> git
> > >> > is
> > >> > > > a good place for distribution of binary artifacts. Furthermore,
> > the
> > >> PMC
> > >> > > > shouldn't be releasing anything without a release vote.
> > >> > > >
> > >> > > >
> > >> > > Proposed solution doesnt release any binaries in git. Its
> actually a
> > >> > > complete sub-project which follows entire release process,
> including
> > >> VOTE
> > >> > > in public. I have mentioned already that release process is
> similar
> > to
> > >> > > hadoop.
> > >> > > To be specific, using the (almost) same script used in hadoop to
> > >> generate
> > >> > > artifacts, sign and deploy to staging repository. Please let me
> know
> > >> If I
> > >> > > am conveying anything wrong.
> > >> > >
> > >> > >
> > >> > > > I'd propose that we make a third party module that contains the
> > >> > *source*
> > >> > > > of the pom files to build the relocated jars. This should
> > >> absolutely be
> > >> > > > treated as a last resort for the mostly Google projects that
> > >> regularly
> > >> > > > break binary compatibility (eg. Protobuf & Guava).
> > >> > > >
> > >> > > >
> > >> > > Same has been implemented in the PR
> > >> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check
> > and
> > >> let
> > >> > > me
> > >> > > know If I misunderstood. Yes, this is the last option we have
> AFAIK.
> > >> > >
> > >> > >
> > >> > > > In terms of naming, I'd propose something like:
> > >> > > >
> > >> > > > org.apache.hadoop.thirdparty.protobuf2_5
> > >> > > > org.apache.hadoop.thirdparty.guava28
> > >> > > >
> > >> > > > In particular, I think we absolutely need to include the version
> > of
> > >> the
> > >> > > > underlying project. On the other hand, since we should not be
> > >> shading
> > >> > > > *everything* we can drop the leading com.google.
> > >> > > >
> > >> > > >
> > >> > > IMO, This naming convention is easy for identifying the underlying
> > >> > project,
> > >> > > but  it will be difficult to maintain going forward if underlying
> > >> project
> > >> > > versions changes. Since thirdparty module have its own releases,
> > each
> > >> of
> > >> > > those release can be mapped to specific version of underlying
> > project.
> > >> > Even
> > >> > > the binary artifact can include a MANIFEST with underlying project
> > >> > details
> > >> > > as per Steve's suggestion on HADOOP-13363.
> > >> > > That said, if you still prefer to have project number in artifact
> > id,
> > >> it
> > >> > > can be done.
> > >> > >
> > >> > > The Hadoop project can make releases of  the thirdparty module:
> > >> > > >
> > >> > > > <dependency>
> > >> > > >  <groupId>org.apache.hadoop</groupId>
> > >> > > >  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> > >> > > >  <version>1.0</version>
> > >> > > > </dependency>
> > >> > > >
> > >> > > >
> > >> > > Note that the version has to be the hadoop thirdparty release
> > number,
> > >> > which
> > >> > > > is part of why you need to have the underlying version in the
> > >> artifact
> > >> > > > name. These we can push to maven central as new releases from
> > >> Hadoop.
> > >> > > >
> > >> > > >
> > >> > > Exactly, same has been implemented in the PR. hadoop-thirdparty
> > module
> > >> > have
> > >> > > its own releases. But in HADOOP Jira, thirdparty versions can be
> > >> > > differentiated using prefix "thirdparty-".
> > >> > >
> > >> > > Same solution is being followed in HBase. May be people involved
> in
> > >> HBase
> > >> > > can add some points here.
> > >> > >
> > >> > > Thoughts?
> > >> > > >
> > >> > > > .. Owen
> > >> > > >
> > >> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> > >> vinayakumarb@apache.org
> > >> > >
> > >> > > > wrote:
> > >> > > >
> > >> > > >> Hi All,
> > >> > > >>
> > >> > > >>    I wanted to discuss about the separate repo for thirdparty
> > >> > > dependencies
> > >> > > >> which we need to shaded and include in Hadoop component's jars.
> > >> > > >>
> > >> > > >>    Apologies for the big text ahead, but this needs clear
> > >> > explanation!!
> > >> > > >>
> > >> > > >>    Right now most needed such dependency is protobuf. Protobuf
> > >> > > dependency
> > >> > > >> was not upgraded from 2.5.0 onwards with the fear that
> downstream
> > >> > > builds,
> > >> > > >> which depends on transitive dependency protobuf coming from
> > >> hadoop's
> > >> > > jars,
> > >> > > >> may fail with the upgrade. Apparently protobuf does not
> guarantee
> > >> > source
> > >> > > >> compatibility, though it guarantees wire compatibility between
> > >> > versions.
> > >> > > >> Because of this behavior, version upgrade may cause breakage in
> > >> known
> > >> > > and
> > >> > > >> unknown (private?) downstreams.
> > >> > > >>
> > >> > > >>    So to tackle this, we came up the following proposal in
> > >> > HADOOP-13363.
> > >> > > >>
> > >> > > >>    Luckily, As far as I know, no APIs, either public to user or
> > >> > between
> > >> > > >> Hadoop processes, is not directly using protobuf classes in
> > >> > signatures.
> > >> > > >> (If
> > >> > > >> any exist, please let us know).
> > >> > > >>
> > >> > > >>    Proposal:
> > >> > > >>    ------------
> > >> > > >>
> > >> > > >>    1. Create a artifact(s) which contains shaded dependencies.
> > All
> > >> > such
> > >> > > >> shading/relocation will be with known prefix
> > >> > > >> **org.apache.hadoop.thirdparty.**.
> > >> > > >>    2. Right now protobuf jar (ex:
> > >> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
> > >> > > >> to start with, all **com.google.protobuf** classes will be
> > >> relocated
> > >> > as
> > >> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> > >> > > >>    3. Hadoop modules, which needs protobuf as dependency, will
> > add
> > >> > this
> > >> > > >> shaded artifact as dependency (ex:
> > >> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> > >> > > >>    4. All previous usages of "com.google.protobuf" will be
> > >> relocated
> > >> > to
> > >> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code
> > and
> > >> > will
> > >> > > be
> > >> > > >> committed. Please note, this replacement is One-Time directly
> in
> > >> > source
> > >> > > >> code, NOT during compile and package.
> > >> > > >>    5. Once all usages of "com.google.protobuf" is relocated,
> then
> > >> > hadoop
> > >> > > >> dont care about which version of original  "protobuf-java" is
> in
> > >> > > >> dependency.
> > >> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
> > >> break
> > >> > > the
> > >> > > >> downstreams. But hadoop will be originally using the latest
> > >> protobuf
> > >> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> > >> > > >>
> > >> > > >>    7. Coming back to separate repo, Following are most
> > appropriate
> > >> > > reasons
> > >> > > >> of keeping shaded dependency artifact in separate repo instead
> of
> > >> > > >> submodule.
> > >> > > >>
> > >> > > >>      7a. These artifacts need not be built all the time. It
> needs
> > >> to
> > >> > be
> > >> > > >> built only when there is a change in the dependency version or
> > the
> > >> > build
> > >> > > >> process.
> > >> > > >>      7b. If added as "submodule in Hadoop repo",
> > >> > > maven-shade-plugin:shade
> > >> > > >> will execute only in package phase. That means, "mvn compile"
> or
> > >> "mvn
> > >> > > >> test-compile" will not be failed as this artifact will not have
> > >> > > relocated
> > >> > > >> classes, instead it will have original classes, resulting in
> > >> > compilation
> > >> > > >> failure. Workaround, build thirdparty submodule first and
> exclude
> > >> > > >> "thirdparty" submodule in other executions. This will be a
> > complex
> > >> > > process
> > >> > > >> compared to keeping in a separate repo.
> > >> > > >>
> > >> > > >>      7c. Separate repo, will be a subproject of Hadoop, using
> the
> > >> > same
> > >> > > >> HADOOP jira project, with different versioning prefixed with
> > >> > > "thirdparty-"
> > >> > > >> (ex: thirdparty-1.0.0).
> > >> > > >>      7d. Separate will have same release process as Hadoop.
> > >> > > >>
> > >> > > >>    HADOOP-13363 (
> > >> https://issues.apache.org/jira/browse/HADOOP-13363)
> > >> > > is
> > >> > > >> an
> > >> > > >> umbrella jira tracking the changes to protobuf upgrade.
> > >> > > >>
> > >> > > >>    PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
> > >> been
> > >> > > >> raised
> > >> > > >> for separate repo creation in (HADOOP-16595 (
> > >> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> > >> > > >>
> > >> > > >>    Please provide your inputs for the proposal and review the
> PR
> > >> to
> > >> > > >> proceed with the proposal.
> > >> > > >>
> > >> > > >>
> > >> > > >    -Thanks,
> > >> > > >>    Vinay
> > >> > > >>
> > >> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> > >> > > >> vinodkv@apache.org>
> > >> > > >> wrote:
> > >> > > >>
> > >> > > >> > Moving the thread to the dev lists.
> > >> > > >> >
> > >> > > >> > Thanks
> > >> > > >> > +Vinod
> > >> > > >> >
> > >> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> > >> > > vinayakumarb@apache.org>
> > >> > > >> > wrote:
> > >> > > >> > >
> > >> > > >> > > Thanks Marton,
> > >> > > >> > >
> > >> > > >> > > Current created 'hadoop-thirdparty' repo is empty right
> now.
> > >> > > >> > > Whether to use that repo  for shaded artifact or not will
> be
> > >> > > >> monitored in
> > >> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> > >> > discussion.
> > >> > > >> > >
> > >> > > >> > > There is no existing codebase is being moved out of hadoop
> > >> repo.
> > >> > So
> > >> > > I
> > >> > > >> > think
> > >> > > >> > > right now we are good to go.
> > >> > > >> > >
> > >> > > >> > > -Vinay
> > >> > > >> > >
> > >> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
> > elek@apache.org>
> > >> > > wrote:
> > >> > > >> > >
> > >> > > >> > >>
> > >> > > >> > >> I am not sure if it's defined when is a vote required.
> > >> > > >> > >>
> > >> > > >> > >> https://www.apache.org/foundation/voting.html
> > >> > > >> > >>
> > >> > > >> > >> Personally I think it's a big enough change to send a
> > >> > notification
> > >> > > to
> > >> > > >> > the
> > >> > > >> > >> dev lists with a 'lazy consensus'  closure
> > >> > > >> > >>
> > >> > > >> > >> Marton
> > >> > > >> > >>
> > >> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
> > >> vinayakumarb@apache.org>
> > >> > > >> wrote:
> > >> > > >> > >>> Hi,
> > >> > > >> > >>>
> > >> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may
> be
> > >> more
> > >> > in
> > >> > > >> > >> future)
> > >> > > >> > >>> will be kept as a shaded artifact in a separate repo,
> which
> > >> will
> > >> > > be
> > >> > > >> > >>> referred as dependency in hadoop modules.  This approach
> > >> avoids
> > >> > > >> shading
> > >> > > >> > >> of
> > >> > > >> > >>> every submodule during build.
> > >> > > >> > >>>
> > >> > > >> > >>> So question is does any VOTE required before asking to
> > >> create a
> > >> > > git
> > >> > > >> > repo?
> > >> > > >> > >>>
> > >> > > >> > >>> On selfserve platform
> > >> > > https://gitbox.apache.org/setup/newrepo.html
> > >> > > >> > >>> I can access see that, requester should be PMC.
> > >> > > >> > >>>
> > >> > > >> > >>> Wanted to confirm here first.
> > >> > > >> > >>>
> > >> > > >> > >>> -Vinay
> > >> > > >> > >>>
> > >> > > >> > >>
> > >> > > >> > >>
> > >> > >
> > ---------------------------------------------------------------------
> > >> > > >> > >> To unsubscribe, e-mail:
> > private-unsubscribe@hadoop.apache.org
> > >> > > >> > >> For additional commands, e-mail:
> > >> private-help@hadoop.apache.org
> > >> > > >> > >>
> > >> > > >> > >>
> > >> > > >> >
> > >> > > >> >
> > >> > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Vinayakumar B <vi...@apache.org>.
Hi Sree,

> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
Project ? Or as a TLP ?
> Or as a new project definition ?
As already mentioned by Ayush, this will be a subproject of Hadoop.
Releases will be voted by Hadoop PMC as per ASF process.


> The effort to streamline and put in an accepted standard for the
dependencies that require shading,
> seems beyond the siloed efforts of hadoop, hbase, etc....

>I propose, we bring all the decision makers from all these artifacts in
one room and decide best course of action.
> I am looking at, no projects should ever had to shade any artifacts
except as an absolute necessary alternative.

This is the ideal proposal for any project. But unfortunately some projects
takes their own course based on need.

In the current case of protobuf in Hadoop,
    Protobuf upgrade from 2.5.0 (which is already EOL) was not taken up to
avoid downstream failures. Since Hadoop is a platform, its dependencies
will get added to downstream projects' classpath. So any change in Hadoop's
dependencies will directly affect downstreams. Hadoop strictly follows
backward compatibility as far as possible.
    Though protobuf provides wire compatibility b/w versions, it doesnt
provide compatibility for generated sources.
    Now, to support ARM protobuf upgrade is mandatory. Using shading
technique, In Hadoop internally can upgrade to shaded protobuf 3.x and
still have 2.5.0 protobuf (deprecated) for downstreams.

This shading is necessary to have both versions of protobuf supported.
(2.5.0 (non-shaded) for downstream's classpath and 3.x (shaded) for
hadoop's internal usage).
And this entire work to be done before 3.3.0 release.

So, though its ideal to make a common approach for all projects, I suggest
for Hadoop we can go ahead as per current approach.
We can also start the parallel effort to address these problems in a
separate discussion/proposal. Once the solution is available we can revisit
and adopt new solution accordingly in all such projects (ex: HBase, Hadoop,
Ratis).

-Vinay

On Mon, Jan 6, 2020 at 12:39 AM Ayush Saxena <ay...@gmail.com> wrote:

> Hey Sree
>
> > apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> > Project ? Or as a TLP ?
> > Or as a new project definition ?
> >
> A sub project of Apache Hadoop, having its own independent release cycles.
> May be you can put this into the same column as ozone or as
> submarine(couple of months ago).
>
> Unifying for all, seems interesting but each project is independent and has
> its own limitations and way of thinking, I don't think it would be an easy
> task to bring all on the same table and get them agree to a common stuff.
>
> I guess this has been into discussion since quite long, and there hasn't
> been any other alternative suggested. Still we can hold up for a week, if
> someone comes up with a better solution, else we can continue in the
> present direction.
>
> -Ayush
>
>
>
> On Sun, 5 Jan 2020 at 05:03, Sree Vaddi <sr...@yahoo.com.invalid>
> wrote:
>
> > apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> > Project ? Or as a TLP ?
> > Or as a new project definition ?
> >
> > The effort to streamline and put in an accepted standard for the
> > dependencies that require shading,seems beyond the siloed efforts of
> > hadoop, hbase, etc....
> >
> > I propose, we bring all the decision makers from all these artifacts in
> > one room and decide best course of action.I am looking at, no projects
> > should ever had to shade any artifacts except as an absolute necessary
> > alternative.
> >
> >
> > Thank you./Sree
> >
> >
> >
> >     On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
> > vinayakumarb@apache.org> wrote:
> >
> >  Hi,
> > Sorry for the late reply,.
> > >>> To be exact, how can we better use the thirdparty repo? Looking at
> > HBase as an example, it looks like everything that are known to break a
> lot
> > after an update get shaded into the hbase-thirdparty artifact: guava,
> > netty, ... etc.
> > Is it the purpose to isolate these naughty dependencies?
> > Yes, shading is to isolate these naughty dependencies from downstream
> > classpath and have independent control on these upgrades without breaking
> > downstreams.
> >
> > First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create
> the
> > protobuf shaded jar is ready to merge.
> >
> > Please take a look if anyone interested, will be merged may be after two
> > days if no objections.
> >
> > -Vinay
> >
> >
> > On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org>
> > wrote:
> >
> > > Hi I am late to this but I am keen to understand more.
> > >
> > > To be exact, how can we better use the thirdparty repo? Looking at
> HBase
> > > as an example, it looks like everything that are known to break a lot
> > after
> > > an update get shaded into the hbase-thirdparty artifact: guava, netty,
> > ...
> > > etc.
> > > Is it the purpose to isolate these naughty dependencies?
> > >
> > > On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vinayakumarb@apache.org
> >
> > > wrote:
> > >
> > >> Hi All,
> > >>
> > >> I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
> > >> 's suggestions.
> > >>
> > >>    i. Renamed the module to 'hadoop-shaded-protobuf37'
> > >>    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
> > >>
> > >> Please review!!
> > >>
> > >> Thanks,
> > >> -Vinay
> > >>
> > >>
> > >> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <palomino219@gmail.com
> >
> > >> wrote:
> > >>
> > >> > For HBase we have a separated repo for hbase-thirdparty
> > >> >
> > >> > https://github.com/apache/hbase-thirdparty
> > >> >
> > >> > We will publish the artifacts to nexus so we do not need to include
> > >> > binaries in our git repo, just add a dependency in the pom.
> > >> >
> > >> >
> > >> >
> > >>
> >
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> > >> >
> > >> >
> > >> > And it has its own release cycles, only when there are special
> > >> requirements
> > >> > or we want to upgrade some of the dependencies. This is the vote
> > thread
> > >> for
> > >> > the newest release, where we want to provide a shaded gson for jdk7.
> > >> >
> > >> >
> > >> >
> > >>
> >
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> > >> >
> > >> >
> > >> > Thanks.
> > >> >
> > >> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
> > >> >
> > >> > > Please find replies inline.
> > >> > >
> > >> > > -Vinay
> > >> > >
> > >> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
> > >> owen.omalley@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > I'm very unhappy with this direction. In particular, I don't
> think
> > >> git
> > >> > is
> > >> > > > a good place for distribution of binary artifacts. Furthermore,
> > the
> > >> PMC
> > >> > > > shouldn't be releasing anything without a release vote.
> > >> > > >
> > >> > > >
> > >> > > Proposed solution doesnt release any binaries in git. Its
> actually a
> > >> > > complete sub-project which follows entire release process,
> including
> > >> VOTE
> > >> > > in public. I have mentioned already that release process is
> similar
> > to
> > >> > > hadoop.
> > >> > > To be specific, using the (almost) same script used in hadoop to
> > >> generate
> > >> > > artifacts, sign and deploy to staging repository. Please let me
> know
> > >> If I
> > >> > > am conveying anything wrong.
> > >> > >
> > >> > >
> > >> > > > I'd propose that we make a third party module that contains the
> > >> > *source*
> > >> > > > of the pom files to build the relocated jars. This should
> > >> absolutely be
> > >> > > > treated as a last resort for the mostly Google projects that
> > >> regularly
> > >> > > > break binary compatibility (eg. Protobuf & Guava).
> > >> > > >
> > >> > > >
> > >> > > Same has been implemented in the PR
> > >> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check
> > and
> > >> let
> > >> > > me
> > >> > > know If I misunderstood. Yes, this is the last option we have
> AFAIK.
> > >> > >
> > >> > >
> > >> > > > In terms of naming, I'd propose something like:
> > >> > > >
> > >> > > > org.apache.hadoop.thirdparty.protobuf2_5
> > >> > > > org.apache.hadoop.thirdparty.guava28
> > >> > > >
> > >> > > > In particular, I think we absolutely need to include the version
> > of
> > >> the
> > >> > > > underlying project. On the other hand, since we should not be
> > >> shading
> > >> > > > *everything* we can drop the leading com.google.
> > >> > > >
> > >> > > >
> > >> > > IMO, This naming convention is easy for identifying the underlying
> > >> > project,
> > >> > > but  it will be difficult to maintain going forward if underlying
> > >> project
> > >> > > versions changes. Since thirdparty module have its own releases,
> > each
> > >> of
> > >> > > those release can be mapped to specific version of underlying
> > project.
> > >> > Even
> > >> > > the binary artifact can include a MANIFEST with underlying project
> > >> > details
> > >> > > as per Steve's suggestion on HADOOP-13363.
> > >> > > That said, if you still prefer to have project number in artifact
> > id,
> > >> it
> > >> > > can be done.
> > >> > >
> > >> > > The Hadoop project can make releases of  the thirdparty module:
> > >> > > >
> > >> > > > <dependency>
> > >> > > >  <groupId>org.apache.hadoop</groupId>
> > >> > > >  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> > >> > > >  <version>1.0</version>
> > >> > > > </dependency>
> > >> > > >
> > >> > > >
> > >> > > Note that the version has to be the hadoop thirdparty release
> > number,
> > >> > which
> > >> > > > is part of why you need to have the underlying version in the
> > >> artifact
> > >> > > > name. These we can push to maven central as new releases from
> > >> Hadoop.
> > >> > > >
> > >> > > >
> > >> > > Exactly, same has been implemented in the PR. hadoop-thirdparty
> > module
> > >> > have
> > >> > > its own releases. But in HADOOP Jira, thirdparty versions can be
> > >> > > differentiated using prefix "thirdparty-".
> > >> > >
> > >> > > Same solution is being followed in HBase. May be people involved
> in
> > >> HBase
> > >> > > can add some points here.
> > >> > >
> > >> > > Thoughts?
> > >> > > >
> > >> > > > .. Owen
> > >> > > >
> > >> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> > >> vinayakumarb@apache.org
> > >> > >
> > >> > > > wrote:
> > >> > > >
> > >> > > >> Hi All,
> > >> > > >>
> > >> > > >>    I wanted to discuss about the separate repo for thirdparty
> > >> > > dependencies
> > >> > > >> which we need to shaded and include in Hadoop component's jars.
> > >> > > >>
> > >> > > >>    Apologies for the big text ahead, but this needs clear
> > >> > explanation!!
> > >> > > >>
> > >> > > >>    Right now most needed such dependency is protobuf. Protobuf
> > >> > > dependency
> > >> > > >> was not upgraded from 2.5.0 onwards with the fear that
> downstream
> > >> > > builds,
> > >> > > >> which depends on transitive dependency protobuf coming from
> > >> hadoop's
> > >> > > jars,
> > >> > > >> may fail with the upgrade. Apparently protobuf does not
> guarantee
> > >> > source
> > >> > > >> compatibility, though it guarantees wire compatibility between
> > >> > versions.
> > >> > > >> Because of this behavior, version upgrade may cause breakage in
> > >> known
> > >> > > and
> > >> > > >> unknown (private?) downstreams.
> > >> > > >>
> > >> > > >>    So to tackle this, we came up the following proposal in
> > >> > HADOOP-13363.
> > >> > > >>
> > >> > > >>    Luckily, As far as I know, no APIs, either public to user or
> > >> > between
> > >> > > >> Hadoop processes, is not directly using protobuf classes in
> > >> > signatures.
> > >> > > >> (If
> > >> > > >> any exist, please let us know).
> > >> > > >>
> > >> > > >>    Proposal:
> > >> > > >>    ------------
> > >> > > >>
> > >> > > >>    1. Create a artifact(s) which contains shaded dependencies.
> > All
> > >> > such
> > >> > > >> shading/relocation will be with known prefix
> > >> > > >> **org.apache.hadoop.thirdparty.**.
> > >> > > >>    2. Right now protobuf jar (ex:
> > >> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
> > >> > > >> to start with, all **com.google.protobuf** classes will be
> > >> relocated
> > >> > as
> > >> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> > >> > > >>    3. Hadoop modules, which needs protobuf as dependency, will
> > add
> > >> > this
> > >> > > >> shaded artifact as dependency (ex:
> > >> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> > >> > > >>    4. All previous usages of "com.google.protobuf" will be
> > >> relocated
> > >> > to
> > >> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code
> > and
> > >> > will
> > >> > > be
> > >> > > >> committed. Please note, this replacement is One-Time directly
> in
> > >> > source
> > >> > > >> code, NOT during compile and package.
> > >> > > >>    5. Once all usages of "com.google.protobuf" is relocated,
> then
> > >> > hadoop
> > >> > > >> dont care about which version of original  "protobuf-java" is
> in
> > >> > > >> dependency.
> > >> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
> > >> break
> > >> > > the
> > >> > > >> downstreams. But hadoop will be originally using the latest
> > >> protobuf
> > >> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> > >> > > >>
> > >> > > >>    7. Coming back to separate repo, Following are most
> > appropriate
> > >> > > reasons
> > >> > > >> of keeping shaded dependency artifact in separate repo instead
> of
> > >> > > >> submodule.
> > >> > > >>
> > >> > > >>      7a. These artifacts need not be built all the time. It
> needs
> > >> to
> > >> > be
> > >> > > >> built only when there is a change in the dependency version or
> > the
> > >> > build
> > >> > > >> process.
> > >> > > >>      7b. If added as "submodule in Hadoop repo",
> > >> > > maven-shade-plugin:shade
> > >> > > >> will execute only in package phase. That means, "mvn compile"
> or
> > >> "mvn
> > >> > > >> test-compile" will not be failed as this artifact will not have
> > >> > > relocated
> > >> > > >> classes, instead it will have original classes, resulting in
> > >> > compilation
> > >> > > >> failure. Workaround, build thirdparty submodule first and
> exclude
> > >> > > >> "thirdparty" submodule in other executions. This will be a
> > complex
> > >> > > process
> > >> > > >> compared to keeping in a separate repo.
> > >> > > >>
> > >> > > >>      7c. Separate repo, will be a subproject of Hadoop, using
> the
> > >> > same
> > >> > > >> HADOOP jira project, with different versioning prefixed with
> > >> > > "thirdparty-"
> > >> > > >> (ex: thirdparty-1.0.0).
> > >> > > >>      7d. Separate will have same release process as Hadoop.
> > >> > > >>
> > >> > > >>    HADOOP-13363 (
> > >> https://issues.apache.org/jira/browse/HADOOP-13363)
> > >> > > is
> > >> > > >> an
> > >> > > >> umbrella jira tracking the changes to protobuf upgrade.
> > >> > > >>
> > >> > > >>    PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
> > >> been
> > >> > > >> raised
> > >> > > >> for separate repo creation in (HADOOP-16595 (
> > >> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> > >> > > >>
> > >> > > >>    Please provide your inputs for the proposal and review the
> PR
> > >> to
> > >> > > >> proceed with the proposal.
> > >> > > >>
> > >> > > >>
> > >> > > >    -Thanks,
> > >> > > >>    Vinay
> > >> > > >>
> > >> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> > >> > > >> vinodkv@apache.org>
> > >> > > >> wrote:
> > >> > > >>
> > >> > > >> > Moving the thread to the dev lists.
> > >> > > >> >
> > >> > > >> > Thanks
> > >> > > >> > +Vinod
> > >> > > >> >
> > >> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> > >> > > vinayakumarb@apache.org>
> > >> > > >> > wrote:
> > >> > > >> > >
> > >> > > >> > > Thanks Marton,
> > >> > > >> > >
> > >> > > >> > > Current created 'hadoop-thirdparty' repo is empty right
> now.
> > >> > > >> > > Whether to use that repo  for shaded artifact or not will
> be
> > >> > > >> monitored in
> > >> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> > >> > discussion.
> > >> > > >> > >
> > >> > > >> > > There is no existing codebase is being moved out of hadoop
> > >> repo.
> > >> > So
> > >> > > I
> > >> > > >> > think
> > >> > > >> > > right now we are good to go.
> > >> > > >> > >
> > >> > > >> > > -Vinay
> > >> > > >> > >
> > >> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
> > elek@apache.org>
> > >> > > wrote:
> > >> > > >> > >
> > >> > > >> > >>
> > >> > > >> > >> I am not sure if it's defined when is a vote required.
> > >> > > >> > >>
> > >> > > >> > >> https://www.apache.org/foundation/voting.html
> > >> > > >> > >>
> > >> > > >> > >> Personally I think it's a big enough change to send a
> > >> > notification
> > >> > > to
> > >> > > >> > the
> > >> > > >> > >> dev lists with a 'lazy consensus'  closure
> > >> > > >> > >>
> > >> > > >> > >> Marton
> > >> > > >> > >>
> > >> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
> > >> vinayakumarb@apache.org>
> > >> > > >> wrote:
> > >> > > >> > >>> Hi,
> > >> > > >> > >>>
> > >> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may
> be
> > >> more
> > >> > in
> > >> > > >> > >> future)
> > >> > > >> > >>> will be kept as a shaded artifact in a separate repo,
> which
> > >> will
> > >> > > be
> > >> > > >> > >>> referred as dependency in hadoop modules.  This approach
> > >> avoids
> > >> > > >> shading
> > >> > > >> > >> of
> > >> > > >> > >>> every submodule during build.
> > >> > > >> > >>>
> > >> > > >> > >>> So question is does any VOTE required before asking to
> > >> create a
> > >> > > git
> > >> > > >> > repo?
> > >> > > >> > >>>
> > >> > > >> > >>> On selfserve platform
> > >> > > https://gitbox.apache.org/setup/newrepo.html
> > >> > > >> > >>> I can access see that, requester should be PMC.
> > >> > > >> > >>>
> > >> > > >> > >>> Wanted to confirm here first.
> > >> > > >> > >>>
> > >> > > >> > >>> -Vinay
> > >> > > >> > >>>
> > >> > > >> > >>
> > >> > > >> > >>
> > >> > >
> > ---------------------------------------------------------------------
> > >> > > >> > >> To unsubscribe, e-mail:
> > private-unsubscribe@hadoop.apache.org
> > >> > > >> > >> For additional commands, e-mail:
> > >> private-help@hadoop.apache.org
> > >> > > >> > >>
> > >> > > >> > >>
> > >> > > >> >
> > >> > > >> >
> > >> > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Ayush Saxena <ay...@gmail.com>.
Hey Sree

> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> Project ? Or as a TLP ?
> Or as a new project definition ?
>
A sub project of Apache Hadoop, having its own independent release cycles.
May be you can put this into the same column as ozone or as
submarine(couple of months ago).

Unifying for all, seems interesting but each project is independent and has
its own limitations and way of thinking, I don't think it would be an easy
task to bring all on the same table and get them agree to a common stuff.

I guess this has been into discussion since quite long, and there hasn't
been any other alternative suggested. Still we can hold up for a week, if
someone comes up with a better solution, else we can continue in the
present direction.

-Ayush



On Sun, 5 Jan 2020 at 05:03, Sree Vaddi <sr...@yahoo.com.invalid>
wrote:

> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> Project ? Or as a TLP ?
> Or as a new project definition ?
>
> The effort to streamline and put in an accepted standard for the
> dependencies that require shading,seems beyond the siloed efforts of
> hadoop, hbase, etc....
>
> I propose, we bring all the decision makers from all these artifacts in
> one room and decide best course of action.I am looking at, no projects
> should ever had to shade any artifacts except as an absolute necessary
> alternative.
>
>
> Thank you./Sree
>
>
>
>     On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
> vinayakumarb@apache.org> wrote:
>
>  Hi,
> Sorry for the late reply,.
> >>> To be exact, how can we better use the thirdparty repo? Looking at
> HBase as an example, it looks like everything that are known to break a lot
> after an update get shaded into the hbase-thirdparty artifact: guava,
> netty, ... etc.
> Is it the purpose to isolate these naughty dependencies?
> Yes, shading is to isolate these naughty dependencies from downstream
> classpath and have independent control on these upgrades without breaking
> downstreams.
>
> First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create the
> protobuf shaded jar is ready to merge.
>
> Please take a look if anyone interested, will be merged may be after two
> days if no objections.
>
> -Vinay
>
>
> On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org>
> wrote:
>
> > Hi I am late to this but I am keen to understand more.
> >
> > To be exact, how can we better use the thirdparty repo? Looking at HBase
> > as an example, it looks like everything that are known to break a lot
> after
> > an update get shaded into the hbase-thirdparty artifact: guava, netty,
> ...
> > etc.
> > Is it the purpose to isolate these naughty dependencies?
> >
> > On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vi...@apache.org>
> > wrote:
> >
> >> Hi All,
> >>
> >> I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
> >> 's suggestions.
> >>
> >>    i. Renamed the module to 'hadoop-shaded-protobuf37'
> >>    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
> >>
> >> Please review!!
> >>
> >> Thanks,
> >> -Vinay
> >>
> >>
> >> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <pa...@gmail.com>
> >> wrote:
> >>
> >> > For HBase we have a separated repo for hbase-thirdparty
> >> >
> >> > https://github.com/apache/hbase-thirdparty
> >> >
> >> > We will publish the artifacts to nexus so we do not need to include
> >> > binaries in our git repo, just add a dependency in the pom.
> >> >
> >> >
> >> >
> >>
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> >> >
> >> >
> >> > And it has its own release cycles, only when there are special
> >> requirements
> >> > or we want to upgrade some of the dependencies. This is the vote
> thread
> >> for
> >> > the newest release, where we want to provide a shaded gson for jdk7.
> >> >
> >> >
> >> >
> >>
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> >> >
> >> >
> >> > Thanks.
> >> >
> >> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
> >> >
> >> > > Please find replies inline.
> >> > >
> >> > > -Vinay
> >> > >
> >> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
> >> owen.omalley@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > I'm very unhappy with this direction. In particular, I don't think
> >> git
> >> > is
> >> > > > a good place for distribution of binary artifacts. Furthermore,
> the
> >> PMC
> >> > > > shouldn't be releasing anything without a release vote.
> >> > > >
> >> > > >
> >> > > Proposed solution doesnt release any binaries in git. Its actually a
> >> > > complete sub-project which follows entire release process, including
> >> VOTE
> >> > > in public. I have mentioned already that release process is similar
> to
> >> > > hadoop.
> >> > > To be specific, using the (almost) same script used in hadoop to
> >> generate
> >> > > artifacts, sign and deploy to staging repository. Please let me know
> >> If I
> >> > > am conveying anything wrong.
> >> > >
> >> > >
> >> > > > I'd propose that we make a third party module that contains the
> >> > *source*
> >> > > > of the pom files to build the relocated jars. This should
> >> absolutely be
> >> > > > treated as a last resort for the mostly Google projects that
> >> regularly
> >> > > > break binary compatibility (eg. Protobuf & Guava).
> >> > > >
> >> > > >
> >> > > Same has been implemented in the PR
> >> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check
> and
> >> let
> >> > > me
> >> > > know If I misunderstood. Yes, this is the last option we have AFAIK.
> >> > >
> >> > >
> >> > > > In terms of naming, I'd propose something like:
> >> > > >
> >> > > > org.apache.hadoop.thirdparty.protobuf2_5
> >> > > > org.apache.hadoop.thirdparty.guava28
> >> > > >
> >> > > > In particular, I think we absolutely need to include the version
> of
> >> the
> >> > > > underlying project. On the other hand, since we should not be
> >> shading
> >> > > > *everything* we can drop the leading com.google.
> >> > > >
> >> > > >
> >> > > IMO, This naming convention is easy for identifying the underlying
> >> > project,
> >> > > but  it will be difficult to maintain going forward if underlying
> >> project
> >> > > versions changes. Since thirdparty module have its own releases,
> each
> >> of
> >> > > those release can be mapped to specific version of underlying
> project.
> >> > Even
> >> > > the binary artifact can include a MANIFEST with underlying project
> >> > details
> >> > > as per Steve's suggestion on HADOOP-13363.
> >> > > That said, if you still prefer to have project number in artifact
> id,
> >> it
> >> > > can be done.
> >> > >
> >> > > The Hadoop project can make releases of  the thirdparty module:
> >> > > >
> >> > > > <dependency>
> >> > > >  <groupId>org.apache.hadoop</groupId>
> >> > > >  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> >> > > >  <version>1.0</version>
> >> > > > </dependency>
> >> > > >
> >> > > >
> >> > > Note that the version has to be the hadoop thirdparty release
> number,
> >> > which
> >> > > > is part of why you need to have the underlying version in the
> >> artifact
> >> > > > name. These we can push to maven central as new releases from
> >> Hadoop.
> >> > > >
> >> > > >
> >> > > Exactly, same has been implemented in the PR. hadoop-thirdparty
> module
> >> > have
> >> > > its own releases. But in HADOOP Jira, thirdparty versions can be
> >> > > differentiated using prefix "thirdparty-".
> >> > >
> >> > > Same solution is being followed in HBase. May be people involved in
> >> HBase
> >> > > can add some points here.
> >> > >
> >> > > Thoughts?
> >> > > >
> >> > > > .. Owen
> >> > > >
> >> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> >> vinayakumarb@apache.org
> >> > >
> >> > > > wrote:
> >> > > >
> >> > > >> Hi All,
> >> > > >>
> >> > > >>    I wanted to discuss about the separate repo for thirdparty
> >> > > dependencies
> >> > > >> which we need to shaded and include in Hadoop component's jars.
> >> > > >>
> >> > > >>    Apologies for the big text ahead, but this needs clear
> >> > explanation!!
> >> > > >>
> >> > > >>    Right now most needed such dependency is protobuf. Protobuf
> >> > > dependency
> >> > > >> was not upgraded from 2.5.0 onwards with the fear that downstream
> >> > > builds,
> >> > > >> which depends on transitive dependency protobuf coming from
> >> hadoop's
> >> > > jars,
> >> > > >> may fail with the upgrade. Apparently protobuf does not guarantee
> >> > source
> >> > > >> compatibility, though it guarantees wire compatibility between
> >> > versions.
> >> > > >> Because of this behavior, version upgrade may cause breakage in
> >> known
> >> > > and
> >> > > >> unknown (private?) downstreams.
> >> > > >>
> >> > > >>    So to tackle this, we came up the following proposal in
> >> > HADOOP-13363.
> >> > > >>
> >> > > >>    Luckily, As far as I know, no APIs, either public to user or
> >> > between
> >> > > >> Hadoop processes, is not directly using protobuf classes in
> >> > signatures.
> >> > > >> (If
> >> > > >> any exist, please let us know).
> >> > > >>
> >> > > >>    Proposal:
> >> > > >>    ------------
> >> > > >>
> >> > > >>    1. Create a artifact(s) which contains shaded dependencies.
> All
> >> > such
> >> > > >> shading/relocation will be with known prefix
> >> > > >> **org.apache.hadoop.thirdparty.**.
> >> > > >>    2. Right now protobuf jar (ex:
> >> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
> >> > > >> to start with, all **com.google.protobuf** classes will be
> >> relocated
> >> > as
> >> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> >> > > >>    3. Hadoop modules, which needs protobuf as dependency, will
> add
> >> > this
> >> > > >> shaded artifact as dependency (ex:
> >> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> >> > > >>    4. All previous usages of "com.google.protobuf" will be
> >> relocated
> >> > to
> >> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code
> and
> >> > will
> >> > > be
> >> > > >> committed. Please note, this replacement is One-Time directly in
> >> > source
> >> > > >> code, NOT during compile and package.
> >> > > >>    5. Once all usages of "com.google.protobuf" is relocated, then
> >> > hadoop
> >> > > >> dont care about which version of original  "protobuf-java" is in
> >> > > >> dependency.
> >> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
> >> break
> >> > > the
> >> > > >> downstreams. But hadoop will be originally using the latest
> >> protobuf
> >> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> >> > > >>
> >> > > >>    7. Coming back to separate repo, Following are most
> appropriate
> >> > > reasons
> >> > > >> of keeping shaded dependency artifact in separate repo instead of
> >> > > >> submodule.
> >> > > >>
> >> > > >>      7a. These artifacts need not be built all the time. It needs
> >> to
> >> > be
> >> > > >> built only when there is a change in the dependency version or
> the
> >> > build
> >> > > >> process.
> >> > > >>      7b. If added as "submodule in Hadoop repo",
> >> > > maven-shade-plugin:shade
> >> > > >> will execute only in package phase. That means, "mvn compile" or
> >> "mvn
> >> > > >> test-compile" will not be failed as this artifact will not have
> >> > > relocated
> >> > > >> classes, instead it will have original classes, resulting in
> >> > compilation
> >> > > >> failure. Workaround, build thirdparty submodule first and exclude
> >> > > >> "thirdparty" submodule in other executions. This will be a
> complex
> >> > > process
> >> > > >> compared to keeping in a separate repo.
> >> > > >>
> >> > > >>      7c. Separate repo, will be a subproject of Hadoop, using the
> >> > same
> >> > > >> HADOOP jira project, with different versioning prefixed with
> >> > > "thirdparty-"
> >> > > >> (ex: thirdparty-1.0.0).
> >> > > >>      7d. Separate will have same release process as Hadoop.
> >> > > >>
> >> > > >>    HADOOP-13363 (
> >> https://issues.apache.org/jira/browse/HADOOP-13363)
> >> > > is
> >> > > >> an
> >> > > >> umbrella jira tracking the changes to protobuf upgrade.
> >> > > >>
> >> > > >>    PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
> >> been
> >> > > >> raised
> >> > > >> for separate repo creation in (HADOOP-16595 (
> >> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> >> > > >>
> >> > > >>    Please provide your inputs for the proposal and review the PR
> >> to
> >> > > >> proceed with the proposal.
> >> > > >>
> >> > > >>
> >> > > >    -Thanks,
> >> > > >>    Vinay
> >> > > >>
> >> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> >> > > >> vinodkv@apache.org>
> >> > > >> wrote:
> >> > > >>
> >> > > >> > Moving the thread to the dev lists.
> >> > > >> >
> >> > > >> > Thanks
> >> > > >> > +Vinod
> >> > > >> >
> >> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> >> > > vinayakumarb@apache.org>
> >> > > >> > wrote:
> >> > > >> > >
> >> > > >> > > Thanks Marton,
> >> > > >> > >
> >> > > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
> >> > > >> > > Whether to use that repo  for shaded artifact or not will be
> >> > > >> monitored in
> >> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> >> > discussion.
> >> > > >> > >
> >> > > >> > > There is no existing codebase is being moved out of hadoop
> >> repo.
> >> > So
> >> > > I
> >> > > >> > think
> >> > > >> > > right now we are good to go.
> >> > > >> > >
> >> > > >> > > -Vinay
> >> > > >> > >
> >> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
> elek@apache.org>
> >> > > wrote:
> >> > > >> > >
> >> > > >> > >>
> >> > > >> > >> I am not sure if it's defined when is a vote required.
> >> > > >> > >>
> >> > > >> > >> https://www.apache.org/foundation/voting.html
> >> > > >> > >>
> >> > > >> > >> Personally I think it's a big enough change to send a
> >> > notification
> >> > > to
> >> > > >> > the
> >> > > >> > >> dev lists with a 'lazy consensus'  closure
> >> > > >> > >>
> >> > > >> > >> Marton
> >> > > >> > >>
> >> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
> >> vinayakumarb@apache.org>
> >> > > >> wrote:
> >> > > >> > >>> Hi,
> >> > > >> > >>>
> >> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be
> >> more
> >> > in
> >> > > >> > >> future)
> >> > > >> > >>> will be kept as a shaded artifact in a separate repo, which
> >> will
> >> > > be
> >> > > >> > >>> referred as dependency in hadoop modules.  This approach
> >> avoids
> >> > > >> shading
> >> > > >> > >> of
> >> > > >> > >>> every submodule during build.
> >> > > >> > >>>
> >> > > >> > >>> So question is does any VOTE required before asking to
> >> create a
> >> > > git
> >> > > >> > repo?
> >> > > >> > >>>
> >> > > >> > >>> On selfserve platform
> >> > > https://gitbox.apache.org/setup/newrepo.html
> >> > > >> > >>> I can access see that, requester should be PMC.
> >> > > >> > >>>
> >> > > >> > >>> Wanted to confirm here first.
> >> > > >> > >>>
> >> > > >> > >>> -Vinay
> >> > > >> > >>>
> >> > > >> > >>
> >> > > >> > >>
> >> > >
> ---------------------------------------------------------------------
> >> > > >> > >> To unsubscribe, e-mail:
> private-unsubscribe@hadoop.apache.org
> >> > > >> > >> For additional commands, e-mail:
> >> private-help@hadoop.apache.org
> >> > > >> > >>
> >> > > >> > >>
> >> > > >> >
> >> > > >> >
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Brahma Reddy Battula <br...@apache.org>.
 I just gone through previous discussions from jira (HADOOP-13363) and this
thread,As stack and Duo Zhang mentioned ,this artifact(instead of
thirdparty we can give shaded??) will be voted by PMC like below, won’t it
be fair??

https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E

one thought here:
May be we can unify ( we can incubation project for same ??) ? So, that all
projects can use same git repo for shaded artifacts??


Wanted to join for the discussion, so please let me know..


On Sun, 5 Jan 2020 at 7:33 AM, Sree Vaddi <sr...@yahoo.com.invalid>
wrote:

> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> Project ? Or as a TLP ?
> Or as a new project definition ?
>
> The effort to streamline and put in an accepted standard for the
> dependencies that require shading,seems beyond the siloed efforts of
> hadoop, hbase, etc....
>
> I propose, we bring all the decision makers from all these artifacts in
> one room and decide best course of action.I am looking at, no projects
> should ever had to shade any artifacts except as an absolute necessary
> alternative.
>
>
> Thank you./Sree
>
>
>
>     On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
> vinayakumarb@apache.org> wrote:
>
>  Hi,
> Sorry for the late reply,.
> >>> To be exact, how can we better use the thirdparty repo? Looking at
> HBase as an example, it looks like everything that are known to break a lot
> after an update get shaded into the hbase-thirdparty artifact: guava,
> netty, ... etc.
> Is it the purpose to isolate these naughty dependencies?
> Yes, shading is to isolate these naughty dependencies from downstream
> classpath and have independent control on these upgrades without breaking
> downstreams.
>
> First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create the
> protobuf shaded jar is ready to merge.
>
> Please take a look if anyone interested, will be merged may be after two
> days if no objections.
>
> -Vinay
>
>
> On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org>
> wrote:
>
> > Hi I am late to this but I am keen to understand more.
> >
> > To be exact, how can we better use the thirdparty repo? Looking at HBase
> > as an example, it looks like everything that are known to break a lot
> after
> > an update get shaded into the hbase-thirdparty artifact: guava, netty,
> ...
> > etc.
> > Is it the purpose to isolate these naughty dependencies?
> >
> > On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vi...@apache.org>
> > wrote:
> >
> >> Hi All,
> >>
> >> I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
> >> 's suggestions.
> >>
> >>    i. Renamed the module to 'hadoop-shaded-protobuf37'
> >>    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
> >>
> >> Please review!!
> >>
> >> Thanks,
> >> -Vinay
> >>
> >>
> >> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <pa...@gmail.com>
> >> wrote:
> >>
> >> > For HBase we have a separated repo for hbase-thirdparty
> >> >
> >> > https://github.com/apache/hbase-thirdparty
> >> >
> >> > We will publish the artifacts to nexus so we do not need to include
> >> > binaries in our git repo, just add a dependency in the pom.
> >> >
> >> >
> >> >
> >>
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> >> >
> >> >
> >> > And it has its own release cycles, only when there are special
> >> requirements
> >> > or we want to upgrade some of the dependencies. This is the vote
> thread
> >> for
> >> > the newest release, where we want to provide a shaded gson for jdk7.
> >> >
> >> >
> >> >
> >>
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> >> >
> >> >
> >> > Thanks.
> >> >
> >> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
> >> >
> >> > > Please find replies inline.
> >> > >
> >> > > -Vinay
> >> > >
> >> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
> >> owen.omalley@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > I'm very unhappy with this direction. In particular, I don't think
> >> git
> >> > is
> >> > > > a good place for distribution of binary artifacts. Furthermore,
> the
> >> PMC
> >> > > > shouldn't be releasing anything without a release vote.
> >> > > >
> >> > > >
> >> > > Proposed solution doesnt release any binaries in git. Its actually a
> >> > > complete sub-project which follows entire release process, including
> >> VOTE
> >> > > in public. I have mentioned already that release process is similar
> to
> >> > > hadoop.
> >> > > To be specific, using the (almost) same script used in hadoop to
> >> generate
> >> > > artifacts, sign and deploy to staging repository. Please let me know
> >> If I
> >> > > am conveying anything wrong.
> >> > >
> >> > >
> >> > > > I'd propose that we make a third party module that contains the
> >> > *source*
> >> > > > of the pom files to build the relocated jars. This should
> >> absolutely be
> >> > > > treated as a last resort for the mostly Google projects that
> >> regularly
> >> > > > break binary compatibility (eg. Protobuf & Guava).
> >> > > >
> >> > > >
> >> > > Same has been implemented in the PR
> >> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check
> and
> >> let
> >> > > me
> >> > > know If I misunderstood. Yes, this is the last option we have AFAIK.
> >> > >
> >> > >
> >> > > > In terms of naming, I'd propose something like:
> >> > > >
> >> > > > org.apache.hadoop.thirdparty.protobuf2_5
> >> > > > org.apache.hadoop.thirdparty.guava28
> >> > > >
> >> > > > In particular, I think we absolutely need to include the version
> of
> >> the
> >> > > > underlying project. On the other hand, since we should not be
> >> shading
> >> > > > *everything* we can drop the leading com.google.
> >> > > >
> >> > > >
> >> > > IMO, This naming convention is easy for identifying the underlying
> >> > project,
> >> > > but  it will be difficult to maintain going forward if underlying
> >> project
> >> > > versions changes. Since thirdparty module have its own releases,
> each
> >> of
> >> > > those release can be mapped to specific version of underlying
> project.
> >> > Even
> >> > > the binary artifact can include a MANIFEST with underlying project
> >> > details
> >> > > as per Steve's suggestion on HADOOP-13363.
> >> > > That said, if you still prefer to have project number in artifact
> id,
> >> it
> >> > > can be done.
> >> > >
> >> > > The Hadoop project can make releases of  the thirdparty module:
> >> > > >
> >> > > > <dependency>
> >> > > >  <groupId>org.apache.hadoop</groupId>
> >> > > >  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> >> > > >  <version>1.0</version>
> >> > > > </dependency>
> >> > > >
> >> > > >
> >> > > Note that the version has to be the hadoop thirdparty release
> number,
> >> > which
> >> > > > is part of why you need to have the underlying version in the
> >> artifact
> >> > > > name. These we can push to maven central as new releases from
> >> Hadoop.
> >> > > >
> >> > > >
> >> > > Exactly, same has been implemented in the PR. hadoop-thirdparty
> module
> >> > have
> >> > > its own releases. But in HADOOP Jira, thirdparty versions can be
> >> > > differentiated using prefix "thirdparty-".
> >> > >
> >> > > Same solution is being followed in HBase. May be people involved in
> >> HBase
> >> > > can add some points here.
> >> > >
> >> > > Thoughts?
> >> > > >
> >> > > > .. Owen
> >> > > >
> >> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> >> vinayakumarb@apache.org
> >> > >
> >> > > > wrote:
> >> > > >
> >> > > >> Hi All,
> >> > > >>
> >> > > >>    I wanted to discuss about the separate repo for thirdparty
> >> > > dependencies
> >> > > >> which we need to shaded and include in Hadoop component's jars.
> >> > > >>
> >> > > >>    Apologies for the big text ahead, but this needs clear
> >> > explanation!!
> >> > > >>
> >> > > >>    Right now most needed such dependency is protobuf. Protobuf
> >> > > dependency
> >> > > >> was not upgraded from 2.5.0 onwards with the fear that downstream
> >> > > builds,
> >> > > >> which depends on transitive dependency protobuf coming from
> >> hadoop's
> >> > > jars,
> >> > > >> may fail with the upgrade. Apparently protobuf does not guarantee
> >> > source
> >> > > >> compatibility, though it guarantees wire compatibility between
> >> > versions.
> >> > > >> Because of this behavior, version upgrade may cause breakage in
> >> known
> >> > > and
> >> > > >> unknown (private?) downstreams.
> >> > > >>
> >> > > >>    So to tackle this, we came up the following proposal in
> >> > HADOOP-13363.
> >> > > >>
> >> > > >>    Luckily, As far as I know, no APIs, either public to user or
> >> > between
> >> > > >> Hadoop processes, is not directly using protobuf classes in
> >> > signatures.
> >> > > >> (If
> >> > > >> any exist, please let us know).
> >> > > >>
> >> > > >>    Proposal:
> >> > > >>    ------------
> >> > > >>
> >> > > >>    1. Create a artifact(s) which contains shaded dependencies.
> All
> >> > such
> >> > > >> shading/relocation will be with known prefix
> >> > > >> **org.apache.hadoop.thirdparty.**.
> >> > > >>    2. Right now protobuf jar (ex:
> >> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
> >> > > >> to start with, all **com.google.protobuf** classes will be
> >> relocated
> >> > as
> >> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> >> > > >>    3. Hadoop modules, which needs protobuf as dependency, will
> add
> >> > this
> >> > > >> shaded artifact as dependency (ex:
> >> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> >> > > >>    4. All previous usages of "com.google.protobuf" will be
> >> relocated
> >> > to
> >> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code
> and
> >> > will
> >> > > be
> >> > > >> committed. Please note, this replacement is One-Time directly in
> >> > source
> >> > > >> code, NOT during compile and package.
> >> > > >>    5. Once all usages of "com.google.protobuf" is relocated, then
> >> > hadoop
> >> > > >> dont care about which version of original  "protobuf-java" is in
> >> > > >> dependency.
> >> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
> >> break
> >> > > the
> >> > > >> downstreams. But hadoop will be originally using the latest
> >> protobuf
> >> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> >> > > >>
> >> > > >>    7. Coming back to separate repo, Following are most
> appropriate
> >> > > reasons
> >> > > >> of keeping shaded dependency artifact in separate repo instead of
> >> > > >> submodule.
> >> > > >>
> >> > > >>      7a. These artifacts need not be built all the time. It needs
> >> to
> >> > be
> >> > > >> built only when there is a change in the dependency version or
> the
> >> > build
> >> > > >> process.
> >> > > >>      7b. If added as "submodule in Hadoop repo",
> >> > > maven-shade-plugin:shade
> >> > > >> will execute only in package phase. That means, "mvn compile" or
> >> "mvn
> >> > > >> test-compile" will not be failed as this artifact will not have
> >> > > relocated
> >> > > >> classes, instead it will have original classes, resulting in
> >> > compilation
> >> > > >> failure. Workaround, build thirdparty submodule first and exclude
> >> > > >> "thirdparty" submodule in other executions. This will be a
> complex
> >> > > process
> >> > > >> compared to keeping in a separate repo.
> >> > > >>
> >> > > >>      7c. Separate repo, will be a subproject of Hadoop, using the
> >> > same
> >> > > >> HADOOP jira project, with different versioning prefixed with
> >> > > "thirdparty-"
> >> > > >> (ex: thirdparty-1.0.0).
> >> > > >>      7d. Separate will have same release process as Hadoop.
> >> > > >>
> >> > > >>    HADOOP-13363 (
> >> https://issues.apache.org/jira/browse/HADOOP-13363)
> >> > > is
> >> > > >> an
> >> > > >> umbrella jira tracking the changes to protobuf upgrade.
> >> > > >>
> >> > > >>    PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
> >> been
> >> > > >> raised
> >> > > >> for separate repo creation in (HADOOP-16595 (
> >> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> >> > > >>
> >> > > >>    Please provide your inputs for the proposal and review the PR
> >> to
> >> > > >> proceed with the proposal.
> >> > > >>
> >> > > >>
> >> > > >    -Thanks,
> >> > > >>    Vinay
> >> > > >>
> >> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> >> > > >> vinodkv@apache.org>
> >> > > >> wrote:
> >> > > >>
> >> > > >> > Moving the thread to the dev lists.
> >> > > >> >
> >> > > >> > Thanks
> >> > > >> > +Vinod
> >> > > >> >
> >> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> >> > > vinayakumarb@apache.org>
> >> > > >> > wrote:
> >> > > >> > >
> >> > > >> > > Thanks Marton,
> >> > > >> > >
> >> > > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
> >> > > >> > > Whether to use that repo  for shaded artifact or not will be
> >> > > >> monitored in
> >> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> >> > discussion.
> >> > > >> > >
> >> > > >> > > There is no existing codebase is being moved out of hadoop
> >> repo.
> >> > So
> >> > > I
> >> > > >> > think
> >> > > >> > > right now we are good to go.
> >> > > >> > >
> >> > > >> > > -Vinay
> >> > > >> > >
> >> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
> elek@apache.org>
> >> > > wrote:
> >> > > >> > >
> >> > > >> > >>
> >> > > >> > >> I am not sure if it's defined when is a vote required.
> >> > > >> > >>
> >> > > >> > >> https://www.apache.org/foundation/voting.html
> >> > > >> > >>
> >> > > >> > >> Personally I think it's a big enough change to send a
> >> > notification
> >> > > to
> >> > > >> > the
> >> > > >> > >> dev lists with a 'lazy consensus'  closure
> >> > > >> > >>
> >> > > >> > >> Marton
> >> > > >> > >>
> >> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
> >> vinayakumarb@apache.org>
> >> > > >> wrote:
> >> > > >> > >>> Hi,
> >> > > >> > >>>
> >> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be
> >> more
> >> > in
> >> > > >> > >> future)
> >> > > >> > >>> will be kept as a shaded artifact in a separate repo, which
> >> will
> >> > > be
> >> > > >> > >>> referred as dependency in hadoop modules.  This approach
> >> avoids
> >> > > >> shading
> >> > > >> > >> of
> >> > > >> > >>> every submodule during build.
> >> > > >> > >>>
> >> > > >> > >>> So question is does any VOTE required before asking to
> >> create a
> >> > > git
> >> > > >> > repo?
> >> > > >> > >>>
> >> > > >> > >>> On selfserve platform
> >> > > https://gitbox.apache.org/setup/newrepo.html
> >> > > >> > >>> I can access see that, requester should be PMC.
> >> > > >> > >>>
> >> > > >> > >>> Wanted to confirm here first.
> >> > > >> > >>>
> >> > > >> > >>> -Vinay
> >> > > >> > >>>
> >> > > >> > >>
> >> > > >> > >>
> >> > >
> ---------------------------------------------------------------------
> >> > > >> > >> To unsubscribe, e-mail:
> private-unsubscribe@hadoop.apache.org
> >> > > >> > >> For additional commands, e-mail:
> >> private-help@hadoop.apache.org
> >> > > >> > >>
> >> > > >> > >>
> >> > > >> >
> >> > > >> >
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >

-- 



--Brahma Reddy Battula

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Ayush Saxena <ay...@gmail.com>.
Hey Sree

> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> Project ? Or as a TLP ?
> Or as a new project definition ?
>
A sub project of Apache Hadoop, having its own independent release cycles.
May be you can put this into the same column as ozone or as
submarine(couple of months ago).

Unifying for all, seems interesting but each project is independent and has
its own limitations and way of thinking, I don't think it would be an easy
task to bring all on the same table and get them agree to a common stuff.

I guess this has been into discussion since quite long, and there hasn't
been any other alternative suggested. Still we can hold up for a week, if
someone comes up with a better solution, else we can continue in the
present direction.

-Ayush



On Sun, 5 Jan 2020 at 05:03, Sree Vaddi <sr...@yahoo.com.invalid>
wrote:

> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> Project ? Or as a TLP ?
> Or as a new project definition ?
>
> The effort to streamline and put in an accepted standard for the
> dependencies that require shading,seems beyond the siloed efforts of
> hadoop, hbase, etc....
>
> I propose, we bring all the decision makers from all these artifacts in
> one room and decide best course of action.I am looking at, no projects
> should ever had to shade any artifacts except as an absolute necessary
> alternative.
>
>
> Thank you./Sree
>
>
>
>     On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
> vinayakumarb@apache.org> wrote:
>
>  Hi,
> Sorry for the late reply,.
> >>> To be exact, how can we better use the thirdparty repo? Looking at
> HBase as an example, it looks like everything that are known to break a lot
> after an update get shaded into the hbase-thirdparty artifact: guava,
> netty, ... etc.
> Is it the purpose to isolate these naughty dependencies?
> Yes, shading is to isolate these naughty dependencies from downstream
> classpath and have independent control on these upgrades without breaking
> downstreams.
>
> First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create the
> protobuf shaded jar is ready to merge.
>
> Please take a look if anyone interested, will be merged may be after two
> days if no objections.
>
> -Vinay
>
>
> On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org>
> wrote:
>
> > Hi I am late to this but I am keen to understand more.
> >
> > To be exact, how can we better use the thirdparty repo? Looking at HBase
> > as an example, it looks like everything that are known to break a lot
> after
> > an update get shaded into the hbase-thirdparty artifact: guava, netty,
> ...
> > etc.
> > Is it the purpose to isolate these naughty dependencies?
> >
> > On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vi...@apache.org>
> > wrote:
> >
> >> Hi All,
> >>
> >> I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
> >> 's suggestions.
> >>
> >>    i. Renamed the module to 'hadoop-shaded-protobuf37'
> >>    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
> >>
> >> Please review!!
> >>
> >> Thanks,
> >> -Vinay
> >>
> >>
> >> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <pa...@gmail.com>
> >> wrote:
> >>
> >> > For HBase we have a separated repo for hbase-thirdparty
> >> >
> >> > https://github.com/apache/hbase-thirdparty
> >> >
> >> > We will publish the artifacts to nexus so we do not need to include
> >> > binaries in our git repo, just add a dependency in the pom.
> >> >
> >> >
> >> >
> >>
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> >> >
> >> >
> >> > And it has its own release cycles, only when there are special
> >> requirements
> >> > or we want to upgrade some of the dependencies. This is the vote
> thread
> >> for
> >> > the newest release, where we want to provide a shaded gson for jdk7.
> >> >
> >> >
> >> >
> >>
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> >> >
> >> >
> >> > Thanks.
> >> >
> >> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
> >> >
> >> > > Please find replies inline.
> >> > >
> >> > > -Vinay
> >> > >
> >> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
> >> owen.omalley@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > I'm very unhappy with this direction. In particular, I don't think
> >> git
> >> > is
> >> > > > a good place for distribution of binary artifacts. Furthermore,
> the
> >> PMC
> >> > > > shouldn't be releasing anything without a release vote.
> >> > > >
> >> > > >
> >> > > Proposed solution doesnt release any binaries in git. Its actually a
> >> > > complete sub-project which follows entire release process, including
> >> VOTE
> >> > > in public. I have mentioned already that release process is similar
> to
> >> > > hadoop.
> >> > > To be specific, using the (almost) same script used in hadoop to
> >> generate
> >> > > artifacts, sign and deploy to staging repository. Please let me know
> >> If I
> >> > > am conveying anything wrong.
> >> > >
> >> > >
> >> > > > I'd propose that we make a third party module that contains the
> >> > *source*
> >> > > > of the pom files to build the relocated jars. This should
> >> absolutely be
> >> > > > treated as a last resort for the mostly Google projects that
> >> regularly
> >> > > > break binary compatibility (eg. Protobuf & Guava).
> >> > > >
> >> > > >
> >> > > Same has been implemented in the PR
> >> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check
> and
> >> let
> >> > > me
> >> > > know If I misunderstood. Yes, this is the last option we have AFAIK.
> >> > >
> >> > >
> >> > > > In terms of naming, I'd propose something like:
> >> > > >
> >> > > > org.apache.hadoop.thirdparty.protobuf2_5
> >> > > > org.apache.hadoop.thirdparty.guava28
> >> > > >
> >> > > > In particular, I think we absolutely need to include the version
> of
> >> the
> >> > > > underlying project. On the other hand, since we should not be
> >> shading
> >> > > > *everything* we can drop the leading com.google.
> >> > > >
> >> > > >
> >> > > IMO, This naming convention is easy for identifying the underlying
> >> > project,
> >> > > but  it will be difficult to maintain going forward if underlying
> >> project
> >> > > versions changes. Since thirdparty module have its own releases,
> each
> >> of
> >> > > those release can be mapped to specific version of underlying
> project.
> >> > Even
> >> > > the binary artifact can include a MANIFEST with underlying project
> >> > details
> >> > > as per Steve's suggestion on HADOOP-13363.
> >> > > That said, if you still prefer to have project number in artifact
> id,
> >> it
> >> > > can be done.
> >> > >
> >> > > The Hadoop project can make releases of  the thirdparty module:
> >> > > >
> >> > > > <dependency>
> >> > > >  <groupId>org.apache.hadoop</groupId>
> >> > > >  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> >> > > >  <version>1.0</version>
> >> > > > </dependency>
> >> > > >
> >> > > >
> >> > > Note that the version has to be the hadoop thirdparty release
> number,
> >> > which
> >> > > > is part of why you need to have the underlying version in the
> >> artifact
> >> > > > name. These we can push to maven central as new releases from
> >> Hadoop.
> >> > > >
> >> > > >
> >> > > Exactly, same has been implemented in the PR. hadoop-thirdparty
> module
> >> > have
> >> > > its own releases. But in HADOOP Jira, thirdparty versions can be
> >> > > differentiated using prefix "thirdparty-".
> >> > >
> >> > > Same solution is being followed in HBase. May be people involved in
> >> HBase
> >> > > can add some points here.
> >> > >
> >> > > Thoughts?
> >> > > >
> >> > > > .. Owen
> >> > > >
> >> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> >> vinayakumarb@apache.org
> >> > >
> >> > > > wrote:
> >> > > >
> >> > > >> Hi All,
> >> > > >>
> >> > > >>    I wanted to discuss about the separate repo for thirdparty
> >> > > dependencies
> >> > > >> which we need to shaded and include in Hadoop component's jars.
> >> > > >>
> >> > > >>    Apologies for the big text ahead, but this needs clear
> >> > explanation!!
> >> > > >>
> >> > > >>    Right now most needed such dependency is protobuf. Protobuf
> >> > > dependency
> >> > > >> was not upgraded from 2.5.0 onwards with the fear that downstream
> >> > > builds,
> >> > > >> which depends on transitive dependency protobuf coming from
> >> hadoop's
> >> > > jars,
> >> > > >> may fail with the upgrade. Apparently protobuf does not guarantee
> >> > source
> >> > > >> compatibility, though it guarantees wire compatibility between
> >> > versions.
> >> > > >> Because of this behavior, version upgrade may cause breakage in
> >> known
> >> > > and
> >> > > >> unknown (private?) downstreams.
> >> > > >>
> >> > > >>    So to tackle this, we came up the following proposal in
> >> > HADOOP-13363.
> >> > > >>
> >> > > >>    Luckily, As far as I know, no APIs, either public to user or
> >> > between
> >> > > >> Hadoop processes, is not directly using protobuf classes in
> >> > signatures.
> >> > > >> (If
> >> > > >> any exist, please let us know).
> >> > > >>
> >> > > >>    Proposal:
> >> > > >>    ------------
> >> > > >>
> >> > > >>    1. Create a artifact(s) which contains shaded dependencies.
> All
> >> > such
> >> > > >> shading/relocation will be with known prefix
> >> > > >> **org.apache.hadoop.thirdparty.**.
> >> > > >>    2. Right now protobuf jar (ex:
> >> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
> >> > > >> to start with, all **com.google.protobuf** classes will be
> >> relocated
> >> > as
> >> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> >> > > >>    3. Hadoop modules, which needs protobuf as dependency, will
> add
> >> > this
> >> > > >> shaded artifact as dependency (ex:
> >> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> >> > > >>    4. All previous usages of "com.google.protobuf" will be
> >> relocated
> >> > to
> >> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code
> and
> >> > will
> >> > > be
> >> > > >> committed. Please note, this replacement is One-Time directly in
> >> > source
> >> > > >> code, NOT during compile and package.
> >> > > >>    5. Once all usages of "com.google.protobuf" is relocated, then
> >> > hadoop
> >> > > >> dont care about which version of original  "protobuf-java" is in
> >> > > >> dependency.
> >> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
> >> break
> >> > > the
> >> > > >> downstreams. But hadoop will be originally using the latest
> >> protobuf
> >> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> >> > > >>
> >> > > >>    7. Coming back to separate repo, Following are most
> appropriate
> >> > > reasons
> >> > > >> of keeping shaded dependency artifact in separate repo instead of
> >> > > >> submodule.
> >> > > >>
> >> > > >>      7a. These artifacts need not be built all the time. It needs
> >> to
> >> > be
> >> > > >> built only when there is a change in the dependency version or
> the
> >> > build
> >> > > >> process.
> >> > > >>      7b. If added as "submodule in Hadoop repo",
> >> > > maven-shade-plugin:shade
> >> > > >> will execute only in package phase. That means, "mvn compile" or
> >> "mvn
> >> > > >> test-compile" will not be failed as this artifact will not have
> >> > > relocated
> >> > > >> classes, instead it will have original classes, resulting in
> >> > compilation
> >> > > >> failure. Workaround, build thirdparty submodule first and exclude
> >> > > >> "thirdparty" submodule in other executions. This will be a
> complex
> >> > > process
> >> > > >> compared to keeping in a separate repo.
> >> > > >>
> >> > > >>      7c. Separate repo, will be a subproject of Hadoop, using the
> >> > same
> >> > > >> HADOOP jira project, with different versioning prefixed with
> >> > > "thirdparty-"
> >> > > >> (ex: thirdparty-1.0.0).
> >> > > >>      7d. Separate will have same release process as Hadoop.
> >> > > >>
> >> > > >>    HADOOP-13363 (
> >> https://issues.apache.org/jira/browse/HADOOP-13363)
> >> > > is
> >> > > >> an
> >> > > >> umbrella jira tracking the changes to protobuf upgrade.
> >> > > >>
> >> > > >>    PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
> >> been
> >> > > >> raised
> >> > > >> for separate repo creation in (HADOOP-16595 (
> >> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> >> > > >>
> >> > > >>    Please provide your inputs for the proposal and review the PR
> >> to
> >> > > >> proceed with the proposal.
> >> > > >>
> >> > > >>
> >> > > >    -Thanks,
> >> > > >>    Vinay
> >> > > >>
> >> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> >> > > >> vinodkv@apache.org>
> >> > > >> wrote:
> >> > > >>
> >> > > >> > Moving the thread to the dev lists.
> >> > > >> >
> >> > > >> > Thanks
> >> > > >> > +Vinod
> >> > > >> >
> >> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> >> > > vinayakumarb@apache.org>
> >> > > >> > wrote:
> >> > > >> > >
> >> > > >> > > Thanks Marton,
> >> > > >> > >
> >> > > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
> >> > > >> > > Whether to use that repo  for shaded artifact or not will be
> >> > > >> monitored in
> >> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> >> > discussion.
> >> > > >> > >
> >> > > >> > > There is no existing codebase is being moved out of hadoop
> >> repo.
> >> > So
> >> > > I
> >> > > >> > think
> >> > > >> > > right now we are good to go.
> >> > > >> > >
> >> > > >> > > -Vinay
> >> > > >> > >
> >> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
> elek@apache.org>
> >> > > wrote:
> >> > > >> > >
> >> > > >> > >>
> >> > > >> > >> I am not sure if it's defined when is a vote required.
> >> > > >> > >>
> >> > > >> > >> https://www.apache.org/foundation/voting.html
> >> > > >> > >>
> >> > > >> > >> Personally I think it's a big enough change to send a
> >> > notification
> >> > > to
> >> > > >> > the
> >> > > >> > >> dev lists with a 'lazy consensus'  closure
> >> > > >> > >>
> >> > > >> > >> Marton
> >> > > >> > >>
> >> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
> >> vinayakumarb@apache.org>
> >> > > >> wrote:
> >> > > >> > >>> Hi,
> >> > > >> > >>>
> >> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be
> >> more
> >> > in
> >> > > >> > >> future)
> >> > > >> > >>> will be kept as a shaded artifact in a separate repo, which
> >> will
> >> > > be
> >> > > >> > >>> referred as dependency in hadoop modules.  This approach
> >> avoids
> >> > > >> shading
> >> > > >> > >> of
> >> > > >> > >>> every submodule during build.
> >> > > >> > >>>
> >> > > >> > >>> So question is does any VOTE required before asking to
> >> create a
> >> > > git
> >> > > >> > repo?
> >> > > >> > >>>
> >> > > >> > >>> On selfserve platform
> >> > > https://gitbox.apache.org/setup/newrepo.html
> >> > > >> > >>> I can access see that, requester should be PMC.
> >> > > >> > >>>
> >> > > >> > >>> Wanted to confirm here first.
> >> > > >> > >>>
> >> > > >> > >>> -Vinay
> >> > > >> > >>>
> >> > > >> > >>
> >> > > >> > >>
> >> > >
> ---------------------------------------------------------------------
> >> > > >> > >> To unsubscribe, e-mail:
> private-unsubscribe@hadoop.apache.org
> >> > > >> > >> For additional commands, e-mail:
> >> private-help@hadoop.apache.org
> >> > > >> > >>
> >> > > >> > >>
> >> > > >> >
> >> > > >> >
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Brahma Reddy Battula <br...@apache.org>.
 I just gone through previous discussions from jira (HADOOP-13363) and this
thread,As stack and Duo Zhang mentioned ,this artifact(instead of
thirdparty we can give shaded??) will be voted by PMC like below, won’t it
be fair??

https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E

one thought here:
May be we can unify ( we can incubation project for same ??) ? So, that all
projects can use same git repo for shaded artifacts??


Wanted to join for the discussion, so please let me know..


On Sun, 5 Jan 2020 at 7:33 AM, Sree Vaddi <sr...@yahoo.com.invalid>
wrote:

> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> Project ? Or as a TLP ?
> Or as a new project definition ?
>
> The effort to streamline and put in an accepted standard for the
> dependencies that require shading,seems beyond the siloed efforts of
> hadoop, hbase, etc....
>
> I propose, we bring all the decision makers from all these artifacts in
> one room and decide best course of action.I am looking at, no projects
> should ever had to shade any artifacts except as an absolute necessary
> alternative.
>
>
> Thank you./Sree
>
>
>
>     On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
> vinayakumarb@apache.org> wrote:
>
>  Hi,
> Sorry for the late reply,.
> >>> To be exact, how can we better use the thirdparty repo? Looking at
> HBase as an example, it looks like everything that are known to break a lot
> after an update get shaded into the hbase-thirdparty artifact: guava,
> netty, ... etc.
> Is it the purpose to isolate these naughty dependencies?
> Yes, shading is to isolate these naughty dependencies from downstream
> classpath and have independent control on these upgrades without breaking
> downstreams.
>
> First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create the
> protobuf shaded jar is ready to merge.
>
> Please take a look if anyone interested, will be merged may be after two
> days if no objections.
>
> -Vinay
>
>
> On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org>
> wrote:
>
> > Hi I am late to this but I am keen to understand more.
> >
> > To be exact, how can we better use the thirdparty repo? Looking at HBase
> > as an example, it looks like everything that are known to break a lot
> after
> > an update get shaded into the hbase-thirdparty artifact: guava, netty,
> ...
> > etc.
> > Is it the purpose to isolate these naughty dependencies?
> >
> > On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vi...@apache.org>
> > wrote:
> >
> >> Hi All,
> >>
> >> I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
> >> 's suggestions.
> >>
> >>    i. Renamed the module to 'hadoop-shaded-protobuf37'
> >>    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
> >>
> >> Please review!!
> >>
> >> Thanks,
> >> -Vinay
> >>
> >>
> >> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <pa...@gmail.com>
> >> wrote:
> >>
> >> > For HBase we have a separated repo for hbase-thirdparty
> >> >
> >> > https://github.com/apache/hbase-thirdparty
> >> >
> >> > We will publish the artifacts to nexus so we do not need to include
> >> > binaries in our git repo, just add a dependency in the pom.
> >> >
> >> >
> >> >
> >>
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> >> >
> >> >
> >> > And it has its own release cycles, only when there are special
> >> requirements
> >> > or we want to upgrade some of the dependencies. This is the vote
> thread
> >> for
> >> > the newest release, where we want to provide a shaded gson for jdk7.
> >> >
> >> >
> >> >
> >>
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> >> >
> >> >
> >> > Thanks.
> >> >
> >> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
> >> >
> >> > > Please find replies inline.
> >> > >
> >> > > -Vinay
> >> > >
> >> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
> >> owen.omalley@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > I'm very unhappy with this direction. In particular, I don't think
> >> git
> >> > is
> >> > > > a good place for distribution of binary artifacts. Furthermore,
> the
> >> PMC
> >> > > > shouldn't be releasing anything without a release vote.
> >> > > >
> >> > > >
> >> > > Proposed solution doesnt release any binaries in git. Its actually a
> >> > > complete sub-project which follows entire release process, including
> >> VOTE
> >> > > in public. I have mentioned already that release process is similar
> to
> >> > > hadoop.
> >> > > To be specific, using the (almost) same script used in hadoop to
> >> generate
> >> > > artifacts, sign and deploy to staging repository. Please let me know
> >> If I
> >> > > am conveying anything wrong.
> >> > >
> >> > >
> >> > > > I'd propose that we make a third party module that contains the
> >> > *source*
> >> > > > of the pom files to build the relocated jars. This should
> >> absolutely be
> >> > > > treated as a last resort for the mostly Google projects that
> >> regularly
> >> > > > break binary compatibility (eg. Protobuf & Guava).
> >> > > >
> >> > > >
> >> > > Same has been implemented in the PR
> >> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check
> and
> >> let
> >> > > me
> >> > > know If I misunderstood. Yes, this is the last option we have AFAIK.
> >> > >
> >> > >
> >> > > > In terms of naming, I'd propose something like:
> >> > > >
> >> > > > org.apache.hadoop.thirdparty.protobuf2_5
> >> > > > org.apache.hadoop.thirdparty.guava28
> >> > > >
> >> > > > In particular, I think we absolutely need to include the version
> of
> >> the
> >> > > > underlying project. On the other hand, since we should not be
> >> shading
> >> > > > *everything* we can drop the leading com.google.
> >> > > >
> >> > > >
> >> > > IMO, This naming convention is easy for identifying the underlying
> >> > project,
> >> > > but  it will be difficult to maintain going forward if underlying
> >> project
> >> > > versions changes. Since thirdparty module have its own releases,
> each
> >> of
> >> > > those release can be mapped to specific version of underlying
> project.
> >> > Even
> >> > > the binary artifact can include a MANIFEST with underlying project
> >> > details
> >> > > as per Steve's suggestion on HADOOP-13363.
> >> > > That said, if you still prefer to have project number in artifact
> id,
> >> it
> >> > > can be done.
> >> > >
> >> > > The Hadoop project can make releases of  the thirdparty module:
> >> > > >
> >> > > > <dependency>
> >> > > >  <groupId>org.apache.hadoop</groupId>
> >> > > >  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> >> > > >  <version>1.0</version>
> >> > > > </dependency>
> >> > > >
> >> > > >
> >> > > Note that the version has to be the hadoop thirdparty release
> number,
> >> > which
> >> > > > is part of why you need to have the underlying version in the
> >> artifact
> >> > > > name. These we can push to maven central as new releases from
> >> Hadoop.
> >> > > >
> >> > > >
> >> > > Exactly, same has been implemented in the PR. hadoop-thirdparty
> module
> >> > have
> >> > > its own releases. But in HADOOP Jira, thirdparty versions can be
> >> > > differentiated using prefix "thirdparty-".
> >> > >
> >> > > Same solution is being followed in HBase. May be people involved in
> >> HBase
> >> > > can add some points here.
> >> > >
> >> > > Thoughts?
> >> > > >
> >> > > > .. Owen
> >> > > >
> >> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> >> vinayakumarb@apache.org
> >> > >
> >> > > > wrote:
> >> > > >
> >> > > >> Hi All,
> >> > > >>
> >> > > >>    I wanted to discuss about the separate repo for thirdparty
> >> > > dependencies
> >> > > >> which we need to shaded and include in Hadoop component's jars.
> >> > > >>
> >> > > >>    Apologies for the big text ahead, but this needs clear
> >> > explanation!!
> >> > > >>
> >> > > >>    Right now most needed such dependency is protobuf. Protobuf
> >> > > dependency
> >> > > >> was not upgraded from 2.5.0 onwards with the fear that downstream
> >> > > builds,
> >> > > >> which depends on transitive dependency protobuf coming from
> >> hadoop's
> >> > > jars,
> >> > > >> may fail with the upgrade. Apparently protobuf does not guarantee
> >> > source
> >> > > >> compatibility, though it guarantees wire compatibility between
> >> > versions.
> >> > > >> Because of this behavior, version upgrade may cause breakage in
> >> known
> >> > > and
> >> > > >> unknown (private?) downstreams.
> >> > > >>
> >> > > >>    So to tackle this, we came up the following proposal in
> >> > HADOOP-13363.
> >> > > >>
> >> > > >>    Luckily, As far as I know, no APIs, either public to user or
> >> > between
> >> > > >> Hadoop processes, is not directly using protobuf classes in
> >> > signatures.
> >> > > >> (If
> >> > > >> any exist, please let us know).
> >> > > >>
> >> > > >>    Proposal:
> >> > > >>    ------------
> >> > > >>
> >> > > >>    1. Create a artifact(s) which contains shaded dependencies.
> All
> >> > such
> >> > > >> shading/relocation will be with known prefix
> >> > > >> **org.apache.hadoop.thirdparty.**.
> >> > > >>    2. Right now protobuf jar (ex:
> >> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
> >> > > >> to start with, all **com.google.protobuf** classes will be
> >> relocated
> >> > as
> >> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> >> > > >>    3. Hadoop modules, which needs protobuf as dependency, will
> add
> >> > this
> >> > > >> shaded artifact as dependency (ex:
> >> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> >> > > >>    4. All previous usages of "com.google.protobuf" will be
> >> relocated
> >> > to
> >> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code
> and
> >> > will
> >> > > be
> >> > > >> committed. Please note, this replacement is One-Time directly in
> >> > source
> >> > > >> code, NOT during compile and package.
> >> > > >>    5. Once all usages of "com.google.protobuf" is relocated, then
> >> > hadoop
> >> > > >> dont care about which version of original  "protobuf-java" is in
> >> > > >> dependency.
> >> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
> >> break
> >> > > the
> >> > > >> downstreams. But hadoop will be originally using the latest
> >> protobuf
> >> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> >> > > >>
> >> > > >>    7. Coming back to separate repo, Following are most
> appropriate
> >> > > reasons
> >> > > >> of keeping shaded dependency artifact in separate repo instead of
> >> > > >> submodule.
> >> > > >>
> >> > > >>      7a. These artifacts need not be built all the time. It needs
> >> to
> >> > be
> >> > > >> built only when there is a change in the dependency version or
> the
> >> > build
> >> > > >> process.
> >> > > >>      7b. If added as "submodule in Hadoop repo",
> >> > > maven-shade-plugin:shade
> >> > > >> will execute only in package phase. That means, "mvn compile" or
> >> "mvn
> >> > > >> test-compile" will not be failed as this artifact will not have
> >> > > relocated
> >> > > >> classes, instead it will have original classes, resulting in
> >> > compilation
> >> > > >> failure. Workaround, build thirdparty submodule first and exclude
> >> > > >> "thirdparty" submodule in other executions. This will be a
> complex
> >> > > process
> >> > > >> compared to keeping in a separate repo.
> >> > > >>
> >> > > >>      7c. Separate repo, will be a subproject of Hadoop, using the
> >> > same
> >> > > >> HADOOP jira project, with different versioning prefixed with
> >> > > "thirdparty-"
> >> > > >> (ex: thirdparty-1.0.0).
> >> > > >>      7d. Separate will have same release process as Hadoop.
> >> > > >>
> >> > > >>    HADOOP-13363 (
> >> https://issues.apache.org/jira/browse/HADOOP-13363)
> >> > > is
> >> > > >> an
> >> > > >> umbrella jira tracking the changes to protobuf upgrade.
> >> > > >>
> >> > > >>    PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
> >> been
> >> > > >> raised
> >> > > >> for separate repo creation in (HADOOP-16595 (
> >> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> >> > > >>
> >> > > >>    Please provide your inputs for the proposal and review the PR
> >> to
> >> > > >> proceed with the proposal.
> >> > > >>
> >> > > >>
> >> > > >    -Thanks,
> >> > > >>    Vinay
> >> > > >>
> >> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> >> > > >> vinodkv@apache.org>
> >> > > >> wrote:
> >> > > >>
> >> > > >> > Moving the thread to the dev lists.
> >> > > >> >
> >> > > >> > Thanks
> >> > > >> > +Vinod
> >> > > >> >
> >> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> >> > > vinayakumarb@apache.org>
> >> > > >> > wrote:
> >> > > >> > >
> >> > > >> > > Thanks Marton,
> >> > > >> > >
> >> > > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
> >> > > >> > > Whether to use that repo  for shaded artifact or not will be
> >> > > >> monitored in
> >> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> >> > discussion.
> >> > > >> > >
> >> > > >> > > There is no existing codebase is being moved out of hadoop
> >> repo.
> >> > So
> >> > > I
> >> > > >> > think
> >> > > >> > > right now we are good to go.
> >> > > >> > >
> >> > > >> > > -Vinay
> >> > > >> > >
> >> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
> elek@apache.org>
> >> > > wrote:
> >> > > >> > >
> >> > > >> > >>
> >> > > >> > >> I am not sure if it's defined when is a vote required.
> >> > > >> > >>
> >> > > >> > >> https://www.apache.org/foundation/voting.html
> >> > > >> > >>
> >> > > >> > >> Personally I think it's a big enough change to send a
> >> > notification
> >> > > to
> >> > > >> > the
> >> > > >> > >> dev lists with a 'lazy consensus'  closure
> >> > > >> > >>
> >> > > >> > >> Marton
> >> > > >> > >>
> >> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
> >> vinayakumarb@apache.org>
> >> > > >> wrote:
> >> > > >> > >>> Hi,
> >> > > >> > >>>
> >> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be
> >> more
> >> > in
> >> > > >> > >> future)
> >> > > >> > >>> will be kept as a shaded artifact in a separate repo, which
> >> will
> >> > > be
> >> > > >> > >>> referred as dependency in hadoop modules.  This approach
> >> avoids
> >> > > >> shading
> >> > > >> > >> of
> >> > > >> > >>> every submodule during build.
> >> > > >> > >>>
> >> > > >> > >>> So question is does any VOTE required before asking to
> >> create a
> >> > > git
> >> > > >> > repo?
> >> > > >> > >>>
> >> > > >> > >>> On selfserve platform
> >> > > https://gitbox.apache.org/setup/newrepo.html
> >> > > >> > >>> I can access see that, requester should be PMC.
> >> > > >> > >>>
> >> > > >> > >>> Wanted to confirm here first.
> >> > > >> > >>>
> >> > > >> > >>> -Vinay
> >> > > >> > >>>
> >> > > >> > >>
> >> > > >> > >>
> >> > >
> ---------------------------------------------------------------------
> >> > > >> > >> To unsubscribe, e-mail:
> private-unsubscribe@hadoop.apache.org
> >> > > >> > >> For additional commands, e-mail:
> >> private-help@hadoop.apache.org
> >> > > >> > >>
> >> > > >> > >>
> >> > > >> >
> >> > > >> >
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >

-- 



--Brahma Reddy Battula

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Ayush Saxena <ay...@gmail.com>.
Hey Sree

> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> Project ? Or as a TLP ?
> Or as a new project definition ?
>
A sub project of Apache Hadoop, having its own independent release cycles.
May be you can put this into the same column as ozone or as
submarine(couple of months ago).

Unifying for all, seems interesting but each project is independent and has
its own limitations and way of thinking, I don't think it would be an easy
task to bring all on the same table and get them agree to a common stuff.

I guess this has been into discussion since quite long, and there hasn't
been any other alternative suggested. Still we can hold up for a week, if
someone comes up with a better solution, else we can continue in the
present direction.

-Ayush



On Sun, 5 Jan 2020 at 05:03, Sree Vaddi <sr...@yahoo.com.invalid>
wrote:

> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> Project ? Or as a TLP ?
> Or as a new project definition ?
>
> The effort to streamline and put in an accepted standard for the
> dependencies that require shading,seems beyond the siloed efforts of
> hadoop, hbase, etc....
>
> I propose, we bring all the decision makers from all these artifacts in
> one room and decide best course of action.I am looking at, no projects
> should ever had to shade any artifacts except as an absolute necessary
> alternative.
>
>
> Thank you./Sree
>
>
>
>     On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
> vinayakumarb@apache.org> wrote:
>
>  Hi,
> Sorry for the late reply,.
> >>> To be exact, how can we better use the thirdparty repo? Looking at
> HBase as an example, it looks like everything that are known to break a lot
> after an update get shaded into the hbase-thirdparty artifact: guava,
> netty, ... etc.
> Is it the purpose to isolate these naughty dependencies?
> Yes, shading is to isolate these naughty dependencies from downstream
> classpath and have independent control on these upgrades without breaking
> downstreams.
>
> First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create the
> protobuf shaded jar is ready to merge.
>
> Please take a look if anyone interested, will be merged may be after two
> days if no objections.
>
> -Vinay
>
>
> On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org>
> wrote:
>
> > Hi I am late to this but I am keen to understand more.
> >
> > To be exact, how can we better use the thirdparty repo? Looking at HBase
> > as an example, it looks like everything that are known to break a lot
> after
> > an update get shaded into the hbase-thirdparty artifact: guava, netty,
> ...
> > etc.
> > Is it the purpose to isolate these naughty dependencies?
> >
> > On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vi...@apache.org>
> > wrote:
> >
> >> Hi All,
> >>
> >> I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
> >> 's suggestions.
> >>
> >>    i. Renamed the module to 'hadoop-shaded-protobuf37'
> >>    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
> >>
> >> Please review!!
> >>
> >> Thanks,
> >> -Vinay
> >>
> >>
> >> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <pa...@gmail.com>
> >> wrote:
> >>
> >> > For HBase we have a separated repo for hbase-thirdparty
> >> >
> >> > https://github.com/apache/hbase-thirdparty
> >> >
> >> > We will publish the artifacts to nexus so we do not need to include
> >> > binaries in our git repo, just add a dependency in the pom.
> >> >
> >> >
> >> >
> >>
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> >> >
> >> >
> >> > And it has its own release cycles, only when there are special
> >> requirements
> >> > or we want to upgrade some of the dependencies. This is the vote
> thread
> >> for
> >> > the newest release, where we want to provide a shaded gson for jdk7.
> >> >
> >> >
> >> >
> >>
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> >> >
> >> >
> >> > Thanks.
> >> >
> >> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
> >> >
> >> > > Please find replies inline.
> >> > >
> >> > > -Vinay
> >> > >
> >> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
> >> owen.omalley@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > I'm very unhappy with this direction. In particular, I don't think
> >> git
> >> > is
> >> > > > a good place for distribution of binary artifacts. Furthermore,
> the
> >> PMC
> >> > > > shouldn't be releasing anything without a release vote.
> >> > > >
> >> > > >
> >> > > Proposed solution doesnt release any binaries in git. Its actually a
> >> > > complete sub-project which follows entire release process, including
> >> VOTE
> >> > > in public. I have mentioned already that release process is similar
> to
> >> > > hadoop.
> >> > > To be specific, using the (almost) same script used in hadoop to
> >> generate
> >> > > artifacts, sign and deploy to staging repository. Please let me know
> >> If I
> >> > > am conveying anything wrong.
> >> > >
> >> > >
> >> > > > I'd propose that we make a third party module that contains the
> >> > *source*
> >> > > > of the pom files to build the relocated jars. This should
> >> absolutely be
> >> > > > treated as a last resort for the mostly Google projects that
> >> regularly
> >> > > > break binary compatibility (eg. Protobuf & Guava).
> >> > > >
> >> > > >
> >> > > Same has been implemented in the PR
> >> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check
> and
> >> let
> >> > > me
> >> > > know If I misunderstood. Yes, this is the last option we have AFAIK.
> >> > >
> >> > >
> >> > > > In terms of naming, I'd propose something like:
> >> > > >
> >> > > > org.apache.hadoop.thirdparty.protobuf2_5
> >> > > > org.apache.hadoop.thirdparty.guava28
> >> > > >
> >> > > > In particular, I think we absolutely need to include the version
> of
> >> the
> >> > > > underlying project. On the other hand, since we should not be
> >> shading
> >> > > > *everything* we can drop the leading com.google.
> >> > > >
> >> > > >
> >> > > IMO, This naming convention is easy for identifying the underlying
> >> > project,
> >> > > but  it will be difficult to maintain going forward if underlying
> >> project
> >> > > versions changes. Since thirdparty module have its own releases,
> each
> >> of
> >> > > those release can be mapped to specific version of underlying
> project.
> >> > Even
> >> > > the binary artifact can include a MANIFEST with underlying project
> >> > details
> >> > > as per Steve's suggestion on HADOOP-13363.
> >> > > That said, if you still prefer to have project number in artifact
> id,
> >> it
> >> > > can be done.
> >> > >
> >> > > The Hadoop project can make releases of  the thirdparty module:
> >> > > >
> >> > > > <dependency>
> >> > > >  <groupId>org.apache.hadoop</groupId>
> >> > > >  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> >> > > >  <version>1.0</version>
> >> > > > </dependency>
> >> > > >
> >> > > >
> >> > > Note that the version has to be the hadoop thirdparty release
> number,
> >> > which
> >> > > > is part of why you need to have the underlying version in the
> >> artifact
> >> > > > name. These we can push to maven central as new releases from
> >> Hadoop.
> >> > > >
> >> > > >
> >> > > Exactly, same has been implemented in the PR. hadoop-thirdparty
> module
> >> > have
> >> > > its own releases. But in HADOOP Jira, thirdparty versions can be
> >> > > differentiated using prefix "thirdparty-".
> >> > >
> >> > > Same solution is being followed in HBase. May be people involved in
> >> HBase
> >> > > can add some points here.
> >> > >
> >> > > Thoughts?
> >> > > >
> >> > > > .. Owen
> >> > > >
> >> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> >> vinayakumarb@apache.org
> >> > >
> >> > > > wrote:
> >> > > >
> >> > > >> Hi All,
> >> > > >>
> >> > > >>    I wanted to discuss about the separate repo for thirdparty
> >> > > dependencies
> >> > > >> which we need to shaded and include in Hadoop component's jars.
> >> > > >>
> >> > > >>    Apologies for the big text ahead, but this needs clear
> >> > explanation!!
> >> > > >>
> >> > > >>    Right now most needed such dependency is protobuf. Protobuf
> >> > > dependency
> >> > > >> was not upgraded from 2.5.0 onwards with the fear that downstream
> >> > > builds,
> >> > > >> which depends on transitive dependency protobuf coming from
> >> hadoop's
> >> > > jars,
> >> > > >> may fail with the upgrade. Apparently protobuf does not guarantee
> >> > source
> >> > > >> compatibility, though it guarantees wire compatibility between
> >> > versions.
> >> > > >> Because of this behavior, version upgrade may cause breakage in
> >> known
> >> > > and
> >> > > >> unknown (private?) downstreams.
> >> > > >>
> >> > > >>    So to tackle this, we came up the following proposal in
> >> > HADOOP-13363.
> >> > > >>
> >> > > >>    Luckily, As far as I know, no APIs, either public to user or
> >> > between
> >> > > >> Hadoop processes, is not directly using protobuf classes in
> >> > signatures.
> >> > > >> (If
> >> > > >> any exist, please let us know).
> >> > > >>
> >> > > >>    Proposal:
> >> > > >>    ------------
> >> > > >>
> >> > > >>    1. Create a artifact(s) which contains shaded dependencies.
> All
> >> > such
> >> > > >> shading/relocation will be with known prefix
> >> > > >> **org.apache.hadoop.thirdparty.**.
> >> > > >>    2. Right now protobuf jar (ex:
> >> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
> >> > > >> to start with, all **com.google.protobuf** classes will be
> >> relocated
> >> > as
> >> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> >> > > >>    3. Hadoop modules, which needs protobuf as dependency, will
> add
> >> > this
> >> > > >> shaded artifact as dependency (ex:
> >> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> >> > > >>    4. All previous usages of "com.google.protobuf" will be
> >> relocated
> >> > to
> >> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code
> and
> >> > will
> >> > > be
> >> > > >> committed. Please note, this replacement is One-Time directly in
> >> > source
> >> > > >> code, NOT during compile and package.
> >> > > >>    5. Once all usages of "com.google.protobuf" is relocated, then
> >> > hadoop
> >> > > >> dont care about which version of original  "protobuf-java" is in
> >> > > >> dependency.
> >> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
> >> break
> >> > > the
> >> > > >> downstreams. But hadoop will be originally using the latest
> >> protobuf
> >> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> >> > > >>
> >> > > >>    7. Coming back to separate repo, Following are most
> appropriate
> >> > > reasons
> >> > > >> of keeping shaded dependency artifact in separate repo instead of
> >> > > >> submodule.
> >> > > >>
> >> > > >>      7a. These artifacts need not be built all the time. It needs
> >> to
> >> > be
> >> > > >> built only when there is a change in the dependency version or
> the
> >> > build
> >> > > >> process.
> >> > > >>      7b. If added as "submodule in Hadoop repo",
> >> > > maven-shade-plugin:shade
> >> > > >> will execute only in package phase. That means, "mvn compile" or
> >> "mvn
> >> > > >> test-compile" will not be failed as this artifact will not have
> >> > > relocated
> >> > > >> classes, instead it will have original classes, resulting in
> >> > compilation
> >> > > >> failure. Workaround, build thirdparty submodule first and exclude
> >> > > >> "thirdparty" submodule in other executions. This will be a
> complex
> >> > > process
> >> > > >> compared to keeping in a separate repo.
> >> > > >>
> >> > > >>      7c. Separate repo, will be a subproject of Hadoop, using the
> >> > same
> >> > > >> HADOOP jira project, with different versioning prefixed with
> >> > > "thirdparty-"
> >> > > >> (ex: thirdparty-1.0.0).
> >> > > >>      7d. Separate will have same release process as Hadoop.
> >> > > >>
> >> > > >>    HADOOP-13363 (
> >> https://issues.apache.org/jira/browse/HADOOP-13363)
> >> > > is
> >> > > >> an
> >> > > >> umbrella jira tracking the changes to protobuf upgrade.
> >> > > >>
> >> > > >>    PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
> >> been
> >> > > >> raised
> >> > > >> for separate repo creation in (HADOOP-16595 (
> >> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> >> > > >>
> >> > > >>    Please provide your inputs for the proposal and review the PR
> >> to
> >> > > >> proceed with the proposal.
> >> > > >>
> >> > > >>
> >> > > >    -Thanks,
> >> > > >>    Vinay
> >> > > >>
> >> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> >> > > >> vinodkv@apache.org>
> >> > > >> wrote:
> >> > > >>
> >> > > >> > Moving the thread to the dev lists.
> >> > > >> >
> >> > > >> > Thanks
> >> > > >> > +Vinod
> >> > > >> >
> >> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> >> > > vinayakumarb@apache.org>
> >> > > >> > wrote:
> >> > > >> > >
> >> > > >> > > Thanks Marton,
> >> > > >> > >
> >> > > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
> >> > > >> > > Whether to use that repo  for shaded artifact or not will be
> >> > > >> monitored in
> >> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> >> > discussion.
> >> > > >> > >
> >> > > >> > > There is no existing codebase is being moved out of hadoop
> >> repo.
> >> > So
> >> > > I
> >> > > >> > think
> >> > > >> > > right now we are good to go.
> >> > > >> > >
> >> > > >> > > -Vinay
> >> > > >> > >
> >> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
> elek@apache.org>
> >> > > wrote:
> >> > > >> > >
> >> > > >> > >>
> >> > > >> > >> I am not sure if it's defined when is a vote required.
> >> > > >> > >>
> >> > > >> > >> https://www.apache.org/foundation/voting.html
> >> > > >> > >>
> >> > > >> > >> Personally I think it's a big enough change to send a
> >> > notification
> >> > > to
> >> > > >> > the
> >> > > >> > >> dev lists with a 'lazy consensus'  closure
> >> > > >> > >>
> >> > > >> > >> Marton
> >> > > >> > >>
> >> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
> >> vinayakumarb@apache.org>
> >> > > >> wrote:
> >> > > >> > >>> Hi,
> >> > > >> > >>>
> >> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be
> >> more
> >> > in
> >> > > >> > >> future)
> >> > > >> > >>> will be kept as a shaded artifact in a separate repo, which
> >> will
> >> > > be
> >> > > >> > >>> referred as dependency in hadoop modules.  This approach
> >> avoids
> >> > > >> shading
> >> > > >> > >> of
> >> > > >> > >>> every submodule during build.
> >> > > >> > >>>
> >> > > >> > >>> So question is does any VOTE required before asking to
> >> create a
> >> > > git
> >> > > >> > repo?
> >> > > >> > >>>
> >> > > >> > >>> On selfserve platform
> >> > > https://gitbox.apache.org/setup/newrepo.html
> >> > > >> > >>> I can access see that, requester should be PMC.
> >> > > >> > >>>
> >> > > >> > >>> Wanted to confirm here first.
> >> > > >> > >>>
> >> > > >> > >>> -Vinay
> >> > > >> > >>>
> >> > > >> > >>
> >> > > >> > >>
> >> > >
> ---------------------------------------------------------------------
> >> > > >> > >> To unsubscribe, e-mail:
> private-unsubscribe@hadoop.apache.org
> >> > > >> > >> For additional commands, e-mail:
> >> private-help@hadoop.apache.org
> >> > > >> > >>
> >> > > >> > >>
> >> > > >> >
> >> > > >> >
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Brahma Reddy Battula <br...@apache.org>.
 I just gone through previous discussions from jira (HADOOP-13363) and this
thread,As stack and Duo Zhang mentioned ,this artifact(instead of
thirdparty we can give shaded??) will be voted by PMC like below, won’t it
be fair??

https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E

one thought here:
May be we can unify ( we can incubation project for same ??) ? So, that all
projects can use same git repo for shaded artifacts??


Wanted to join for the discussion, so please let me know..


On Sun, 5 Jan 2020 at 7:33 AM, Sree Vaddi <sr...@yahoo.com.invalid>
wrote:

> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> Project ? Or as a TLP ?
> Or as a new project definition ?
>
> The effort to streamline and put in an accepted standard for the
> dependencies that require shading,seems beyond the siloed efforts of
> hadoop, hbase, etc....
>
> I propose, we bring all the decision makers from all these artifacts in
> one room and decide best course of action.I am looking at, no projects
> should ever had to shade any artifacts except as an absolute necessary
> alternative.
>
>
> Thank you./Sree
>
>
>
>     On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
> vinayakumarb@apache.org> wrote:
>
>  Hi,
> Sorry for the late reply,.
> >>> To be exact, how can we better use the thirdparty repo? Looking at
> HBase as an example, it looks like everything that are known to break a lot
> after an update get shaded into the hbase-thirdparty artifact: guava,
> netty, ... etc.
> Is it the purpose to isolate these naughty dependencies?
> Yes, shading is to isolate these naughty dependencies from downstream
> classpath and have independent control on these upgrades without breaking
> downstreams.
>
> First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create the
> protobuf shaded jar is ready to merge.
>
> Please take a look if anyone interested, will be merged may be after two
> days if no objections.
>
> -Vinay
>
>
> On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org>
> wrote:
>
> > Hi I am late to this but I am keen to understand more.
> >
> > To be exact, how can we better use the thirdparty repo? Looking at HBase
> > as an example, it looks like everything that are known to break a lot
> after
> > an update get shaded into the hbase-thirdparty artifact: guava, netty,
> ...
> > etc.
> > Is it the purpose to isolate these naughty dependencies?
> >
> > On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vi...@apache.org>
> > wrote:
> >
> >> Hi All,
> >>
> >> I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
> >> 's suggestions.
> >>
> >>    i. Renamed the module to 'hadoop-shaded-protobuf37'
> >>    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
> >>
> >> Please review!!
> >>
> >> Thanks,
> >> -Vinay
> >>
> >>
> >> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <pa...@gmail.com>
> >> wrote:
> >>
> >> > For HBase we have a separated repo for hbase-thirdparty
> >> >
> >> > https://github.com/apache/hbase-thirdparty
> >> >
> >> > We will publish the artifacts to nexus so we do not need to include
> >> > binaries in our git repo, just add a dependency in the pom.
> >> >
> >> >
> >> >
> >>
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> >> >
> >> >
> >> > And it has its own release cycles, only when there are special
> >> requirements
> >> > or we want to upgrade some of the dependencies. This is the vote
> thread
> >> for
> >> > the newest release, where we want to provide a shaded gson for jdk7.
> >> >
> >> >
> >> >
> >>
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> >> >
> >> >
> >> > Thanks.
> >> >
> >> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
> >> >
> >> > > Please find replies inline.
> >> > >
> >> > > -Vinay
> >> > >
> >> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
> >> owen.omalley@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > I'm very unhappy with this direction. In particular, I don't think
> >> git
> >> > is
> >> > > > a good place for distribution of binary artifacts. Furthermore,
> the
> >> PMC
> >> > > > shouldn't be releasing anything without a release vote.
> >> > > >
> >> > > >
> >> > > Proposed solution doesnt release any binaries in git. Its actually a
> >> > > complete sub-project which follows entire release process, including
> >> VOTE
> >> > > in public. I have mentioned already that release process is similar
> to
> >> > > hadoop.
> >> > > To be specific, using the (almost) same script used in hadoop to
> >> generate
> >> > > artifacts, sign and deploy to staging repository. Please let me know
> >> If I
> >> > > am conveying anything wrong.
> >> > >
> >> > >
> >> > > > I'd propose that we make a third party module that contains the
> >> > *source*
> >> > > > of the pom files to build the relocated jars. This should
> >> absolutely be
> >> > > > treated as a last resort for the mostly Google projects that
> >> regularly
> >> > > > break binary compatibility (eg. Protobuf & Guava).
> >> > > >
> >> > > >
> >> > > Same has been implemented in the PR
> >> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check
> and
> >> let
> >> > > me
> >> > > know If I misunderstood. Yes, this is the last option we have AFAIK.
> >> > >
> >> > >
> >> > > > In terms of naming, I'd propose something like:
> >> > > >
> >> > > > org.apache.hadoop.thirdparty.protobuf2_5
> >> > > > org.apache.hadoop.thirdparty.guava28
> >> > > >
> >> > > > In particular, I think we absolutely need to include the version
> of
> >> the
> >> > > > underlying project. On the other hand, since we should not be
> >> shading
> >> > > > *everything* we can drop the leading com.google.
> >> > > >
> >> > > >
> >> > > IMO, This naming convention is easy for identifying the underlying
> >> > project,
> >> > > but  it will be difficult to maintain going forward if underlying
> >> project
> >> > > versions changes. Since thirdparty module have its own releases,
> each
> >> of
> >> > > those release can be mapped to specific version of underlying
> project.
> >> > Even
> >> > > the binary artifact can include a MANIFEST with underlying project
> >> > details
> >> > > as per Steve's suggestion on HADOOP-13363.
> >> > > That said, if you still prefer to have project number in artifact
> id,
> >> it
> >> > > can be done.
> >> > >
> >> > > The Hadoop project can make releases of  the thirdparty module:
> >> > > >
> >> > > > <dependency>
> >> > > >  <groupId>org.apache.hadoop</groupId>
> >> > > >  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> >> > > >  <version>1.0</version>
> >> > > > </dependency>
> >> > > >
> >> > > >
> >> > > Note that the version has to be the hadoop thirdparty release
> number,
> >> > which
> >> > > > is part of why you need to have the underlying version in the
> >> artifact
> >> > > > name. These we can push to maven central as new releases from
> >> Hadoop.
> >> > > >
> >> > > >
> >> > > Exactly, same has been implemented in the PR. hadoop-thirdparty
> module
> >> > have
> >> > > its own releases. But in HADOOP Jira, thirdparty versions can be
> >> > > differentiated using prefix "thirdparty-".
> >> > >
> >> > > Same solution is being followed in HBase. May be people involved in
> >> HBase
> >> > > can add some points here.
> >> > >
> >> > > Thoughts?
> >> > > >
> >> > > > .. Owen
> >> > > >
> >> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> >> vinayakumarb@apache.org
> >> > >
> >> > > > wrote:
> >> > > >
> >> > > >> Hi All,
> >> > > >>
> >> > > >>    I wanted to discuss about the separate repo for thirdparty
> >> > > dependencies
> >> > > >> which we need to shaded and include in Hadoop component's jars.
> >> > > >>
> >> > > >>    Apologies for the big text ahead, but this needs clear
> >> > explanation!!
> >> > > >>
> >> > > >>    Right now most needed such dependency is protobuf. Protobuf
> >> > > dependency
> >> > > >> was not upgraded from 2.5.0 onwards with the fear that downstream
> >> > > builds,
> >> > > >> which depends on transitive dependency protobuf coming from
> >> hadoop's
> >> > > jars,
> >> > > >> may fail with the upgrade. Apparently protobuf does not guarantee
> >> > source
> >> > > >> compatibility, though it guarantees wire compatibility between
> >> > versions.
> >> > > >> Because of this behavior, version upgrade may cause breakage in
> >> known
> >> > > and
> >> > > >> unknown (private?) downstreams.
> >> > > >>
> >> > > >>    So to tackle this, we came up the following proposal in
> >> > HADOOP-13363.
> >> > > >>
> >> > > >>    Luckily, As far as I know, no APIs, either public to user or
> >> > between
> >> > > >> Hadoop processes, is not directly using protobuf classes in
> >> > signatures.
> >> > > >> (If
> >> > > >> any exist, please let us know).
> >> > > >>
> >> > > >>    Proposal:
> >> > > >>    ------------
> >> > > >>
> >> > > >>    1. Create a artifact(s) which contains shaded dependencies.
> All
> >> > such
> >> > > >> shading/relocation will be with known prefix
> >> > > >> **org.apache.hadoop.thirdparty.**.
> >> > > >>    2. Right now protobuf jar (ex:
> >> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
> >> > > >> to start with, all **com.google.protobuf** classes will be
> >> relocated
> >> > as
> >> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> >> > > >>    3. Hadoop modules, which needs protobuf as dependency, will
> add
> >> > this
> >> > > >> shaded artifact as dependency (ex:
> >> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> >> > > >>    4. All previous usages of "com.google.protobuf" will be
> >> relocated
> >> > to
> >> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code
> and
> >> > will
> >> > > be
> >> > > >> committed. Please note, this replacement is One-Time directly in
> >> > source
> >> > > >> code, NOT during compile and package.
> >> > > >>    5. Once all usages of "com.google.protobuf" is relocated, then
> >> > hadoop
> >> > > >> dont care about which version of original  "protobuf-java" is in
> >> > > >> dependency.
> >> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
> >> break
> >> > > the
> >> > > >> downstreams. But hadoop will be originally using the latest
> >> protobuf
> >> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> >> > > >>
> >> > > >>    7. Coming back to separate repo, Following are most
> appropriate
> >> > > reasons
> >> > > >> of keeping shaded dependency artifact in separate repo instead of
> >> > > >> submodule.
> >> > > >>
> >> > > >>      7a. These artifacts need not be built all the time. It needs
> >> to
> >> > be
> >> > > >> built only when there is a change in the dependency version or
> the
> >> > build
> >> > > >> process.
> >> > > >>      7b. If added as "submodule in Hadoop repo",
> >> > > maven-shade-plugin:shade
> >> > > >> will execute only in package phase. That means, "mvn compile" or
> >> "mvn
> >> > > >> test-compile" will not be failed as this artifact will not have
> >> > > relocated
> >> > > >> classes, instead it will have original classes, resulting in
> >> > compilation
> >> > > >> failure. Workaround, build thirdparty submodule first and exclude
> >> > > >> "thirdparty" submodule in other executions. This will be a
> complex
> >> > > process
> >> > > >> compared to keeping in a separate repo.
> >> > > >>
> >> > > >>      7c. Separate repo, will be a subproject of Hadoop, using the
> >> > same
> >> > > >> HADOOP jira project, with different versioning prefixed with
> >> > > "thirdparty-"
> >> > > >> (ex: thirdparty-1.0.0).
> >> > > >>      7d. Separate will have same release process as Hadoop.
> >> > > >>
> >> > > >>    HADOOP-13363 (
> >> https://issues.apache.org/jira/browse/HADOOP-13363)
> >> > > is
> >> > > >> an
> >> > > >> umbrella jira tracking the changes to protobuf upgrade.
> >> > > >>
> >> > > >>    PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
> >> been
> >> > > >> raised
> >> > > >> for separate repo creation in (HADOOP-16595 (
> >> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> >> > > >>
> >> > > >>    Please provide your inputs for the proposal and review the PR
> >> to
> >> > > >> proceed with the proposal.
> >> > > >>
> >> > > >>
> >> > > >    -Thanks,
> >> > > >>    Vinay
> >> > > >>
> >> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> >> > > >> vinodkv@apache.org>
> >> > > >> wrote:
> >> > > >>
> >> > > >> > Moving the thread to the dev lists.
> >> > > >> >
> >> > > >> > Thanks
> >> > > >> > +Vinod
> >> > > >> >
> >> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> >> > > vinayakumarb@apache.org>
> >> > > >> > wrote:
> >> > > >> > >
> >> > > >> > > Thanks Marton,
> >> > > >> > >
> >> > > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
> >> > > >> > > Whether to use that repo  for shaded artifact or not will be
> >> > > >> monitored in
> >> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> >> > discussion.
> >> > > >> > >
> >> > > >> > > There is no existing codebase is being moved out of hadoop
> >> repo.
> >> > So
> >> > > I
> >> > > >> > think
> >> > > >> > > right now we are good to go.
> >> > > >> > >
> >> > > >> > > -Vinay
> >> > > >> > >
> >> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
> elek@apache.org>
> >> > > wrote:
> >> > > >> > >
> >> > > >> > >>
> >> > > >> > >> I am not sure if it's defined when is a vote required.
> >> > > >> > >>
> >> > > >> > >> https://www.apache.org/foundation/voting.html
> >> > > >> > >>
> >> > > >> > >> Personally I think it's a big enough change to send a
> >> > notification
> >> > > to
> >> > > >> > the
> >> > > >> > >> dev lists with a 'lazy consensus'  closure
> >> > > >> > >>
> >> > > >> > >> Marton
> >> > > >> > >>
> >> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
> >> vinayakumarb@apache.org>
> >> > > >> wrote:
> >> > > >> > >>> Hi,
> >> > > >> > >>>
> >> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be
> >> more
> >> > in
> >> > > >> > >> future)
> >> > > >> > >>> will be kept as a shaded artifact in a separate repo, which
> >> will
> >> > > be
> >> > > >> > >>> referred as dependency in hadoop modules.  This approach
> >> avoids
> >> > > >> shading
> >> > > >> > >> of
> >> > > >> > >>> every submodule during build.
> >> > > >> > >>>
> >> > > >> > >>> So question is does any VOTE required before asking to
> >> create a
> >> > > git
> >> > > >> > repo?
> >> > > >> > >>>
> >> > > >> > >>> On selfserve platform
> >> > > https://gitbox.apache.org/setup/newrepo.html
> >> > > >> > >>> I can access see that, requester should be PMC.
> >> > > >> > >>>
> >> > > >> > >>> Wanted to confirm here first.
> >> > > >> > >>>
> >> > > >> > >>> -Vinay
> >> > > >> > >>>
> >> > > >> > >>
> >> > > >> > >>
> >> > >
> ---------------------------------------------------------------------
> >> > > >> > >> To unsubscribe, e-mail:
> private-unsubscribe@hadoop.apache.org
> >> > > >> > >> For additional commands, e-mail:
> >> private-help@hadoop.apache.org
> >> > > >> > >>
> >> > > >> > >>
> >> > > >> >
> >> > > >> >
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >

-- 



--Brahma Reddy Battula

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Ayush Saxena <ay...@gmail.com>.
Hey Sree

> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> Project ? Or as a TLP ?
> Or as a new project definition ?
>
A sub project of Apache Hadoop, having its own independent release cycles.
May be you can put this into the same column as ozone or as
submarine(couple of months ago).

Unifying for all, seems interesting but each project is independent and has
its own limitations and way of thinking, I don't think it would be an easy
task to bring all on the same table and get them agree to a common stuff.

I guess this has been into discussion since quite long, and there hasn't
been any other alternative suggested. Still we can hold up for a week, if
someone comes up with a better solution, else we can continue in the
present direction.

-Ayush



On Sun, 5 Jan 2020 at 05:03, Sree Vaddi <sr...@yahoo.com.invalid>
wrote:

> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> Project ? Or as a TLP ?
> Or as a new project definition ?
>
> The effort to streamline and put in an accepted standard for the
> dependencies that require shading,seems beyond the siloed efforts of
> hadoop, hbase, etc....
>
> I propose, we bring all the decision makers from all these artifacts in
> one room and decide best course of action.I am looking at, no projects
> should ever had to shade any artifacts except as an absolute necessary
> alternative.
>
>
> Thank you./Sree
>
>
>
>     On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
> vinayakumarb@apache.org> wrote:
>
>  Hi,
> Sorry for the late reply,.
> >>> To be exact, how can we better use the thirdparty repo? Looking at
> HBase as an example, it looks like everything that are known to break a lot
> after an update get shaded into the hbase-thirdparty artifact: guava,
> netty, ... etc.
> Is it the purpose to isolate these naughty dependencies?
> Yes, shading is to isolate these naughty dependencies from downstream
> classpath and have independent control on these upgrades without breaking
> downstreams.
>
> First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create the
> protobuf shaded jar is ready to merge.
>
> Please take a look if anyone interested, will be merged may be after two
> days if no objections.
>
> -Vinay
>
>
> On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org>
> wrote:
>
> > Hi I am late to this but I am keen to understand more.
> >
> > To be exact, how can we better use the thirdparty repo? Looking at HBase
> > as an example, it looks like everything that are known to break a lot
> after
> > an update get shaded into the hbase-thirdparty artifact: guava, netty,
> ...
> > etc.
> > Is it the purpose to isolate these naughty dependencies?
> >
> > On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vi...@apache.org>
> > wrote:
> >
> >> Hi All,
> >>
> >> I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
> >> 's suggestions.
> >>
> >>    i. Renamed the module to 'hadoop-shaded-protobuf37'
> >>    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
> >>
> >> Please review!!
> >>
> >> Thanks,
> >> -Vinay
> >>
> >>
> >> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <pa...@gmail.com>
> >> wrote:
> >>
> >> > For HBase we have a separated repo for hbase-thirdparty
> >> >
> >> > https://github.com/apache/hbase-thirdparty
> >> >
> >> > We will publish the artifacts to nexus so we do not need to include
> >> > binaries in our git repo, just add a dependency in the pom.
> >> >
> >> >
> >> >
> >>
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> >> >
> >> >
> >> > And it has its own release cycles, only when there are special
> >> requirements
> >> > or we want to upgrade some of the dependencies. This is the vote
> thread
> >> for
> >> > the newest release, where we want to provide a shaded gson for jdk7.
> >> >
> >> >
> >> >
> >>
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> >> >
> >> >
> >> > Thanks.
> >> >
> >> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
> >> >
> >> > > Please find replies inline.
> >> > >
> >> > > -Vinay
> >> > >
> >> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
> >> owen.omalley@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > I'm very unhappy with this direction. In particular, I don't think
> >> git
> >> > is
> >> > > > a good place for distribution of binary artifacts. Furthermore,
> the
> >> PMC
> >> > > > shouldn't be releasing anything without a release vote.
> >> > > >
> >> > > >
> >> > > Proposed solution doesnt release any binaries in git. Its actually a
> >> > > complete sub-project which follows entire release process, including
> >> VOTE
> >> > > in public. I have mentioned already that release process is similar
> to
> >> > > hadoop.
> >> > > To be specific, using the (almost) same script used in hadoop to
> >> generate
> >> > > artifacts, sign and deploy to staging repository. Please let me know
> >> If I
> >> > > am conveying anything wrong.
> >> > >
> >> > >
> >> > > > I'd propose that we make a third party module that contains the
> >> > *source*
> >> > > > of the pom files to build the relocated jars. This should
> >> absolutely be
> >> > > > treated as a last resort for the mostly Google projects that
> >> regularly
> >> > > > break binary compatibility (eg. Protobuf & Guava).
> >> > > >
> >> > > >
> >> > > Same has been implemented in the PR
> >> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check
> and
> >> let
> >> > > me
> >> > > know If I misunderstood. Yes, this is the last option we have AFAIK.
> >> > >
> >> > >
> >> > > > In terms of naming, I'd propose something like:
> >> > > >
> >> > > > org.apache.hadoop.thirdparty.protobuf2_5
> >> > > > org.apache.hadoop.thirdparty.guava28
> >> > > >
> >> > > > In particular, I think we absolutely need to include the version
> of
> >> the
> >> > > > underlying project. On the other hand, since we should not be
> >> shading
> >> > > > *everything* we can drop the leading com.google.
> >> > > >
> >> > > >
> >> > > IMO, This naming convention is easy for identifying the underlying
> >> > project,
> >> > > but  it will be difficult to maintain going forward if underlying
> >> project
> >> > > versions changes. Since thirdparty module have its own releases,
> each
> >> of
> >> > > those release can be mapped to specific version of underlying
> project.
> >> > Even
> >> > > the binary artifact can include a MANIFEST with underlying project
> >> > details
> >> > > as per Steve's suggestion on HADOOP-13363.
> >> > > That said, if you still prefer to have project number in artifact
> id,
> >> it
> >> > > can be done.
> >> > >
> >> > > The Hadoop project can make releases of  the thirdparty module:
> >> > > >
> >> > > > <dependency>
> >> > > >  <groupId>org.apache.hadoop</groupId>
> >> > > >  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> >> > > >  <version>1.0</version>
> >> > > > </dependency>
> >> > > >
> >> > > >
> >> > > Note that the version has to be the hadoop thirdparty release
> number,
> >> > which
> >> > > > is part of why you need to have the underlying version in the
> >> artifact
> >> > > > name. These we can push to maven central as new releases from
> >> Hadoop.
> >> > > >
> >> > > >
> >> > > Exactly, same has been implemented in the PR. hadoop-thirdparty
> module
> >> > have
> >> > > its own releases. But in HADOOP Jira, thirdparty versions can be
> >> > > differentiated using prefix "thirdparty-".
> >> > >
> >> > > Same solution is being followed in HBase. May be people involved in
> >> HBase
> >> > > can add some points here.
> >> > >
> >> > > Thoughts?
> >> > > >
> >> > > > .. Owen
> >> > > >
> >> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> >> vinayakumarb@apache.org
> >> > >
> >> > > > wrote:
> >> > > >
> >> > > >> Hi All,
> >> > > >>
> >> > > >>    I wanted to discuss about the separate repo for thirdparty
> >> > > dependencies
> >> > > >> which we need to shaded and include in Hadoop component's jars.
> >> > > >>
> >> > > >>    Apologies for the big text ahead, but this needs clear
> >> > explanation!!
> >> > > >>
> >> > > >>    Right now most needed such dependency is protobuf. Protobuf
> >> > > dependency
> >> > > >> was not upgraded from 2.5.0 onwards with the fear that downstream
> >> > > builds,
> >> > > >> which depends on transitive dependency protobuf coming from
> >> hadoop's
> >> > > jars,
> >> > > >> may fail with the upgrade. Apparently protobuf does not guarantee
> >> > source
> >> > > >> compatibility, though it guarantees wire compatibility between
> >> > versions.
> >> > > >> Because of this behavior, version upgrade may cause breakage in
> >> known
> >> > > and
> >> > > >> unknown (private?) downstreams.
> >> > > >>
> >> > > >>    So to tackle this, we came up the following proposal in
> >> > HADOOP-13363.
> >> > > >>
> >> > > >>    Luckily, As far as I know, no APIs, either public to user or
> >> > between
> >> > > >> Hadoop processes, is not directly using protobuf classes in
> >> > signatures.
> >> > > >> (If
> >> > > >> any exist, please let us know).
> >> > > >>
> >> > > >>    Proposal:
> >> > > >>    ------------
> >> > > >>
> >> > > >>    1. Create a artifact(s) which contains shaded dependencies.
> All
> >> > such
> >> > > >> shading/relocation will be with known prefix
> >> > > >> **org.apache.hadoop.thirdparty.**.
> >> > > >>    2. Right now protobuf jar (ex:
> >> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
> >> > > >> to start with, all **com.google.protobuf** classes will be
> >> relocated
> >> > as
> >> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> >> > > >>    3. Hadoop modules, which needs protobuf as dependency, will
> add
> >> > this
> >> > > >> shaded artifact as dependency (ex:
> >> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> >> > > >>    4. All previous usages of "com.google.protobuf" will be
> >> relocated
> >> > to
> >> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code
> and
> >> > will
> >> > > be
> >> > > >> committed. Please note, this replacement is One-Time directly in
> >> > source
> >> > > >> code, NOT during compile and package.
> >> > > >>    5. Once all usages of "com.google.protobuf" is relocated, then
> >> > hadoop
> >> > > >> dont care about which version of original  "protobuf-java" is in
> >> > > >> dependency.
> >> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
> >> break
> >> > > the
> >> > > >> downstreams. But hadoop will be originally using the latest
> >> protobuf
> >> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> >> > > >>
> >> > > >>    7. Coming back to separate repo, Following are most
> appropriate
> >> > > reasons
> >> > > >> of keeping shaded dependency artifact in separate repo instead of
> >> > > >> submodule.
> >> > > >>
> >> > > >>      7a. These artifacts need not be built all the time. It needs
> >> to
> >> > be
> >> > > >> built only when there is a change in the dependency version or
> the
> >> > build
> >> > > >> process.
> >> > > >>      7b. If added as "submodule in Hadoop repo",
> >> > > maven-shade-plugin:shade
> >> > > >> will execute only in package phase. That means, "mvn compile" or
> >> "mvn
> >> > > >> test-compile" will not be failed as this artifact will not have
> >> > > relocated
> >> > > >> classes, instead it will have original classes, resulting in
> >> > compilation
> >> > > >> failure. Workaround, build thirdparty submodule first and exclude
> >> > > >> "thirdparty" submodule in other executions. This will be a
> complex
> >> > > process
> >> > > >> compared to keeping in a separate repo.
> >> > > >>
> >> > > >>      7c. Separate repo, will be a subproject of Hadoop, using the
> >> > same
> >> > > >> HADOOP jira project, with different versioning prefixed with
> >> > > "thirdparty-"
> >> > > >> (ex: thirdparty-1.0.0).
> >> > > >>      7d. Separate will have same release process as Hadoop.
> >> > > >>
> >> > > >>    HADOOP-13363 (
> >> https://issues.apache.org/jira/browse/HADOOP-13363)
> >> > > is
> >> > > >> an
> >> > > >> umbrella jira tracking the changes to protobuf upgrade.
> >> > > >>
> >> > > >>    PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
> >> been
> >> > > >> raised
> >> > > >> for separate repo creation in (HADOOP-16595 (
> >> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> >> > > >>
> >> > > >>    Please provide your inputs for the proposal and review the PR
> >> to
> >> > > >> proceed with the proposal.
> >> > > >>
> >> > > >>
> >> > > >    -Thanks,
> >> > > >>    Vinay
> >> > > >>
> >> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> >> > > >> vinodkv@apache.org>
> >> > > >> wrote:
> >> > > >>
> >> > > >> > Moving the thread to the dev lists.
> >> > > >> >
> >> > > >> > Thanks
> >> > > >> > +Vinod
> >> > > >> >
> >> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> >> > > vinayakumarb@apache.org>
> >> > > >> > wrote:
> >> > > >> > >
> >> > > >> > > Thanks Marton,
> >> > > >> > >
> >> > > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
> >> > > >> > > Whether to use that repo  for shaded artifact or not will be
> >> > > >> monitored in
> >> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> >> > discussion.
> >> > > >> > >
> >> > > >> > > There is no existing codebase is being moved out of hadoop
> >> repo.
> >> > So
> >> > > I
> >> > > >> > think
> >> > > >> > > right now we are good to go.
> >> > > >> > >
> >> > > >> > > -Vinay
> >> > > >> > >
> >> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
> elek@apache.org>
> >> > > wrote:
> >> > > >> > >
> >> > > >> > >>
> >> > > >> > >> I am not sure if it's defined when is a vote required.
> >> > > >> > >>
> >> > > >> > >> https://www.apache.org/foundation/voting.html
> >> > > >> > >>
> >> > > >> > >> Personally I think it's a big enough change to send a
> >> > notification
> >> > > to
> >> > > >> > the
> >> > > >> > >> dev lists with a 'lazy consensus'  closure
> >> > > >> > >>
> >> > > >> > >> Marton
> >> > > >> > >>
> >> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
> >> vinayakumarb@apache.org>
> >> > > >> wrote:
> >> > > >> > >>> Hi,
> >> > > >> > >>>
> >> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be
> >> more
> >> > in
> >> > > >> > >> future)
> >> > > >> > >>> will be kept as a shaded artifact in a separate repo, which
> >> will
> >> > > be
> >> > > >> > >>> referred as dependency in hadoop modules.  This approach
> >> avoids
> >> > > >> shading
> >> > > >> > >> of
> >> > > >> > >>> every submodule during build.
> >> > > >> > >>>
> >> > > >> > >>> So question is does any VOTE required before asking to
> >> create a
> >> > > git
> >> > > >> > repo?
> >> > > >> > >>>
> >> > > >> > >>> On selfserve platform
> >> > > https://gitbox.apache.org/setup/newrepo.html
> >> > > >> > >>> I can access see that, requester should be PMC.
> >> > > >> > >>>
> >> > > >> > >>> Wanted to confirm here first.
> >> > > >> > >>>
> >> > > >> > >>> -Vinay
> >> > > >> > >>>
> >> > > >> > >>
> >> > > >> > >>
> >> > >
> ---------------------------------------------------------------------
> >> > > >> > >> To unsubscribe, e-mail:
> private-unsubscribe@hadoop.apache.org
> >> > > >> > >> For additional commands, e-mail:
> >> private-help@hadoop.apache.org
> >> > > >> > >>
> >> > > >> > >>
> >> > > >> >
> >> > > >> >
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Brahma Reddy Battula <br...@apache.org>.
 I just gone through previous discussions from jira (HADOOP-13363) and this
thread,As stack and Duo Zhang mentioned ,this artifact(instead of
thirdparty we can give shaded??) will be voted by PMC like below, won’t it
be fair??

https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E

one thought here:
May be we can unify ( we can incubation project for same ??) ? So, that all
projects can use same git repo for shaded artifacts??


Wanted to join for the discussion, so please let me know..


On Sun, 5 Jan 2020 at 7:33 AM, Sree Vaddi <sr...@yahoo.com.invalid>
wrote:

> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> Project ? Or as a TLP ?
> Or as a new project definition ?
>
> The effort to streamline and put in an accepted standard for the
> dependencies that require shading,seems beyond the siloed efforts of
> hadoop, hbase, etc....
>
> I propose, we bring all the decision makers from all these artifacts in
> one room and decide best course of action.I am looking at, no projects
> should ever had to shade any artifacts except as an absolute necessary
> alternative.
>
>
> Thank you./Sree
>
>
>
>     On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
> vinayakumarb@apache.org> wrote:
>
>  Hi,
> Sorry for the late reply,.
> >>> To be exact, how can we better use the thirdparty repo? Looking at
> HBase as an example, it looks like everything that are known to break a lot
> after an update get shaded into the hbase-thirdparty artifact: guava,
> netty, ... etc.
> Is it the purpose to isolate these naughty dependencies?
> Yes, shading is to isolate these naughty dependencies from downstream
> classpath and have independent control on these upgrades without breaking
> downstreams.
>
> First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create the
> protobuf shaded jar is ready to merge.
>
> Please take a look if anyone interested, will be merged may be after two
> days if no objections.
>
> -Vinay
>
>
> On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org>
> wrote:
>
> > Hi I am late to this but I am keen to understand more.
> >
> > To be exact, how can we better use the thirdparty repo? Looking at HBase
> > as an example, it looks like everything that are known to break a lot
> after
> > an update get shaded into the hbase-thirdparty artifact: guava, netty,
> ...
> > etc.
> > Is it the purpose to isolate these naughty dependencies?
> >
> > On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vi...@apache.org>
> > wrote:
> >
> >> Hi All,
> >>
> >> I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
> >> 's suggestions.
> >>
> >>    i. Renamed the module to 'hadoop-shaded-protobuf37'
> >>    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
> >>
> >> Please review!!
> >>
> >> Thanks,
> >> -Vinay
> >>
> >>
> >> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <pa...@gmail.com>
> >> wrote:
> >>
> >> > For HBase we have a separated repo for hbase-thirdparty
> >> >
> >> > https://github.com/apache/hbase-thirdparty
> >> >
> >> > We will publish the artifacts to nexus so we do not need to include
> >> > binaries in our git repo, just add a dependency in the pom.
> >> >
> >> >
> >> >
> >>
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> >> >
> >> >
> >> > And it has its own release cycles, only when there are special
> >> requirements
> >> > or we want to upgrade some of the dependencies. This is the vote
> thread
> >> for
> >> > the newest release, where we want to provide a shaded gson for jdk7.
> >> >
> >> >
> >> >
> >>
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> >> >
> >> >
> >> > Thanks.
> >> >
> >> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
> >> >
> >> > > Please find replies inline.
> >> > >
> >> > > -Vinay
> >> > >
> >> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
> >> owen.omalley@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > I'm very unhappy with this direction. In particular, I don't think
> >> git
> >> > is
> >> > > > a good place for distribution of binary artifacts. Furthermore,
> the
> >> PMC
> >> > > > shouldn't be releasing anything without a release vote.
> >> > > >
> >> > > >
> >> > > Proposed solution doesnt release any binaries in git. Its actually a
> >> > > complete sub-project which follows entire release process, including
> >> VOTE
> >> > > in public. I have mentioned already that release process is similar
> to
> >> > > hadoop.
> >> > > To be specific, using the (almost) same script used in hadoop to
> >> generate
> >> > > artifacts, sign and deploy to staging repository. Please let me know
> >> If I
> >> > > am conveying anything wrong.
> >> > >
> >> > >
> >> > > > I'd propose that we make a third party module that contains the
> >> > *source*
> >> > > > of the pom files to build the relocated jars. This should
> >> absolutely be
> >> > > > treated as a last resort for the mostly Google projects that
> >> regularly
> >> > > > break binary compatibility (eg. Protobuf & Guava).
> >> > > >
> >> > > >
> >> > > Same has been implemented in the PR
> >> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check
> and
> >> let
> >> > > me
> >> > > know If I misunderstood. Yes, this is the last option we have AFAIK.
> >> > >
> >> > >
> >> > > > In terms of naming, I'd propose something like:
> >> > > >
> >> > > > org.apache.hadoop.thirdparty.protobuf2_5
> >> > > > org.apache.hadoop.thirdparty.guava28
> >> > > >
> >> > > > In particular, I think we absolutely need to include the version
> of
> >> the
> >> > > > underlying project. On the other hand, since we should not be
> >> shading
> >> > > > *everything* we can drop the leading com.google.
> >> > > >
> >> > > >
> >> > > IMO, This naming convention is easy for identifying the underlying
> >> > project,
> >> > > but  it will be difficult to maintain going forward if underlying
> >> project
> >> > > versions changes. Since thirdparty module have its own releases,
> each
> >> of
> >> > > those release can be mapped to specific version of underlying
> project.
> >> > Even
> >> > > the binary artifact can include a MANIFEST with underlying project
> >> > details
> >> > > as per Steve's suggestion on HADOOP-13363.
> >> > > That said, if you still prefer to have project number in artifact
> id,
> >> it
> >> > > can be done.
> >> > >
> >> > > The Hadoop project can make releases of  the thirdparty module:
> >> > > >
> >> > > > <dependency>
> >> > > >  <groupId>org.apache.hadoop</groupId>
> >> > > >  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> >> > > >  <version>1.0</version>
> >> > > > </dependency>
> >> > > >
> >> > > >
> >> > > Note that the version has to be the hadoop thirdparty release
> number,
> >> > which
> >> > > > is part of why you need to have the underlying version in the
> >> artifact
> >> > > > name. These we can push to maven central as new releases from
> >> Hadoop.
> >> > > >
> >> > > >
> >> > > Exactly, same has been implemented in the PR. hadoop-thirdparty
> module
> >> > have
> >> > > its own releases. But in HADOOP Jira, thirdparty versions can be
> >> > > differentiated using prefix "thirdparty-".
> >> > >
> >> > > Same solution is being followed in HBase. May be people involved in
> >> HBase
> >> > > can add some points here.
> >> > >
> >> > > Thoughts?
> >> > > >
> >> > > > .. Owen
> >> > > >
> >> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> >> vinayakumarb@apache.org
> >> > >
> >> > > > wrote:
> >> > > >
> >> > > >> Hi All,
> >> > > >>
> >> > > >>    I wanted to discuss about the separate repo for thirdparty
> >> > > dependencies
> >> > > >> which we need to shaded and include in Hadoop component's jars.
> >> > > >>
> >> > > >>    Apologies for the big text ahead, but this needs clear
> >> > explanation!!
> >> > > >>
> >> > > >>    Right now most needed such dependency is protobuf. Protobuf
> >> > > dependency
> >> > > >> was not upgraded from 2.5.0 onwards with the fear that downstream
> >> > > builds,
> >> > > >> which depends on transitive dependency protobuf coming from
> >> hadoop's
> >> > > jars,
> >> > > >> may fail with the upgrade. Apparently protobuf does not guarantee
> >> > source
> >> > > >> compatibility, though it guarantees wire compatibility between
> >> > versions.
> >> > > >> Because of this behavior, version upgrade may cause breakage in
> >> known
> >> > > and
> >> > > >> unknown (private?) downstreams.
> >> > > >>
> >> > > >>    So to tackle this, we came up the following proposal in
> >> > HADOOP-13363.
> >> > > >>
> >> > > >>    Luckily, As far as I know, no APIs, either public to user or
> >> > between
> >> > > >> Hadoop processes, is not directly using protobuf classes in
> >> > signatures.
> >> > > >> (If
> >> > > >> any exist, please let us know).
> >> > > >>
> >> > > >>    Proposal:
> >> > > >>    ------------
> >> > > >>
> >> > > >>    1. Create a artifact(s) which contains shaded dependencies.
> All
> >> > such
> >> > > >> shading/relocation will be with known prefix
> >> > > >> **org.apache.hadoop.thirdparty.**.
> >> > > >>    2. Right now protobuf jar (ex:
> >> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
> >> > > >> to start with, all **com.google.protobuf** classes will be
> >> relocated
> >> > as
> >> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> >> > > >>    3. Hadoop modules, which needs protobuf as dependency, will
> add
> >> > this
> >> > > >> shaded artifact as dependency (ex:
> >> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> >> > > >>    4. All previous usages of "com.google.protobuf" will be
> >> relocated
> >> > to
> >> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code
> and
> >> > will
> >> > > be
> >> > > >> committed. Please note, this replacement is One-Time directly in
> >> > source
> >> > > >> code, NOT during compile and package.
> >> > > >>    5. Once all usages of "com.google.protobuf" is relocated, then
> >> > hadoop
> >> > > >> dont care about which version of original  "protobuf-java" is in
> >> > > >> dependency.
> >> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
> >> break
> >> > > the
> >> > > >> downstreams. But hadoop will be originally using the latest
> >> protobuf
> >> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> >> > > >>
> >> > > >>    7. Coming back to separate repo, Following are most
> appropriate
> >> > > reasons
> >> > > >> of keeping shaded dependency artifact in separate repo instead of
> >> > > >> submodule.
> >> > > >>
> >> > > >>      7a. These artifacts need not be built all the time. It needs
> >> to
> >> > be
> >> > > >> built only when there is a change in the dependency version or
> the
> >> > build
> >> > > >> process.
> >> > > >>      7b. If added as "submodule in Hadoop repo",
> >> > > maven-shade-plugin:shade
> >> > > >> will execute only in package phase. That means, "mvn compile" or
> >> "mvn
> >> > > >> test-compile" will not be failed as this artifact will not have
> >> > > relocated
> >> > > >> classes, instead it will have original classes, resulting in
> >> > compilation
> >> > > >> failure. Workaround, build thirdparty submodule first and exclude
> >> > > >> "thirdparty" submodule in other executions. This will be a
> complex
> >> > > process
> >> > > >> compared to keeping in a separate repo.
> >> > > >>
> >> > > >>      7c. Separate repo, will be a subproject of Hadoop, using the
> >> > same
> >> > > >> HADOOP jira project, with different versioning prefixed with
> >> > > "thirdparty-"
> >> > > >> (ex: thirdparty-1.0.0).
> >> > > >>      7d. Separate will have same release process as Hadoop.
> >> > > >>
> >> > > >>    HADOOP-13363 (
> >> https://issues.apache.org/jira/browse/HADOOP-13363)
> >> > > is
> >> > > >> an
> >> > > >> umbrella jira tracking the changes to protobuf upgrade.
> >> > > >>
> >> > > >>    PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
> >> been
> >> > > >> raised
> >> > > >> for separate repo creation in (HADOOP-16595 (
> >> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> >> > > >>
> >> > > >>    Please provide your inputs for the proposal and review the PR
> >> to
> >> > > >> proceed with the proposal.
> >> > > >>
> >> > > >>
> >> > > >    -Thanks,
> >> > > >>    Vinay
> >> > > >>
> >> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> >> > > >> vinodkv@apache.org>
> >> > > >> wrote:
> >> > > >>
> >> > > >> > Moving the thread to the dev lists.
> >> > > >> >
> >> > > >> > Thanks
> >> > > >> > +Vinod
> >> > > >> >
> >> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> >> > > vinayakumarb@apache.org>
> >> > > >> > wrote:
> >> > > >> > >
> >> > > >> > > Thanks Marton,
> >> > > >> > >
> >> > > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
> >> > > >> > > Whether to use that repo  for shaded artifact or not will be
> >> > > >> monitored in
> >> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> >> > discussion.
> >> > > >> > >
> >> > > >> > > There is no existing codebase is being moved out of hadoop
> >> repo.
> >> > So
> >> > > I
> >> > > >> > think
> >> > > >> > > right now we are good to go.
> >> > > >> > >
> >> > > >> > > -Vinay
> >> > > >> > >
> >> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
> elek@apache.org>
> >> > > wrote:
> >> > > >> > >
> >> > > >> > >>
> >> > > >> > >> I am not sure if it's defined when is a vote required.
> >> > > >> > >>
> >> > > >> > >> https://www.apache.org/foundation/voting.html
> >> > > >> > >>
> >> > > >> > >> Personally I think it's a big enough change to send a
> >> > notification
> >> > > to
> >> > > >> > the
> >> > > >> > >> dev lists with a 'lazy consensus'  closure
> >> > > >> > >>
> >> > > >> > >> Marton
> >> > > >> > >>
> >> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
> >> vinayakumarb@apache.org>
> >> > > >> wrote:
> >> > > >> > >>> Hi,
> >> > > >> > >>>
> >> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be
> >> more
> >> > in
> >> > > >> > >> future)
> >> > > >> > >>> will be kept as a shaded artifact in a separate repo, which
> >> will
> >> > > be
> >> > > >> > >>> referred as dependency in hadoop modules.  This approach
> >> avoids
> >> > > >> shading
> >> > > >> > >> of
> >> > > >> > >>> every submodule during build.
> >> > > >> > >>>
> >> > > >> > >>> So question is does any VOTE required before asking to
> >> create a
> >> > > git
> >> > > >> > repo?
> >> > > >> > >>>
> >> > > >> > >>> On selfserve platform
> >> > > https://gitbox.apache.org/setup/newrepo.html
> >> > > >> > >>> I can access see that, requester should be PMC.
> >> > > >> > >>>
> >> > > >> > >>> Wanted to confirm here first.
> >> > > >> > >>>
> >> > > >> > >>> -Vinay
> >> > > >> > >>>
> >> > > >> > >>
> >> > > >> > >>
> >> > >
> ---------------------------------------------------------------------
> >> > > >> > >> To unsubscribe, e-mail:
> private-unsubscribe@hadoop.apache.org
> >> > > >> > >> For additional commands, e-mail:
> >> private-help@hadoop.apache.org
> >> > > >> > >>
> >> > > >> > >>
> >> > > >> >
> >> > > >> >
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >

-- 



--Brahma Reddy Battula

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Sree Vaddi <sr...@yahoo.com.INVALID>.
apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating Project ? Or as a TLP ?
Or as a new project definition ?

The effort to streamline and put in an accepted standard for the dependencies that require shading,seems beyond the siloed efforts of hadoop, hbase, etc....

I propose, we bring all the decision makers from all these artifacts in one room and decide best course of action.I am looking at, no projects should ever had to shade any artifacts except as an absolute necessary alternative.


Thank you./Sree

 

    On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <vi...@apache.org> wrote:  
 
 Hi,
Sorry for the late reply,.
>>> To be exact, how can we better use the thirdparty repo? Looking at
HBase as an example, it looks like everything that are known to break a lot
after an update get shaded into the hbase-thirdparty artifact: guava,
netty, ... etc.
Is it the purpose to isolate these naughty dependencies?
Yes, shading is to isolate these naughty dependencies from downstream
classpath and have independent control on these upgrades without breaking
downstreams.

First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create the
protobuf shaded jar is ready to merge.

Please take a look if anyone interested, will be merged may be after two
days if no objections.

-Vinay


On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org> wrote:

> Hi I am late to this but I am keen to understand more.
>
> To be exact, how can we better use the thirdparty repo? Looking at HBase
> as an example, it looks like everything that are known to break a lot after
> an update get shaded into the hbase-thirdparty artifact: guava, netty, ...
> etc.
> Is it the purpose to isolate these naughty dependencies?
>
> On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vi...@apache.org>
> wrote:
>
>> Hi All,
>>
>> I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
>> 's suggestions.
>>
>>    i. Renamed the module to 'hadoop-shaded-protobuf37'
>>    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
>>
>> Please review!!
>>
>> Thanks,
>> -Vinay
>>
>>
>> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <pa...@gmail.com>
>> wrote:
>>
>> > For HBase we have a separated repo for hbase-thirdparty
>> >
>> > https://github.com/apache/hbase-thirdparty
>> >
>> > We will publish the artifacts to nexus so we do not need to include
>> > binaries in our git repo, just add a dependency in the pom.
>> >
>> >
>> >
>> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
>> >
>> >
>> > And it has its own release cycles, only when there are special
>> requirements
>> > or we want to upgrade some of the dependencies. This is the vote thread
>> for
>> > the newest release, where we want to provide a shaded gson for jdk7.
>> >
>> >
>> >
>> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
>> >
>> >
>> > Thanks.
>> >
>> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
>> >
>> > > Please find replies inline.
>> > >
>> > > -Vinay
>> > >
>> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
>> owen.omalley@gmail.com>
>> > > wrote:
>> > >
>> > > > I'm very unhappy with this direction. In particular, I don't think
>> git
>> > is
>> > > > a good place for distribution of binary artifacts. Furthermore, the
>> PMC
>> > > > shouldn't be releasing anything without a release vote.
>> > > >
>> > > >
>> > > Proposed solution doesnt release any binaries in git. Its actually a
>> > > complete sub-project which follows entire release process, including
>> VOTE
>> > > in public. I have mentioned already that release process is similar to
>> > > hadoop.
>> > > To be specific, using the (almost) same script used in hadoop to
>> generate
>> > > artifacts, sign and deploy to staging repository. Please let me know
>> If I
>> > > am conveying anything wrong.
>> > >
>> > >
>> > > > I'd propose that we make a third party module that contains the
>> > *source*
>> > > > of the pom files to build the relocated jars. This should
>> absolutely be
>> > > > treated as a last resort for the mostly Google projects that
>> regularly
>> > > > break binary compatibility (eg. Protobuf & Guava).
>> > > >
>> > > >
>> > > Same has been implemented in the PR
>> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check and
>> let
>> > > me
>> > > know If I misunderstood. Yes, this is the last option we have AFAIK.
>> > >
>> > >
>> > > > In terms of naming, I'd propose something like:
>> > > >
>> > > > org.apache.hadoop.thirdparty.protobuf2_5
>> > > > org.apache.hadoop.thirdparty.guava28
>> > > >
>> > > > In particular, I think we absolutely need to include the version of
>> the
>> > > > underlying project. On the other hand, since we should not be
>> shading
>> > > > *everything* we can drop the leading com.google.
>> > > >
>> > > >
>> > > IMO, This naming convention is easy for identifying the underlying
>> > project,
>> > > but  it will be difficult to maintain going forward if underlying
>> project
>> > > versions changes. Since thirdparty module have its own releases, each
>> of
>> > > those release can be mapped to specific version of underlying project.
>> > Even
>> > > the binary artifact can include a MANIFEST with underlying project
>> > details
>> > > as per Steve's suggestion on HADOOP-13363.
>> > > That said, if you still prefer to have project number in artifact id,
>> it
>> > > can be done.
>> > >
>> > > The Hadoop project can make releases of  the thirdparty module:
>> > > >
>> > > > <dependency>
>> > > >  <groupId>org.apache.hadoop</groupId>
>> > > >  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
>> > > >  <version>1.0</version>
>> > > > </dependency>
>> > > >
>> > > >
>> > > Note that the version has to be the hadoop thirdparty release number,
>> > which
>> > > > is part of why you need to have the underlying version in the
>> artifact
>> > > > name. These we can push to maven central as new releases from
>> Hadoop.
>> > > >
>> > > >
>> > > Exactly, same has been implemented in the PR. hadoop-thirdparty module
>> > have
>> > > its own releases. But in HADOOP Jira, thirdparty versions can be
>> > > differentiated using prefix "thirdparty-".
>> > >
>> > > Same solution is being followed in HBase. May be people involved in
>> HBase
>> > > can add some points here.
>> > >
>> > > Thoughts?
>> > > >
>> > > > .. Owen
>> > > >
>> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
>> vinayakumarb@apache.org
>> > >
>> > > > wrote:
>> > > >
>> > > >> Hi All,
>> > > >>
>> > > >>    I wanted to discuss about the separate repo for thirdparty
>> > > dependencies
>> > > >> which we need to shaded and include in Hadoop component's jars.
>> > > >>
>> > > >>    Apologies for the big text ahead, but this needs clear
>> > explanation!!
>> > > >>
>> > > >>    Right now most needed such dependency is protobuf. Protobuf
>> > > dependency
>> > > >> was not upgraded from 2.5.0 onwards with the fear that downstream
>> > > builds,
>> > > >> which depends on transitive dependency protobuf coming from
>> hadoop's
>> > > jars,
>> > > >> may fail with the upgrade. Apparently protobuf does not guarantee
>> > source
>> > > >> compatibility, though it guarantees wire compatibility between
>> > versions.
>> > > >> Because of this behavior, version upgrade may cause breakage in
>> known
>> > > and
>> > > >> unknown (private?) downstreams.
>> > > >>
>> > > >>    So to tackle this, we came up the following proposal in
>> > HADOOP-13363.
>> > > >>
>> > > >>    Luckily, As far as I know, no APIs, either public to user or
>> > between
>> > > >> Hadoop processes, is not directly using protobuf classes in
>> > signatures.
>> > > >> (If
>> > > >> any exist, please let us know).
>> > > >>
>> > > >>    Proposal:
>> > > >>    ------------
>> > > >>
>> > > >>    1. Create a artifact(s) which contains shaded dependencies. All
>> > such
>> > > >> shading/relocation will be with known prefix
>> > > >> **org.apache.hadoop.thirdparty.**.
>> > > >>    2. Right now protobuf jar (ex:
>> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
>> > > >> to start with, all **com.google.protobuf** classes will be
>> relocated
>> > as
>> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
>> > > >>    3. Hadoop modules, which needs protobuf as dependency, will add
>> > this
>> > > >> shaded artifact as dependency (ex:
>> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
>> > > >>    4. All previous usages of "com.google.protobuf" will be
>> relocated
>> > to
>> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and
>> > will
>> > > be
>> > > >> committed. Please note, this replacement is One-Time directly in
>> > source
>> > > >> code, NOT during compile and package.
>> > > >>    5. Once all usages of "com.google.protobuf" is relocated, then
>> > hadoop
>> > > >> dont care about which version of original  "protobuf-java" is in
>> > > >> dependency.
>> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
>> break
>> > > the
>> > > >> downstreams. But hadoop will be originally using the latest
>> protobuf
>> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
>> > > >>
>> > > >>    7. Coming back to separate repo, Following are most appropriate
>> > > reasons
>> > > >> of keeping shaded dependency artifact in separate repo instead of
>> > > >> submodule.
>> > > >>
>> > > >>      7a. These artifacts need not be built all the time. It needs
>> to
>> > be
>> > > >> built only when there is a change in the dependency version or the
>> > build
>> > > >> process.
>> > > >>      7b. If added as "submodule in Hadoop repo",
>> > > maven-shade-plugin:shade
>> > > >> will execute only in package phase. That means, "mvn compile" or
>> "mvn
>> > > >> test-compile" will not be failed as this artifact will not have
>> > > relocated
>> > > >> classes, instead it will have original classes, resulting in
>> > compilation
>> > > >> failure. Workaround, build thirdparty submodule first and exclude
>> > > >> "thirdparty" submodule in other executions. This will be a complex
>> > > process
>> > > >> compared to keeping in a separate repo.
>> > > >>
>> > > >>      7c. Separate repo, will be a subproject of Hadoop, using the
>> > same
>> > > >> HADOOP jira project, with different versioning prefixed with
>> > > "thirdparty-"
>> > > >> (ex: thirdparty-1.0.0).
>> > > >>      7d. Separate will have same release process as Hadoop.
>> > > >>
>> > > >>    HADOOP-13363 (
>> https://issues.apache.org/jira/browse/HADOOP-13363)
>> > > is
>> > > >> an
>> > > >> umbrella jira tracking the changes to protobuf upgrade.
>> > > >>
>> > > >>    PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
>> been
>> > > >> raised
>> > > >> for separate repo creation in (HADOOP-16595 (
>> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
>> > > >>
>> > > >>    Please provide your inputs for the proposal and review the PR
>> to
>> > > >> proceed with the proposal.
>> > > >>
>> > > >>
>> > > >    -Thanks,
>> > > >>    Vinay
>> > > >>
>> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
>> > > >> vinodkv@apache.org>
>> > > >> wrote:
>> > > >>
>> > > >> > Moving the thread to the dev lists.
>> > > >> >
>> > > >> > Thanks
>> > > >> > +Vinod
>> > > >> >
>> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
>> > > vinayakumarb@apache.org>
>> > > >> > wrote:
>> > > >> > >
>> > > >> > > Thanks Marton,
>> > > >> > >
>> > > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
>> > > >> > > Whether to use that repo  for shaded artifact or not will be
>> > > >> monitored in
>> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
>> > discussion.
>> > > >> > >
>> > > >> > > There is no existing codebase is being moved out of hadoop
>> repo.
>> > So
>> > > I
>> > > >> > think
>> > > >> > > right now we are good to go.
>> > > >> > >
>> > > >> > > -Vinay
>> > > >> > >
>> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org>
>> > > wrote:
>> > > >> > >
>> > > >> > >>
>> > > >> > >> I am not sure if it's defined when is a vote required.
>> > > >> > >>
>> > > >> > >> https://www.apache.org/foundation/voting.html
>> > > >> > >>
>> > > >> > >> Personally I think it's a big enough change to send a
>> > notification
>> > > to
>> > > >> > the
>> > > >> > >> dev lists with a 'lazy consensus'  closure
>> > > >> > >>
>> > > >> > >> Marton
>> > > >> > >>
>> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
>> vinayakumarb@apache.org>
>> > > >> wrote:
>> > > >> > >>> Hi,
>> > > >> > >>>
>> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be
>> more
>> > in
>> > > >> > >> future)
>> > > >> > >>> will be kept as a shaded artifact in a separate repo, which
>> will
>> > > be
>> > > >> > >>> referred as dependency in hadoop modules.  This approach
>> avoids
>> > > >> shading
>> > > >> > >> of
>> > > >> > >>> every submodule during build.
>> > > >> > >>>
>> > > >> > >>> So question is does any VOTE required before asking to
>> create a
>> > > git
>> > > >> > repo?
>> > > >> > >>>
>> > > >> > >>> On selfserve platform
>> > > https://gitbox.apache.org/setup/newrepo.html
>> > > >> > >>> I can access see that, requester should be PMC.
>> > > >> > >>>
>> > > >> > >>> Wanted to confirm here first.
>> > > >> > >>>
>> > > >> > >>> -Vinay
>> > > >> > >>>
>> > > >> > >>
>> > > >> > >>
>> > > ---------------------------------------------------------------------
>> > > >> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
>> > > >> > >> For additional commands, e-mail:
>> private-help@hadoop.apache.org
>> > > >> > >>
>> > > >> > >>
>> > > >> >
>> > > >> >
>> > > >>
>> > > >
>> > >
>> >
>>
>  

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Sree Vaddi <sr...@yahoo.com.INVALID>.
apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating Project ? Or as a TLP ?
Or as a new project definition ?

The effort to streamline and put in an accepted standard for the dependencies that require shading,seems beyond the siloed efforts of hadoop, hbase, etc....

I propose, we bring all the decision makers from all these artifacts in one room and decide best course of action.I am looking at, no projects should ever had to shade any artifacts except as an absolute necessary alternative.


Thank you./Sree

 

    On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <vi...@apache.org> wrote:  
 
 Hi,
Sorry for the late reply,.
>>> To be exact, how can we better use the thirdparty repo? Looking at
HBase as an example, it looks like everything that are known to break a lot
after an update get shaded into the hbase-thirdparty artifact: guava,
netty, ... etc.
Is it the purpose to isolate these naughty dependencies?
Yes, shading is to isolate these naughty dependencies from downstream
classpath and have independent control on these upgrades without breaking
downstreams.

First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create the
protobuf shaded jar is ready to merge.

Please take a look if anyone interested, will be merged may be after two
days if no objections.

-Vinay


On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org> wrote:

> Hi I am late to this but I am keen to understand more.
>
> To be exact, how can we better use the thirdparty repo? Looking at HBase
> as an example, it looks like everything that are known to break a lot after
> an update get shaded into the hbase-thirdparty artifact: guava, netty, ...
> etc.
> Is it the purpose to isolate these naughty dependencies?
>
> On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vi...@apache.org>
> wrote:
>
>> Hi All,
>>
>> I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
>> 's suggestions.
>>
>>    i. Renamed the module to 'hadoop-shaded-protobuf37'
>>    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
>>
>> Please review!!
>>
>> Thanks,
>> -Vinay
>>
>>
>> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <pa...@gmail.com>
>> wrote:
>>
>> > For HBase we have a separated repo for hbase-thirdparty
>> >
>> > https://github.com/apache/hbase-thirdparty
>> >
>> > We will publish the artifacts to nexus so we do not need to include
>> > binaries in our git repo, just add a dependency in the pom.
>> >
>> >
>> >
>> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
>> >
>> >
>> > And it has its own release cycles, only when there are special
>> requirements
>> > or we want to upgrade some of the dependencies. This is the vote thread
>> for
>> > the newest release, where we want to provide a shaded gson for jdk7.
>> >
>> >
>> >
>> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
>> >
>> >
>> > Thanks.
>> >
>> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
>> >
>> > > Please find replies inline.
>> > >
>> > > -Vinay
>> > >
>> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
>> owen.omalley@gmail.com>
>> > > wrote:
>> > >
>> > > > I'm very unhappy with this direction. In particular, I don't think
>> git
>> > is
>> > > > a good place for distribution of binary artifacts. Furthermore, the
>> PMC
>> > > > shouldn't be releasing anything without a release vote.
>> > > >
>> > > >
>> > > Proposed solution doesnt release any binaries in git. Its actually a
>> > > complete sub-project which follows entire release process, including
>> VOTE
>> > > in public. I have mentioned already that release process is similar to
>> > > hadoop.
>> > > To be specific, using the (almost) same script used in hadoop to
>> generate
>> > > artifacts, sign and deploy to staging repository. Please let me know
>> If I
>> > > am conveying anything wrong.
>> > >
>> > >
>> > > > I'd propose that we make a third party module that contains the
>> > *source*
>> > > > of the pom files to build the relocated jars. This should
>> absolutely be
>> > > > treated as a last resort for the mostly Google projects that
>> regularly
>> > > > break binary compatibility (eg. Protobuf & Guava).
>> > > >
>> > > >
>> > > Same has been implemented in the PR
>> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check and
>> let
>> > > me
>> > > know If I misunderstood. Yes, this is the last option we have AFAIK.
>> > >
>> > >
>> > > > In terms of naming, I'd propose something like:
>> > > >
>> > > > org.apache.hadoop.thirdparty.protobuf2_5
>> > > > org.apache.hadoop.thirdparty.guava28
>> > > >
>> > > > In particular, I think we absolutely need to include the version of
>> the
>> > > > underlying project. On the other hand, since we should not be
>> shading
>> > > > *everything* we can drop the leading com.google.
>> > > >
>> > > >
>> > > IMO, This naming convention is easy for identifying the underlying
>> > project,
>> > > but  it will be difficult to maintain going forward if underlying
>> project
>> > > versions changes. Since thirdparty module have its own releases, each
>> of
>> > > those release can be mapped to specific version of underlying project.
>> > Even
>> > > the binary artifact can include a MANIFEST with underlying project
>> > details
>> > > as per Steve's suggestion on HADOOP-13363.
>> > > That said, if you still prefer to have project number in artifact id,
>> it
>> > > can be done.
>> > >
>> > > The Hadoop project can make releases of  the thirdparty module:
>> > > >
>> > > > <dependency>
>> > > >  <groupId>org.apache.hadoop</groupId>
>> > > >  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
>> > > >  <version>1.0</version>
>> > > > </dependency>
>> > > >
>> > > >
>> > > Note that the version has to be the hadoop thirdparty release number,
>> > which
>> > > > is part of why you need to have the underlying version in the
>> artifact
>> > > > name. These we can push to maven central as new releases from
>> Hadoop.
>> > > >
>> > > >
>> > > Exactly, same has been implemented in the PR. hadoop-thirdparty module
>> > have
>> > > its own releases. But in HADOOP Jira, thirdparty versions can be
>> > > differentiated using prefix "thirdparty-".
>> > >
>> > > Same solution is being followed in HBase. May be people involved in
>> HBase
>> > > can add some points here.
>> > >
>> > > Thoughts?
>> > > >
>> > > > .. Owen
>> > > >
>> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
>> vinayakumarb@apache.org
>> > >
>> > > > wrote:
>> > > >
>> > > >> Hi All,
>> > > >>
>> > > >>    I wanted to discuss about the separate repo for thirdparty
>> > > dependencies
>> > > >> which we need to shaded and include in Hadoop component's jars.
>> > > >>
>> > > >>    Apologies for the big text ahead, but this needs clear
>> > explanation!!
>> > > >>
>> > > >>    Right now most needed such dependency is protobuf. Protobuf
>> > > dependency
>> > > >> was not upgraded from 2.5.0 onwards with the fear that downstream
>> > > builds,
>> > > >> which depends on transitive dependency protobuf coming from
>> hadoop's
>> > > jars,
>> > > >> may fail with the upgrade. Apparently protobuf does not guarantee
>> > source
>> > > >> compatibility, though it guarantees wire compatibility between
>> > versions.
>> > > >> Because of this behavior, version upgrade may cause breakage in
>> known
>> > > and
>> > > >> unknown (private?) downstreams.
>> > > >>
>> > > >>    So to tackle this, we came up the following proposal in
>> > HADOOP-13363.
>> > > >>
>> > > >>    Luckily, As far as I know, no APIs, either public to user or
>> > between
>> > > >> Hadoop processes, is not directly using protobuf classes in
>> > signatures.
>> > > >> (If
>> > > >> any exist, please let us know).
>> > > >>
>> > > >>    Proposal:
>> > > >>    ------------
>> > > >>
>> > > >>    1. Create a artifact(s) which contains shaded dependencies. All
>> > such
>> > > >> shading/relocation will be with known prefix
>> > > >> **org.apache.hadoop.thirdparty.**.
>> > > >>    2. Right now protobuf jar (ex:
>> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
>> > > >> to start with, all **com.google.protobuf** classes will be
>> relocated
>> > as
>> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
>> > > >>    3. Hadoop modules, which needs protobuf as dependency, will add
>> > this
>> > > >> shaded artifact as dependency (ex:
>> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
>> > > >>    4. All previous usages of "com.google.protobuf" will be
>> relocated
>> > to
>> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and
>> > will
>> > > be
>> > > >> committed. Please note, this replacement is One-Time directly in
>> > source
>> > > >> code, NOT during compile and package.
>> > > >>    5. Once all usages of "com.google.protobuf" is relocated, then
>> > hadoop
>> > > >> dont care about which version of original  "protobuf-java" is in
>> > > >> dependency.
>> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
>> break
>> > > the
>> > > >> downstreams. But hadoop will be originally using the latest
>> protobuf
>> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
>> > > >>
>> > > >>    7. Coming back to separate repo, Following are most appropriate
>> > > reasons
>> > > >> of keeping shaded dependency artifact in separate repo instead of
>> > > >> submodule.
>> > > >>
>> > > >>      7a. These artifacts need not be built all the time. It needs
>> to
>> > be
>> > > >> built only when there is a change in the dependency version or the
>> > build
>> > > >> process.
>> > > >>      7b. If added as "submodule in Hadoop repo",
>> > > maven-shade-plugin:shade
>> > > >> will execute only in package phase. That means, "mvn compile" or
>> "mvn
>> > > >> test-compile" will not be failed as this artifact will not have
>> > > relocated
>> > > >> classes, instead it will have original classes, resulting in
>> > compilation
>> > > >> failure. Workaround, build thirdparty submodule first and exclude
>> > > >> "thirdparty" submodule in other executions. This will be a complex
>> > > process
>> > > >> compared to keeping in a separate repo.
>> > > >>
>> > > >>      7c. Separate repo, will be a subproject of Hadoop, using the
>> > same
>> > > >> HADOOP jira project, with different versioning prefixed with
>> > > "thirdparty-"
>> > > >> (ex: thirdparty-1.0.0).
>> > > >>      7d. Separate will have same release process as Hadoop.
>> > > >>
>> > > >>    HADOOP-13363 (
>> https://issues.apache.org/jira/browse/HADOOP-13363)
>> > > is
>> > > >> an
>> > > >> umbrella jira tracking the changes to protobuf upgrade.
>> > > >>
>> > > >>    PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
>> been
>> > > >> raised
>> > > >> for separate repo creation in (HADOOP-16595 (
>> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
>> > > >>
>> > > >>    Please provide your inputs for the proposal and review the PR
>> to
>> > > >> proceed with the proposal.
>> > > >>
>> > > >>
>> > > >    -Thanks,
>> > > >>    Vinay
>> > > >>
>> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
>> > > >> vinodkv@apache.org>
>> > > >> wrote:
>> > > >>
>> > > >> > Moving the thread to the dev lists.
>> > > >> >
>> > > >> > Thanks
>> > > >> > +Vinod
>> > > >> >
>> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
>> > > vinayakumarb@apache.org>
>> > > >> > wrote:
>> > > >> > >
>> > > >> > > Thanks Marton,
>> > > >> > >
>> > > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
>> > > >> > > Whether to use that repo  for shaded artifact or not will be
>> > > >> monitored in
>> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
>> > discussion.
>> > > >> > >
>> > > >> > > There is no existing codebase is being moved out of hadoop
>> repo.
>> > So
>> > > I
>> > > >> > think
>> > > >> > > right now we are good to go.
>> > > >> > >
>> > > >> > > -Vinay
>> > > >> > >
>> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org>
>> > > wrote:
>> > > >> > >
>> > > >> > >>
>> > > >> > >> I am not sure if it's defined when is a vote required.
>> > > >> > >>
>> > > >> > >> https://www.apache.org/foundation/voting.html
>> > > >> > >>
>> > > >> > >> Personally I think it's a big enough change to send a
>> > notification
>> > > to
>> > > >> > the
>> > > >> > >> dev lists with a 'lazy consensus'  closure
>> > > >> > >>
>> > > >> > >> Marton
>> > > >> > >>
>> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
>> vinayakumarb@apache.org>
>> > > >> wrote:
>> > > >> > >>> Hi,
>> > > >> > >>>
>> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be
>> more
>> > in
>> > > >> > >> future)
>> > > >> > >>> will be kept as a shaded artifact in a separate repo, which
>> will
>> > > be
>> > > >> > >>> referred as dependency in hadoop modules.  This approach
>> avoids
>> > > >> shading
>> > > >> > >> of
>> > > >> > >>> every submodule during build.
>> > > >> > >>>
>> > > >> > >>> So question is does any VOTE required before asking to
>> create a
>> > > git
>> > > >> > repo?
>> > > >> > >>>
>> > > >> > >>> On selfserve platform
>> > > https://gitbox.apache.org/setup/newrepo.html
>> > > >> > >>> I can access see that, requester should be PMC.
>> > > >> > >>>
>> > > >> > >>> Wanted to confirm here first.
>> > > >> > >>>
>> > > >> > >>> -Vinay
>> > > >> > >>>
>> > > >> > >>
>> > > >> > >>
>> > > ---------------------------------------------------------------------
>> > > >> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
>> > > >> > >> For additional commands, e-mail:
>> private-help@hadoop.apache.org
>> > > >> > >>
>> > > >> > >>
>> > > >> >
>> > > >> >
>> > > >>
>> > > >
>> > >
>> >
>>
>  

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Sree Vaddi <sr...@yahoo.com.INVALID>.
apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating Project ? Or as a TLP ?
Or as a new project definition ?

The effort to streamline and put in an accepted standard for the dependencies that require shading,seems beyond the siloed efforts of hadoop, hbase, etc....

I propose, we bring all the decision makers from all these artifacts in one room and decide best course of action.I am looking at, no projects should ever had to shade any artifacts except as an absolute necessary alternative.


Thank you./Sree

 

    On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <vi...@apache.org> wrote:  
 
 Hi,
Sorry for the late reply,.
>>> To be exact, how can we better use the thirdparty repo? Looking at
HBase as an example, it looks like everything that are known to break a lot
after an update get shaded into the hbase-thirdparty artifact: guava,
netty, ... etc.
Is it the purpose to isolate these naughty dependencies?
Yes, shading is to isolate these naughty dependencies from downstream
classpath and have independent control on these upgrades without breaking
downstreams.

First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create the
protobuf shaded jar is ready to merge.

Please take a look if anyone interested, will be merged may be after two
days if no objections.

-Vinay


On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org> wrote:

> Hi I am late to this but I am keen to understand more.
>
> To be exact, how can we better use the thirdparty repo? Looking at HBase
> as an example, it looks like everything that are known to break a lot after
> an update get shaded into the hbase-thirdparty artifact: guava, netty, ...
> etc.
> Is it the purpose to isolate these naughty dependencies?
>
> On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vi...@apache.org>
> wrote:
>
>> Hi All,
>>
>> I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
>> 's suggestions.
>>
>>    i. Renamed the module to 'hadoop-shaded-protobuf37'
>>    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
>>
>> Please review!!
>>
>> Thanks,
>> -Vinay
>>
>>
>> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <pa...@gmail.com>
>> wrote:
>>
>> > For HBase we have a separated repo for hbase-thirdparty
>> >
>> > https://github.com/apache/hbase-thirdparty
>> >
>> > We will publish the artifacts to nexus so we do not need to include
>> > binaries in our git repo, just add a dependency in the pom.
>> >
>> >
>> >
>> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
>> >
>> >
>> > And it has its own release cycles, only when there are special
>> requirements
>> > or we want to upgrade some of the dependencies. This is the vote thread
>> for
>> > the newest release, where we want to provide a shaded gson for jdk7.
>> >
>> >
>> >
>> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
>> >
>> >
>> > Thanks.
>> >
>> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
>> >
>> > > Please find replies inline.
>> > >
>> > > -Vinay
>> > >
>> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
>> owen.omalley@gmail.com>
>> > > wrote:
>> > >
>> > > > I'm very unhappy with this direction. In particular, I don't think
>> git
>> > is
>> > > > a good place for distribution of binary artifacts. Furthermore, the
>> PMC
>> > > > shouldn't be releasing anything without a release vote.
>> > > >
>> > > >
>> > > Proposed solution doesnt release any binaries in git. Its actually a
>> > > complete sub-project which follows entire release process, including
>> VOTE
>> > > in public. I have mentioned already that release process is similar to
>> > > hadoop.
>> > > To be specific, using the (almost) same script used in hadoop to
>> generate
>> > > artifacts, sign and deploy to staging repository. Please let me know
>> If I
>> > > am conveying anything wrong.
>> > >
>> > >
>> > > > I'd propose that we make a third party module that contains the
>> > *source*
>> > > > of the pom files to build the relocated jars. This should
>> absolutely be
>> > > > treated as a last resort for the mostly Google projects that
>> regularly
>> > > > break binary compatibility (eg. Protobuf & Guava).
>> > > >
>> > > >
>> > > Same has been implemented in the PR
>> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check and
>> let
>> > > me
>> > > know If I misunderstood. Yes, this is the last option we have AFAIK.
>> > >
>> > >
>> > > > In terms of naming, I'd propose something like:
>> > > >
>> > > > org.apache.hadoop.thirdparty.protobuf2_5
>> > > > org.apache.hadoop.thirdparty.guava28
>> > > >
>> > > > In particular, I think we absolutely need to include the version of
>> the
>> > > > underlying project. On the other hand, since we should not be
>> shading
>> > > > *everything* we can drop the leading com.google.
>> > > >
>> > > >
>> > > IMO, This naming convention is easy for identifying the underlying
>> > project,
>> > > but  it will be difficult to maintain going forward if underlying
>> project
>> > > versions changes. Since thirdparty module have its own releases, each
>> of
>> > > those release can be mapped to specific version of underlying project.
>> > Even
>> > > the binary artifact can include a MANIFEST with underlying project
>> > details
>> > > as per Steve's suggestion on HADOOP-13363.
>> > > That said, if you still prefer to have project number in artifact id,
>> it
>> > > can be done.
>> > >
>> > > The Hadoop project can make releases of  the thirdparty module:
>> > > >
>> > > > <dependency>
>> > > >  <groupId>org.apache.hadoop</groupId>
>> > > >  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
>> > > >  <version>1.0</version>
>> > > > </dependency>
>> > > >
>> > > >
>> > > Note that the version has to be the hadoop thirdparty release number,
>> > which
>> > > > is part of why you need to have the underlying version in the
>> artifact
>> > > > name. These we can push to maven central as new releases from
>> Hadoop.
>> > > >
>> > > >
>> > > Exactly, same has been implemented in the PR. hadoop-thirdparty module
>> > have
>> > > its own releases. But in HADOOP Jira, thirdparty versions can be
>> > > differentiated using prefix "thirdparty-".
>> > >
>> > > Same solution is being followed in HBase. May be people involved in
>> HBase
>> > > can add some points here.
>> > >
>> > > Thoughts?
>> > > >
>> > > > .. Owen
>> > > >
>> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
>> vinayakumarb@apache.org
>> > >
>> > > > wrote:
>> > > >
>> > > >> Hi All,
>> > > >>
>> > > >>    I wanted to discuss about the separate repo for thirdparty
>> > > dependencies
>> > > >> which we need to shaded and include in Hadoop component's jars.
>> > > >>
>> > > >>    Apologies for the big text ahead, but this needs clear
>> > explanation!!
>> > > >>
>> > > >>    Right now most needed such dependency is protobuf. Protobuf
>> > > dependency
>> > > >> was not upgraded from 2.5.0 onwards with the fear that downstream
>> > > builds,
>> > > >> which depends on transitive dependency protobuf coming from
>> hadoop's
>> > > jars,
>> > > >> may fail with the upgrade. Apparently protobuf does not guarantee
>> > source
>> > > >> compatibility, though it guarantees wire compatibility between
>> > versions.
>> > > >> Because of this behavior, version upgrade may cause breakage in
>> known
>> > > and
>> > > >> unknown (private?) downstreams.
>> > > >>
>> > > >>    So to tackle this, we came up the following proposal in
>> > HADOOP-13363.
>> > > >>
>> > > >>    Luckily, As far as I know, no APIs, either public to user or
>> > between
>> > > >> Hadoop processes, is not directly using protobuf classes in
>> > signatures.
>> > > >> (If
>> > > >> any exist, please let us know).
>> > > >>
>> > > >>    Proposal:
>> > > >>    ------------
>> > > >>
>> > > >>    1. Create a artifact(s) which contains shaded dependencies. All
>> > such
>> > > >> shading/relocation will be with known prefix
>> > > >> **org.apache.hadoop.thirdparty.**.
>> > > >>    2. Right now protobuf jar (ex:
>> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
>> > > >> to start with, all **com.google.protobuf** classes will be
>> relocated
>> > as
>> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
>> > > >>    3. Hadoop modules, which needs protobuf as dependency, will add
>> > this
>> > > >> shaded artifact as dependency (ex:
>> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
>> > > >>    4. All previous usages of "com.google.protobuf" will be
>> relocated
>> > to
>> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and
>> > will
>> > > be
>> > > >> committed. Please note, this replacement is One-Time directly in
>> > source
>> > > >> code, NOT during compile and package.
>> > > >>    5. Once all usages of "com.google.protobuf" is relocated, then
>> > hadoop
>> > > >> dont care about which version of original  "protobuf-java" is in
>> > > >> dependency.
>> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
>> break
>> > > the
>> > > >> downstreams. But hadoop will be originally using the latest
>> protobuf
>> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
>> > > >>
>> > > >>    7. Coming back to separate repo, Following are most appropriate
>> > > reasons
>> > > >> of keeping shaded dependency artifact in separate repo instead of
>> > > >> submodule.
>> > > >>
>> > > >>      7a. These artifacts need not be built all the time. It needs
>> to
>> > be
>> > > >> built only when there is a change in the dependency version or the
>> > build
>> > > >> process.
>> > > >>      7b. If added as "submodule in Hadoop repo",
>> > > maven-shade-plugin:shade
>> > > >> will execute only in package phase. That means, "mvn compile" or
>> "mvn
>> > > >> test-compile" will not be failed as this artifact will not have
>> > > relocated
>> > > >> classes, instead it will have original classes, resulting in
>> > compilation
>> > > >> failure. Workaround, build thirdparty submodule first and exclude
>> > > >> "thirdparty" submodule in other executions. This will be a complex
>> > > process
>> > > >> compared to keeping in a separate repo.
>> > > >>
>> > > >>      7c. Separate repo, will be a subproject of Hadoop, using the
>> > same
>> > > >> HADOOP jira project, with different versioning prefixed with
>> > > "thirdparty-"
>> > > >> (ex: thirdparty-1.0.0).
>> > > >>      7d. Separate will have same release process as Hadoop.
>> > > >>
>> > > >>    HADOOP-13363 (
>> https://issues.apache.org/jira/browse/HADOOP-13363)
>> > > is
>> > > >> an
>> > > >> umbrella jira tracking the changes to protobuf upgrade.
>> > > >>
>> > > >>    PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
>> been
>> > > >> raised
>> > > >> for separate repo creation in (HADOOP-16595 (
>> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
>> > > >>
>> > > >>    Please provide your inputs for the proposal and review the PR
>> to
>> > > >> proceed with the proposal.
>> > > >>
>> > > >>
>> > > >    -Thanks,
>> > > >>    Vinay
>> > > >>
>> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
>> > > >> vinodkv@apache.org>
>> > > >> wrote:
>> > > >>
>> > > >> > Moving the thread to the dev lists.
>> > > >> >
>> > > >> > Thanks
>> > > >> > +Vinod
>> > > >> >
>> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
>> > > vinayakumarb@apache.org>
>> > > >> > wrote:
>> > > >> > >
>> > > >> > > Thanks Marton,
>> > > >> > >
>> > > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
>> > > >> > > Whether to use that repo  for shaded artifact or not will be
>> > > >> monitored in
>> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
>> > discussion.
>> > > >> > >
>> > > >> > > There is no existing codebase is being moved out of hadoop
>> repo.
>> > So
>> > > I
>> > > >> > think
>> > > >> > > right now we are good to go.
>> > > >> > >
>> > > >> > > -Vinay
>> > > >> > >
>> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org>
>> > > wrote:
>> > > >> > >
>> > > >> > >>
>> > > >> > >> I am not sure if it's defined when is a vote required.
>> > > >> > >>
>> > > >> > >> https://www.apache.org/foundation/voting.html
>> > > >> > >>
>> > > >> > >> Personally I think it's a big enough change to send a
>> > notification
>> > > to
>> > > >> > the
>> > > >> > >> dev lists with a 'lazy consensus'  closure
>> > > >> > >>
>> > > >> > >> Marton
>> > > >> > >>
>> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
>> vinayakumarb@apache.org>
>> > > >> wrote:
>> > > >> > >>> Hi,
>> > > >> > >>>
>> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be
>> more
>> > in
>> > > >> > >> future)
>> > > >> > >>> will be kept as a shaded artifact in a separate repo, which
>> will
>> > > be
>> > > >> > >>> referred as dependency in hadoop modules.  This approach
>> avoids
>> > > >> shading
>> > > >> > >> of
>> > > >> > >>> every submodule during build.
>> > > >> > >>>
>> > > >> > >>> So question is does any VOTE required before asking to
>> create a
>> > > git
>> > > >> > repo?
>> > > >> > >>>
>> > > >> > >>> On selfserve platform
>> > > https://gitbox.apache.org/setup/newrepo.html
>> > > >> > >>> I can access see that, requester should be PMC.
>> > > >> > >>>
>> > > >> > >>> Wanted to confirm here first.
>> > > >> > >>>
>> > > >> > >>> -Vinay
>> > > >> > >>>
>> > > >> > >>
>> > > >> > >>
>> > > ---------------------------------------------------------------------
>> > > >> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
>> > > >> > >> For additional commands, e-mail:
>> private-help@hadoop.apache.org
>> > > >> > >>
>> > > >> > >>
>> > > >> >
>> > > >> >
>> > > >>
>> > > >
>> > >
>> >
>>
>  

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Sree Vaddi <sr...@yahoo.com.INVALID>.
apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating Project ? Or as a TLP ?
Or as a new project definition ?

The effort to streamline and put in an accepted standard for the dependencies that require shading,seems beyond the siloed efforts of hadoop, hbase, etc....

I propose, we bring all the decision makers from all these artifacts in one room and decide best course of action.I am looking at, no projects should ever had to shade any artifacts except as an absolute necessary alternative.


Thank you./Sree

 

    On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <vi...@apache.org> wrote:  
 
 Hi,
Sorry for the late reply,.
>>> To be exact, how can we better use the thirdparty repo? Looking at
HBase as an example, it looks like everything that are known to break a lot
after an update get shaded into the hbase-thirdparty artifact: guava,
netty, ... etc.
Is it the purpose to isolate these naughty dependencies?
Yes, shading is to isolate these naughty dependencies from downstream
classpath and have independent control on these upgrades without breaking
downstreams.

First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create the
protobuf shaded jar is ready to merge.

Please take a look if anyone interested, will be merged may be after two
days if no objections.

-Vinay


On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org> wrote:

> Hi I am late to this but I am keen to understand more.
>
> To be exact, how can we better use the thirdparty repo? Looking at HBase
> as an example, it looks like everything that are known to break a lot after
> an update get shaded into the hbase-thirdparty artifact: guava, netty, ...
> etc.
> Is it the purpose to isolate these naughty dependencies?
>
> On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vi...@apache.org>
> wrote:
>
>> Hi All,
>>
>> I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
>> 's suggestions.
>>
>>    i. Renamed the module to 'hadoop-shaded-protobuf37'
>>    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
>>
>> Please review!!
>>
>> Thanks,
>> -Vinay
>>
>>
>> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <pa...@gmail.com>
>> wrote:
>>
>> > For HBase we have a separated repo for hbase-thirdparty
>> >
>> > https://github.com/apache/hbase-thirdparty
>> >
>> > We will publish the artifacts to nexus so we do not need to include
>> > binaries in our git repo, just add a dependency in the pom.
>> >
>> >
>> >
>> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
>> >
>> >
>> > And it has its own release cycles, only when there are special
>> requirements
>> > or we want to upgrade some of the dependencies. This is the vote thread
>> for
>> > the newest release, where we want to provide a shaded gson for jdk7.
>> >
>> >
>> >
>> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
>> >
>> >
>> > Thanks.
>> >
>> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
>> >
>> > > Please find replies inline.
>> > >
>> > > -Vinay
>> > >
>> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
>> owen.omalley@gmail.com>
>> > > wrote:
>> > >
>> > > > I'm very unhappy with this direction. In particular, I don't think
>> git
>> > is
>> > > > a good place for distribution of binary artifacts. Furthermore, the
>> PMC
>> > > > shouldn't be releasing anything without a release vote.
>> > > >
>> > > >
>> > > Proposed solution doesnt release any binaries in git. Its actually a
>> > > complete sub-project which follows entire release process, including
>> VOTE
>> > > in public. I have mentioned already that release process is similar to
>> > > hadoop.
>> > > To be specific, using the (almost) same script used in hadoop to
>> generate
>> > > artifacts, sign and deploy to staging repository. Please let me know
>> If I
>> > > am conveying anything wrong.
>> > >
>> > >
>> > > > I'd propose that we make a third party module that contains the
>> > *source*
>> > > > of the pom files to build the relocated jars. This should
>> absolutely be
>> > > > treated as a last resort for the mostly Google projects that
>> regularly
>> > > > break binary compatibility (eg. Protobuf & Guava).
>> > > >
>> > > >
>> > > Same has been implemented in the PR
>> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check and
>> let
>> > > me
>> > > know If I misunderstood. Yes, this is the last option we have AFAIK.
>> > >
>> > >
>> > > > In terms of naming, I'd propose something like:
>> > > >
>> > > > org.apache.hadoop.thirdparty.protobuf2_5
>> > > > org.apache.hadoop.thirdparty.guava28
>> > > >
>> > > > In particular, I think we absolutely need to include the version of
>> the
>> > > > underlying project. On the other hand, since we should not be
>> shading
>> > > > *everything* we can drop the leading com.google.
>> > > >
>> > > >
>> > > IMO, This naming convention is easy for identifying the underlying
>> > project,
>> > > but  it will be difficult to maintain going forward if underlying
>> project
>> > > versions changes. Since thirdparty module have its own releases, each
>> of
>> > > those release can be mapped to specific version of underlying project.
>> > Even
>> > > the binary artifact can include a MANIFEST with underlying project
>> > details
>> > > as per Steve's suggestion on HADOOP-13363.
>> > > That said, if you still prefer to have project number in artifact id,
>> it
>> > > can be done.
>> > >
>> > > The Hadoop project can make releases of  the thirdparty module:
>> > > >
>> > > > <dependency>
>> > > >  <groupId>org.apache.hadoop</groupId>
>> > > >  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
>> > > >  <version>1.0</version>
>> > > > </dependency>
>> > > >
>> > > >
>> > > Note that the version has to be the hadoop thirdparty release number,
>> > which
>> > > > is part of why you need to have the underlying version in the
>> artifact
>> > > > name. These we can push to maven central as new releases from
>> Hadoop.
>> > > >
>> > > >
>> > > Exactly, same has been implemented in the PR. hadoop-thirdparty module
>> > have
>> > > its own releases. But in HADOOP Jira, thirdparty versions can be
>> > > differentiated using prefix "thirdparty-".
>> > >
>> > > Same solution is being followed in HBase. May be people involved in
>> HBase
>> > > can add some points here.
>> > >
>> > > Thoughts?
>> > > >
>> > > > .. Owen
>> > > >
>> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
>> vinayakumarb@apache.org
>> > >
>> > > > wrote:
>> > > >
>> > > >> Hi All,
>> > > >>
>> > > >>    I wanted to discuss about the separate repo for thirdparty
>> > > dependencies
>> > > >> which we need to shaded and include in Hadoop component's jars.
>> > > >>
>> > > >>    Apologies for the big text ahead, but this needs clear
>> > explanation!!
>> > > >>
>> > > >>    Right now most needed such dependency is protobuf. Protobuf
>> > > dependency
>> > > >> was not upgraded from 2.5.0 onwards with the fear that downstream
>> > > builds,
>> > > >> which depends on transitive dependency protobuf coming from
>> hadoop's
>> > > jars,
>> > > >> may fail with the upgrade. Apparently protobuf does not guarantee
>> > source
>> > > >> compatibility, though it guarantees wire compatibility between
>> > versions.
>> > > >> Because of this behavior, version upgrade may cause breakage in
>> known
>> > > and
>> > > >> unknown (private?) downstreams.
>> > > >>
>> > > >>    So to tackle this, we came up the following proposal in
>> > HADOOP-13363.
>> > > >>
>> > > >>    Luckily, As far as I know, no APIs, either public to user or
>> > between
>> > > >> Hadoop processes, is not directly using protobuf classes in
>> > signatures.
>> > > >> (If
>> > > >> any exist, please let us know).
>> > > >>
>> > > >>    Proposal:
>> > > >>    ------------
>> > > >>
>> > > >>    1. Create a artifact(s) which contains shaded dependencies. All
>> > such
>> > > >> shading/relocation will be with known prefix
>> > > >> **org.apache.hadoop.thirdparty.**.
>> > > >>    2. Right now protobuf jar (ex:
>> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
>> > > >> to start with, all **com.google.protobuf** classes will be
>> relocated
>> > as
>> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
>> > > >>    3. Hadoop modules, which needs protobuf as dependency, will add
>> > this
>> > > >> shaded artifact as dependency (ex:
>> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
>> > > >>    4. All previous usages of "com.google.protobuf" will be
>> relocated
>> > to
>> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and
>> > will
>> > > be
>> > > >> committed. Please note, this replacement is One-Time directly in
>> > source
>> > > >> code, NOT during compile and package.
>> > > >>    5. Once all usages of "com.google.protobuf" is relocated, then
>> > hadoop
>> > > >> dont care about which version of original  "protobuf-java" is in
>> > > >> dependency.
>> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
>> break
>> > > the
>> > > >> downstreams. But hadoop will be originally using the latest
>> protobuf
>> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
>> > > >>
>> > > >>    7. Coming back to separate repo, Following are most appropriate
>> > > reasons
>> > > >> of keeping shaded dependency artifact in separate repo instead of
>> > > >> submodule.
>> > > >>
>> > > >>      7a. These artifacts need not be built all the time. It needs
>> to
>> > be
>> > > >> built only when there is a change in the dependency version or the
>> > build
>> > > >> process.
>> > > >>      7b. If added as "submodule in Hadoop repo",
>> > > maven-shade-plugin:shade
>> > > >> will execute only in package phase. That means, "mvn compile" or
>> "mvn
>> > > >> test-compile" will not be failed as this artifact will not have
>> > > relocated
>> > > >> classes, instead it will have original classes, resulting in
>> > compilation
>> > > >> failure. Workaround, build thirdparty submodule first and exclude
>> > > >> "thirdparty" submodule in other executions. This will be a complex
>> > > process
>> > > >> compared to keeping in a separate repo.
>> > > >>
>> > > >>      7c. Separate repo, will be a subproject of Hadoop, using the
>> > same
>> > > >> HADOOP jira project, with different versioning prefixed with
>> > > "thirdparty-"
>> > > >> (ex: thirdparty-1.0.0).
>> > > >>      7d. Separate will have same release process as Hadoop.
>> > > >>
>> > > >>    HADOOP-13363 (
>> https://issues.apache.org/jira/browse/HADOOP-13363)
>> > > is
>> > > >> an
>> > > >> umbrella jira tracking the changes to protobuf upgrade.
>> > > >>
>> > > >>    PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
>> been
>> > > >> raised
>> > > >> for separate repo creation in (HADOOP-16595 (
>> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
>> > > >>
>> > > >>    Please provide your inputs for the proposal and review the PR
>> to
>> > > >> proceed with the proposal.
>> > > >>
>> > > >>
>> > > >    -Thanks,
>> > > >>    Vinay
>> > > >>
>> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
>> > > >> vinodkv@apache.org>
>> > > >> wrote:
>> > > >>
>> > > >> > Moving the thread to the dev lists.
>> > > >> >
>> > > >> > Thanks
>> > > >> > +Vinod
>> > > >> >
>> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
>> > > vinayakumarb@apache.org>
>> > > >> > wrote:
>> > > >> > >
>> > > >> > > Thanks Marton,
>> > > >> > >
>> > > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
>> > > >> > > Whether to use that repo  for shaded artifact or not will be
>> > > >> monitored in
>> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
>> > discussion.
>> > > >> > >
>> > > >> > > There is no existing codebase is being moved out of hadoop
>> repo.
>> > So
>> > > I
>> > > >> > think
>> > > >> > > right now we are good to go.
>> > > >> > >
>> > > >> > > -Vinay
>> > > >> > >
>> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org>
>> > > wrote:
>> > > >> > >
>> > > >> > >>
>> > > >> > >> I am not sure if it's defined when is a vote required.
>> > > >> > >>
>> > > >> > >> https://www.apache.org/foundation/voting.html
>> > > >> > >>
>> > > >> > >> Personally I think it's a big enough change to send a
>> > notification
>> > > to
>> > > >> > the
>> > > >> > >> dev lists with a 'lazy consensus'  closure
>> > > >> > >>
>> > > >> > >> Marton
>> > > >> > >>
>> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
>> vinayakumarb@apache.org>
>> > > >> wrote:
>> > > >> > >>> Hi,
>> > > >> > >>>
>> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be
>> more
>> > in
>> > > >> > >> future)
>> > > >> > >>> will be kept as a shaded artifact in a separate repo, which
>> will
>> > > be
>> > > >> > >>> referred as dependency in hadoop modules.  This approach
>> avoids
>> > > >> shading
>> > > >> > >> of
>> > > >> > >>> every submodule during build.
>> > > >> > >>>
>> > > >> > >>> So question is does any VOTE required before asking to
>> create a
>> > > git
>> > > >> > repo?
>> > > >> > >>>
>> > > >> > >>> On selfserve platform
>> > > https://gitbox.apache.org/setup/newrepo.html
>> > > >> > >>> I can access see that, requester should be PMC.
>> > > >> > >>>
>> > > >> > >>> Wanted to confirm here first.
>> > > >> > >>>
>> > > >> > >>> -Vinay
>> > > >> > >>>
>> > > >> > >>
>> > > >> > >>
>> > > ---------------------------------------------------------------------
>> > > >> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
>> > > >> > >> For additional commands, e-mail:
>> private-help@hadoop.apache.org
>> > > >> > >>
>> > > >> > >>
>> > > >> >
>> > > >> >
>> > > >>
>> > > >
>> > >
>> >
>>
>  

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Vinayakumar B <vi...@apache.org>.
Hi,
Sorry for the late reply,.
>>> To be exact, how can we better use the thirdparty repo? Looking at
HBase as an example, it looks like everything that are known to break a lot
after an update get shaded into the hbase-thirdparty artifact: guava,
netty, ... etc.
Is it the purpose to isolate these naughty dependencies?
Yes, shading is to isolate these naughty dependencies from downstream
classpath and have independent control on these upgrades without breaking
downstreams.

First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create the
protobuf shaded jar is ready to merge.

Please take a look if anyone interested, will be merged may be after two
days if no objections.

-Vinay


On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org> wrote:

> Hi I am late to this but I am keen to understand more.
>
> To be exact, how can we better use the thirdparty repo? Looking at HBase
> as an example, it looks like everything that are known to break a lot after
> an update get shaded into the hbase-thirdparty artifact: guava, netty, ...
> etc.
> Is it the purpose to isolate these naughty dependencies?
>
> On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vi...@apache.org>
> wrote:
>
>> Hi All,
>>
>> I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
>> 's suggestions.
>>
>>     i. Renamed the module to 'hadoop-shaded-protobuf37'
>>     ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
>>
>> Please review!!
>>
>> Thanks,
>> -Vinay
>>
>>
>> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <pa...@gmail.com>
>> wrote:
>>
>> > For HBase we have a separated repo for hbase-thirdparty
>> >
>> > https://github.com/apache/hbase-thirdparty
>> >
>> > We will publish the artifacts to nexus so we do not need to include
>> > binaries in our git repo, just add a dependency in the pom.
>> >
>> >
>> >
>> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
>> >
>> >
>> > And it has its own release cycles, only when there are special
>> requirements
>> > or we want to upgrade some of the dependencies. This is the vote thread
>> for
>> > the newest release, where we want to provide a shaded gson for jdk7.
>> >
>> >
>> >
>> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
>> >
>> >
>> > Thanks.
>> >
>> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
>> >
>> > > Please find replies inline.
>> > >
>> > > -Vinay
>> > >
>> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
>> owen.omalley@gmail.com>
>> > > wrote:
>> > >
>> > > > I'm very unhappy with this direction. In particular, I don't think
>> git
>> > is
>> > > > a good place for distribution of binary artifacts. Furthermore, the
>> PMC
>> > > > shouldn't be releasing anything without a release vote.
>> > > >
>> > > >
>> > > Proposed solution doesnt release any binaries in git. Its actually a
>> > > complete sub-project which follows entire release process, including
>> VOTE
>> > > in public. I have mentioned already that release process is similar to
>> > > hadoop.
>> > > To be specific, using the (almost) same script used in hadoop to
>> generate
>> > > artifacts, sign and deploy to staging repository. Please let me know
>> If I
>> > > am conveying anything wrong.
>> > >
>> > >
>> > > > I'd propose that we make a third party module that contains the
>> > *source*
>> > > > of the pom files to build the relocated jars. This should
>> absolutely be
>> > > > treated as a last resort for the mostly Google projects that
>> regularly
>> > > > break binary compatibility (eg. Protobuf & Guava).
>> > > >
>> > > >
>> > > Same has been implemented in the PR
>> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check and
>> let
>> > > me
>> > > know If I misunderstood. Yes, this is the last option we have AFAIK.
>> > >
>> > >
>> > > > In terms of naming, I'd propose something like:
>> > > >
>> > > > org.apache.hadoop.thirdparty.protobuf2_5
>> > > > org.apache.hadoop.thirdparty.guava28
>> > > >
>> > > > In particular, I think we absolutely need to include the version of
>> the
>> > > > underlying project. On the other hand, since we should not be
>> shading
>> > > > *everything* we can drop the leading com.google.
>> > > >
>> > > >
>> > > IMO, This naming convention is easy for identifying the underlying
>> > project,
>> > > but  it will be difficult to maintain going forward if underlying
>> project
>> > > versions changes. Since thirdparty module have its own releases, each
>> of
>> > > those release can be mapped to specific version of underlying project.
>> > Even
>> > > the binary artifact can include a MANIFEST with underlying project
>> > details
>> > > as per Steve's suggestion on HADOOP-13363.
>> > > That said, if you still prefer to have project number in artifact id,
>> it
>> > > can be done.
>> > >
>> > > The Hadoop project can make releases of  the thirdparty module:
>> > > >
>> > > > <dependency>
>> > > >   <groupId>org.apache.hadoop</groupId>
>> > > >   <artifactId>hadoop-thirdparty-protobuf25</artifactId>
>> > > >   <version>1.0</version>
>> > > > </dependency>
>> > > >
>> > > >
>> > > Note that the version has to be the hadoop thirdparty release number,
>> > which
>> > > > is part of why you need to have the underlying version in the
>> artifact
>> > > > name. These we can push to maven central as new releases from
>> Hadoop.
>> > > >
>> > > >
>> > > Exactly, same has been implemented in the PR. hadoop-thirdparty module
>> > have
>> > > its own releases. But in HADOOP Jira, thirdparty versions can be
>> > > differentiated using prefix "thirdparty-".
>> > >
>> > > Same solution is being followed in HBase. May be people involved in
>> HBase
>> > > can add some points here.
>> > >
>> > > Thoughts?
>> > > >
>> > > > .. Owen
>> > > >
>> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
>> vinayakumarb@apache.org
>> > >
>> > > > wrote:
>> > > >
>> > > >> Hi All,
>> > > >>
>> > > >>    I wanted to discuss about the separate repo for thirdparty
>> > > dependencies
>> > > >> which we need to shaded and include in Hadoop component's jars.
>> > > >>
>> > > >>    Apologies for the big text ahead, but this needs clear
>> > explanation!!
>> > > >>
>> > > >>    Right now most needed such dependency is protobuf. Protobuf
>> > > dependency
>> > > >> was not upgraded from 2.5.0 onwards with the fear that downstream
>> > > builds,
>> > > >> which depends on transitive dependency protobuf coming from
>> hadoop's
>> > > jars,
>> > > >> may fail with the upgrade. Apparently protobuf does not guarantee
>> > source
>> > > >> compatibility, though it guarantees wire compatibility between
>> > versions.
>> > > >> Because of this behavior, version upgrade may cause breakage in
>> known
>> > > and
>> > > >> unknown (private?) downstreams.
>> > > >>
>> > > >>    So to tackle this, we came up the following proposal in
>> > HADOOP-13363.
>> > > >>
>> > > >>    Luckily, As far as I know, no APIs, either public to user or
>> > between
>> > > >> Hadoop processes, is not directly using protobuf classes in
>> > signatures.
>> > > >> (If
>> > > >> any exist, please let us know).
>> > > >>
>> > > >>    Proposal:
>> > > >>    ------------
>> > > >>
>> > > >>    1. Create a artifact(s) which contains shaded dependencies. All
>> > such
>> > > >> shading/relocation will be with known prefix
>> > > >> **org.apache.hadoop.thirdparty.**.
>> > > >>    2. Right now protobuf jar (ex:
>> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
>> > > >> to start with, all **com.google.protobuf** classes will be
>> relocated
>> > as
>> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
>> > > >>    3. Hadoop modules, which needs protobuf as dependency, will add
>> > this
>> > > >> shaded artifact as dependency (ex:
>> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
>> > > >>    4. All previous usages of "com.google.protobuf" will be
>> relocated
>> > to
>> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and
>> > will
>> > > be
>> > > >> committed. Please note, this replacement is One-Time directly in
>> > source
>> > > >> code, NOT during compile and package.
>> > > >>    5. Once all usages of "com.google.protobuf" is relocated, then
>> > hadoop
>> > > >> dont care about which version of original  "protobuf-java" is in
>> > > >> dependency.
>> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
>> break
>> > > the
>> > > >> downstreams. But hadoop will be originally using the latest
>> protobuf
>> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
>> > > >>
>> > > >>    7. Coming back to separate repo, Following are most appropriate
>> > > reasons
>> > > >> of keeping shaded dependency artifact in separate repo instead of
>> > > >> submodule.
>> > > >>
>> > > >>       7a. These artifacts need not be built all the time. It needs
>> to
>> > be
>> > > >> built only when there is a change in the dependency version or the
>> > build
>> > > >> process.
>> > > >>       7b. If added as "submodule in Hadoop repo",
>> > > maven-shade-plugin:shade
>> > > >> will execute only in package phase. That means, "mvn compile" or
>> "mvn
>> > > >> test-compile" will not be failed as this artifact will not have
>> > > relocated
>> > > >> classes, instead it will have original classes, resulting in
>> > compilation
>> > > >> failure. Workaround, build thirdparty submodule first and exclude
>> > > >> "thirdparty" submodule in other executions. This will be a complex
>> > > process
>> > > >> compared to keeping in a separate repo.
>> > > >>
>> > > >>       7c. Separate repo, will be a subproject of Hadoop, using the
>> > same
>> > > >> HADOOP jira project, with different versioning prefixed with
>> > > "thirdparty-"
>> > > >> (ex: thirdparty-1.0.0).
>> > > >>       7d. Separate will have same release process as Hadoop.
>> > > >>
>> > > >>     HADOOP-13363 (
>> https://issues.apache.org/jira/browse/HADOOP-13363)
>> > > is
>> > > >> an
>> > > >> umbrella jira tracking the changes to protobuf upgrade.
>> > > >>
>> > > >>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
>> been
>> > > >> raised
>> > > >> for separate repo creation in (HADOOP-16595 (
>> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
>> > > >>
>> > > >>     Please provide your inputs for the proposal and review the PR
>> to
>> > > >> proceed with the proposal.
>> > > >>
>> > > >>
>> > > >    -Thanks,
>> > > >>     Vinay
>> > > >>
>> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
>> > > >> vinodkv@apache.org>
>> > > >> wrote:
>> > > >>
>> > > >> > Moving the thread to the dev lists.
>> > > >> >
>> > > >> > Thanks
>> > > >> > +Vinod
>> > > >> >
>> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
>> > > vinayakumarb@apache.org>
>> > > >> > wrote:
>> > > >> > >
>> > > >> > > Thanks Marton,
>> > > >> > >
>> > > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
>> > > >> > > Whether to use that repo  for shaded artifact or not will be
>> > > >> monitored in
>> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
>> > discussion.
>> > > >> > >
>> > > >> > > There is no existing codebase is being moved out of hadoop
>> repo.
>> > So
>> > > I
>> > > >> > think
>> > > >> > > right now we are good to go.
>> > > >> > >
>> > > >> > > -Vinay
>> > > >> > >
>> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org>
>> > > wrote:
>> > > >> > >
>> > > >> > >>
>> > > >> > >> I am not sure if it's defined when is a vote required.
>> > > >> > >>
>> > > >> > >> https://www.apache.org/foundation/voting.html
>> > > >> > >>
>> > > >> > >> Personally I think it's a big enough change to send a
>> > notification
>> > > to
>> > > >> > the
>> > > >> > >> dev lists with a 'lazy consensus'  closure
>> > > >> > >>
>> > > >> > >> Marton
>> > > >> > >>
>> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
>> vinayakumarb@apache.org>
>> > > >> wrote:
>> > > >> > >>> Hi,
>> > > >> > >>>
>> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be
>> more
>> > in
>> > > >> > >> future)
>> > > >> > >>> will be kept as a shaded artifact in a separate repo, which
>> will
>> > > be
>> > > >> > >>> referred as dependency in hadoop modules.  This approach
>> avoids
>> > > >> shading
>> > > >> > >> of
>> > > >> > >>> every submodule during build.
>> > > >> > >>>
>> > > >> > >>> So question is does any VOTE required before asking to
>> create a
>> > > git
>> > > >> > repo?
>> > > >> > >>>
>> > > >> > >>> On selfserve platform
>> > > https://gitbox.apache.org/setup/newrepo.html
>> > > >> > >>> I can access see that, requester should be PMC.
>> > > >> > >>>
>> > > >> > >>> Wanted to confirm here first.
>> > > >> > >>>
>> > > >> > >>> -Vinay
>> > > >> > >>>
>> > > >> > >>
>> > > >> > >>
>> > > ---------------------------------------------------------------------
>> > > >> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
>> > > >> > >> For additional commands, e-mail:
>> private-help@hadoop.apache.org
>> > > >> > >>
>> > > >> > >>
>> > > >> >
>> > > >> >
>> > > >>
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Vinayakumar B <vi...@apache.org>.
Hi,
Sorry for the late reply,.
>>> To be exact, how can we better use the thirdparty repo? Looking at
HBase as an example, it looks like everything that are known to break a lot
after an update get shaded into the hbase-thirdparty artifact: guava,
netty, ... etc.
Is it the purpose to isolate these naughty dependencies?
Yes, shading is to isolate these naughty dependencies from downstream
classpath and have independent control on these upgrades without breaking
downstreams.

First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create the
protobuf shaded jar is ready to merge.

Please take a look if anyone interested, will be merged may be after two
days if no objections.

-Vinay


On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org> wrote:

> Hi I am late to this but I am keen to understand more.
>
> To be exact, how can we better use the thirdparty repo? Looking at HBase
> as an example, it looks like everything that are known to break a lot after
> an update get shaded into the hbase-thirdparty artifact: guava, netty, ...
> etc.
> Is it the purpose to isolate these naughty dependencies?
>
> On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vi...@apache.org>
> wrote:
>
>> Hi All,
>>
>> I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
>> 's suggestions.
>>
>>     i. Renamed the module to 'hadoop-shaded-protobuf37'
>>     ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
>>
>> Please review!!
>>
>> Thanks,
>> -Vinay
>>
>>
>> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <pa...@gmail.com>
>> wrote:
>>
>> > For HBase we have a separated repo for hbase-thirdparty
>> >
>> > https://github.com/apache/hbase-thirdparty
>> >
>> > We will publish the artifacts to nexus so we do not need to include
>> > binaries in our git repo, just add a dependency in the pom.
>> >
>> >
>> >
>> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
>> >
>> >
>> > And it has its own release cycles, only when there are special
>> requirements
>> > or we want to upgrade some of the dependencies. This is the vote thread
>> for
>> > the newest release, where we want to provide a shaded gson for jdk7.
>> >
>> >
>> >
>> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
>> >
>> >
>> > Thanks.
>> >
>> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
>> >
>> > > Please find replies inline.
>> > >
>> > > -Vinay
>> > >
>> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
>> owen.omalley@gmail.com>
>> > > wrote:
>> > >
>> > > > I'm very unhappy with this direction. In particular, I don't think
>> git
>> > is
>> > > > a good place for distribution of binary artifacts. Furthermore, the
>> PMC
>> > > > shouldn't be releasing anything without a release vote.
>> > > >
>> > > >
>> > > Proposed solution doesnt release any binaries in git. Its actually a
>> > > complete sub-project which follows entire release process, including
>> VOTE
>> > > in public. I have mentioned already that release process is similar to
>> > > hadoop.
>> > > To be specific, using the (almost) same script used in hadoop to
>> generate
>> > > artifacts, sign and deploy to staging repository. Please let me know
>> If I
>> > > am conveying anything wrong.
>> > >
>> > >
>> > > > I'd propose that we make a third party module that contains the
>> > *source*
>> > > > of the pom files to build the relocated jars. This should
>> absolutely be
>> > > > treated as a last resort for the mostly Google projects that
>> regularly
>> > > > break binary compatibility (eg. Protobuf & Guava).
>> > > >
>> > > >
>> > > Same has been implemented in the PR
>> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check and
>> let
>> > > me
>> > > know If I misunderstood. Yes, this is the last option we have AFAIK.
>> > >
>> > >
>> > > > In terms of naming, I'd propose something like:
>> > > >
>> > > > org.apache.hadoop.thirdparty.protobuf2_5
>> > > > org.apache.hadoop.thirdparty.guava28
>> > > >
>> > > > In particular, I think we absolutely need to include the version of
>> the
>> > > > underlying project. On the other hand, since we should not be
>> shading
>> > > > *everything* we can drop the leading com.google.
>> > > >
>> > > >
>> > > IMO, This naming convention is easy for identifying the underlying
>> > project,
>> > > but  it will be difficult to maintain going forward if underlying
>> project
>> > > versions changes. Since thirdparty module have its own releases, each
>> of
>> > > those release can be mapped to specific version of underlying project.
>> > Even
>> > > the binary artifact can include a MANIFEST with underlying project
>> > details
>> > > as per Steve's suggestion on HADOOP-13363.
>> > > That said, if you still prefer to have project number in artifact id,
>> it
>> > > can be done.
>> > >
>> > > The Hadoop project can make releases of  the thirdparty module:
>> > > >
>> > > > <dependency>
>> > > >   <groupId>org.apache.hadoop</groupId>
>> > > >   <artifactId>hadoop-thirdparty-protobuf25</artifactId>
>> > > >   <version>1.0</version>
>> > > > </dependency>
>> > > >
>> > > >
>> > > Note that the version has to be the hadoop thirdparty release number,
>> > which
>> > > > is part of why you need to have the underlying version in the
>> artifact
>> > > > name. These we can push to maven central as new releases from
>> Hadoop.
>> > > >
>> > > >
>> > > Exactly, same has been implemented in the PR. hadoop-thirdparty module
>> > have
>> > > its own releases. But in HADOOP Jira, thirdparty versions can be
>> > > differentiated using prefix "thirdparty-".
>> > >
>> > > Same solution is being followed in HBase. May be people involved in
>> HBase
>> > > can add some points here.
>> > >
>> > > Thoughts?
>> > > >
>> > > > .. Owen
>> > > >
>> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
>> vinayakumarb@apache.org
>> > >
>> > > > wrote:
>> > > >
>> > > >> Hi All,
>> > > >>
>> > > >>    I wanted to discuss about the separate repo for thirdparty
>> > > dependencies
>> > > >> which we need to shaded and include in Hadoop component's jars.
>> > > >>
>> > > >>    Apologies for the big text ahead, but this needs clear
>> > explanation!!
>> > > >>
>> > > >>    Right now most needed such dependency is protobuf. Protobuf
>> > > dependency
>> > > >> was not upgraded from 2.5.0 onwards with the fear that downstream
>> > > builds,
>> > > >> which depends on transitive dependency protobuf coming from
>> hadoop's
>> > > jars,
>> > > >> may fail with the upgrade. Apparently protobuf does not guarantee
>> > source
>> > > >> compatibility, though it guarantees wire compatibility between
>> > versions.
>> > > >> Because of this behavior, version upgrade may cause breakage in
>> known
>> > > and
>> > > >> unknown (private?) downstreams.
>> > > >>
>> > > >>    So to tackle this, we came up the following proposal in
>> > HADOOP-13363.
>> > > >>
>> > > >>    Luckily, As far as I know, no APIs, either public to user or
>> > between
>> > > >> Hadoop processes, is not directly using protobuf classes in
>> > signatures.
>> > > >> (If
>> > > >> any exist, please let us know).
>> > > >>
>> > > >>    Proposal:
>> > > >>    ------------
>> > > >>
>> > > >>    1. Create a artifact(s) which contains shaded dependencies. All
>> > such
>> > > >> shading/relocation will be with known prefix
>> > > >> **org.apache.hadoop.thirdparty.**.
>> > > >>    2. Right now protobuf jar (ex:
>> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
>> > > >> to start with, all **com.google.protobuf** classes will be
>> relocated
>> > as
>> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
>> > > >>    3. Hadoop modules, which needs protobuf as dependency, will add
>> > this
>> > > >> shaded artifact as dependency (ex:
>> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
>> > > >>    4. All previous usages of "com.google.protobuf" will be
>> relocated
>> > to
>> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and
>> > will
>> > > be
>> > > >> committed. Please note, this replacement is One-Time directly in
>> > source
>> > > >> code, NOT during compile and package.
>> > > >>    5. Once all usages of "com.google.protobuf" is relocated, then
>> > hadoop
>> > > >> dont care about which version of original  "protobuf-java" is in
>> > > >> dependency.
>> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
>> break
>> > > the
>> > > >> downstreams. But hadoop will be originally using the latest
>> protobuf
>> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
>> > > >>
>> > > >>    7. Coming back to separate repo, Following are most appropriate
>> > > reasons
>> > > >> of keeping shaded dependency artifact in separate repo instead of
>> > > >> submodule.
>> > > >>
>> > > >>       7a. These artifacts need not be built all the time. It needs
>> to
>> > be
>> > > >> built only when there is a change in the dependency version or the
>> > build
>> > > >> process.
>> > > >>       7b. If added as "submodule in Hadoop repo",
>> > > maven-shade-plugin:shade
>> > > >> will execute only in package phase. That means, "mvn compile" or
>> "mvn
>> > > >> test-compile" will not be failed as this artifact will not have
>> > > relocated
>> > > >> classes, instead it will have original classes, resulting in
>> > compilation
>> > > >> failure. Workaround, build thirdparty submodule first and exclude
>> > > >> "thirdparty" submodule in other executions. This will be a complex
>> > > process
>> > > >> compared to keeping in a separate repo.
>> > > >>
>> > > >>       7c. Separate repo, will be a subproject of Hadoop, using the
>> > same
>> > > >> HADOOP jira project, with different versioning prefixed with
>> > > "thirdparty-"
>> > > >> (ex: thirdparty-1.0.0).
>> > > >>       7d. Separate will have same release process as Hadoop.
>> > > >>
>> > > >>     HADOOP-13363 (
>> https://issues.apache.org/jira/browse/HADOOP-13363)
>> > > is
>> > > >> an
>> > > >> umbrella jira tracking the changes to protobuf upgrade.
>> > > >>
>> > > >>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
>> been
>> > > >> raised
>> > > >> for separate repo creation in (HADOOP-16595 (
>> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
>> > > >>
>> > > >>     Please provide your inputs for the proposal and review the PR
>> to
>> > > >> proceed with the proposal.
>> > > >>
>> > > >>
>> > > >    -Thanks,
>> > > >>     Vinay
>> > > >>
>> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
>> > > >> vinodkv@apache.org>
>> > > >> wrote:
>> > > >>
>> > > >> > Moving the thread to the dev lists.
>> > > >> >
>> > > >> > Thanks
>> > > >> > +Vinod
>> > > >> >
>> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
>> > > vinayakumarb@apache.org>
>> > > >> > wrote:
>> > > >> > >
>> > > >> > > Thanks Marton,
>> > > >> > >
>> > > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
>> > > >> > > Whether to use that repo  for shaded artifact or not will be
>> > > >> monitored in
>> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
>> > discussion.
>> > > >> > >
>> > > >> > > There is no existing codebase is being moved out of hadoop
>> repo.
>> > So
>> > > I
>> > > >> > think
>> > > >> > > right now we are good to go.
>> > > >> > >
>> > > >> > > -Vinay
>> > > >> > >
>> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org>
>> > > wrote:
>> > > >> > >
>> > > >> > >>
>> > > >> > >> I am not sure if it's defined when is a vote required.
>> > > >> > >>
>> > > >> > >> https://www.apache.org/foundation/voting.html
>> > > >> > >>
>> > > >> > >> Personally I think it's a big enough change to send a
>> > notification
>> > > to
>> > > >> > the
>> > > >> > >> dev lists with a 'lazy consensus'  closure
>> > > >> > >>
>> > > >> > >> Marton
>> > > >> > >>
>> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
>> vinayakumarb@apache.org>
>> > > >> wrote:
>> > > >> > >>> Hi,
>> > > >> > >>>
>> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be
>> more
>> > in
>> > > >> > >> future)
>> > > >> > >>> will be kept as a shaded artifact in a separate repo, which
>> will
>> > > be
>> > > >> > >>> referred as dependency in hadoop modules.  This approach
>> avoids
>> > > >> shading
>> > > >> > >> of
>> > > >> > >>> every submodule during build.
>> > > >> > >>>
>> > > >> > >>> So question is does any VOTE required before asking to
>> create a
>> > > git
>> > > >> > repo?
>> > > >> > >>>
>> > > >> > >>> On selfserve platform
>> > > https://gitbox.apache.org/setup/newrepo.html
>> > > >> > >>> I can access see that, requester should be PMC.
>> > > >> > >>>
>> > > >> > >>> Wanted to confirm here first.
>> > > >> > >>>
>> > > >> > >>> -Vinay
>> > > >> > >>>
>> > > >> > >>
>> > > >> > >>
>> > > ---------------------------------------------------------------------
>> > > >> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
>> > > >> > >> For additional commands, e-mail:
>> private-help@hadoop.apache.org
>> > > >> > >>
>> > > >> > >>
>> > > >> >
>> > > >> >
>> > > >>
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Vinayakumar B <vi...@apache.org>.
Hi,
Sorry for the late reply,.
>>> To be exact, how can we better use the thirdparty repo? Looking at
HBase as an example, it looks like everything that are known to break a lot
after an update get shaded into the hbase-thirdparty artifact: guava,
netty, ... etc.
Is it the purpose to isolate these naughty dependencies?
Yes, shading is to isolate these naughty dependencies from downstream
classpath and have independent control on these upgrades without breaking
downstreams.

First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create the
protobuf shaded jar is ready to merge.

Please take a look if anyone interested, will be merged may be after two
days if no objections.

-Vinay


On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org> wrote:

> Hi I am late to this but I am keen to understand more.
>
> To be exact, how can we better use the thirdparty repo? Looking at HBase
> as an example, it looks like everything that are known to break a lot after
> an update get shaded into the hbase-thirdparty artifact: guava, netty, ...
> etc.
> Is it the purpose to isolate these naughty dependencies?
>
> On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vi...@apache.org>
> wrote:
>
>> Hi All,
>>
>> I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
>> 's suggestions.
>>
>>     i. Renamed the module to 'hadoop-shaded-protobuf37'
>>     ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
>>
>> Please review!!
>>
>> Thanks,
>> -Vinay
>>
>>
>> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <pa...@gmail.com>
>> wrote:
>>
>> > For HBase we have a separated repo for hbase-thirdparty
>> >
>> > https://github.com/apache/hbase-thirdparty
>> >
>> > We will publish the artifacts to nexus so we do not need to include
>> > binaries in our git repo, just add a dependency in the pom.
>> >
>> >
>> >
>> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
>> >
>> >
>> > And it has its own release cycles, only when there are special
>> requirements
>> > or we want to upgrade some of the dependencies. This is the vote thread
>> for
>> > the newest release, where we want to provide a shaded gson for jdk7.
>> >
>> >
>> >
>> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
>> >
>> >
>> > Thanks.
>> >
>> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
>> >
>> > > Please find replies inline.
>> > >
>> > > -Vinay
>> > >
>> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
>> owen.omalley@gmail.com>
>> > > wrote:
>> > >
>> > > > I'm very unhappy with this direction. In particular, I don't think
>> git
>> > is
>> > > > a good place for distribution of binary artifacts. Furthermore, the
>> PMC
>> > > > shouldn't be releasing anything without a release vote.
>> > > >
>> > > >
>> > > Proposed solution doesnt release any binaries in git. Its actually a
>> > > complete sub-project which follows entire release process, including
>> VOTE
>> > > in public. I have mentioned already that release process is similar to
>> > > hadoop.
>> > > To be specific, using the (almost) same script used in hadoop to
>> generate
>> > > artifacts, sign and deploy to staging repository. Please let me know
>> If I
>> > > am conveying anything wrong.
>> > >
>> > >
>> > > > I'd propose that we make a third party module that contains the
>> > *source*
>> > > > of the pom files to build the relocated jars. This should
>> absolutely be
>> > > > treated as a last resort for the mostly Google projects that
>> regularly
>> > > > break binary compatibility (eg. Protobuf & Guava).
>> > > >
>> > > >
>> > > Same has been implemented in the PR
>> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check and
>> let
>> > > me
>> > > know If I misunderstood. Yes, this is the last option we have AFAIK.
>> > >
>> > >
>> > > > In terms of naming, I'd propose something like:
>> > > >
>> > > > org.apache.hadoop.thirdparty.protobuf2_5
>> > > > org.apache.hadoop.thirdparty.guava28
>> > > >
>> > > > In particular, I think we absolutely need to include the version of
>> the
>> > > > underlying project. On the other hand, since we should not be
>> shading
>> > > > *everything* we can drop the leading com.google.
>> > > >
>> > > >
>> > > IMO, This naming convention is easy for identifying the underlying
>> > project,
>> > > but  it will be difficult to maintain going forward if underlying
>> project
>> > > versions changes. Since thirdparty module have its own releases, each
>> of
>> > > those release can be mapped to specific version of underlying project.
>> > Even
>> > > the binary artifact can include a MANIFEST with underlying project
>> > details
>> > > as per Steve's suggestion on HADOOP-13363.
>> > > That said, if you still prefer to have project number in artifact id,
>> it
>> > > can be done.
>> > >
>> > > The Hadoop project can make releases of  the thirdparty module:
>> > > >
>> > > > <dependency>
>> > > >   <groupId>org.apache.hadoop</groupId>
>> > > >   <artifactId>hadoop-thirdparty-protobuf25</artifactId>
>> > > >   <version>1.0</version>
>> > > > </dependency>
>> > > >
>> > > >
>> > > Note that the version has to be the hadoop thirdparty release number,
>> > which
>> > > > is part of why you need to have the underlying version in the
>> artifact
>> > > > name. These we can push to maven central as new releases from
>> Hadoop.
>> > > >
>> > > >
>> > > Exactly, same has been implemented in the PR. hadoop-thirdparty module
>> > have
>> > > its own releases. But in HADOOP Jira, thirdparty versions can be
>> > > differentiated using prefix "thirdparty-".
>> > >
>> > > Same solution is being followed in HBase. May be people involved in
>> HBase
>> > > can add some points here.
>> > >
>> > > Thoughts?
>> > > >
>> > > > .. Owen
>> > > >
>> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
>> vinayakumarb@apache.org
>> > >
>> > > > wrote:
>> > > >
>> > > >> Hi All,
>> > > >>
>> > > >>    I wanted to discuss about the separate repo for thirdparty
>> > > dependencies
>> > > >> which we need to shaded and include in Hadoop component's jars.
>> > > >>
>> > > >>    Apologies for the big text ahead, but this needs clear
>> > explanation!!
>> > > >>
>> > > >>    Right now most needed such dependency is protobuf. Protobuf
>> > > dependency
>> > > >> was not upgraded from 2.5.0 onwards with the fear that downstream
>> > > builds,
>> > > >> which depends on transitive dependency protobuf coming from
>> hadoop's
>> > > jars,
>> > > >> may fail with the upgrade. Apparently protobuf does not guarantee
>> > source
>> > > >> compatibility, though it guarantees wire compatibility between
>> > versions.
>> > > >> Because of this behavior, version upgrade may cause breakage in
>> known
>> > > and
>> > > >> unknown (private?) downstreams.
>> > > >>
>> > > >>    So to tackle this, we came up the following proposal in
>> > HADOOP-13363.
>> > > >>
>> > > >>    Luckily, As far as I know, no APIs, either public to user or
>> > between
>> > > >> Hadoop processes, is not directly using protobuf classes in
>> > signatures.
>> > > >> (If
>> > > >> any exist, please let us know).
>> > > >>
>> > > >>    Proposal:
>> > > >>    ------------
>> > > >>
>> > > >>    1. Create a artifact(s) which contains shaded dependencies. All
>> > such
>> > > >> shading/relocation will be with known prefix
>> > > >> **org.apache.hadoop.thirdparty.**.
>> > > >>    2. Right now protobuf jar (ex:
>> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
>> > > >> to start with, all **com.google.protobuf** classes will be
>> relocated
>> > as
>> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
>> > > >>    3. Hadoop modules, which needs protobuf as dependency, will add
>> > this
>> > > >> shaded artifact as dependency (ex:
>> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
>> > > >>    4. All previous usages of "com.google.protobuf" will be
>> relocated
>> > to
>> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and
>> > will
>> > > be
>> > > >> committed. Please note, this replacement is One-Time directly in
>> > source
>> > > >> code, NOT during compile and package.
>> > > >>    5. Once all usages of "com.google.protobuf" is relocated, then
>> > hadoop
>> > > >> dont care about which version of original  "protobuf-java" is in
>> > > >> dependency.
>> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
>> break
>> > > the
>> > > >> downstreams. But hadoop will be originally using the latest
>> protobuf
>> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
>> > > >>
>> > > >>    7. Coming back to separate repo, Following are most appropriate
>> > > reasons
>> > > >> of keeping shaded dependency artifact in separate repo instead of
>> > > >> submodule.
>> > > >>
>> > > >>       7a. These artifacts need not be built all the time. It needs
>> to
>> > be
>> > > >> built only when there is a change in the dependency version or the
>> > build
>> > > >> process.
>> > > >>       7b. If added as "submodule in Hadoop repo",
>> > > maven-shade-plugin:shade
>> > > >> will execute only in package phase. That means, "mvn compile" or
>> "mvn
>> > > >> test-compile" will not be failed as this artifact will not have
>> > > relocated
>> > > >> classes, instead it will have original classes, resulting in
>> > compilation
>> > > >> failure. Workaround, build thirdparty submodule first and exclude
>> > > >> "thirdparty" submodule in other executions. This will be a complex
>> > > process
>> > > >> compared to keeping in a separate repo.
>> > > >>
>> > > >>       7c. Separate repo, will be a subproject of Hadoop, using the
>> > same
>> > > >> HADOOP jira project, with different versioning prefixed with
>> > > "thirdparty-"
>> > > >> (ex: thirdparty-1.0.0).
>> > > >>       7d. Separate will have same release process as Hadoop.
>> > > >>
>> > > >>     HADOOP-13363 (
>> https://issues.apache.org/jira/browse/HADOOP-13363)
>> > > is
>> > > >> an
>> > > >> umbrella jira tracking the changes to protobuf upgrade.
>> > > >>
>> > > >>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
>> been
>> > > >> raised
>> > > >> for separate repo creation in (HADOOP-16595 (
>> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
>> > > >>
>> > > >>     Please provide your inputs for the proposal and review the PR
>> to
>> > > >> proceed with the proposal.
>> > > >>
>> > > >>
>> > > >    -Thanks,
>> > > >>     Vinay
>> > > >>
>> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
>> > > >> vinodkv@apache.org>
>> > > >> wrote:
>> > > >>
>> > > >> > Moving the thread to the dev lists.
>> > > >> >
>> > > >> > Thanks
>> > > >> > +Vinod
>> > > >> >
>> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
>> > > vinayakumarb@apache.org>
>> > > >> > wrote:
>> > > >> > >
>> > > >> > > Thanks Marton,
>> > > >> > >
>> > > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
>> > > >> > > Whether to use that repo  for shaded artifact or not will be
>> > > >> monitored in
>> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
>> > discussion.
>> > > >> > >
>> > > >> > > There is no existing codebase is being moved out of hadoop
>> repo.
>> > So
>> > > I
>> > > >> > think
>> > > >> > > right now we are good to go.
>> > > >> > >
>> > > >> > > -Vinay
>> > > >> > >
>> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org>
>> > > wrote:
>> > > >> > >
>> > > >> > >>
>> > > >> > >> I am not sure if it's defined when is a vote required.
>> > > >> > >>
>> > > >> > >> https://www.apache.org/foundation/voting.html
>> > > >> > >>
>> > > >> > >> Personally I think it's a big enough change to send a
>> > notification
>> > > to
>> > > >> > the
>> > > >> > >> dev lists with a 'lazy consensus'  closure
>> > > >> > >>
>> > > >> > >> Marton
>> > > >> > >>
>> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
>> vinayakumarb@apache.org>
>> > > >> wrote:
>> > > >> > >>> Hi,
>> > > >> > >>>
>> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be
>> more
>> > in
>> > > >> > >> future)
>> > > >> > >>> will be kept as a shaded artifact in a separate repo, which
>> will
>> > > be
>> > > >> > >>> referred as dependency in hadoop modules.  This approach
>> avoids
>> > > >> shading
>> > > >> > >> of
>> > > >> > >>> every submodule during build.
>> > > >> > >>>
>> > > >> > >>> So question is does any VOTE required before asking to
>> create a
>> > > git
>> > > >> > repo?
>> > > >> > >>>
>> > > >> > >>> On selfserve platform
>> > > https://gitbox.apache.org/setup/newrepo.html
>> > > >> > >>> I can access see that, requester should be PMC.
>> > > >> > >>>
>> > > >> > >>> Wanted to confirm here first.
>> > > >> > >>>
>> > > >> > >>> -Vinay
>> > > >> > >>>
>> > > >> > >>
>> > > >> > >>
>> > > ---------------------------------------------------------------------
>> > > >> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
>> > > >> > >> For additional commands, e-mail:
>> private-help@hadoop.apache.org
>> > > >> > >>
>> > > >> > >>
>> > > >> >
>> > > >> >
>> > > >>
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Vinayakumar B <vi...@apache.org>.
Hi,
Sorry for the late reply,.
>>> To be exact, how can we better use the thirdparty repo? Looking at
HBase as an example, it looks like everything that are known to break a lot
after an update get shaded into the hbase-thirdparty artifact: guava,
netty, ... etc.
Is it the purpose to isolate these naughty dependencies?
Yes, shading is to isolate these naughty dependencies from downstream
classpath and have independent control on these upgrades without breaking
downstreams.

First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create the
protobuf shaded jar is ready to merge.

Please take a look if anyone interested, will be merged may be after two
days if no objections.

-Vinay


On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <we...@apache.org> wrote:

> Hi I am late to this but I am keen to understand more.
>
> To be exact, how can we better use the thirdparty repo? Looking at HBase
> as an example, it looks like everything that are known to break a lot after
> an update get shaded into the hbase-thirdparty artifact: guava, netty, ...
> etc.
> Is it the purpose to isolate these naughty dependencies?
>
> On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vi...@apache.org>
> wrote:
>
>> Hi All,
>>
>> I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
>> 's suggestions.
>>
>>     i. Renamed the module to 'hadoop-shaded-protobuf37'
>>     ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
>>
>> Please review!!
>>
>> Thanks,
>> -Vinay
>>
>>
>> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <pa...@gmail.com>
>> wrote:
>>
>> > For HBase we have a separated repo for hbase-thirdparty
>> >
>> > https://github.com/apache/hbase-thirdparty
>> >
>> > We will publish the artifacts to nexus so we do not need to include
>> > binaries in our git repo, just add a dependency in the pom.
>> >
>> >
>> >
>> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
>> >
>> >
>> > And it has its own release cycles, only when there are special
>> requirements
>> > or we want to upgrade some of the dependencies. This is the vote thread
>> for
>> > the newest release, where we want to provide a shaded gson for jdk7.
>> >
>> >
>> >
>> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
>> >
>> >
>> > Thanks.
>> >
>> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
>> >
>> > > Please find replies inline.
>> > >
>> > > -Vinay
>> > >
>> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
>> owen.omalley@gmail.com>
>> > > wrote:
>> > >
>> > > > I'm very unhappy with this direction. In particular, I don't think
>> git
>> > is
>> > > > a good place for distribution of binary artifacts. Furthermore, the
>> PMC
>> > > > shouldn't be releasing anything without a release vote.
>> > > >
>> > > >
>> > > Proposed solution doesnt release any binaries in git. Its actually a
>> > > complete sub-project which follows entire release process, including
>> VOTE
>> > > in public. I have mentioned already that release process is similar to
>> > > hadoop.
>> > > To be specific, using the (almost) same script used in hadoop to
>> generate
>> > > artifacts, sign and deploy to staging repository. Please let me know
>> If I
>> > > am conveying anything wrong.
>> > >
>> > >
>> > > > I'd propose that we make a third party module that contains the
>> > *source*
>> > > > of the pom files to build the relocated jars. This should
>> absolutely be
>> > > > treated as a last resort for the mostly Google projects that
>> regularly
>> > > > break binary compatibility (eg. Protobuf & Guava).
>> > > >
>> > > >
>> > > Same has been implemented in the PR
>> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check and
>> let
>> > > me
>> > > know If I misunderstood. Yes, this is the last option we have AFAIK.
>> > >
>> > >
>> > > > In terms of naming, I'd propose something like:
>> > > >
>> > > > org.apache.hadoop.thirdparty.protobuf2_5
>> > > > org.apache.hadoop.thirdparty.guava28
>> > > >
>> > > > In particular, I think we absolutely need to include the version of
>> the
>> > > > underlying project. On the other hand, since we should not be
>> shading
>> > > > *everything* we can drop the leading com.google.
>> > > >
>> > > >
>> > > IMO, This naming convention is easy for identifying the underlying
>> > project,
>> > > but  it will be difficult to maintain going forward if underlying
>> project
>> > > versions changes. Since thirdparty module have its own releases, each
>> of
>> > > those release can be mapped to specific version of underlying project.
>> > Even
>> > > the binary artifact can include a MANIFEST with underlying project
>> > details
>> > > as per Steve's suggestion on HADOOP-13363.
>> > > That said, if you still prefer to have project number in artifact id,
>> it
>> > > can be done.
>> > >
>> > > The Hadoop project can make releases of  the thirdparty module:
>> > > >
>> > > > <dependency>
>> > > >   <groupId>org.apache.hadoop</groupId>
>> > > >   <artifactId>hadoop-thirdparty-protobuf25</artifactId>
>> > > >   <version>1.0</version>
>> > > > </dependency>
>> > > >
>> > > >
>> > > Note that the version has to be the hadoop thirdparty release number,
>> > which
>> > > > is part of why you need to have the underlying version in the
>> artifact
>> > > > name. These we can push to maven central as new releases from
>> Hadoop.
>> > > >
>> > > >
>> > > Exactly, same has been implemented in the PR. hadoop-thirdparty module
>> > have
>> > > its own releases. But in HADOOP Jira, thirdparty versions can be
>> > > differentiated using prefix "thirdparty-".
>> > >
>> > > Same solution is being followed in HBase. May be people involved in
>> HBase
>> > > can add some points here.
>> > >
>> > > Thoughts?
>> > > >
>> > > > .. Owen
>> > > >
>> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
>> vinayakumarb@apache.org
>> > >
>> > > > wrote:
>> > > >
>> > > >> Hi All,
>> > > >>
>> > > >>    I wanted to discuss about the separate repo for thirdparty
>> > > dependencies
>> > > >> which we need to shaded and include in Hadoop component's jars.
>> > > >>
>> > > >>    Apologies for the big text ahead, but this needs clear
>> > explanation!!
>> > > >>
>> > > >>    Right now most needed such dependency is protobuf. Protobuf
>> > > dependency
>> > > >> was not upgraded from 2.5.0 onwards with the fear that downstream
>> > > builds,
>> > > >> which depends on transitive dependency protobuf coming from
>> hadoop's
>> > > jars,
>> > > >> may fail with the upgrade. Apparently protobuf does not guarantee
>> > source
>> > > >> compatibility, though it guarantees wire compatibility between
>> > versions.
>> > > >> Because of this behavior, version upgrade may cause breakage in
>> known
>> > > and
>> > > >> unknown (private?) downstreams.
>> > > >>
>> > > >>    So to tackle this, we came up the following proposal in
>> > HADOOP-13363.
>> > > >>
>> > > >>    Luckily, As far as I know, no APIs, either public to user or
>> > between
>> > > >> Hadoop processes, is not directly using protobuf classes in
>> > signatures.
>> > > >> (If
>> > > >> any exist, please let us know).
>> > > >>
>> > > >>    Proposal:
>> > > >>    ------------
>> > > >>
>> > > >>    1. Create a artifact(s) which contains shaded dependencies. All
>> > such
>> > > >> shading/relocation will be with known prefix
>> > > >> **org.apache.hadoop.thirdparty.**.
>> > > >>    2. Right now protobuf jar (ex:
>> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
>> > > >> to start with, all **com.google.protobuf** classes will be
>> relocated
>> > as
>> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
>> > > >>    3. Hadoop modules, which needs protobuf as dependency, will add
>> > this
>> > > >> shaded artifact as dependency (ex:
>> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
>> > > >>    4. All previous usages of "com.google.protobuf" will be
>> relocated
>> > to
>> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and
>> > will
>> > > be
>> > > >> committed. Please note, this replacement is One-Time directly in
>> > source
>> > > >> code, NOT during compile and package.
>> > > >>    5. Once all usages of "com.google.protobuf" is relocated, then
>> > hadoop
>> > > >> dont care about which version of original  "protobuf-java" is in
>> > > >> dependency.
>> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
>> break
>> > > the
>> > > >> downstreams. But hadoop will be originally using the latest
>> protobuf
>> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
>> > > >>
>> > > >>    7. Coming back to separate repo, Following are most appropriate
>> > > reasons
>> > > >> of keeping shaded dependency artifact in separate repo instead of
>> > > >> submodule.
>> > > >>
>> > > >>       7a. These artifacts need not be built all the time. It needs
>> to
>> > be
>> > > >> built only when there is a change in the dependency version or the
>> > build
>> > > >> process.
>> > > >>       7b. If added as "submodule in Hadoop repo",
>> > > maven-shade-plugin:shade
>> > > >> will execute only in package phase. That means, "mvn compile" or
>> "mvn
>> > > >> test-compile" will not be failed as this artifact will not have
>> > > relocated
>> > > >> classes, instead it will have original classes, resulting in
>> > compilation
>> > > >> failure. Workaround, build thirdparty submodule first and exclude
>> > > >> "thirdparty" submodule in other executions. This will be a complex
>> > > process
>> > > >> compared to keeping in a separate repo.
>> > > >>
>> > > >>       7c. Separate repo, will be a subproject of Hadoop, using the
>> > same
>> > > >> HADOOP jira project, with different versioning prefixed with
>> > > "thirdparty-"
>> > > >> (ex: thirdparty-1.0.0).
>> > > >>       7d. Separate will have same release process as Hadoop.
>> > > >>
>> > > >>     HADOOP-13363 (
>> https://issues.apache.org/jira/browse/HADOOP-13363)
>> > > is
>> > > >> an
>> > > >> umbrella jira tracking the changes to protobuf upgrade.
>> > > >>
>> > > >>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
>> been
>> > > >> raised
>> > > >> for separate repo creation in (HADOOP-16595 (
>> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
>> > > >>
>> > > >>     Please provide your inputs for the proposal and review the PR
>> to
>> > > >> proceed with the proposal.
>> > > >>
>> > > >>
>> > > >    -Thanks,
>> > > >>     Vinay
>> > > >>
>> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
>> > > >> vinodkv@apache.org>
>> > > >> wrote:
>> > > >>
>> > > >> > Moving the thread to the dev lists.
>> > > >> >
>> > > >> > Thanks
>> > > >> > +Vinod
>> > > >> >
>> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
>> > > vinayakumarb@apache.org>
>> > > >> > wrote:
>> > > >> > >
>> > > >> > > Thanks Marton,
>> > > >> > >
>> > > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
>> > > >> > > Whether to use that repo  for shaded artifact or not will be
>> > > >> monitored in
>> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
>> > discussion.
>> > > >> > >
>> > > >> > > There is no existing codebase is being moved out of hadoop
>> repo.
>> > So
>> > > I
>> > > >> > think
>> > > >> > > right now we are good to go.
>> > > >> > >
>> > > >> > > -Vinay
>> > > >> > >
>> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org>
>> > > wrote:
>> > > >> > >
>> > > >> > >>
>> > > >> > >> I am not sure if it's defined when is a vote required.
>> > > >> > >>
>> > > >> > >> https://www.apache.org/foundation/voting.html
>> > > >> > >>
>> > > >> > >> Personally I think it's a big enough change to send a
>> > notification
>> > > to
>> > > >> > the
>> > > >> > >> dev lists with a 'lazy consensus'  closure
>> > > >> > >>
>> > > >> > >> Marton
>> > > >> > >>
>> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
>> vinayakumarb@apache.org>
>> > > >> wrote:
>> > > >> > >>> Hi,
>> > > >> > >>>
>> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be
>> more
>> > in
>> > > >> > >> future)
>> > > >> > >>> will be kept as a shaded artifact in a separate repo, which
>> will
>> > > be
>> > > >> > >>> referred as dependency in hadoop modules.  This approach
>> avoids
>> > > >> shading
>> > > >> > >> of
>> > > >> > >>> every submodule during build.
>> > > >> > >>>
>> > > >> > >>> So question is does any VOTE required before asking to
>> create a
>> > > git
>> > > >> > repo?
>> > > >> > >>>
>> > > >> > >>> On selfserve platform
>> > > https://gitbox.apache.org/setup/newrepo.html
>> > > >> > >>> I can access see that, requester should be PMC.
>> > > >> > >>>
>> > > >> > >>> Wanted to confirm here first.
>> > > >> > >>>
>> > > >> > >>> -Vinay
>> > > >> > >>>
>> > > >> > >>
>> > > >> > >>
>> > > ---------------------------------------------------------------------
>> > > >> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
>> > > >> > >> For additional commands, e-mail:
>> private-help@hadoop.apache.org
>> > > >> > >>
>> > > >> > >>
>> > > >> >
>> > > >> >
>> > > >>
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Wei-Chiu Chuang <we...@apache.org>.
Hi I am late to this but I am keen to understand more.

To be exact, how can we better use the thirdparty repo? Looking at HBase as
an example, it looks like everything that are known to break a lot after an
update get shaded into the hbase-thirdparty artifact: guava, netty, ... etc.
Is it the purpose to isolate these naughty dependencies?

On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vi...@apache.org>
wrote:

> Hi All,
>
> I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
> 's suggestions.
>
>     i. Renamed the module to 'hadoop-shaded-protobuf37'
>     ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
>
> Please review!!
>
> Thanks,
> -Vinay
>
>
> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <pa...@gmail.com>
> wrote:
>
> > For HBase we have a separated repo for hbase-thirdparty
> >
> > https://github.com/apache/hbase-thirdparty
> >
> > We will publish the artifacts to nexus so we do not need to include
> > binaries in our git repo, just add a dependency in the pom.
> >
> >
> >
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> >
> >
> > And it has its own release cycles, only when there are special
> requirements
> > or we want to upgrade some of the dependencies. This is the vote thread
> for
> > the newest release, where we want to provide a shaded gson for jdk7.
> >
> >
> >
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> >
> >
> > Thanks.
> >
> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
> >
> > > Please find replies inline.
> > >
> > > -Vinay
> > >
> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <owen.omalley@gmail.com
> >
> > > wrote:
> > >
> > > > I'm very unhappy with this direction. In particular, I don't think
> git
> > is
> > > > a good place for distribution of binary artifacts. Furthermore, the
> PMC
> > > > shouldn't be releasing anything without a release vote.
> > > >
> > > >
> > > Proposed solution doesnt release any binaries in git. Its actually a
> > > complete sub-project which follows entire release process, including
> VOTE
> > > in public. I have mentioned already that release process is similar to
> > > hadoop.
> > > To be specific, using the (almost) same script used in hadoop to
> generate
> > > artifacts, sign and deploy to staging repository. Please let me know
> If I
> > > am conveying anything wrong.
> > >
> > >
> > > > I'd propose that we make a third party module that contains the
> > *source*
> > > > of the pom files to build the relocated jars. This should absolutely
> be
> > > > treated as a last resort for the mostly Google projects that
> regularly
> > > > break binary compatibility (eg. Protobuf & Guava).
> > > >
> > > >
> > > Same has been implemented in the PR
> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check and
> let
> > > me
> > > know If I misunderstood. Yes, this is the last option we have AFAIK.
> > >
> > >
> > > > In terms of naming, I'd propose something like:
> > > >
> > > > org.apache.hadoop.thirdparty.protobuf2_5
> > > > org.apache.hadoop.thirdparty.guava28
> > > >
> > > > In particular, I think we absolutely need to include the version of
> the
> > > > underlying project. On the other hand, since we should not be shading
> > > > *everything* we can drop the leading com.google.
> > > >
> > > >
> > > IMO, This naming convention is easy for identifying the underlying
> > project,
> > > but  it will be difficult to maintain going forward if underlying
> project
> > > versions changes. Since thirdparty module have its own releases, each
> of
> > > those release can be mapped to specific version of underlying project.
> > Even
> > > the binary artifact can include a MANIFEST with underlying project
> > details
> > > as per Steve's suggestion on HADOOP-13363.
> > > That said, if you still prefer to have project number in artifact id,
> it
> > > can be done.
> > >
> > > The Hadoop project can make releases of  the thirdparty module:
> > > >
> > > > <dependency>
> > > >   <groupId>org.apache.hadoop</groupId>
> > > >   <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> > > >   <version>1.0</version>
> > > > </dependency>
> > > >
> > > >
> > > Note that the version has to be the hadoop thirdparty release number,
> > which
> > > > is part of why you need to have the underlying version in the
> artifact
> > > > name. These we can push to maven central as new releases from Hadoop.
> > > >
> > > >
> > > Exactly, same has been implemented in the PR. hadoop-thirdparty module
> > have
> > > its own releases. But in HADOOP Jira, thirdparty versions can be
> > > differentiated using prefix "thirdparty-".
> > >
> > > Same solution is being followed in HBase. May be people involved in
> HBase
> > > can add some points here.
> > >
> > > Thoughts?
> > > >
> > > > .. Owen
> > > >
> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> vinayakumarb@apache.org
> > >
> > > > wrote:
> > > >
> > > >> Hi All,
> > > >>
> > > >>    I wanted to discuss about the separate repo for thirdparty
> > > dependencies
> > > >> which we need to shaded and include in Hadoop component's jars.
> > > >>
> > > >>    Apologies for the big text ahead, but this needs clear
> > explanation!!
> > > >>
> > > >>    Right now most needed such dependency is protobuf. Protobuf
> > > dependency
> > > >> was not upgraded from 2.5.0 onwards with the fear that downstream
> > > builds,
> > > >> which depends on transitive dependency protobuf coming from hadoop's
> > > jars,
> > > >> may fail with the upgrade. Apparently protobuf does not guarantee
> > source
> > > >> compatibility, though it guarantees wire compatibility between
> > versions.
> > > >> Because of this behavior, version upgrade may cause breakage in
> known
> > > and
> > > >> unknown (private?) downstreams.
> > > >>
> > > >>    So to tackle this, we came up the following proposal in
> > HADOOP-13363.
> > > >>
> > > >>    Luckily, As far as I know, no APIs, either public to user or
> > between
> > > >> Hadoop processes, is not directly using protobuf classes in
> > signatures.
> > > >> (If
> > > >> any exist, please let us know).
> > > >>
> > > >>    Proposal:
> > > >>    ------------
> > > >>
> > > >>    1. Create a artifact(s) which contains shaded dependencies. All
> > such
> > > >> shading/relocation will be with known prefix
> > > >> **org.apache.hadoop.thirdparty.**.
> > > >>    2. Right now protobuf jar (ex:
> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
> > > >> to start with, all **com.google.protobuf** classes will be relocated
> > as
> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> > > >>    3. Hadoop modules, which needs protobuf as dependency, will add
> > this
> > > >> shaded artifact as dependency (ex:
> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> > > >>    4. All previous usages of "com.google.protobuf" will be relocated
> > to
> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and
> > will
> > > be
> > > >> committed. Please note, this replacement is One-Time directly in
> > source
> > > >> code, NOT during compile and package.
> > > >>    5. Once all usages of "com.google.protobuf" is relocated, then
> > hadoop
> > > >> dont care about which version of original  "protobuf-java" is in
> > > >> dependency.
> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
> break
> > > the
> > > >> downstreams. But hadoop will be originally using the latest protobuf
> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> > > >>
> > > >>    7. Coming back to separate repo, Following are most appropriate
> > > reasons
> > > >> of keeping shaded dependency artifact in separate repo instead of
> > > >> submodule.
> > > >>
> > > >>       7a. These artifacts need not be built all the time. It needs
> to
> > be
> > > >> built only when there is a change in the dependency version or the
> > build
> > > >> process.
> > > >>       7b. If added as "submodule in Hadoop repo",
> > > maven-shade-plugin:shade
> > > >> will execute only in package phase. That means, "mvn compile" or
> "mvn
> > > >> test-compile" will not be failed as this artifact will not have
> > > relocated
> > > >> classes, instead it will have original classes, resulting in
> > compilation
> > > >> failure. Workaround, build thirdparty submodule first and exclude
> > > >> "thirdparty" submodule in other executions. This will be a complex
> > > process
> > > >> compared to keeping in a separate repo.
> > > >>
> > > >>       7c. Separate repo, will be a subproject of Hadoop, using the
> > same
> > > >> HADOOP jira project, with different versioning prefixed with
> > > "thirdparty-"
> > > >> (ex: thirdparty-1.0.0).
> > > >>       7d. Separate will have same release process as Hadoop.
> > > >>
> > > >>     HADOOP-13363 (
> https://issues.apache.org/jira/browse/HADOOP-13363)
> > > is
> > > >> an
> > > >> umbrella jira tracking the changes to protobuf upgrade.
> > > >>
> > > >>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
> been
> > > >> raised
> > > >> for separate repo creation in (HADOOP-16595 (
> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> > > >>
> > > >>     Please provide your inputs for the proposal and review the PR to
> > > >> proceed with the proposal.
> > > >>
> > > >>
> > > >    -Thanks,
> > > >>     Vinay
> > > >>
> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> > > >> vinodkv@apache.org>
> > > >> wrote:
> > > >>
> > > >> > Moving the thread to the dev lists.
> > > >> >
> > > >> > Thanks
> > > >> > +Vinod
> > > >> >
> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> > > vinayakumarb@apache.org>
> > > >> > wrote:
> > > >> > >
> > > >> > > Thanks Marton,
> > > >> > >
> > > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
> > > >> > > Whether to use that repo  for shaded artifact or not will be
> > > >> monitored in
> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> > discussion.
> > > >> > >
> > > >> > > There is no existing codebase is being moved out of hadoop repo.
> > So
> > > I
> > > >> > think
> > > >> > > right now we are good to go.
> > > >> > >
> > > >> > > -Vinay
> > > >> > >
> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org>
> > > wrote:
> > > >> > >
> > > >> > >>
> > > >> > >> I am not sure if it's defined when is a vote required.
> > > >> > >>
> > > >> > >> https://www.apache.org/foundation/voting.html
> > > >> > >>
> > > >> > >> Personally I think it's a big enough change to send a
> > notification
> > > to
> > > >> > the
> > > >> > >> dev lists with a 'lazy consensus'  closure
> > > >> > >>
> > > >> > >> Marton
> > > >> > >>
> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <vinayakumarb@apache.org
> >
> > > >> wrote:
> > > >> > >>> Hi,
> > > >> > >>>
> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be
> more
> > in
> > > >> > >> future)
> > > >> > >>> will be kept as a shaded artifact in a separate repo, which
> will
> > > be
> > > >> > >>> referred as dependency in hadoop modules.  This approach
> avoids
> > > >> shading
> > > >> > >> of
> > > >> > >>> every submodule during build.
> > > >> > >>>
> > > >> > >>> So question is does any VOTE required before asking to create
> a
> > > git
> > > >> > repo?
> > > >> > >>>
> > > >> > >>> On selfserve platform
> > > https://gitbox.apache.org/setup/newrepo.html
> > > >> > >>> I can access see that, requester should be PMC.
> > > >> > >>>
> > > >> > >>> Wanted to confirm here first.
> > > >> > >>>
> > > >> > >>> -Vinay
> > > >> > >>>
> > > >> > >>
> > > >> > >>
> > > ---------------------------------------------------------------------
> > > >> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
> > > >> > >> For additional commands, e-mail:
> private-help@hadoop.apache.org
> > > >> > >>
> > > >> > >>
> > > >> >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Wei-Chiu Chuang <we...@apache.org>.
Hi I am late to this but I am keen to understand more.

To be exact, how can we better use the thirdparty repo? Looking at HBase as
an example, it looks like everything that are known to break a lot after an
update get shaded into the hbase-thirdparty artifact: guava, netty, ... etc.
Is it the purpose to isolate these naughty dependencies?

On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vi...@apache.org>
wrote:

> Hi All,
>
> I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
> 's suggestions.
>
>     i. Renamed the module to 'hadoop-shaded-protobuf37'
>     ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
>
> Please review!!
>
> Thanks,
> -Vinay
>
>
> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <pa...@gmail.com>
> wrote:
>
> > For HBase we have a separated repo for hbase-thirdparty
> >
> > https://github.com/apache/hbase-thirdparty
> >
> > We will publish the artifacts to nexus so we do not need to include
> > binaries in our git repo, just add a dependency in the pom.
> >
> >
> >
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> >
> >
> > And it has its own release cycles, only when there are special
> requirements
> > or we want to upgrade some of the dependencies. This is the vote thread
> for
> > the newest release, where we want to provide a shaded gson for jdk7.
> >
> >
> >
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> >
> >
> > Thanks.
> >
> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
> >
> > > Please find replies inline.
> > >
> > > -Vinay
> > >
> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <owen.omalley@gmail.com
> >
> > > wrote:
> > >
> > > > I'm very unhappy with this direction. In particular, I don't think
> git
> > is
> > > > a good place for distribution of binary artifacts. Furthermore, the
> PMC
> > > > shouldn't be releasing anything without a release vote.
> > > >
> > > >
> > > Proposed solution doesnt release any binaries in git. Its actually a
> > > complete sub-project which follows entire release process, including
> VOTE
> > > in public. I have mentioned already that release process is similar to
> > > hadoop.
> > > To be specific, using the (almost) same script used in hadoop to
> generate
> > > artifacts, sign and deploy to staging repository. Please let me know
> If I
> > > am conveying anything wrong.
> > >
> > >
> > > > I'd propose that we make a third party module that contains the
> > *source*
> > > > of the pom files to build the relocated jars. This should absolutely
> be
> > > > treated as a last resort for the mostly Google projects that
> regularly
> > > > break binary compatibility (eg. Protobuf & Guava).
> > > >
> > > >
> > > Same has been implemented in the PR
> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check and
> let
> > > me
> > > know If I misunderstood. Yes, this is the last option we have AFAIK.
> > >
> > >
> > > > In terms of naming, I'd propose something like:
> > > >
> > > > org.apache.hadoop.thirdparty.protobuf2_5
> > > > org.apache.hadoop.thirdparty.guava28
> > > >
> > > > In particular, I think we absolutely need to include the version of
> the
> > > > underlying project. On the other hand, since we should not be shading
> > > > *everything* we can drop the leading com.google.
> > > >
> > > >
> > > IMO, This naming convention is easy for identifying the underlying
> > project,
> > > but  it will be difficult to maintain going forward if underlying
> project
> > > versions changes. Since thirdparty module have its own releases, each
> of
> > > those release can be mapped to specific version of underlying project.
> > Even
> > > the binary artifact can include a MANIFEST with underlying project
> > details
> > > as per Steve's suggestion on HADOOP-13363.
> > > That said, if you still prefer to have project number in artifact id,
> it
> > > can be done.
> > >
> > > The Hadoop project can make releases of  the thirdparty module:
> > > >
> > > > <dependency>
> > > >   <groupId>org.apache.hadoop</groupId>
> > > >   <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> > > >   <version>1.0</version>
> > > > </dependency>
> > > >
> > > >
> > > Note that the version has to be the hadoop thirdparty release number,
> > which
> > > > is part of why you need to have the underlying version in the
> artifact
> > > > name. These we can push to maven central as new releases from Hadoop.
> > > >
> > > >
> > > Exactly, same has been implemented in the PR. hadoop-thirdparty module
> > have
> > > its own releases. But in HADOOP Jira, thirdparty versions can be
> > > differentiated using prefix "thirdparty-".
> > >
> > > Same solution is being followed in HBase. May be people involved in
> HBase
> > > can add some points here.
> > >
> > > Thoughts?
> > > >
> > > > .. Owen
> > > >
> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> vinayakumarb@apache.org
> > >
> > > > wrote:
> > > >
> > > >> Hi All,
> > > >>
> > > >>    I wanted to discuss about the separate repo for thirdparty
> > > dependencies
> > > >> which we need to shaded and include in Hadoop component's jars.
> > > >>
> > > >>    Apologies for the big text ahead, but this needs clear
> > explanation!!
> > > >>
> > > >>    Right now most needed such dependency is protobuf. Protobuf
> > > dependency
> > > >> was not upgraded from 2.5.0 onwards with the fear that downstream
> > > builds,
> > > >> which depends on transitive dependency protobuf coming from hadoop's
> > > jars,
> > > >> may fail with the upgrade. Apparently protobuf does not guarantee
> > source
> > > >> compatibility, though it guarantees wire compatibility between
> > versions.
> > > >> Because of this behavior, version upgrade may cause breakage in
> known
> > > and
> > > >> unknown (private?) downstreams.
> > > >>
> > > >>    So to tackle this, we came up the following proposal in
> > HADOOP-13363.
> > > >>
> > > >>    Luckily, As far as I know, no APIs, either public to user or
> > between
> > > >> Hadoop processes, is not directly using protobuf classes in
> > signatures.
> > > >> (If
> > > >> any exist, please let us know).
> > > >>
> > > >>    Proposal:
> > > >>    ------------
> > > >>
> > > >>    1. Create a artifact(s) which contains shaded dependencies. All
> > such
> > > >> shading/relocation will be with known prefix
> > > >> **org.apache.hadoop.thirdparty.**.
> > > >>    2. Right now protobuf jar (ex:
> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
> > > >> to start with, all **com.google.protobuf** classes will be relocated
> > as
> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> > > >>    3. Hadoop modules, which needs protobuf as dependency, will add
> > this
> > > >> shaded artifact as dependency (ex:
> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> > > >>    4. All previous usages of "com.google.protobuf" will be relocated
> > to
> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and
> > will
> > > be
> > > >> committed. Please note, this replacement is One-Time directly in
> > source
> > > >> code, NOT during compile and package.
> > > >>    5. Once all usages of "com.google.protobuf" is relocated, then
> > hadoop
> > > >> dont care about which version of original  "protobuf-java" is in
> > > >> dependency.
> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
> break
> > > the
> > > >> downstreams. But hadoop will be originally using the latest protobuf
> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> > > >>
> > > >>    7. Coming back to separate repo, Following are most appropriate
> > > reasons
> > > >> of keeping shaded dependency artifact in separate repo instead of
> > > >> submodule.
> > > >>
> > > >>       7a. These artifacts need not be built all the time. It needs
> to
> > be
> > > >> built only when there is a change in the dependency version or the
> > build
> > > >> process.
> > > >>       7b. If added as "submodule in Hadoop repo",
> > > maven-shade-plugin:shade
> > > >> will execute only in package phase. That means, "mvn compile" or
> "mvn
> > > >> test-compile" will not be failed as this artifact will not have
> > > relocated
> > > >> classes, instead it will have original classes, resulting in
> > compilation
> > > >> failure. Workaround, build thirdparty submodule first and exclude
> > > >> "thirdparty" submodule in other executions. This will be a complex
> > > process
> > > >> compared to keeping in a separate repo.
> > > >>
> > > >>       7c. Separate repo, will be a subproject of Hadoop, using the
> > same
> > > >> HADOOP jira project, with different versioning prefixed with
> > > "thirdparty-"
> > > >> (ex: thirdparty-1.0.0).
> > > >>       7d. Separate will have same release process as Hadoop.
> > > >>
> > > >>     HADOOP-13363 (
> https://issues.apache.org/jira/browse/HADOOP-13363)
> > > is
> > > >> an
> > > >> umbrella jira tracking the changes to protobuf upgrade.
> > > >>
> > > >>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
> been
> > > >> raised
> > > >> for separate repo creation in (HADOOP-16595 (
> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> > > >>
> > > >>     Please provide your inputs for the proposal and review the PR to
> > > >> proceed with the proposal.
> > > >>
> > > >>
> > > >    -Thanks,
> > > >>     Vinay
> > > >>
> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> > > >> vinodkv@apache.org>
> > > >> wrote:
> > > >>
> > > >> > Moving the thread to the dev lists.
> > > >> >
> > > >> > Thanks
> > > >> > +Vinod
> > > >> >
> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> > > vinayakumarb@apache.org>
> > > >> > wrote:
> > > >> > >
> > > >> > > Thanks Marton,
> > > >> > >
> > > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
> > > >> > > Whether to use that repo  for shaded artifact or not will be
> > > >> monitored in
> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> > discussion.
> > > >> > >
> > > >> > > There is no existing codebase is being moved out of hadoop repo.
> > So
> > > I
> > > >> > think
> > > >> > > right now we are good to go.
> > > >> > >
> > > >> > > -Vinay
> > > >> > >
> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org>
> > > wrote:
> > > >> > >
> > > >> > >>
> > > >> > >> I am not sure if it's defined when is a vote required.
> > > >> > >>
> > > >> > >> https://www.apache.org/foundation/voting.html
> > > >> > >>
> > > >> > >> Personally I think it's a big enough change to send a
> > notification
> > > to
> > > >> > the
> > > >> > >> dev lists with a 'lazy consensus'  closure
> > > >> > >>
> > > >> > >> Marton
> > > >> > >>
> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <vinayakumarb@apache.org
> >
> > > >> wrote:
> > > >> > >>> Hi,
> > > >> > >>>
> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be
> more
> > in
> > > >> > >> future)
> > > >> > >>> will be kept as a shaded artifact in a separate repo, which
> will
> > > be
> > > >> > >>> referred as dependency in hadoop modules.  This approach
> avoids
> > > >> shading
> > > >> > >> of
> > > >> > >>> every submodule during build.
> > > >> > >>>
> > > >> > >>> So question is does any VOTE required before asking to create
> a
> > > git
> > > >> > repo?
> > > >> > >>>
> > > >> > >>> On selfserve platform
> > > https://gitbox.apache.org/setup/newrepo.html
> > > >> > >>> I can access see that, requester should be PMC.
> > > >> > >>>
> > > >> > >>> Wanted to confirm here first.
> > > >> > >>>
> > > >> > >>> -Vinay
> > > >> > >>>
> > > >> > >>
> > > >> > >>
> > > ---------------------------------------------------------------------
> > > >> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
> > > >> > >> For additional commands, e-mail:
> private-help@hadoop.apache.org
> > > >> > >>
> > > >> > >>
> > > >> >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Wei-Chiu Chuang <we...@apache.org>.
Hi I am late to this but I am keen to understand more.

To be exact, how can we better use the thirdparty repo? Looking at HBase as
an example, it looks like everything that are known to break a lot after an
update get shaded into the hbase-thirdparty artifact: guava, netty, ... etc.
Is it the purpose to isolate these naughty dependencies?

On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vi...@apache.org>
wrote:

> Hi All,
>
> I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
> 's suggestions.
>
>     i. Renamed the module to 'hadoop-shaded-protobuf37'
>     ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
>
> Please review!!
>
> Thanks,
> -Vinay
>
>
> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <pa...@gmail.com>
> wrote:
>
> > For HBase we have a separated repo for hbase-thirdparty
> >
> > https://github.com/apache/hbase-thirdparty
> >
> > We will publish the artifacts to nexus so we do not need to include
> > binaries in our git repo, just add a dependency in the pom.
> >
> >
> >
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> >
> >
> > And it has its own release cycles, only when there are special
> requirements
> > or we want to upgrade some of the dependencies. This is the vote thread
> for
> > the newest release, where we want to provide a shaded gson for jdk7.
> >
> >
> >
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> >
> >
> > Thanks.
> >
> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
> >
> > > Please find replies inline.
> > >
> > > -Vinay
> > >
> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <owen.omalley@gmail.com
> >
> > > wrote:
> > >
> > > > I'm very unhappy with this direction. In particular, I don't think
> git
> > is
> > > > a good place for distribution of binary artifacts. Furthermore, the
> PMC
> > > > shouldn't be releasing anything without a release vote.
> > > >
> > > >
> > > Proposed solution doesnt release any binaries in git. Its actually a
> > > complete sub-project which follows entire release process, including
> VOTE
> > > in public. I have mentioned already that release process is similar to
> > > hadoop.
> > > To be specific, using the (almost) same script used in hadoop to
> generate
> > > artifacts, sign and deploy to staging repository. Please let me know
> If I
> > > am conveying anything wrong.
> > >
> > >
> > > > I'd propose that we make a third party module that contains the
> > *source*
> > > > of the pom files to build the relocated jars. This should absolutely
> be
> > > > treated as a last resort for the mostly Google projects that
> regularly
> > > > break binary compatibility (eg. Protobuf & Guava).
> > > >
> > > >
> > > Same has been implemented in the PR
> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check and
> let
> > > me
> > > know If I misunderstood. Yes, this is the last option we have AFAIK.
> > >
> > >
> > > > In terms of naming, I'd propose something like:
> > > >
> > > > org.apache.hadoop.thirdparty.protobuf2_5
> > > > org.apache.hadoop.thirdparty.guava28
> > > >
> > > > In particular, I think we absolutely need to include the version of
> the
> > > > underlying project. On the other hand, since we should not be shading
> > > > *everything* we can drop the leading com.google.
> > > >
> > > >
> > > IMO, This naming convention is easy for identifying the underlying
> > project,
> > > but  it will be difficult to maintain going forward if underlying
> project
> > > versions changes. Since thirdparty module have its own releases, each
> of
> > > those release can be mapped to specific version of underlying project.
> > Even
> > > the binary artifact can include a MANIFEST with underlying project
> > details
> > > as per Steve's suggestion on HADOOP-13363.
> > > That said, if you still prefer to have project number in artifact id,
> it
> > > can be done.
> > >
> > > The Hadoop project can make releases of  the thirdparty module:
> > > >
> > > > <dependency>
> > > >   <groupId>org.apache.hadoop</groupId>
> > > >   <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> > > >   <version>1.0</version>
> > > > </dependency>
> > > >
> > > >
> > > Note that the version has to be the hadoop thirdparty release number,
> > which
> > > > is part of why you need to have the underlying version in the
> artifact
> > > > name. These we can push to maven central as new releases from Hadoop.
> > > >
> > > >
> > > Exactly, same has been implemented in the PR. hadoop-thirdparty module
> > have
> > > its own releases. But in HADOOP Jira, thirdparty versions can be
> > > differentiated using prefix "thirdparty-".
> > >
> > > Same solution is being followed in HBase. May be people involved in
> HBase
> > > can add some points here.
> > >
> > > Thoughts?
> > > >
> > > > .. Owen
> > > >
> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> vinayakumarb@apache.org
> > >
> > > > wrote:
> > > >
> > > >> Hi All,
> > > >>
> > > >>    I wanted to discuss about the separate repo for thirdparty
> > > dependencies
> > > >> which we need to shaded and include in Hadoop component's jars.
> > > >>
> > > >>    Apologies for the big text ahead, but this needs clear
> > explanation!!
> > > >>
> > > >>    Right now most needed such dependency is protobuf. Protobuf
> > > dependency
> > > >> was not upgraded from 2.5.0 onwards with the fear that downstream
> > > builds,
> > > >> which depends on transitive dependency protobuf coming from hadoop's
> > > jars,
> > > >> may fail with the upgrade. Apparently protobuf does not guarantee
> > source
> > > >> compatibility, though it guarantees wire compatibility between
> > versions.
> > > >> Because of this behavior, version upgrade may cause breakage in
> known
> > > and
> > > >> unknown (private?) downstreams.
> > > >>
> > > >>    So to tackle this, we came up the following proposal in
> > HADOOP-13363.
> > > >>
> > > >>    Luckily, As far as I know, no APIs, either public to user or
> > between
> > > >> Hadoop processes, is not directly using protobuf classes in
> > signatures.
> > > >> (If
> > > >> any exist, please let us know).
> > > >>
> > > >>    Proposal:
> > > >>    ------------
> > > >>
> > > >>    1. Create a artifact(s) which contains shaded dependencies. All
> > such
> > > >> shading/relocation will be with known prefix
> > > >> **org.apache.hadoop.thirdparty.**.
> > > >>    2. Right now protobuf jar (ex:
> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
> > > >> to start with, all **com.google.protobuf** classes will be relocated
> > as
> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> > > >>    3. Hadoop modules, which needs protobuf as dependency, will add
> > this
> > > >> shaded artifact as dependency (ex:
> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> > > >>    4. All previous usages of "com.google.protobuf" will be relocated
> > to
> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and
> > will
> > > be
> > > >> committed. Please note, this replacement is One-Time directly in
> > source
> > > >> code, NOT during compile and package.
> > > >>    5. Once all usages of "com.google.protobuf" is relocated, then
> > hadoop
> > > >> dont care about which version of original  "protobuf-java" is in
> > > >> dependency.
> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
> break
> > > the
> > > >> downstreams. But hadoop will be originally using the latest protobuf
> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> > > >>
> > > >>    7. Coming back to separate repo, Following are most appropriate
> > > reasons
> > > >> of keeping shaded dependency artifact in separate repo instead of
> > > >> submodule.
> > > >>
> > > >>       7a. These artifacts need not be built all the time. It needs
> to
> > be
> > > >> built only when there is a change in the dependency version or the
> > build
> > > >> process.
> > > >>       7b. If added as "submodule in Hadoop repo",
> > > maven-shade-plugin:shade
> > > >> will execute only in package phase. That means, "mvn compile" or
> "mvn
> > > >> test-compile" will not be failed as this artifact will not have
> > > relocated
> > > >> classes, instead it will have original classes, resulting in
> > compilation
> > > >> failure. Workaround, build thirdparty submodule first and exclude
> > > >> "thirdparty" submodule in other executions. This will be a complex
> > > process
> > > >> compared to keeping in a separate repo.
> > > >>
> > > >>       7c. Separate repo, will be a subproject of Hadoop, using the
> > same
> > > >> HADOOP jira project, with different versioning prefixed with
> > > "thirdparty-"
> > > >> (ex: thirdparty-1.0.0).
> > > >>       7d. Separate will have same release process as Hadoop.
> > > >>
> > > >>     HADOOP-13363 (
> https://issues.apache.org/jira/browse/HADOOP-13363)
> > > is
> > > >> an
> > > >> umbrella jira tracking the changes to protobuf upgrade.
> > > >>
> > > >>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
> been
> > > >> raised
> > > >> for separate repo creation in (HADOOP-16595 (
> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> > > >>
> > > >>     Please provide your inputs for the proposal and review the PR to
> > > >> proceed with the proposal.
> > > >>
> > > >>
> > > >    -Thanks,
> > > >>     Vinay
> > > >>
> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> > > >> vinodkv@apache.org>
> > > >> wrote:
> > > >>
> > > >> > Moving the thread to the dev lists.
> > > >> >
> > > >> > Thanks
> > > >> > +Vinod
> > > >> >
> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> > > vinayakumarb@apache.org>
> > > >> > wrote:
> > > >> > >
> > > >> > > Thanks Marton,
> > > >> > >
> > > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
> > > >> > > Whether to use that repo  for shaded artifact or not will be
> > > >> monitored in
> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> > discussion.
> > > >> > >
> > > >> > > There is no existing codebase is being moved out of hadoop repo.
> > So
> > > I
> > > >> > think
> > > >> > > right now we are good to go.
> > > >> > >
> > > >> > > -Vinay
> > > >> > >
> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org>
> > > wrote:
> > > >> > >
> > > >> > >>
> > > >> > >> I am not sure if it's defined when is a vote required.
> > > >> > >>
> > > >> > >> https://www.apache.org/foundation/voting.html
> > > >> > >>
> > > >> > >> Personally I think it's a big enough change to send a
> > notification
> > > to
> > > >> > the
> > > >> > >> dev lists with a 'lazy consensus'  closure
> > > >> > >>
> > > >> > >> Marton
> > > >> > >>
> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <vinayakumarb@apache.org
> >
> > > >> wrote:
> > > >> > >>> Hi,
> > > >> > >>>
> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be
> more
> > in
> > > >> > >> future)
> > > >> > >>> will be kept as a shaded artifact in a separate repo, which
> will
> > > be
> > > >> > >>> referred as dependency in hadoop modules.  This approach
> avoids
> > > >> shading
> > > >> > >> of
> > > >> > >>> every submodule during build.
> > > >> > >>>
> > > >> > >>> So question is does any VOTE required before asking to create
> a
> > > git
> > > >> > repo?
> > > >> > >>>
> > > >> > >>> On selfserve platform
> > > https://gitbox.apache.org/setup/newrepo.html
> > > >> > >>> I can access see that, requester should be PMC.
> > > >> > >>>
> > > >> > >>> Wanted to confirm here first.
> > > >> > >>>
> > > >> > >>> -Vinay
> > > >> > >>>
> > > >> > >>
> > > >> > >>
> > > ---------------------------------------------------------------------
> > > >> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
> > > >> > >> For additional commands, e-mail:
> private-help@hadoop.apache.org
> > > >> > >>
> > > >> > >>
> > > >> >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Wei-Chiu Chuang <we...@apache.org>.
Hi I am late to this but I am keen to understand more.

To be exact, how can we better use the thirdparty repo? Looking at HBase as
an example, it looks like everything that are known to break a lot after an
update get shaded into the hbase-thirdparty artifact: guava, netty, ... etc.
Is it the purpose to isolate these naughty dependencies?

On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vi...@apache.org>
wrote:

> Hi All,
>
> I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
> 's suggestions.
>
>     i. Renamed the module to 'hadoop-shaded-protobuf37'
>     ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
>
> Please review!!
>
> Thanks,
> -Vinay
>
>
> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <pa...@gmail.com>
> wrote:
>
> > For HBase we have a separated repo for hbase-thirdparty
> >
> > https://github.com/apache/hbase-thirdparty
> >
> > We will publish the artifacts to nexus so we do not need to include
> > binaries in our git repo, just add a dependency in the pom.
> >
> >
> >
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> >
> >
> > And it has its own release cycles, only when there are special
> requirements
> > or we want to upgrade some of the dependencies. This is the vote thread
> for
> > the newest release, where we want to provide a shaded gson for jdk7.
> >
> >
> >
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> >
> >
> > Thanks.
> >
> > Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
> >
> > > Please find replies inline.
> > >
> > > -Vinay
> > >
> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <owen.omalley@gmail.com
> >
> > > wrote:
> > >
> > > > I'm very unhappy with this direction. In particular, I don't think
> git
> > is
> > > > a good place for distribution of binary artifacts. Furthermore, the
> PMC
> > > > shouldn't be releasing anything without a release vote.
> > > >
> > > >
> > > Proposed solution doesnt release any binaries in git. Its actually a
> > > complete sub-project which follows entire release process, including
> VOTE
> > > in public. I have mentioned already that release process is similar to
> > > hadoop.
> > > To be specific, using the (almost) same script used in hadoop to
> generate
> > > artifacts, sign and deploy to staging repository. Please let me know
> If I
> > > am conveying anything wrong.
> > >
> > >
> > > > I'd propose that we make a third party module that contains the
> > *source*
> > > > of the pom files to build the relocated jars. This should absolutely
> be
> > > > treated as a last resort for the mostly Google projects that
> regularly
> > > > break binary compatibility (eg. Protobuf & Guava).
> > > >
> > > >
> > > Same has been implemented in the PR
> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check and
> let
> > > me
> > > know If I misunderstood. Yes, this is the last option we have AFAIK.
> > >
> > >
> > > > In terms of naming, I'd propose something like:
> > > >
> > > > org.apache.hadoop.thirdparty.protobuf2_5
> > > > org.apache.hadoop.thirdparty.guava28
> > > >
> > > > In particular, I think we absolutely need to include the version of
> the
> > > > underlying project. On the other hand, since we should not be shading
> > > > *everything* we can drop the leading com.google.
> > > >
> > > >
> > > IMO, This naming convention is easy for identifying the underlying
> > project,
> > > but  it will be difficult to maintain going forward if underlying
> project
> > > versions changes. Since thirdparty module have its own releases, each
> of
> > > those release can be mapped to specific version of underlying project.
> > Even
> > > the binary artifact can include a MANIFEST with underlying project
> > details
> > > as per Steve's suggestion on HADOOP-13363.
> > > That said, if you still prefer to have project number in artifact id,
> it
> > > can be done.
> > >
> > > The Hadoop project can make releases of  the thirdparty module:
> > > >
> > > > <dependency>
> > > >   <groupId>org.apache.hadoop</groupId>
> > > >   <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> > > >   <version>1.0</version>
> > > > </dependency>
> > > >
> > > >
> > > Note that the version has to be the hadoop thirdparty release number,
> > which
> > > > is part of why you need to have the underlying version in the
> artifact
> > > > name. These we can push to maven central as new releases from Hadoop.
> > > >
> > > >
> > > Exactly, same has been implemented in the PR. hadoop-thirdparty module
> > have
> > > its own releases. But in HADOOP Jira, thirdparty versions can be
> > > differentiated using prefix "thirdparty-".
> > >
> > > Same solution is being followed in HBase. May be people involved in
> HBase
> > > can add some points here.
> > >
> > > Thoughts?
> > > >
> > > > .. Owen
> > > >
> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> vinayakumarb@apache.org
> > >
> > > > wrote:
> > > >
> > > >> Hi All,
> > > >>
> > > >>    I wanted to discuss about the separate repo for thirdparty
> > > dependencies
> > > >> which we need to shaded and include in Hadoop component's jars.
> > > >>
> > > >>    Apologies for the big text ahead, but this needs clear
> > explanation!!
> > > >>
> > > >>    Right now most needed such dependency is protobuf. Protobuf
> > > dependency
> > > >> was not upgraded from 2.5.0 onwards with the fear that downstream
> > > builds,
> > > >> which depends on transitive dependency protobuf coming from hadoop's
> > > jars,
> > > >> may fail with the upgrade. Apparently protobuf does not guarantee
> > source
> > > >> compatibility, though it guarantees wire compatibility between
> > versions.
> > > >> Because of this behavior, version upgrade may cause breakage in
> known
> > > and
> > > >> unknown (private?) downstreams.
> > > >>
> > > >>    So to tackle this, we came up the following proposal in
> > HADOOP-13363.
> > > >>
> > > >>    Luckily, As far as I know, no APIs, either public to user or
> > between
> > > >> Hadoop processes, is not directly using protobuf classes in
> > signatures.
> > > >> (If
> > > >> any exist, please let us know).
> > > >>
> > > >>    Proposal:
> > > >>    ------------
> > > >>
> > > >>    1. Create a artifact(s) which contains shaded dependencies. All
> > such
> > > >> shading/relocation will be with known prefix
> > > >> **org.apache.hadoop.thirdparty.**.
> > > >>    2. Right now protobuf jar (ex:
> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
> > > >> to start with, all **com.google.protobuf** classes will be relocated
> > as
> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> > > >>    3. Hadoop modules, which needs protobuf as dependency, will add
> > this
> > > >> shaded artifact as dependency (ex:
> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> > > >>    4. All previous usages of "com.google.protobuf" will be relocated
> > to
> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and
> > will
> > > be
> > > >> committed. Please note, this replacement is One-Time directly in
> > source
> > > >> code, NOT during compile and package.
> > > >>    5. Once all usages of "com.google.protobuf" is relocated, then
> > hadoop
> > > >> dont care about which version of original  "protobuf-java" is in
> > > >> dependency.
> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
> break
> > > the
> > > >> downstreams. But hadoop will be originally using the latest protobuf
> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> > > >>
> > > >>    7. Coming back to separate repo, Following are most appropriate
> > > reasons
> > > >> of keeping shaded dependency artifact in separate repo instead of
> > > >> submodule.
> > > >>
> > > >>       7a. These artifacts need not be built all the time. It needs
> to
> > be
> > > >> built only when there is a change in the dependency version or the
> > build
> > > >> process.
> > > >>       7b. If added as "submodule in Hadoop repo",
> > > maven-shade-plugin:shade
> > > >> will execute only in package phase. That means, "mvn compile" or
> "mvn
> > > >> test-compile" will not be failed as this artifact will not have
> > > relocated
> > > >> classes, instead it will have original classes, resulting in
> > compilation
> > > >> failure. Workaround, build thirdparty submodule first and exclude
> > > >> "thirdparty" submodule in other executions. This will be a complex
> > > process
> > > >> compared to keeping in a separate repo.
> > > >>
> > > >>       7c. Separate repo, will be a subproject of Hadoop, using the
> > same
> > > >> HADOOP jira project, with different versioning prefixed with
> > > "thirdparty-"
> > > >> (ex: thirdparty-1.0.0).
> > > >>       7d. Separate will have same release process as Hadoop.
> > > >>
> > > >>     HADOOP-13363 (
> https://issues.apache.org/jira/browse/HADOOP-13363)
> > > is
> > > >> an
> > > >> umbrella jira tracking the changes to protobuf upgrade.
> > > >>
> > > >>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
> been
> > > >> raised
> > > >> for separate repo creation in (HADOOP-16595 (
> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> > > >>
> > > >>     Please provide your inputs for the proposal and review the PR to
> > > >> proceed with the proposal.
> > > >>
> > > >>
> > > >    -Thanks,
> > > >>     Vinay
> > > >>
> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> > > >> vinodkv@apache.org>
> > > >> wrote:
> > > >>
> > > >> > Moving the thread to the dev lists.
> > > >> >
> > > >> > Thanks
> > > >> > +Vinod
> > > >> >
> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> > > vinayakumarb@apache.org>
> > > >> > wrote:
> > > >> > >
> > > >> > > Thanks Marton,
> > > >> > >
> > > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
> > > >> > > Whether to use that repo  for shaded artifact or not will be
> > > >> monitored in
> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> > discussion.
> > > >> > >
> > > >> > > There is no existing codebase is being moved out of hadoop repo.
> > So
> > > I
> > > >> > think
> > > >> > > right now we are good to go.
> > > >> > >
> > > >> > > -Vinay
> > > >> > >
> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org>
> > > wrote:
> > > >> > >
> > > >> > >>
> > > >> > >> I am not sure if it's defined when is a vote required.
> > > >> > >>
> > > >> > >> https://www.apache.org/foundation/voting.html
> > > >> > >>
> > > >> > >> Personally I think it's a big enough change to send a
> > notification
> > > to
> > > >> > the
> > > >> > >> dev lists with a 'lazy consensus'  closure
> > > >> > >>
> > > >> > >> Marton
> > > >> > >>
> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <vinayakumarb@apache.org
> >
> > > >> wrote:
> > > >> > >>> Hi,
> > > >> > >>>
> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be
> more
> > in
> > > >> > >> future)
> > > >> > >>> will be kept as a shaded artifact in a separate repo, which
> will
> > > be
> > > >> > >>> referred as dependency in hadoop modules.  This approach
> avoids
> > > >> shading
> > > >> > >> of
> > > >> > >>> every submodule during build.
> > > >> > >>>
> > > >> > >>> So question is does any VOTE required before asking to create
> a
> > > git
> > > >> > repo?
> > > >> > >>>
> > > >> > >>> On selfserve platform
> > > https://gitbox.apache.org/setup/newrepo.html
> > > >> > >>> I can access see that, requester should be PMC.
> > > >> > >>>
> > > >> > >>> Wanted to confirm here first.
> > > >> > >>>
> > > >> > >>> -Vinay
> > > >> > >>>
> > > >> > >>
> > > >> > >>
> > > ---------------------------------------------------------------------
> > > >> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
> > > >> > >> For additional commands, e-mail:
> private-help@hadoop.apache.org
> > > >> > >>
> > > >> > >>
> > > >> >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Vinayakumar B <vi...@apache.org>.
Hi All,

I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
's suggestions.

    i. Renamed the module to 'hadoop-shaded-protobuf37'
    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'

Please review!!

Thanks,
-Vinay


On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <pa...@gmail.com>
wrote:

> For HBase we have a separated repo for hbase-thirdparty
>
> https://github.com/apache/hbase-thirdparty
>
> We will publish the artifacts to nexus so we do not need to include
> binaries in our git repo, just add a dependency in the pom.
>
>
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
>
>
> And it has its own release cycles, only when there are special requirements
> or we want to upgrade some of the dependencies. This is the vote thread for
> the newest release, where we want to provide a shaded gson for jdk7.
>
>
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
>
>
> Thanks.
>
> Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
>
> > Please find replies inline.
> >
> > -Vinay
> >
> > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <ow...@gmail.com>
> > wrote:
> >
> > > I'm very unhappy with this direction. In particular, I don't think git
> is
> > > a good place for distribution of binary artifacts. Furthermore, the PMC
> > > shouldn't be releasing anything without a release vote.
> > >
> > >
> > Proposed solution doesnt release any binaries in git. Its actually a
> > complete sub-project which follows entire release process, including VOTE
> > in public. I have mentioned already that release process is similar to
> > hadoop.
> > To be specific, using the (almost) same script used in hadoop to generate
> > artifacts, sign and deploy to staging repository. Please let me know If I
> > am conveying anything wrong.
> >
> >
> > > I'd propose that we make a third party module that contains the
> *source*
> > > of the pom files to build the relocated jars. This should absolutely be
> > > treated as a last resort for the mostly Google projects that regularly
> > > break binary compatibility (eg. Protobuf & Guava).
> > >
> > >
> > Same has been implemented in the PR
> > https://github.com/apache/hadoop-thirdparty/pull/1. Please check and let
> > me
> > know If I misunderstood. Yes, this is the last option we have AFAIK.
> >
> >
> > > In terms of naming, I'd propose something like:
> > >
> > > org.apache.hadoop.thirdparty.protobuf2_5
> > > org.apache.hadoop.thirdparty.guava28
> > >
> > > In particular, I think we absolutely need to include the version of the
> > > underlying project. On the other hand, since we should not be shading
> > > *everything* we can drop the leading com.google.
> > >
> > >
> > IMO, This naming convention is easy for identifying the underlying
> project,
> > but  it will be difficult to maintain going forward if underlying project
> > versions changes. Since thirdparty module have its own releases, each of
> > those release can be mapped to specific version of underlying project.
> Even
> > the binary artifact can include a MANIFEST with underlying project
> details
> > as per Steve's suggestion on HADOOP-13363.
> > That said, if you still prefer to have project number in artifact id, it
> > can be done.
> >
> > The Hadoop project can make releases of  the thirdparty module:
> > >
> > > <dependency>
> > >   <groupId>org.apache.hadoop</groupId>
> > >   <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> > >   <version>1.0</version>
> > > </dependency>
> > >
> > >
> > Note that the version has to be the hadoop thirdparty release number,
> which
> > > is part of why you need to have the underlying version in the artifact
> > > name. These we can push to maven central as new releases from Hadoop.
> > >
> > >
> > Exactly, same has been implemented in the PR. hadoop-thirdparty module
> have
> > its own releases. But in HADOOP Jira, thirdparty versions can be
> > differentiated using prefix "thirdparty-".
> >
> > Same solution is being followed in HBase. May be people involved in HBase
> > can add some points here.
> >
> > Thoughts?
> > >
> > > .. Owen
> > >
> > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <vinayakumarb@apache.org
> >
> > > wrote:
> > >
> > >> Hi All,
> > >>
> > >>    I wanted to discuss about the separate repo for thirdparty
> > dependencies
> > >> which we need to shaded and include in Hadoop component's jars.
> > >>
> > >>    Apologies for the big text ahead, but this needs clear
> explanation!!
> > >>
> > >>    Right now most needed such dependency is protobuf. Protobuf
> > dependency
> > >> was not upgraded from 2.5.0 onwards with the fear that downstream
> > builds,
> > >> which depends on transitive dependency protobuf coming from hadoop's
> > jars,
> > >> may fail with the upgrade. Apparently protobuf does not guarantee
> source
> > >> compatibility, though it guarantees wire compatibility between
> versions.
> > >> Because of this behavior, version upgrade may cause breakage in known
> > and
> > >> unknown (private?) downstreams.
> > >>
> > >>    So to tackle this, we came up the following proposal in
> HADOOP-13363.
> > >>
> > >>    Luckily, As far as I know, no APIs, either public to user or
> between
> > >> Hadoop processes, is not directly using protobuf classes in
> signatures.
> > >> (If
> > >> any exist, please let us know).
> > >>
> > >>    Proposal:
> > >>    ------------
> > >>
> > >>    1. Create a artifact(s) which contains shaded dependencies. All
> such
> > >> shading/relocation will be with known prefix
> > >> **org.apache.hadoop.thirdparty.**.
> > >>    2. Right now protobuf jar (ex:
> > o.a.h.thirdparty:hadoop-shaded-protobuf)
> > >> to start with, all **com.google.protobuf** classes will be relocated
> as
> > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> > >>    3. Hadoop modules, which needs protobuf as dependency, will add
> this
> > >> shaded artifact as dependency (ex:
> > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> > >>    4. All previous usages of "com.google.protobuf" will be relocated
> to
> > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and
> will
> > be
> > >> committed. Please note, this replacement is One-Time directly in
> source
> > >> code, NOT during compile and package.
> > >>    5. Once all usages of "com.google.protobuf" is relocated, then
> hadoop
> > >> dont care about which version of original  "protobuf-java" is in
> > >> dependency.
> > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to break
> > the
> > >> downstreams. But hadoop will be originally using the latest protobuf
> > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> > >>
> > >>    7. Coming back to separate repo, Following are most appropriate
> > reasons
> > >> of keeping shaded dependency artifact in separate repo instead of
> > >> submodule.
> > >>
> > >>       7a. These artifacts need not be built all the time. It needs to
> be
> > >> built only when there is a change in the dependency version or the
> build
> > >> process.
> > >>       7b. If added as "submodule in Hadoop repo",
> > maven-shade-plugin:shade
> > >> will execute only in package phase. That means, "mvn compile" or "mvn
> > >> test-compile" will not be failed as this artifact will not have
> > relocated
> > >> classes, instead it will have original classes, resulting in
> compilation
> > >> failure. Workaround, build thirdparty submodule first and exclude
> > >> "thirdparty" submodule in other executions. This will be a complex
> > process
> > >> compared to keeping in a separate repo.
> > >>
> > >>       7c. Separate repo, will be a subproject of Hadoop, using the
> same
> > >> HADOOP jira project, with different versioning prefixed with
> > "thirdparty-"
> > >> (ex: thirdparty-1.0.0).
> > >>       7d. Separate will have same release process as Hadoop.
> > >>
> > >>     HADOOP-13363 (https://issues.apache.org/jira/browse/HADOOP-13363)
> > is
> > >> an
> > >> umbrella jira tracking the changes to protobuf upgrade.
> > >>
> > >>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has been
> > >> raised
> > >> for separate repo creation in (HADOOP-16595 (
> > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> > >>
> > >>     Please provide your inputs for the proposal and review the PR to
> > >> proceed with the proposal.
> > >>
> > >>
> > >    -Thanks,
> > >>     Vinay
> > >>
> > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> > >> vinodkv@apache.org>
> > >> wrote:
> > >>
> > >> > Moving the thread to the dev lists.
> > >> >
> > >> > Thanks
> > >> > +Vinod
> > >> >
> > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> > vinayakumarb@apache.org>
> > >> > wrote:
> > >> > >
> > >> > > Thanks Marton,
> > >> > >
> > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
> > >> > > Whether to use that repo  for shaded artifact or not will be
> > >> monitored in
> > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> discussion.
> > >> > >
> > >> > > There is no existing codebase is being moved out of hadoop repo.
> So
> > I
> > >> > think
> > >> > > right now we are good to go.
> > >> > >
> > >> > > -Vinay
> > >> > >
> > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org>
> > wrote:
> > >> > >
> > >> > >>
> > >> > >> I am not sure if it's defined when is a vote required.
> > >> > >>
> > >> > >> https://www.apache.org/foundation/voting.html
> > >> > >>
> > >> > >> Personally I think it's a big enough change to send a
> notification
> > to
> > >> > the
> > >> > >> dev lists with a 'lazy consensus'  closure
> > >> > >>
> > >> > >> Marton
> > >> > >>
> > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <vi...@apache.org>
> > >> wrote:
> > >> > >>> Hi,
> > >> > >>>
> > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be more
> in
> > >> > >> future)
> > >> > >>> will be kept as a shaded artifact in a separate repo, which will
> > be
> > >> > >>> referred as dependency in hadoop modules.  This approach avoids
> > >> shading
> > >> > >> of
> > >> > >>> every submodule during build.
> > >> > >>>
> > >> > >>> So question is does any VOTE required before asking to create a
> > git
> > >> > repo?
> > >> > >>>
> > >> > >>> On selfserve platform
> > https://gitbox.apache.org/setup/newrepo.html
> > >> > >>> I can access see that, requester should be PMC.
> > >> > >>>
> > >> > >>> Wanted to confirm here first.
> > >> > >>>
> > >> > >>> -Vinay
> > >> > >>>
> > >> > >>
> > >> > >>
> > ---------------------------------------------------------------------
> > >> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
> > >> > >> For additional commands, e-mail: private-help@hadoop.apache.org
> > >> > >>
> > >> > >>
> > >> >
> > >> >
> > >>
> > >
> >
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Vinayakumar B <vi...@apache.org>.
Hi All,

I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
's suggestions.

    i. Renamed the module to 'hadoop-shaded-protobuf37'
    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'

Please review!!

Thanks,
-Vinay


On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <pa...@gmail.com>
wrote:

> For HBase we have a separated repo for hbase-thirdparty
>
> https://github.com/apache/hbase-thirdparty
>
> We will publish the artifacts to nexus so we do not need to include
> binaries in our git repo, just add a dependency in the pom.
>
>
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
>
>
> And it has its own release cycles, only when there are special requirements
> or we want to upgrade some of the dependencies. This is the vote thread for
> the newest release, where we want to provide a shaded gson for jdk7.
>
>
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
>
>
> Thanks.
>
> Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
>
> > Please find replies inline.
> >
> > -Vinay
> >
> > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <ow...@gmail.com>
> > wrote:
> >
> > > I'm very unhappy with this direction. In particular, I don't think git
> is
> > > a good place for distribution of binary artifacts. Furthermore, the PMC
> > > shouldn't be releasing anything without a release vote.
> > >
> > >
> > Proposed solution doesnt release any binaries in git. Its actually a
> > complete sub-project which follows entire release process, including VOTE
> > in public. I have mentioned already that release process is similar to
> > hadoop.
> > To be specific, using the (almost) same script used in hadoop to generate
> > artifacts, sign and deploy to staging repository. Please let me know If I
> > am conveying anything wrong.
> >
> >
> > > I'd propose that we make a third party module that contains the
> *source*
> > > of the pom files to build the relocated jars. This should absolutely be
> > > treated as a last resort for the mostly Google projects that regularly
> > > break binary compatibility (eg. Protobuf & Guava).
> > >
> > >
> > Same has been implemented in the PR
> > https://github.com/apache/hadoop-thirdparty/pull/1. Please check and let
> > me
> > know If I misunderstood. Yes, this is the last option we have AFAIK.
> >
> >
> > > In terms of naming, I'd propose something like:
> > >
> > > org.apache.hadoop.thirdparty.protobuf2_5
> > > org.apache.hadoop.thirdparty.guava28
> > >
> > > In particular, I think we absolutely need to include the version of the
> > > underlying project. On the other hand, since we should not be shading
> > > *everything* we can drop the leading com.google.
> > >
> > >
> > IMO, This naming convention is easy for identifying the underlying
> project,
> > but  it will be difficult to maintain going forward if underlying project
> > versions changes. Since thirdparty module have its own releases, each of
> > those release can be mapped to specific version of underlying project.
> Even
> > the binary artifact can include a MANIFEST with underlying project
> details
> > as per Steve's suggestion on HADOOP-13363.
> > That said, if you still prefer to have project number in artifact id, it
> > can be done.
> >
> > The Hadoop project can make releases of  the thirdparty module:
> > >
> > > <dependency>
> > >   <groupId>org.apache.hadoop</groupId>
> > >   <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> > >   <version>1.0</version>
> > > </dependency>
> > >
> > >
> > Note that the version has to be the hadoop thirdparty release number,
> which
> > > is part of why you need to have the underlying version in the artifact
> > > name. These we can push to maven central as new releases from Hadoop.
> > >
> > >
> > Exactly, same has been implemented in the PR. hadoop-thirdparty module
> have
> > its own releases. But in HADOOP Jira, thirdparty versions can be
> > differentiated using prefix "thirdparty-".
> >
> > Same solution is being followed in HBase. May be people involved in HBase
> > can add some points here.
> >
> > Thoughts?
> > >
> > > .. Owen
> > >
> > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <vinayakumarb@apache.org
> >
> > > wrote:
> > >
> > >> Hi All,
> > >>
> > >>    I wanted to discuss about the separate repo for thirdparty
> > dependencies
> > >> which we need to shaded and include in Hadoop component's jars.
> > >>
> > >>    Apologies for the big text ahead, but this needs clear
> explanation!!
> > >>
> > >>    Right now most needed such dependency is protobuf. Protobuf
> > dependency
> > >> was not upgraded from 2.5.0 onwards with the fear that downstream
> > builds,
> > >> which depends on transitive dependency protobuf coming from hadoop's
> > jars,
> > >> may fail with the upgrade. Apparently protobuf does not guarantee
> source
> > >> compatibility, though it guarantees wire compatibility between
> versions.
> > >> Because of this behavior, version upgrade may cause breakage in known
> > and
> > >> unknown (private?) downstreams.
> > >>
> > >>    So to tackle this, we came up the following proposal in
> HADOOP-13363.
> > >>
> > >>    Luckily, As far as I know, no APIs, either public to user or
> between
> > >> Hadoop processes, is not directly using protobuf classes in
> signatures.
> > >> (If
> > >> any exist, please let us know).
> > >>
> > >>    Proposal:
> > >>    ------------
> > >>
> > >>    1. Create a artifact(s) which contains shaded dependencies. All
> such
> > >> shading/relocation will be with known prefix
> > >> **org.apache.hadoop.thirdparty.**.
> > >>    2. Right now protobuf jar (ex:
> > o.a.h.thirdparty:hadoop-shaded-protobuf)
> > >> to start with, all **com.google.protobuf** classes will be relocated
> as
> > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> > >>    3. Hadoop modules, which needs protobuf as dependency, will add
> this
> > >> shaded artifact as dependency (ex:
> > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> > >>    4. All previous usages of "com.google.protobuf" will be relocated
> to
> > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and
> will
> > be
> > >> committed. Please note, this replacement is One-Time directly in
> source
> > >> code, NOT during compile and package.
> > >>    5. Once all usages of "com.google.protobuf" is relocated, then
> hadoop
> > >> dont care about which version of original  "protobuf-java" is in
> > >> dependency.
> > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to break
> > the
> > >> downstreams. But hadoop will be originally using the latest protobuf
> > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> > >>
> > >>    7. Coming back to separate repo, Following are most appropriate
> > reasons
> > >> of keeping shaded dependency artifact in separate repo instead of
> > >> submodule.
> > >>
> > >>       7a. These artifacts need not be built all the time. It needs to
> be
> > >> built only when there is a change in the dependency version or the
> build
> > >> process.
> > >>       7b. If added as "submodule in Hadoop repo",
> > maven-shade-plugin:shade
> > >> will execute only in package phase. That means, "mvn compile" or "mvn
> > >> test-compile" will not be failed as this artifact will not have
> > relocated
> > >> classes, instead it will have original classes, resulting in
> compilation
> > >> failure. Workaround, build thirdparty submodule first and exclude
> > >> "thirdparty" submodule in other executions. This will be a complex
> > process
> > >> compared to keeping in a separate repo.
> > >>
> > >>       7c. Separate repo, will be a subproject of Hadoop, using the
> same
> > >> HADOOP jira project, with different versioning prefixed with
> > "thirdparty-"
> > >> (ex: thirdparty-1.0.0).
> > >>       7d. Separate will have same release process as Hadoop.
> > >>
> > >>     HADOOP-13363 (https://issues.apache.org/jira/browse/HADOOP-13363)
> > is
> > >> an
> > >> umbrella jira tracking the changes to protobuf upgrade.
> > >>
> > >>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has been
> > >> raised
> > >> for separate repo creation in (HADOOP-16595 (
> > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> > >>
> > >>     Please provide your inputs for the proposal and review the PR to
> > >> proceed with the proposal.
> > >>
> > >>
> > >    -Thanks,
> > >>     Vinay
> > >>
> > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> > >> vinodkv@apache.org>
> > >> wrote:
> > >>
> > >> > Moving the thread to the dev lists.
> > >> >
> > >> > Thanks
> > >> > +Vinod
> > >> >
> > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> > vinayakumarb@apache.org>
> > >> > wrote:
> > >> > >
> > >> > > Thanks Marton,
> > >> > >
> > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
> > >> > > Whether to use that repo  for shaded artifact or not will be
> > >> monitored in
> > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> discussion.
> > >> > >
> > >> > > There is no existing codebase is being moved out of hadoop repo.
> So
> > I
> > >> > think
> > >> > > right now we are good to go.
> > >> > >
> > >> > > -Vinay
> > >> > >
> > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org>
> > wrote:
> > >> > >
> > >> > >>
> > >> > >> I am not sure if it's defined when is a vote required.
> > >> > >>
> > >> > >> https://www.apache.org/foundation/voting.html
> > >> > >>
> > >> > >> Personally I think it's a big enough change to send a
> notification
> > to
> > >> > the
> > >> > >> dev lists with a 'lazy consensus'  closure
> > >> > >>
> > >> > >> Marton
> > >> > >>
> > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <vi...@apache.org>
> > >> wrote:
> > >> > >>> Hi,
> > >> > >>>
> > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be more
> in
> > >> > >> future)
> > >> > >>> will be kept as a shaded artifact in a separate repo, which will
> > be
> > >> > >>> referred as dependency in hadoop modules.  This approach avoids
> > >> shading
> > >> > >> of
> > >> > >>> every submodule during build.
> > >> > >>>
> > >> > >>> So question is does any VOTE required before asking to create a
> > git
> > >> > repo?
> > >> > >>>
> > >> > >>> On selfserve platform
> > https://gitbox.apache.org/setup/newrepo.html
> > >> > >>> I can access see that, requester should be PMC.
> > >> > >>>
> > >> > >>> Wanted to confirm here first.
> > >> > >>>
> > >> > >>> -Vinay
> > >> > >>>
> > >> > >>
> > >> > >>
> > ---------------------------------------------------------------------
> > >> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
> > >> > >> For additional commands, e-mail: private-help@hadoop.apache.org
> > >> > >>
> > >> > >>
> > >> >
> > >> >
> > >>
> > >
> >
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Vinayakumar B <vi...@apache.org>.
Hi All,

I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
's suggestions.

    i. Renamed the module to 'hadoop-shaded-protobuf37'
    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'

Please review!!

Thanks,
-Vinay


On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <pa...@gmail.com>
wrote:

> For HBase we have a separated repo for hbase-thirdparty
>
> https://github.com/apache/hbase-thirdparty
>
> We will publish the artifacts to nexus so we do not need to include
> binaries in our git repo, just add a dependency in the pom.
>
>
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
>
>
> And it has its own release cycles, only when there are special requirements
> or we want to upgrade some of the dependencies. This is the vote thread for
> the newest release, where we want to provide a shaded gson for jdk7.
>
>
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
>
>
> Thanks.
>
> Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
>
> > Please find replies inline.
> >
> > -Vinay
> >
> > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <ow...@gmail.com>
> > wrote:
> >
> > > I'm very unhappy with this direction. In particular, I don't think git
> is
> > > a good place for distribution of binary artifacts. Furthermore, the PMC
> > > shouldn't be releasing anything without a release vote.
> > >
> > >
> > Proposed solution doesnt release any binaries in git. Its actually a
> > complete sub-project which follows entire release process, including VOTE
> > in public. I have mentioned already that release process is similar to
> > hadoop.
> > To be specific, using the (almost) same script used in hadoop to generate
> > artifacts, sign and deploy to staging repository. Please let me know If I
> > am conveying anything wrong.
> >
> >
> > > I'd propose that we make a third party module that contains the
> *source*
> > > of the pom files to build the relocated jars. This should absolutely be
> > > treated as a last resort for the mostly Google projects that regularly
> > > break binary compatibility (eg. Protobuf & Guava).
> > >
> > >
> > Same has been implemented in the PR
> > https://github.com/apache/hadoop-thirdparty/pull/1. Please check and let
> > me
> > know If I misunderstood. Yes, this is the last option we have AFAIK.
> >
> >
> > > In terms of naming, I'd propose something like:
> > >
> > > org.apache.hadoop.thirdparty.protobuf2_5
> > > org.apache.hadoop.thirdparty.guava28
> > >
> > > In particular, I think we absolutely need to include the version of the
> > > underlying project. On the other hand, since we should not be shading
> > > *everything* we can drop the leading com.google.
> > >
> > >
> > IMO, This naming convention is easy for identifying the underlying
> project,
> > but  it will be difficult to maintain going forward if underlying project
> > versions changes. Since thirdparty module have its own releases, each of
> > those release can be mapped to specific version of underlying project.
> Even
> > the binary artifact can include a MANIFEST with underlying project
> details
> > as per Steve's suggestion on HADOOP-13363.
> > That said, if you still prefer to have project number in artifact id, it
> > can be done.
> >
> > The Hadoop project can make releases of  the thirdparty module:
> > >
> > > <dependency>
> > >   <groupId>org.apache.hadoop</groupId>
> > >   <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> > >   <version>1.0</version>
> > > </dependency>
> > >
> > >
> > Note that the version has to be the hadoop thirdparty release number,
> which
> > > is part of why you need to have the underlying version in the artifact
> > > name. These we can push to maven central as new releases from Hadoop.
> > >
> > >
> > Exactly, same has been implemented in the PR. hadoop-thirdparty module
> have
> > its own releases. But in HADOOP Jira, thirdparty versions can be
> > differentiated using prefix "thirdparty-".
> >
> > Same solution is being followed in HBase. May be people involved in HBase
> > can add some points here.
> >
> > Thoughts?
> > >
> > > .. Owen
> > >
> > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <vinayakumarb@apache.org
> >
> > > wrote:
> > >
> > >> Hi All,
> > >>
> > >>    I wanted to discuss about the separate repo for thirdparty
> > dependencies
> > >> which we need to shaded and include in Hadoop component's jars.
> > >>
> > >>    Apologies for the big text ahead, but this needs clear
> explanation!!
> > >>
> > >>    Right now most needed such dependency is protobuf. Protobuf
> > dependency
> > >> was not upgraded from 2.5.0 onwards with the fear that downstream
> > builds,
> > >> which depends on transitive dependency protobuf coming from hadoop's
> > jars,
> > >> may fail with the upgrade. Apparently protobuf does not guarantee
> source
> > >> compatibility, though it guarantees wire compatibility between
> versions.
> > >> Because of this behavior, version upgrade may cause breakage in known
> > and
> > >> unknown (private?) downstreams.
> > >>
> > >>    So to tackle this, we came up the following proposal in
> HADOOP-13363.
> > >>
> > >>    Luckily, As far as I know, no APIs, either public to user or
> between
> > >> Hadoop processes, is not directly using protobuf classes in
> signatures.
> > >> (If
> > >> any exist, please let us know).
> > >>
> > >>    Proposal:
> > >>    ------------
> > >>
> > >>    1. Create a artifact(s) which contains shaded dependencies. All
> such
> > >> shading/relocation will be with known prefix
> > >> **org.apache.hadoop.thirdparty.**.
> > >>    2. Right now protobuf jar (ex:
> > o.a.h.thirdparty:hadoop-shaded-protobuf)
> > >> to start with, all **com.google.protobuf** classes will be relocated
> as
> > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> > >>    3. Hadoop modules, which needs protobuf as dependency, will add
> this
> > >> shaded artifact as dependency (ex:
> > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> > >>    4. All previous usages of "com.google.protobuf" will be relocated
> to
> > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and
> will
> > be
> > >> committed. Please note, this replacement is One-Time directly in
> source
> > >> code, NOT during compile and package.
> > >>    5. Once all usages of "com.google.protobuf" is relocated, then
> hadoop
> > >> dont care about which version of original  "protobuf-java" is in
> > >> dependency.
> > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to break
> > the
> > >> downstreams. But hadoop will be originally using the latest protobuf
> > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> > >>
> > >>    7. Coming back to separate repo, Following are most appropriate
> > reasons
> > >> of keeping shaded dependency artifact in separate repo instead of
> > >> submodule.
> > >>
> > >>       7a. These artifacts need not be built all the time. It needs to
> be
> > >> built only when there is a change in the dependency version or the
> build
> > >> process.
> > >>       7b. If added as "submodule in Hadoop repo",
> > maven-shade-plugin:shade
> > >> will execute only in package phase. That means, "mvn compile" or "mvn
> > >> test-compile" will not be failed as this artifact will not have
> > relocated
> > >> classes, instead it will have original classes, resulting in
> compilation
> > >> failure. Workaround, build thirdparty submodule first and exclude
> > >> "thirdparty" submodule in other executions. This will be a complex
> > process
> > >> compared to keeping in a separate repo.
> > >>
> > >>       7c. Separate repo, will be a subproject of Hadoop, using the
> same
> > >> HADOOP jira project, with different versioning prefixed with
> > "thirdparty-"
> > >> (ex: thirdparty-1.0.0).
> > >>       7d. Separate will have same release process as Hadoop.
> > >>
> > >>     HADOOP-13363 (https://issues.apache.org/jira/browse/HADOOP-13363)
> > is
> > >> an
> > >> umbrella jira tracking the changes to protobuf upgrade.
> > >>
> > >>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has been
> > >> raised
> > >> for separate repo creation in (HADOOP-16595 (
> > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> > >>
> > >>     Please provide your inputs for the proposal and review the PR to
> > >> proceed with the proposal.
> > >>
> > >>
> > >    -Thanks,
> > >>     Vinay
> > >>
> > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> > >> vinodkv@apache.org>
> > >> wrote:
> > >>
> > >> > Moving the thread to the dev lists.
> > >> >
> > >> > Thanks
> > >> > +Vinod
> > >> >
> > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> > vinayakumarb@apache.org>
> > >> > wrote:
> > >> > >
> > >> > > Thanks Marton,
> > >> > >
> > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
> > >> > > Whether to use that repo  for shaded artifact or not will be
> > >> monitored in
> > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> discussion.
> > >> > >
> > >> > > There is no existing codebase is being moved out of hadoop repo.
> So
> > I
> > >> > think
> > >> > > right now we are good to go.
> > >> > >
> > >> > > -Vinay
> > >> > >
> > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org>
> > wrote:
> > >> > >
> > >> > >>
> > >> > >> I am not sure if it's defined when is a vote required.
> > >> > >>
> > >> > >> https://www.apache.org/foundation/voting.html
> > >> > >>
> > >> > >> Personally I think it's a big enough change to send a
> notification
> > to
> > >> > the
> > >> > >> dev lists with a 'lazy consensus'  closure
> > >> > >>
> > >> > >> Marton
> > >> > >>
> > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <vi...@apache.org>
> > >> wrote:
> > >> > >>> Hi,
> > >> > >>>
> > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be more
> in
> > >> > >> future)
> > >> > >>> will be kept as a shaded artifact in a separate repo, which will
> > be
> > >> > >>> referred as dependency in hadoop modules.  This approach avoids
> > >> shading
> > >> > >> of
> > >> > >>> every submodule during build.
> > >> > >>>
> > >> > >>> So question is does any VOTE required before asking to create a
> > git
> > >> > repo?
> > >> > >>>
> > >> > >>> On selfserve platform
> > https://gitbox.apache.org/setup/newrepo.html
> > >> > >>> I can access see that, requester should be PMC.
> > >> > >>>
> > >> > >>> Wanted to confirm here first.
> > >> > >>>
> > >> > >>> -Vinay
> > >> > >>>
> > >> > >>
> > >> > >>
> > ---------------------------------------------------------------------
> > >> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
> > >> > >> For additional commands, e-mail: private-help@hadoop.apache.org
> > >> > >>
> > >> > >>
> > >> >
> > >> >
> > >>
> > >
> >
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Vinayakumar B <vi...@apache.org>.
Hi All,

I have updated the PR as per @Owen O'Malley <ow...@gmail.com>
's suggestions.

    i. Renamed the module to 'hadoop-shaded-protobuf37'
    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'

Please review!!

Thanks,
-Vinay


On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <pa...@gmail.com>
wrote:

> For HBase we have a separated repo for hbase-thirdparty
>
> https://github.com/apache/hbase-thirdparty
>
> We will publish the artifacts to nexus so we do not need to include
> binaries in our git repo, just add a dependency in the pom.
>
>
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
>
>
> And it has its own release cycles, only when there are special requirements
> or we want to upgrade some of the dependencies. This is the vote thread for
> the newest release, where we want to provide a shaded gson for jdk7.
>
>
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
>
>
> Thanks.
>
> Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:
>
> > Please find replies inline.
> >
> > -Vinay
> >
> > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <ow...@gmail.com>
> > wrote:
> >
> > > I'm very unhappy with this direction. In particular, I don't think git
> is
> > > a good place for distribution of binary artifacts. Furthermore, the PMC
> > > shouldn't be releasing anything without a release vote.
> > >
> > >
> > Proposed solution doesnt release any binaries in git. Its actually a
> > complete sub-project which follows entire release process, including VOTE
> > in public. I have mentioned already that release process is similar to
> > hadoop.
> > To be specific, using the (almost) same script used in hadoop to generate
> > artifacts, sign and deploy to staging repository. Please let me know If I
> > am conveying anything wrong.
> >
> >
> > > I'd propose that we make a third party module that contains the
> *source*
> > > of the pom files to build the relocated jars. This should absolutely be
> > > treated as a last resort for the mostly Google projects that regularly
> > > break binary compatibility (eg. Protobuf & Guava).
> > >
> > >
> > Same has been implemented in the PR
> > https://github.com/apache/hadoop-thirdparty/pull/1. Please check and let
> > me
> > know If I misunderstood. Yes, this is the last option we have AFAIK.
> >
> >
> > > In terms of naming, I'd propose something like:
> > >
> > > org.apache.hadoop.thirdparty.protobuf2_5
> > > org.apache.hadoop.thirdparty.guava28
> > >
> > > In particular, I think we absolutely need to include the version of the
> > > underlying project. On the other hand, since we should not be shading
> > > *everything* we can drop the leading com.google.
> > >
> > >
> > IMO, This naming convention is easy for identifying the underlying
> project,
> > but  it will be difficult to maintain going forward if underlying project
> > versions changes. Since thirdparty module have its own releases, each of
> > those release can be mapped to specific version of underlying project.
> Even
> > the binary artifact can include a MANIFEST with underlying project
> details
> > as per Steve's suggestion on HADOOP-13363.
> > That said, if you still prefer to have project number in artifact id, it
> > can be done.
> >
> > The Hadoop project can make releases of  the thirdparty module:
> > >
> > > <dependency>
> > >   <groupId>org.apache.hadoop</groupId>
> > >   <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> > >   <version>1.0</version>
> > > </dependency>
> > >
> > >
> > Note that the version has to be the hadoop thirdparty release number,
> which
> > > is part of why you need to have the underlying version in the artifact
> > > name. These we can push to maven central as new releases from Hadoop.
> > >
> > >
> > Exactly, same has been implemented in the PR. hadoop-thirdparty module
> have
> > its own releases. But in HADOOP Jira, thirdparty versions can be
> > differentiated using prefix "thirdparty-".
> >
> > Same solution is being followed in HBase. May be people involved in HBase
> > can add some points here.
> >
> > Thoughts?
> > >
> > > .. Owen
> > >
> > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <vinayakumarb@apache.org
> >
> > > wrote:
> > >
> > >> Hi All,
> > >>
> > >>    I wanted to discuss about the separate repo for thirdparty
> > dependencies
> > >> which we need to shaded and include in Hadoop component's jars.
> > >>
> > >>    Apologies for the big text ahead, but this needs clear
> explanation!!
> > >>
> > >>    Right now most needed such dependency is protobuf. Protobuf
> > dependency
> > >> was not upgraded from 2.5.0 onwards with the fear that downstream
> > builds,
> > >> which depends on transitive dependency protobuf coming from hadoop's
> > jars,
> > >> may fail with the upgrade. Apparently protobuf does not guarantee
> source
> > >> compatibility, though it guarantees wire compatibility between
> versions.
> > >> Because of this behavior, version upgrade may cause breakage in known
> > and
> > >> unknown (private?) downstreams.
> > >>
> > >>    So to tackle this, we came up the following proposal in
> HADOOP-13363.
> > >>
> > >>    Luckily, As far as I know, no APIs, either public to user or
> between
> > >> Hadoop processes, is not directly using protobuf classes in
> signatures.
> > >> (If
> > >> any exist, please let us know).
> > >>
> > >>    Proposal:
> > >>    ------------
> > >>
> > >>    1. Create a artifact(s) which contains shaded dependencies. All
> such
> > >> shading/relocation will be with known prefix
> > >> **org.apache.hadoop.thirdparty.**.
> > >>    2. Right now protobuf jar (ex:
> > o.a.h.thirdparty:hadoop-shaded-protobuf)
> > >> to start with, all **com.google.protobuf** classes will be relocated
> as
> > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> > >>    3. Hadoop modules, which needs protobuf as dependency, will add
> this
> > >> shaded artifact as dependency (ex:
> > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> > >>    4. All previous usages of "com.google.protobuf" will be relocated
> to
> > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and
> will
> > be
> > >> committed. Please note, this replacement is One-Time directly in
> source
> > >> code, NOT during compile and package.
> > >>    5. Once all usages of "com.google.protobuf" is relocated, then
> hadoop
> > >> dont care about which version of original  "protobuf-java" is in
> > >> dependency.
> > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to break
> > the
> > >> downstreams. But hadoop will be originally using the latest protobuf
> > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> > >>
> > >>    7. Coming back to separate repo, Following are most appropriate
> > reasons
> > >> of keeping shaded dependency artifact in separate repo instead of
> > >> submodule.
> > >>
> > >>       7a. These artifacts need not be built all the time. It needs to
> be
> > >> built only when there is a change in the dependency version or the
> build
> > >> process.
> > >>       7b. If added as "submodule in Hadoop repo",
> > maven-shade-plugin:shade
> > >> will execute only in package phase. That means, "mvn compile" or "mvn
> > >> test-compile" will not be failed as this artifact will not have
> > relocated
> > >> classes, instead it will have original classes, resulting in
> compilation
> > >> failure. Workaround, build thirdparty submodule first and exclude
> > >> "thirdparty" submodule in other executions. This will be a complex
> > process
> > >> compared to keeping in a separate repo.
> > >>
> > >>       7c. Separate repo, will be a subproject of Hadoop, using the
> same
> > >> HADOOP jira project, with different versioning prefixed with
> > "thirdparty-"
> > >> (ex: thirdparty-1.0.0).
> > >>       7d. Separate will have same release process as Hadoop.
> > >>
> > >>     HADOOP-13363 (https://issues.apache.org/jira/browse/HADOOP-13363)
> > is
> > >> an
> > >> umbrella jira tracking the changes to protobuf upgrade.
> > >>
> > >>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has been
> > >> raised
> > >> for separate repo creation in (HADOOP-16595 (
> > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> > >>
> > >>     Please provide your inputs for the proposal and review the PR to
> > >> proceed with the proposal.
> > >>
> > >>
> > >    -Thanks,
> > >>     Vinay
> > >>
> > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> > >> vinodkv@apache.org>
> > >> wrote:
> > >>
> > >> > Moving the thread to the dev lists.
> > >> >
> > >> > Thanks
> > >> > +Vinod
> > >> >
> > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> > vinayakumarb@apache.org>
> > >> > wrote:
> > >> > >
> > >> > > Thanks Marton,
> > >> > >
> > >> > > Current created 'hadoop-thirdparty' repo is empty right now.
> > >> > > Whether to use that repo  for shaded artifact or not will be
> > >> monitored in
> > >> > > HADOOP-13363 umbrella jira. Please feel free to join the
> discussion.
> > >> > >
> > >> > > There is no existing codebase is being moved out of hadoop repo.
> So
> > I
> > >> > think
> > >> > > right now we are good to go.
> > >> > >
> > >> > > -Vinay
> > >> > >
> > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org>
> > wrote:
> > >> > >
> > >> > >>
> > >> > >> I am not sure if it's defined when is a vote required.
> > >> > >>
> > >> > >> https://www.apache.org/foundation/voting.html
> > >> > >>
> > >> > >> Personally I think it's a big enough change to send a
> notification
> > to
> > >> > the
> > >> > >> dev lists with a 'lazy consensus'  closure
> > >> > >>
> > >> > >> Marton
> > >> > >>
> > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <vi...@apache.org>
> > >> wrote:
> > >> > >>> Hi,
> > >> > >>>
> > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be more
> in
> > >> > >> future)
> > >> > >>> will be kept as a shaded artifact in a separate repo, which will
> > be
> > >> > >>> referred as dependency in hadoop modules.  This approach avoids
> > >> shading
> > >> > >> of
> > >> > >>> every submodule during build.
> > >> > >>>
> > >> > >>> So question is does any VOTE required before asking to create a
> > git
> > >> > repo?
> > >> > >>>
> > >> > >>> On selfserve platform
> > https://gitbox.apache.org/setup/newrepo.html
> > >> > >>> I can access see that, requester should be PMC.
> > >> > >>>
> > >> > >>> Wanted to confirm here first.
> > >> > >>>
> > >> > >>> -Vinay
> > >> > >>>
> > >> > >>
> > >> > >>
> > ---------------------------------------------------------------------
> > >> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
> > >> > >> For additional commands, e-mail: private-help@hadoop.apache.org
> > >> > >>
> > >> > >>
> > >> >
> > >> >
> > >>
> > >
> >
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
For HBase we have a separated repo for hbase-thirdparty

https://github.com/apache/hbase-thirdparty

We will publish the artifacts to nexus so we do not need to include
binaries in our git repo, just add a dependency in the pom.

https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf


And it has its own release cycles, only when there are special requirements
or we want to upgrade some of the dependencies. This is the vote thread for
the newest release, where we want to provide a shaded gson for jdk7.

https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E


Thanks.

Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:

> Please find replies inline.
>
> -Vinay
>
> On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <ow...@gmail.com>
> wrote:
>
> > I'm very unhappy with this direction. In particular, I don't think git is
> > a good place for distribution of binary artifacts. Furthermore, the PMC
> > shouldn't be releasing anything without a release vote.
> >
> >
> Proposed solution doesnt release any binaries in git. Its actually a
> complete sub-project which follows entire release process, including VOTE
> in public. I have mentioned already that release process is similar to
> hadoop.
> To be specific, using the (almost) same script used in hadoop to generate
> artifacts, sign and deploy to staging repository. Please let me know If I
> am conveying anything wrong.
>
>
> > I'd propose that we make a third party module that contains the *source*
> > of the pom files to build the relocated jars. This should absolutely be
> > treated as a last resort for the mostly Google projects that regularly
> > break binary compatibility (eg. Protobuf & Guava).
> >
> >
> Same has been implemented in the PR
> https://github.com/apache/hadoop-thirdparty/pull/1. Please check and let
> me
> know If I misunderstood. Yes, this is the last option we have AFAIK.
>
>
> > In terms of naming, I'd propose something like:
> >
> > org.apache.hadoop.thirdparty.protobuf2_5
> > org.apache.hadoop.thirdparty.guava28
> >
> > In particular, I think we absolutely need to include the version of the
> > underlying project. On the other hand, since we should not be shading
> > *everything* we can drop the leading com.google.
> >
> >
> IMO, This naming convention is easy for identifying the underlying project,
> but  it will be difficult to maintain going forward if underlying project
> versions changes. Since thirdparty module have its own releases, each of
> those release can be mapped to specific version of underlying project. Even
> the binary artifact can include a MANIFEST with underlying project details
> as per Steve's suggestion on HADOOP-13363.
> That said, if you still prefer to have project number in artifact id, it
> can be done.
>
> The Hadoop project can make releases of  the thirdparty module:
> >
> > <dependency>
> >   <groupId>org.apache.hadoop</groupId>
> >   <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> >   <version>1.0</version>
> > </dependency>
> >
> >
> Note that the version has to be the hadoop thirdparty release number, which
> > is part of why you need to have the underlying version in the artifact
> > name. These we can push to maven central as new releases from Hadoop.
> >
> >
> Exactly, same has been implemented in the PR. hadoop-thirdparty module have
> its own releases. But in HADOOP Jira, thirdparty versions can be
> differentiated using prefix "thirdparty-".
>
> Same solution is being followed in HBase. May be people involved in HBase
> can add some points here.
>
> Thoughts?
> >
> > .. Owen
> >
> > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <vi...@apache.org>
> > wrote:
> >
> >> Hi All,
> >>
> >>    I wanted to discuss about the separate repo for thirdparty
> dependencies
> >> which we need to shaded and include in Hadoop component's jars.
> >>
> >>    Apologies for the big text ahead, but this needs clear explanation!!
> >>
> >>    Right now most needed such dependency is protobuf. Protobuf
> dependency
> >> was not upgraded from 2.5.0 onwards with the fear that downstream
> builds,
> >> which depends on transitive dependency protobuf coming from hadoop's
> jars,
> >> may fail with the upgrade. Apparently protobuf does not guarantee source
> >> compatibility, though it guarantees wire compatibility between versions.
> >> Because of this behavior, version upgrade may cause breakage in known
> and
> >> unknown (private?) downstreams.
> >>
> >>    So to tackle this, we came up the following proposal in HADOOP-13363.
> >>
> >>    Luckily, As far as I know, no APIs, either public to user or between
> >> Hadoop processes, is not directly using protobuf classes in signatures.
> >> (If
> >> any exist, please let us know).
> >>
> >>    Proposal:
> >>    ------------
> >>
> >>    1. Create a artifact(s) which contains shaded dependencies. All such
> >> shading/relocation will be with known prefix
> >> **org.apache.hadoop.thirdparty.**.
> >>    2. Right now protobuf jar (ex:
> o.a.h.thirdparty:hadoop-shaded-protobuf)
> >> to start with, all **com.google.protobuf** classes will be relocated as
> >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> >>    3. Hadoop modules, which needs protobuf as dependency, will add this
> >> shaded artifact as dependency (ex:
> >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> >>    4. All previous usages of "com.google.protobuf" will be relocated to
> >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and will
> be
> >> committed. Please note, this replacement is One-Time directly in source
> >> code, NOT during compile and package.
> >>    5. Once all usages of "com.google.protobuf" is relocated, then hadoop
> >> dont care about which version of original  "protobuf-java" is in
> >> dependency.
> >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to break
> the
> >> downstreams. But hadoop will be originally using the latest protobuf
> >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> >>
> >>    7. Coming back to separate repo, Following are most appropriate
> reasons
> >> of keeping shaded dependency artifact in separate repo instead of
> >> submodule.
> >>
> >>       7a. These artifacts need not be built all the time. It needs to be
> >> built only when there is a change in the dependency version or the build
> >> process.
> >>       7b. If added as "submodule in Hadoop repo",
> maven-shade-plugin:shade
> >> will execute only in package phase. That means, "mvn compile" or "mvn
> >> test-compile" will not be failed as this artifact will not have
> relocated
> >> classes, instead it will have original classes, resulting in compilation
> >> failure. Workaround, build thirdparty submodule first and exclude
> >> "thirdparty" submodule in other executions. This will be a complex
> process
> >> compared to keeping in a separate repo.
> >>
> >>       7c. Separate repo, will be a subproject of Hadoop, using the same
> >> HADOOP jira project, with different versioning prefixed with
> "thirdparty-"
> >> (ex: thirdparty-1.0.0).
> >>       7d. Separate will have same release process as Hadoop.
> >>
> >>     HADOOP-13363 (https://issues.apache.org/jira/browse/HADOOP-13363)
> is
> >> an
> >> umbrella jira tracking the changes to protobuf upgrade.
> >>
> >>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has been
> >> raised
> >> for separate repo creation in (HADOOP-16595 (
> >> https://issues.apache.org/jira/browse/HADOOP-16595)
> >>
> >>     Please provide your inputs for the proposal and review the PR to
> >> proceed with the proposal.
> >>
> >>
> >    -Thanks,
> >>     Vinay
> >>
> >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> >> vinodkv@apache.org>
> >> wrote:
> >>
> >> > Moving the thread to the dev lists.
> >> >
> >> > Thanks
> >> > +Vinod
> >> >
> >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> vinayakumarb@apache.org>
> >> > wrote:
> >> > >
> >> > > Thanks Marton,
> >> > >
> >> > > Current created 'hadoop-thirdparty' repo is empty right now.
> >> > > Whether to use that repo  for shaded artifact or not will be
> >> monitored in
> >> > > HADOOP-13363 umbrella jira. Please feel free to join the discussion.
> >> > >
> >> > > There is no existing codebase is being moved out of hadoop repo. So
> I
> >> > think
> >> > > right now we are good to go.
> >> > >
> >> > > -Vinay
> >> > >
> >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org>
> wrote:
> >> > >
> >> > >>
> >> > >> I am not sure if it's defined when is a vote required.
> >> > >>
> >> > >> https://www.apache.org/foundation/voting.html
> >> > >>
> >> > >> Personally I think it's a big enough change to send a notification
> to
> >> > the
> >> > >> dev lists with a 'lazy consensus'  closure
> >> > >>
> >> > >> Marton
> >> > >>
> >> > >> On 2019/09/23 17:46:37, Vinayakumar B <vi...@apache.org>
> >> wrote:
> >> > >>> Hi,
> >> > >>>
> >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be more in
> >> > >> future)
> >> > >>> will be kept as a shaded artifact in a separate repo, which will
> be
> >> > >>> referred as dependency in hadoop modules.  This approach avoids
> >> shading
> >> > >> of
> >> > >>> every submodule during build.
> >> > >>>
> >> > >>> So question is does any VOTE required before asking to create a
> git
> >> > repo?
> >> > >>>
> >> > >>> On selfserve platform
> https://gitbox.apache.org/setup/newrepo.html
> >> > >>> I can access see that, requester should be PMC.
> >> > >>>
> >> > >>> Wanted to confirm here first.
> >> > >>>
> >> > >>> -Vinay
> >> > >>>
> >> > >>
> >> > >>
> ---------------------------------------------------------------------
> >> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
> >> > >> For additional commands, e-mail: private-help@hadoop.apache.org
> >> > >>
> >> > >>
> >> >
> >> >
> >>
> >
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
For HBase we have a separated repo for hbase-thirdparty

https://github.com/apache/hbase-thirdparty

We will publish the artifacts to nexus so we do not need to include
binaries in our git repo, just add a dependency in the pom.

https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf


And it has its own release cycles, only when there are special requirements
or we want to upgrade some of the dependencies. This is the vote thread for
the newest release, where we want to provide a shaded gson for jdk7.

https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E


Thanks.

Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:

> Please find replies inline.
>
> -Vinay
>
> On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <ow...@gmail.com>
> wrote:
>
> > I'm very unhappy with this direction. In particular, I don't think git is
> > a good place for distribution of binary artifacts. Furthermore, the PMC
> > shouldn't be releasing anything without a release vote.
> >
> >
> Proposed solution doesnt release any binaries in git. Its actually a
> complete sub-project which follows entire release process, including VOTE
> in public. I have mentioned already that release process is similar to
> hadoop.
> To be specific, using the (almost) same script used in hadoop to generate
> artifacts, sign and deploy to staging repository. Please let me know If I
> am conveying anything wrong.
>
>
> > I'd propose that we make a third party module that contains the *source*
> > of the pom files to build the relocated jars. This should absolutely be
> > treated as a last resort for the mostly Google projects that regularly
> > break binary compatibility (eg. Protobuf & Guava).
> >
> >
> Same has been implemented in the PR
> https://github.com/apache/hadoop-thirdparty/pull/1. Please check and let
> me
> know If I misunderstood. Yes, this is the last option we have AFAIK.
>
>
> > In terms of naming, I'd propose something like:
> >
> > org.apache.hadoop.thirdparty.protobuf2_5
> > org.apache.hadoop.thirdparty.guava28
> >
> > In particular, I think we absolutely need to include the version of the
> > underlying project. On the other hand, since we should not be shading
> > *everything* we can drop the leading com.google.
> >
> >
> IMO, This naming convention is easy for identifying the underlying project,
> but  it will be difficult to maintain going forward if underlying project
> versions changes. Since thirdparty module have its own releases, each of
> those release can be mapped to specific version of underlying project. Even
> the binary artifact can include a MANIFEST with underlying project details
> as per Steve's suggestion on HADOOP-13363.
> That said, if you still prefer to have project number in artifact id, it
> can be done.
>
> The Hadoop project can make releases of  the thirdparty module:
> >
> > <dependency>
> >   <groupId>org.apache.hadoop</groupId>
> >   <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> >   <version>1.0</version>
> > </dependency>
> >
> >
> Note that the version has to be the hadoop thirdparty release number, which
> > is part of why you need to have the underlying version in the artifact
> > name. These we can push to maven central as new releases from Hadoop.
> >
> >
> Exactly, same has been implemented in the PR. hadoop-thirdparty module have
> its own releases. But in HADOOP Jira, thirdparty versions can be
> differentiated using prefix "thirdparty-".
>
> Same solution is being followed in HBase. May be people involved in HBase
> can add some points here.
>
> Thoughts?
> >
> > .. Owen
> >
> > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <vi...@apache.org>
> > wrote:
> >
> >> Hi All,
> >>
> >>    I wanted to discuss about the separate repo for thirdparty
> dependencies
> >> which we need to shaded and include in Hadoop component's jars.
> >>
> >>    Apologies for the big text ahead, but this needs clear explanation!!
> >>
> >>    Right now most needed such dependency is protobuf. Protobuf
> dependency
> >> was not upgraded from 2.5.0 onwards with the fear that downstream
> builds,
> >> which depends on transitive dependency protobuf coming from hadoop's
> jars,
> >> may fail with the upgrade. Apparently protobuf does not guarantee source
> >> compatibility, though it guarantees wire compatibility between versions.
> >> Because of this behavior, version upgrade may cause breakage in known
> and
> >> unknown (private?) downstreams.
> >>
> >>    So to tackle this, we came up the following proposal in HADOOP-13363.
> >>
> >>    Luckily, As far as I know, no APIs, either public to user or between
> >> Hadoop processes, is not directly using protobuf classes in signatures.
> >> (If
> >> any exist, please let us know).
> >>
> >>    Proposal:
> >>    ------------
> >>
> >>    1. Create a artifact(s) which contains shaded dependencies. All such
> >> shading/relocation will be with known prefix
> >> **org.apache.hadoop.thirdparty.**.
> >>    2. Right now protobuf jar (ex:
> o.a.h.thirdparty:hadoop-shaded-protobuf)
> >> to start with, all **com.google.protobuf** classes will be relocated as
> >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> >>    3. Hadoop modules, which needs protobuf as dependency, will add this
> >> shaded artifact as dependency (ex:
> >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> >>    4. All previous usages of "com.google.protobuf" will be relocated to
> >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and will
> be
> >> committed. Please note, this replacement is One-Time directly in source
> >> code, NOT during compile and package.
> >>    5. Once all usages of "com.google.protobuf" is relocated, then hadoop
> >> dont care about which version of original  "protobuf-java" is in
> >> dependency.
> >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to break
> the
> >> downstreams. But hadoop will be originally using the latest protobuf
> >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> >>
> >>    7. Coming back to separate repo, Following are most appropriate
> reasons
> >> of keeping shaded dependency artifact in separate repo instead of
> >> submodule.
> >>
> >>       7a. These artifacts need not be built all the time. It needs to be
> >> built only when there is a change in the dependency version or the build
> >> process.
> >>       7b. If added as "submodule in Hadoop repo",
> maven-shade-plugin:shade
> >> will execute only in package phase. That means, "mvn compile" or "mvn
> >> test-compile" will not be failed as this artifact will not have
> relocated
> >> classes, instead it will have original classes, resulting in compilation
> >> failure. Workaround, build thirdparty submodule first and exclude
> >> "thirdparty" submodule in other executions. This will be a complex
> process
> >> compared to keeping in a separate repo.
> >>
> >>       7c. Separate repo, will be a subproject of Hadoop, using the same
> >> HADOOP jira project, with different versioning prefixed with
> "thirdparty-"
> >> (ex: thirdparty-1.0.0).
> >>       7d. Separate will have same release process as Hadoop.
> >>
> >>     HADOOP-13363 (https://issues.apache.org/jira/browse/HADOOP-13363)
> is
> >> an
> >> umbrella jira tracking the changes to protobuf upgrade.
> >>
> >>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has been
> >> raised
> >> for separate repo creation in (HADOOP-16595 (
> >> https://issues.apache.org/jira/browse/HADOOP-16595)
> >>
> >>     Please provide your inputs for the proposal and review the PR to
> >> proceed with the proposal.
> >>
> >>
> >    -Thanks,
> >>     Vinay
> >>
> >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> >> vinodkv@apache.org>
> >> wrote:
> >>
> >> > Moving the thread to the dev lists.
> >> >
> >> > Thanks
> >> > +Vinod
> >> >
> >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> vinayakumarb@apache.org>
> >> > wrote:
> >> > >
> >> > > Thanks Marton,
> >> > >
> >> > > Current created 'hadoop-thirdparty' repo is empty right now.
> >> > > Whether to use that repo  for shaded artifact or not will be
> >> monitored in
> >> > > HADOOP-13363 umbrella jira. Please feel free to join the discussion.
> >> > >
> >> > > There is no existing codebase is being moved out of hadoop repo. So
> I
> >> > think
> >> > > right now we are good to go.
> >> > >
> >> > > -Vinay
> >> > >
> >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org>
> wrote:
> >> > >
> >> > >>
> >> > >> I am not sure if it's defined when is a vote required.
> >> > >>
> >> > >> https://www.apache.org/foundation/voting.html
> >> > >>
> >> > >> Personally I think it's a big enough change to send a notification
> to
> >> > the
> >> > >> dev lists with a 'lazy consensus'  closure
> >> > >>
> >> > >> Marton
> >> > >>
> >> > >> On 2019/09/23 17:46:37, Vinayakumar B <vi...@apache.org>
> >> wrote:
> >> > >>> Hi,
> >> > >>>
> >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be more in
> >> > >> future)
> >> > >>> will be kept as a shaded artifact in a separate repo, which will
> be
> >> > >>> referred as dependency in hadoop modules.  This approach avoids
> >> shading
> >> > >> of
> >> > >>> every submodule during build.
> >> > >>>
> >> > >>> So question is does any VOTE required before asking to create a
> git
> >> > repo?
> >> > >>>
> >> > >>> On selfserve platform
> https://gitbox.apache.org/setup/newrepo.html
> >> > >>> I can access see that, requester should be PMC.
> >> > >>>
> >> > >>> Wanted to confirm here first.
> >> > >>>
> >> > >>> -Vinay
> >> > >>>
> >> > >>
> >> > >>
> ---------------------------------------------------------------------
> >> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
> >> > >> For additional commands, e-mail: private-help@hadoop.apache.org
> >> > >>
> >> > >>
> >> >
> >> >
> >>
> >
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
For HBase we have a separated repo for hbase-thirdparty

https://github.com/apache/hbase-thirdparty

We will publish the artifacts to nexus so we do not need to include
binaries in our git repo, just add a dependency in the pom.

https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf


And it has its own release cycles, only when there are special requirements
or we want to upgrade some of the dependencies. This is the vote thread for
the newest release, where we want to provide a shaded gson for jdk7.

https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E


Thanks.

Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:

> Please find replies inline.
>
> -Vinay
>
> On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <ow...@gmail.com>
> wrote:
>
> > I'm very unhappy with this direction. In particular, I don't think git is
> > a good place for distribution of binary artifacts. Furthermore, the PMC
> > shouldn't be releasing anything without a release vote.
> >
> >
> Proposed solution doesnt release any binaries in git. Its actually a
> complete sub-project which follows entire release process, including VOTE
> in public. I have mentioned already that release process is similar to
> hadoop.
> To be specific, using the (almost) same script used in hadoop to generate
> artifacts, sign and deploy to staging repository. Please let me know If I
> am conveying anything wrong.
>
>
> > I'd propose that we make a third party module that contains the *source*
> > of the pom files to build the relocated jars. This should absolutely be
> > treated as a last resort for the mostly Google projects that regularly
> > break binary compatibility (eg. Protobuf & Guava).
> >
> >
> Same has been implemented in the PR
> https://github.com/apache/hadoop-thirdparty/pull/1. Please check and let
> me
> know If I misunderstood. Yes, this is the last option we have AFAIK.
>
>
> > In terms of naming, I'd propose something like:
> >
> > org.apache.hadoop.thirdparty.protobuf2_5
> > org.apache.hadoop.thirdparty.guava28
> >
> > In particular, I think we absolutely need to include the version of the
> > underlying project. On the other hand, since we should not be shading
> > *everything* we can drop the leading com.google.
> >
> >
> IMO, This naming convention is easy for identifying the underlying project,
> but  it will be difficult to maintain going forward if underlying project
> versions changes. Since thirdparty module have its own releases, each of
> those release can be mapped to specific version of underlying project. Even
> the binary artifact can include a MANIFEST with underlying project details
> as per Steve's suggestion on HADOOP-13363.
> That said, if you still prefer to have project number in artifact id, it
> can be done.
>
> The Hadoop project can make releases of  the thirdparty module:
> >
> > <dependency>
> >   <groupId>org.apache.hadoop</groupId>
> >   <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> >   <version>1.0</version>
> > </dependency>
> >
> >
> Note that the version has to be the hadoop thirdparty release number, which
> > is part of why you need to have the underlying version in the artifact
> > name. These we can push to maven central as new releases from Hadoop.
> >
> >
> Exactly, same has been implemented in the PR. hadoop-thirdparty module have
> its own releases. But in HADOOP Jira, thirdparty versions can be
> differentiated using prefix "thirdparty-".
>
> Same solution is being followed in HBase. May be people involved in HBase
> can add some points here.
>
> Thoughts?
> >
> > .. Owen
> >
> > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <vi...@apache.org>
> > wrote:
> >
> >> Hi All,
> >>
> >>    I wanted to discuss about the separate repo for thirdparty
> dependencies
> >> which we need to shaded and include in Hadoop component's jars.
> >>
> >>    Apologies for the big text ahead, but this needs clear explanation!!
> >>
> >>    Right now most needed such dependency is protobuf. Protobuf
> dependency
> >> was not upgraded from 2.5.0 onwards with the fear that downstream
> builds,
> >> which depends on transitive dependency protobuf coming from hadoop's
> jars,
> >> may fail with the upgrade. Apparently protobuf does not guarantee source
> >> compatibility, though it guarantees wire compatibility between versions.
> >> Because of this behavior, version upgrade may cause breakage in known
> and
> >> unknown (private?) downstreams.
> >>
> >>    So to tackle this, we came up the following proposal in HADOOP-13363.
> >>
> >>    Luckily, As far as I know, no APIs, either public to user or between
> >> Hadoop processes, is not directly using protobuf classes in signatures.
> >> (If
> >> any exist, please let us know).
> >>
> >>    Proposal:
> >>    ------------
> >>
> >>    1. Create a artifact(s) which contains shaded dependencies. All such
> >> shading/relocation will be with known prefix
> >> **org.apache.hadoop.thirdparty.**.
> >>    2. Right now protobuf jar (ex:
> o.a.h.thirdparty:hadoop-shaded-protobuf)
> >> to start with, all **com.google.protobuf** classes will be relocated as
> >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> >>    3. Hadoop modules, which needs protobuf as dependency, will add this
> >> shaded artifact as dependency (ex:
> >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> >>    4. All previous usages of "com.google.protobuf" will be relocated to
> >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and will
> be
> >> committed. Please note, this replacement is One-Time directly in source
> >> code, NOT during compile and package.
> >>    5. Once all usages of "com.google.protobuf" is relocated, then hadoop
> >> dont care about which version of original  "protobuf-java" is in
> >> dependency.
> >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to break
> the
> >> downstreams. But hadoop will be originally using the latest protobuf
> >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> >>
> >>    7. Coming back to separate repo, Following are most appropriate
> reasons
> >> of keeping shaded dependency artifact in separate repo instead of
> >> submodule.
> >>
> >>       7a. These artifacts need not be built all the time. It needs to be
> >> built only when there is a change in the dependency version or the build
> >> process.
> >>       7b. If added as "submodule in Hadoop repo",
> maven-shade-plugin:shade
> >> will execute only in package phase. That means, "mvn compile" or "mvn
> >> test-compile" will not be failed as this artifact will not have
> relocated
> >> classes, instead it will have original classes, resulting in compilation
> >> failure. Workaround, build thirdparty submodule first and exclude
> >> "thirdparty" submodule in other executions. This will be a complex
> process
> >> compared to keeping in a separate repo.
> >>
> >>       7c. Separate repo, will be a subproject of Hadoop, using the same
> >> HADOOP jira project, with different versioning prefixed with
> "thirdparty-"
> >> (ex: thirdparty-1.0.0).
> >>       7d. Separate will have same release process as Hadoop.
> >>
> >>     HADOOP-13363 (https://issues.apache.org/jira/browse/HADOOP-13363)
> is
> >> an
> >> umbrella jira tracking the changes to protobuf upgrade.
> >>
> >>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has been
> >> raised
> >> for separate repo creation in (HADOOP-16595 (
> >> https://issues.apache.org/jira/browse/HADOOP-16595)
> >>
> >>     Please provide your inputs for the proposal and review the PR to
> >> proceed with the proposal.
> >>
> >>
> >    -Thanks,
> >>     Vinay
> >>
> >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> >> vinodkv@apache.org>
> >> wrote:
> >>
> >> > Moving the thread to the dev lists.
> >> >
> >> > Thanks
> >> > +Vinod
> >> >
> >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> vinayakumarb@apache.org>
> >> > wrote:
> >> > >
> >> > > Thanks Marton,
> >> > >
> >> > > Current created 'hadoop-thirdparty' repo is empty right now.
> >> > > Whether to use that repo  for shaded artifact or not will be
> >> monitored in
> >> > > HADOOP-13363 umbrella jira. Please feel free to join the discussion.
> >> > >
> >> > > There is no existing codebase is being moved out of hadoop repo. So
> I
> >> > think
> >> > > right now we are good to go.
> >> > >
> >> > > -Vinay
> >> > >
> >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org>
> wrote:
> >> > >
> >> > >>
> >> > >> I am not sure if it's defined when is a vote required.
> >> > >>
> >> > >> https://www.apache.org/foundation/voting.html
> >> > >>
> >> > >> Personally I think it's a big enough change to send a notification
> to
> >> > the
> >> > >> dev lists with a 'lazy consensus'  closure
> >> > >>
> >> > >> Marton
> >> > >>
> >> > >> On 2019/09/23 17:46:37, Vinayakumar B <vi...@apache.org>
> >> wrote:
> >> > >>> Hi,
> >> > >>>
> >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be more in
> >> > >> future)
> >> > >>> will be kept as a shaded artifact in a separate repo, which will
> be
> >> > >>> referred as dependency in hadoop modules.  This approach avoids
> >> shading
> >> > >> of
> >> > >>> every submodule during build.
> >> > >>>
> >> > >>> So question is does any VOTE required before asking to create a
> git
> >> > repo?
> >> > >>>
> >> > >>> On selfserve platform
> https://gitbox.apache.org/setup/newrepo.html
> >> > >>> I can access see that, requester should be PMC.
> >> > >>>
> >> > >>> Wanted to confirm here first.
> >> > >>>
> >> > >>> -Vinay
> >> > >>>
> >> > >>
> >> > >>
> ---------------------------------------------------------------------
> >> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
> >> > >> For additional commands, e-mail: private-help@hadoop.apache.org
> >> > >>
> >> > >>
> >> >
> >> >
> >>
> >
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
For HBase we have a separated repo for hbase-thirdparty

https://github.com/apache/hbase-thirdparty

We will publish the artifacts to nexus so we do not need to include
binaries in our git repo, just add a dependency in the pom.

https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf


And it has its own release cycles, only when there are special requirements
or we want to upgrade some of the dependencies. This is the vote thread for
the newest release, where we want to provide a shaded gson for jdk7.

https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E


Thanks.

Vinayakumar B <vi...@apache.org> 于2019年9月28日周六 上午1:28写道:

> Please find replies inline.
>
> -Vinay
>
> On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <ow...@gmail.com>
> wrote:
>
> > I'm very unhappy with this direction. In particular, I don't think git is
> > a good place for distribution of binary artifacts. Furthermore, the PMC
> > shouldn't be releasing anything without a release vote.
> >
> >
> Proposed solution doesnt release any binaries in git. Its actually a
> complete sub-project which follows entire release process, including VOTE
> in public. I have mentioned already that release process is similar to
> hadoop.
> To be specific, using the (almost) same script used in hadoop to generate
> artifacts, sign and deploy to staging repository. Please let me know If I
> am conveying anything wrong.
>
>
> > I'd propose that we make a third party module that contains the *source*
> > of the pom files to build the relocated jars. This should absolutely be
> > treated as a last resort for the mostly Google projects that regularly
> > break binary compatibility (eg. Protobuf & Guava).
> >
> >
> Same has been implemented in the PR
> https://github.com/apache/hadoop-thirdparty/pull/1. Please check and let
> me
> know If I misunderstood. Yes, this is the last option we have AFAIK.
>
>
> > In terms of naming, I'd propose something like:
> >
> > org.apache.hadoop.thirdparty.protobuf2_5
> > org.apache.hadoop.thirdparty.guava28
> >
> > In particular, I think we absolutely need to include the version of the
> > underlying project. On the other hand, since we should not be shading
> > *everything* we can drop the leading com.google.
> >
> >
> IMO, This naming convention is easy for identifying the underlying project,
> but  it will be difficult to maintain going forward if underlying project
> versions changes. Since thirdparty module have its own releases, each of
> those release can be mapped to specific version of underlying project. Even
> the binary artifact can include a MANIFEST with underlying project details
> as per Steve's suggestion on HADOOP-13363.
> That said, if you still prefer to have project number in artifact id, it
> can be done.
>
> The Hadoop project can make releases of  the thirdparty module:
> >
> > <dependency>
> >   <groupId>org.apache.hadoop</groupId>
> >   <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> >   <version>1.0</version>
> > </dependency>
> >
> >
> Note that the version has to be the hadoop thirdparty release number, which
> > is part of why you need to have the underlying version in the artifact
> > name. These we can push to maven central as new releases from Hadoop.
> >
> >
> Exactly, same has been implemented in the PR. hadoop-thirdparty module have
> its own releases. But in HADOOP Jira, thirdparty versions can be
> differentiated using prefix "thirdparty-".
>
> Same solution is being followed in HBase. May be people involved in HBase
> can add some points here.
>
> Thoughts?
> >
> > .. Owen
> >
> > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <vi...@apache.org>
> > wrote:
> >
> >> Hi All,
> >>
> >>    I wanted to discuss about the separate repo for thirdparty
> dependencies
> >> which we need to shaded and include in Hadoop component's jars.
> >>
> >>    Apologies for the big text ahead, but this needs clear explanation!!
> >>
> >>    Right now most needed such dependency is protobuf. Protobuf
> dependency
> >> was not upgraded from 2.5.0 onwards with the fear that downstream
> builds,
> >> which depends on transitive dependency protobuf coming from hadoop's
> jars,
> >> may fail with the upgrade. Apparently protobuf does not guarantee source
> >> compatibility, though it guarantees wire compatibility between versions.
> >> Because of this behavior, version upgrade may cause breakage in known
> and
> >> unknown (private?) downstreams.
> >>
> >>    So to tackle this, we came up the following proposal in HADOOP-13363.
> >>
> >>    Luckily, As far as I know, no APIs, either public to user or between
> >> Hadoop processes, is not directly using protobuf classes in signatures.
> >> (If
> >> any exist, please let us know).
> >>
> >>    Proposal:
> >>    ------------
> >>
> >>    1. Create a artifact(s) which contains shaded dependencies. All such
> >> shading/relocation will be with known prefix
> >> **org.apache.hadoop.thirdparty.**.
> >>    2. Right now protobuf jar (ex:
> o.a.h.thirdparty:hadoop-shaded-protobuf)
> >> to start with, all **com.google.protobuf** classes will be relocated as
> >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> >>    3. Hadoop modules, which needs protobuf as dependency, will add this
> >> shaded artifact as dependency (ex:
> >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> >>    4. All previous usages of "com.google.protobuf" will be relocated to
> >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and will
> be
> >> committed. Please note, this replacement is One-Time directly in source
> >> code, NOT during compile and package.
> >>    5. Once all usages of "com.google.protobuf" is relocated, then hadoop
> >> dont care about which version of original  "protobuf-java" is in
> >> dependency.
> >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to break
> the
> >> downstreams. But hadoop will be originally using the latest protobuf
> >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> >>
> >>    7. Coming back to separate repo, Following are most appropriate
> reasons
> >> of keeping shaded dependency artifact in separate repo instead of
> >> submodule.
> >>
> >>       7a. These artifacts need not be built all the time. It needs to be
> >> built only when there is a change in the dependency version or the build
> >> process.
> >>       7b. If added as "submodule in Hadoop repo",
> maven-shade-plugin:shade
> >> will execute only in package phase. That means, "mvn compile" or "mvn
> >> test-compile" will not be failed as this artifact will not have
> relocated
> >> classes, instead it will have original classes, resulting in compilation
> >> failure. Workaround, build thirdparty submodule first and exclude
> >> "thirdparty" submodule in other executions. This will be a complex
> process
> >> compared to keeping in a separate repo.
> >>
> >>       7c. Separate repo, will be a subproject of Hadoop, using the same
> >> HADOOP jira project, with different versioning prefixed with
> "thirdparty-"
> >> (ex: thirdparty-1.0.0).
> >>       7d. Separate will have same release process as Hadoop.
> >>
> >>     HADOOP-13363 (https://issues.apache.org/jira/browse/HADOOP-13363)
> is
> >> an
> >> umbrella jira tracking the changes to protobuf upgrade.
> >>
> >>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has been
> >> raised
> >> for separate repo creation in (HADOOP-16595 (
> >> https://issues.apache.org/jira/browse/HADOOP-16595)
> >>
> >>     Please provide your inputs for the proposal and review the PR to
> >> proceed with the proposal.
> >>
> >>
> >    -Thanks,
> >>     Vinay
> >>
> >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> >> vinodkv@apache.org>
> >> wrote:
> >>
> >> > Moving the thread to the dev lists.
> >> >
> >> > Thanks
> >> > +Vinod
> >> >
> >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> vinayakumarb@apache.org>
> >> > wrote:
> >> > >
> >> > > Thanks Marton,
> >> > >
> >> > > Current created 'hadoop-thirdparty' repo is empty right now.
> >> > > Whether to use that repo  for shaded artifact or not will be
> >> monitored in
> >> > > HADOOP-13363 umbrella jira. Please feel free to join the discussion.
> >> > >
> >> > > There is no existing codebase is being moved out of hadoop repo. So
> I
> >> > think
> >> > > right now we are good to go.
> >> > >
> >> > > -Vinay
> >> > >
> >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org>
> wrote:
> >> > >
> >> > >>
> >> > >> I am not sure if it's defined when is a vote required.
> >> > >>
> >> > >> https://www.apache.org/foundation/voting.html
> >> > >>
> >> > >> Personally I think it's a big enough change to send a notification
> to
> >> > the
> >> > >> dev lists with a 'lazy consensus'  closure
> >> > >>
> >> > >> Marton
> >> > >>
> >> > >> On 2019/09/23 17:46:37, Vinayakumar B <vi...@apache.org>
> >> wrote:
> >> > >>> Hi,
> >> > >>>
> >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be more in
> >> > >> future)
> >> > >>> will be kept as a shaded artifact in a separate repo, which will
> be
> >> > >>> referred as dependency in hadoop modules.  This approach avoids
> >> shading
> >> > >> of
> >> > >>> every submodule during build.
> >> > >>>
> >> > >>> So question is does any VOTE required before asking to create a
> git
> >> > repo?
> >> > >>>
> >> > >>> On selfserve platform
> https://gitbox.apache.org/setup/newrepo.html
> >> > >>> I can access see that, requester should be PMC.
> >> > >>>
> >> > >>> Wanted to confirm here first.
> >> > >>>
> >> > >>> -Vinay
> >> > >>>
> >> > >>
> >> > >>
> ---------------------------------------------------------------------
> >> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
> >> > >> For additional commands, e-mail: private-help@hadoop.apache.org
> >> > >>
> >> > >>
> >> >
> >> >
> >>
> >
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Vinayakumar B <vi...@apache.org>.
Please find replies inline.

-Vinay

On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <ow...@gmail.com>
wrote:

> I'm very unhappy with this direction. In particular, I don't think git is
> a good place for distribution of binary artifacts. Furthermore, the PMC
> shouldn't be releasing anything without a release vote.
>
>
Proposed solution doesnt release any binaries in git. Its actually a
complete sub-project which follows entire release process, including VOTE
in public. I have mentioned already that release process is similar to
hadoop.
To be specific, using the (almost) same script used in hadoop to generate
artifacts, sign and deploy to staging repository. Please let me know If I
am conveying anything wrong.


> I'd propose that we make a third party module that contains the *source*
> of the pom files to build the relocated jars. This should absolutely be
> treated as a last resort for the mostly Google projects that regularly
> break binary compatibility (eg. Protobuf & Guava).
>
>
Same has been implemented in the PR
https://github.com/apache/hadoop-thirdparty/pull/1. Please check and let me
know If I misunderstood. Yes, this is the last option we have AFAIK.


> In terms of naming, I'd propose something like:
>
> org.apache.hadoop.thirdparty.protobuf2_5
> org.apache.hadoop.thirdparty.guava28
>
> In particular, I think we absolutely need to include the version of the
> underlying project. On the other hand, since we should not be shading
> *everything* we can drop the leading com.google.
>
>
IMO, This naming convention is easy for identifying the underlying project,
but  it will be difficult to maintain going forward if underlying project
versions changes. Since thirdparty module have its own releases, each of
those release can be mapped to specific version of underlying project. Even
the binary artifact can include a MANIFEST with underlying project details
as per Steve's suggestion on HADOOP-13363.
That said, if you still prefer to have project number in artifact id, it
can be done.

The Hadoop project can make releases of  the thirdparty module:
>
> <dependency>
>   <groupId>org.apache.hadoop</groupId>
>   <artifactId>hadoop-thirdparty-protobuf25</artifactId>
>   <version>1.0</version>
> </dependency>
>
>
Note that the version has to be the hadoop thirdparty release number, which
> is part of why you need to have the underlying version in the artifact
> name. These we can push to maven central as new releases from Hadoop.
>
>
Exactly, same has been implemented in the PR. hadoop-thirdparty module have
its own releases. But in HADOOP Jira, thirdparty versions can be
differentiated using prefix "thirdparty-".

Same solution is being followed in HBase. May be people involved in HBase
can add some points here.

Thoughts?
>
> .. Owen
>
> On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <vi...@apache.org>
> wrote:
>
>> Hi All,
>>
>>    I wanted to discuss about the separate repo for thirdparty dependencies
>> which we need to shaded and include in Hadoop component's jars.
>>
>>    Apologies for the big text ahead, but this needs clear explanation!!
>>
>>    Right now most needed such dependency is protobuf. Protobuf dependency
>> was not upgraded from 2.5.0 onwards with the fear that downstream builds,
>> which depends on transitive dependency protobuf coming from hadoop's jars,
>> may fail with the upgrade. Apparently protobuf does not guarantee source
>> compatibility, though it guarantees wire compatibility between versions.
>> Because of this behavior, version upgrade may cause breakage in known and
>> unknown (private?) downstreams.
>>
>>    So to tackle this, we came up the following proposal in HADOOP-13363.
>>
>>    Luckily, As far as I know, no APIs, either public to user or between
>> Hadoop processes, is not directly using protobuf classes in signatures.
>> (If
>> any exist, please let us know).
>>
>>    Proposal:
>>    ------------
>>
>>    1. Create a artifact(s) which contains shaded dependencies. All such
>> shading/relocation will be with known prefix
>> **org.apache.hadoop.thirdparty.**.
>>    2. Right now protobuf jar (ex: o.a.h.thirdparty:hadoop-shaded-protobuf)
>> to start with, all **com.google.protobuf** classes will be relocated as
>> **org.apache.hadoop.thirdparty.com.google.protobuf**.
>>    3. Hadoop modules, which needs protobuf as dependency, will add this
>> shaded artifact as dependency (ex:
>> o.a.h.thirdparty:hadoop-shaded-protobuf).
>>    4. All previous usages of "com.google.protobuf" will be relocated to
>> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and will be
>> committed. Please note, this replacement is One-Time directly in source
>> code, NOT during compile and package.
>>    5. Once all usages of "com.google.protobuf" is relocated, then hadoop
>> dont care about which version of original  "protobuf-java" is in
>> dependency.
>>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to break the
>> downstreams. But hadoop will be originally using the latest protobuf
>> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
>>
>>    7. Coming back to separate repo, Following are most appropriate reasons
>> of keeping shaded dependency artifact in separate repo instead of
>> submodule.
>>
>>       7a. These artifacts need not be built all the time. It needs to be
>> built only when there is a change in the dependency version or the build
>> process.
>>       7b. If added as "submodule in Hadoop repo", maven-shade-plugin:shade
>> will execute only in package phase. That means, "mvn compile" or "mvn
>> test-compile" will not be failed as this artifact will not have relocated
>> classes, instead it will have original classes, resulting in compilation
>> failure. Workaround, build thirdparty submodule first and exclude
>> "thirdparty" submodule in other executions. This will be a complex process
>> compared to keeping in a separate repo.
>>
>>       7c. Separate repo, will be a subproject of Hadoop, using the same
>> HADOOP jira project, with different versioning prefixed with "thirdparty-"
>> (ex: thirdparty-1.0.0).
>>       7d. Separate will have same release process as Hadoop.
>>
>>     HADOOP-13363 (https://issues.apache.org/jira/browse/HADOOP-13363) is
>> an
>> umbrella jira tracking the changes to protobuf upgrade.
>>
>>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has been
>> raised
>> for separate repo creation in (HADOOP-16595 (
>> https://issues.apache.org/jira/browse/HADOOP-16595)
>>
>>     Please provide your inputs for the proposal and review the PR to
>> proceed with the proposal.
>>
>>
>    -Thanks,
>>     Vinay
>>
>> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
>> vinodkv@apache.org>
>> wrote:
>>
>> > Moving the thread to the dev lists.
>> >
>> > Thanks
>> > +Vinod
>> >
>> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <vi...@apache.org>
>> > wrote:
>> > >
>> > > Thanks Marton,
>> > >
>> > > Current created 'hadoop-thirdparty' repo is empty right now.
>> > > Whether to use that repo  for shaded artifact or not will be
>> monitored in
>> > > HADOOP-13363 umbrella jira. Please feel free to join the discussion.
>> > >
>> > > There is no existing codebase is being moved out of hadoop repo. So I
>> > think
>> > > right now we are good to go.
>> > >
>> > > -Vinay
>> > >
>> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org> wrote:
>> > >
>> > >>
>> > >> I am not sure if it's defined when is a vote required.
>> > >>
>> > >> https://www.apache.org/foundation/voting.html
>> > >>
>> > >> Personally I think it's a big enough change to send a notification to
>> > the
>> > >> dev lists with a 'lazy consensus'  closure
>> > >>
>> > >> Marton
>> > >>
>> > >> On 2019/09/23 17:46:37, Vinayakumar B <vi...@apache.org>
>> wrote:
>> > >>> Hi,
>> > >>>
>> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be more in
>> > >> future)
>> > >>> will be kept as a shaded artifact in a separate repo, which will be
>> > >>> referred as dependency in hadoop modules.  This approach avoids
>> shading
>> > >> of
>> > >>> every submodule during build.
>> > >>>
>> > >>> So question is does any VOTE required before asking to create a git
>> > repo?
>> > >>>
>> > >>> On selfserve platform https://gitbox.apache.org/setup/newrepo.html
>> > >>> I can access see that, requester should be PMC.
>> > >>>
>> > >>> Wanted to confirm here first.
>> > >>>
>> > >>> -Vinay
>> > >>>
>> > >>
>> > >> ---------------------------------------------------------------------
>> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
>> > >> For additional commands, e-mail: private-help@hadoop.apache.org
>> > >>
>> > >>
>> >
>> >
>>
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Vinayakumar B <vi...@apache.org>.
Please find replies inline.

-Vinay

On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <ow...@gmail.com>
wrote:

> I'm very unhappy with this direction. In particular, I don't think git is
> a good place for distribution of binary artifacts. Furthermore, the PMC
> shouldn't be releasing anything without a release vote.
>
>
Proposed solution doesnt release any binaries in git. Its actually a
complete sub-project which follows entire release process, including VOTE
in public. I have mentioned already that release process is similar to
hadoop.
To be specific, using the (almost) same script used in hadoop to generate
artifacts, sign and deploy to staging repository. Please let me know If I
am conveying anything wrong.


> I'd propose that we make a third party module that contains the *source*
> of the pom files to build the relocated jars. This should absolutely be
> treated as a last resort for the mostly Google projects that regularly
> break binary compatibility (eg. Protobuf & Guava).
>
>
Same has been implemented in the PR
https://github.com/apache/hadoop-thirdparty/pull/1. Please check and let me
know If I misunderstood. Yes, this is the last option we have AFAIK.


> In terms of naming, I'd propose something like:
>
> org.apache.hadoop.thirdparty.protobuf2_5
> org.apache.hadoop.thirdparty.guava28
>
> In particular, I think we absolutely need to include the version of the
> underlying project. On the other hand, since we should not be shading
> *everything* we can drop the leading com.google.
>
>
IMO, This naming convention is easy for identifying the underlying project,
but  it will be difficult to maintain going forward if underlying project
versions changes. Since thirdparty module have its own releases, each of
those release can be mapped to specific version of underlying project. Even
the binary artifact can include a MANIFEST with underlying project details
as per Steve's suggestion on HADOOP-13363.
That said, if you still prefer to have project number in artifact id, it
can be done.

The Hadoop project can make releases of  the thirdparty module:
>
> <dependency>
>   <groupId>org.apache.hadoop</groupId>
>   <artifactId>hadoop-thirdparty-protobuf25</artifactId>
>   <version>1.0</version>
> </dependency>
>
>
Note that the version has to be the hadoop thirdparty release number, which
> is part of why you need to have the underlying version in the artifact
> name. These we can push to maven central as new releases from Hadoop.
>
>
Exactly, same has been implemented in the PR. hadoop-thirdparty module have
its own releases. But in HADOOP Jira, thirdparty versions can be
differentiated using prefix "thirdparty-".

Same solution is being followed in HBase. May be people involved in HBase
can add some points here.

Thoughts?
>
> .. Owen
>
> On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <vi...@apache.org>
> wrote:
>
>> Hi All,
>>
>>    I wanted to discuss about the separate repo for thirdparty dependencies
>> which we need to shaded and include in Hadoop component's jars.
>>
>>    Apologies for the big text ahead, but this needs clear explanation!!
>>
>>    Right now most needed such dependency is protobuf. Protobuf dependency
>> was not upgraded from 2.5.0 onwards with the fear that downstream builds,
>> which depends on transitive dependency protobuf coming from hadoop's jars,
>> may fail with the upgrade. Apparently protobuf does not guarantee source
>> compatibility, though it guarantees wire compatibility between versions.
>> Because of this behavior, version upgrade may cause breakage in known and
>> unknown (private?) downstreams.
>>
>>    So to tackle this, we came up the following proposal in HADOOP-13363.
>>
>>    Luckily, As far as I know, no APIs, either public to user or between
>> Hadoop processes, is not directly using protobuf classes in signatures.
>> (If
>> any exist, please let us know).
>>
>>    Proposal:
>>    ------------
>>
>>    1. Create a artifact(s) which contains shaded dependencies. All such
>> shading/relocation will be with known prefix
>> **org.apache.hadoop.thirdparty.**.
>>    2. Right now protobuf jar (ex: o.a.h.thirdparty:hadoop-shaded-protobuf)
>> to start with, all **com.google.protobuf** classes will be relocated as
>> **org.apache.hadoop.thirdparty.com.google.protobuf**.
>>    3. Hadoop modules, which needs protobuf as dependency, will add this
>> shaded artifact as dependency (ex:
>> o.a.h.thirdparty:hadoop-shaded-protobuf).
>>    4. All previous usages of "com.google.protobuf" will be relocated to
>> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and will be
>> committed. Please note, this replacement is One-Time directly in source
>> code, NOT during compile and package.
>>    5. Once all usages of "com.google.protobuf" is relocated, then hadoop
>> dont care about which version of original  "protobuf-java" is in
>> dependency.
>>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to break the
>> downstreams. But hadoop will be originally using the latest protobuf
>> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
>>
>>    7. Coming back to separate repo, Following are most appropriate reasons
>> of keeping shaded dependency artifact in separate repo instead of
>> submodule.
>>
>>       7a. These artifacts need not be built all the time. It needs to be
>> built only when there is a change in the dependency version or the build
>> process.
>>       7b. If added as "submodule in Hadoop repo", maven-shade-plugin:shade
>> will execute only in package phase. That means, "mvn compile" or "mvn
>> test-compile" will not be failed as this artifact will not have relocated
>> classes, instead it will have original classes, resulting in compilation
>> failure. Workaround, build thirdparty submodule first and exclude
>> "thirdparty" submodule in other executions. This will be a complex process
>> compared to keeping in a separate repo.
>>
>>       7c. Separate repo, will be a subproject of Hadoop, using the same
>> HADOOP jira project, with different versioning prefixed with "thirdparty-"
>> (ex: thirdparty-1.0.0).
>>       7d. Separate will have same release process as Hadoop.
>>
>>     HADOOP-13363 (https://issues.apache.org/jira/browse/HADOOP-13363) is
>> an
>> umbrella jira tracking the changes to protobuf upgrade.
>>
>>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has been
>> raised
>> for separate repo creation in (HADOOP-16595 (
>> https://issues.apache.org/jira/browse/HADOOP-16595)
>>
>>     Please provide your inputs for the proposal and review the PR to
>> proceed with the proposal.
>>
>>
>    -Thanks,
>>     Vinay
>>
>> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
>> vinodkv@apache.org>
>> wrote:
>>
>> > Moving the thread to the dev lists.
>> >
>> > Thanks
>> > +Vinod
>> >
>> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <vi...@apache.org>
>> > wrote:
>> > >
>> > > Thanks Marton,
>> > >
>> > > Current created 'hadoop-thirdparty' repo is empty right now.
>> > > Whether to use that repo  for shaded artifact or not will be
>> monitored in
>> > > HADOOP-13363 umbrella jira. Please feel free to join the discussion.
>> > >
>> > > There is no existing codebase is being moved out of hadoop repo. So I
>> > think
>> > > right now we are good to go.
>> > >
>> > > -Vinay
>> > >
>> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org> wrote:
>> > >
>> > >>
>> > >> I am not sure if it's defined when is a vote required.
>> > >>
>> > >> https://www.apache.org/foundation/voting.html
>> > >>
>> > >> Personally I think it's a big enough change to send a notification to
>> > the
>> > >> dev lists with a 'lazy consensus'  closure
>> > >>
>> > >> Marton
>> > >>
>> > >> On 2019/09/23 17:46:37, Vinayakumar B <vi...@apache.org>
>> wrote:
>> > >>> Hi,
>> > >>>
>> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be more in
>> > >> future)
>> > >>> will be kept as a shaded artifact in a separate repo, which will be
>> > >>> referred as dependency in hadoop modules.  This approach avoids
>> shading
>> > >> of
>> > >>> every submodule during build.
>> > >>>
>> > >>> So question is does any VOTE required before asking to create a git
>> > repo?
>> > >>>
>> > >>> On selfserve platform https://gitbox.apache.org/setup/newrepo.html
>> > >>> I can access see that, requester should be PMC.
>> > >>>
>> > >>> Wanted to confirm here first.
>> > >>>
>> > >>> -Vinay
>> > >>>
>> > >>
>> > >> ---------------------------------------------------------------------
>> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
>> > >> For additional commands, e-mail: private-help@hadoop.apache.org
>> > >>
>> > >>
>> >
>> >
>>
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Vinayakumar B <vi...@apache.org>.
Please find replies inline.

-Vinay

On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <ow...@gmail.com>
wrote:

> I'm very unhappy with this direction. In particular, I don't think git is
> a good place for distribution of binary artifacts. Furthermore, the PMC
> shouldn't be releasing anything without a release vote.
>
>
Proposed solution doesnt release any binaries in git. Its actually a
complete sub-project which follows entire release process, including VOTE
in public. I have mentioned already that release process is similar to
hadoop.
To be specific, using the (almost) same script used in hadoop to generate
artifacts, sign and deploy to staging repository. Please let me know If I
am conveying anything wrong.


> I'd propose that we make a third party module that contains the *source*
> of the pom files to build the relocated jars. This should absolutely be
> treated as a last resort for the mostly Google projects that regularly
> break binary compatibility (eg. Protobuf & Guava).
>
>
Same has been implemented in the PR
https://github.com/apache/hadoop-thirdparty/pull/1. Please check and let me
know If I misunderstood. Yes, this is the last option we have AFAIK.


> In terms of naming, I'd propose something like:
>
> org.apache.hadoop.thirdparty.protobuf2_5
> org.apache.hadoop.thirdparty.guava28
>
> In particular, I think we absolutely need to include the version of the
> underlying project. On the other hand, since we should not be shading
> *everything* we can drop the leading com.google.
>
>
IMO, This naming convention is easy for identifying the underlying project,
but  it will be difficult to maintain going forward if underlying project
versions changes. Since thirdparty module have its own releases, each of
those release can be mapped to specific version of underlying project. Even
the binary artifact can include a MANIFEST with underlying project details
as per Steve's suggestion on HADOOP-13363.
That said, if you still prefer to have project number in artifact id, it
can be done.

The Hadoop project can make releases of  the thirdparty module:
>
> <dependency>
>   <groupId>org.apache.hadoop</groupId>
>   <artifactId>hadoop-thirdparty-protobuf25</artifactId>
>   <version>1.0</version>
> </dependency>
>
>
Note that the version has to be the hadoop thirdparty release number, which
> is part of why you need to have the underlying version in the artifact
> name. These we can push to maven central as new releases from Hadoop.
>
>
Exactly, same has been implemented in the PR. hadoop-thirdparty module have
its own releases. But in HADOOP Jira, thirdparty versions can be
differentiated using prefix "thirdparty-".

Same solution is being followed in HBase. May be people involved in HBase
can add some points here.

Thoughts?
>
> .. Owen
>
> On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <vi...@apache.org>
> wrote:
>
>> Hi All,
>>
>>    I wanted to discuss about the separate repo for thirdparty dependencies
>> which we need to shaded and include in Hadoop component's jars.
>>
>>    Apologies for the big text ahead, but this needs clear explanation!!
>>
>>    Right now most needed such dependency is protobuf. Protobuf dependency
>> was not upgraded from 2.5.0 onwards with the fear that downstream builds,
>> which depends on transitive dependency protobuf coming from hadoop's jars,
>> may fail with the upgrade. Apparently protobuf does not guarantee source
>> compatibility, though it guarantees wire compatibility between versions.
>> Because of this behavior, version upgrade may cause breakage in known and
>> unknown (private?) downstreams.
>>
>>    So to tackle this, we came up the following proposal in HADOOP-13363.
>>
>>    Luckily, As far as I know, no APIs, either public to user or between
>> Hadoop processes, is not directly using protobuf classes in signatures.
>> (If
>> any exist, please let us know).
>>
>>    Proposal:
>>    ------------
>>
>>    1. Create a artifact(s) which contains shaded dependencies. All such
>> shading/relocation will be with known prefix
>> **org.apache.hadoop.thirdparty.**.
>>    2. Right now protobuf jar (ex: o.a.h.thirdparty:hadoop-shaded-protobuf)
>> to start with, all **com.google.protobuf** classes will be relocated as
>> **org.apache.hadoop.thirdparty.com.google.protobuf**.
>>    3. Hadoop modules, which needs protobuf as dependency, will add this
>> shaded artifact as dependency (ex:
>> o.a.h.thirdparty:hadoop-shaded-protobuf).
>>    4. All previous usages of "com.google.protobuf" will be relocated to
>> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and will be
>> committed. Please note, this replacement is One-Time directly in source
>> code, NOT during compile and package.
>>    5. Once all usages of "com.google.protobuf" is relocated, then hadoop
>> dont care about which version of original  "protobuf-java" is in
>> dependency.
>>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to break the
>> downstreams. But hadoop will be originally using the latest protobuf
>> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
>>
>>    7. Coming back to separate repo, Following are most appropriate reasons
>> of keeping shaded dependency artifact in separate repo instead of
>> submodule.
>>
>>       7a. These artifacts need not be built all the time. It needs to be
>> built only when there is a change in the dependency version or the build
>> process.
>>       7b. If added as "submodule in Hadoop repo", maven-shade-plugin:shade
>> will execute only in package phase. That means, "mvn compile" or "mvn
>> test-compile" will not be failed as this artifact will not have relocated
>> classes, instead it will have original classes, resulting in compilation
>> failure. Workaround, build thirdparty submodule first and exclude
>> "thirdparty" submodule in other executions. This will be a complex process
>> compared to keeping in a separate repo.
>>
>>       7c. Separate repo, will be a subproject of Hadoop, using the same
>> HADOOP jira project, with different versioning prefixed with "thirdparty-"
>> (ex: thirdparty-1.0.0).
>>       7d. Separate will have same release process as Hadoop.
>>
>>     HADOOP-13363 (https://issues.apache.org/jira/browse/HADOOP-13363) is
>> an
>> umbrella jira tracking the changes to protobuf upgrade.
>>
>>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has been
>> raised
>> for separate repo creation in (HADOOP-16595 (
>> https://issues.apache.org/jira/browse/HADOOP-16595)
>>
>>     Please provide your inputs for the proposal and review the PR to
>> proceed with the proposal.
>>
>>
>    -Thanks,
>>     Vinay
>>
>> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
>> vinodkv@apache.org>
>> wrote:
>>
>> > Moving the thread to the dev lists.
>> >
>> > Thanks
>> > +Vinod
>> >
>> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <vi...@apache.org>
>> > wrote:
>> > >
>> > > Thanks Marton,
>> > >
>> > > Current created 'hadoop-thirdparty' repo is empty right now.
>> > > Whether to use that repo  for shaded artifact or not will be
>> monitored in
>> > > HADOOP-13363 umbrella jira. Please feel free to join the discussion.
>> > >
>> > > There is no existing codebase is being moved out of hadoop repo. So I
>> > think
>> > > right now we are good to go.
>> > >
>> > > -Vinay
>> > >
>> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org> wrote:
>> > >
>> > >>
>> > >> I am not sure if it's defined when is a vote required.
>> > >>
>> > >> https://www.apache.org/foundation/voting.html
>> > >>
>> > >> Personally I think it's a big enough change to send a notification to
>> > the
>> > >> dev lists with a 'lazy consensus'  closure
>> > >>
>> > >> Marton
>> > >>
>> > >> On 2019/09/23 17:46:37, Vinayakumar B <vi...@apache.org>
>> wrote:
>> > >>> Hi,
>> > >>>
>> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be more in
>> > >> future)
>> > >>> will be kept as a shaded artifact in a separate repo, which will be
>> > >>> referred as dependency in hadoop modules.  This approach avoids
>> shading
>> > >> of
>> > >>> every submodule during build.
>> > >>>
>> > >>> So question is does any VOTE required before asking to create a git
>> > repo?
>> > >>>
>> > >>> On selfserve platform https://gitbox.apache.org/setup/newrepo.html
>> > >>> I can access see that, requester should be PMC.
>> > >>>
>> > >>> Wanted to confirm here first.
>> > >>>
>> > >>> -Vinay
>> > >>>
>> > >>
>> > >> ---------------------------------------------------------------------
>> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
>> > >> For additional commands, e-mail: private-help@hadoop.apache.org
>> > >>
>> > >>
>> >
>> >
>>
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Vinayakumar B <vi...@apache.org>.
Please find replies inline.

-Vinay

On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <ow...@gmail.com>
wrote:

> I'm very unhappy with this direction. In particular, I don't think git is
> a good place for distribution of binary artifacts. Furthermore, the PMC
> shouldn't be releasing anything without a release vote.
>
>
Proposed solution doesnt release any binaries in git. Its actually a
complete sub-project which follows entire release process, including VOTE
in public. I have mentioned already that release process is similar to
hadoop.
To be specific, using the (almost) same script used in hadoop to generate
artifacts, sign and deploy to staging repository. Please let me know If I
am conveying anything wrong.


> I'd propose that we make a third party module that contains the *source*
> of the pom files to build the relocated jars. This should absolutely be
> treated as a last resort for the mostly Google projects that regularly
> break binary compatibility (eg. Protobuf & Guava).
>
>
Same has been implemented in the PR
https://github.com/apache/hadoop-thirdparty/pull/1. Please check and let me
know If I misunderstood. Yes, this is the last option we have AFAIK.


> In terms of naming, I'd propose something like:
>
> org.apache.hadoop.thirdparty.protobuf2_5
> org.apache.hadoop.thirdparty.guava28
>
> In particular, I think we absolutely need to include the version of the
> underlying project. On the other hand, since we should not be shading
> *everything* we can drop the leading com.google.
>
>
IMO, This naming convention is easy for identifying the underlying project,
but  it will be difficult to maintain going forward if underlying project
versions changes. Since thirdparty module have its own releases, each of
those release can be mapped to specific version of underlying project. Even
the binary artifact can include a MANIFEST with underlying project details
as per Steve's suggestion on HADOOP-13363.
That said, if you still prefer to have project number in artifact id, it
can be done.

The Hadoop project can make releases of  the thirdparty module:
>
> <dependency>
>   <groupId>org.apache.hadoop</groupId>
>   <artifactId>hadoop-thirdparty-protobuf25</artifactId>
>   <version>1.0</version>
> </dependency>
>
>
Note that the version has to be the hadoop thirdparty release number, which
> is part of why you need to have the underlying version in the artifact
> name. These we can push to maven central as new releases from Hadoop.
>
>
Exactly, same has been implemented in the PR. hadoop-thirdparty module have
its own releases. But in HADOOP Jira, thirdparty versions can be
differentiated using prefix "thirdparty-".

Same solution is being followed in HBase. May be people involved in HBase
can add some points here.

Thoughts?
>
> .. Owen
>
> On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <vi...@apache.org>
> wrote:
>
>> Hi All,
>>
>>    I wanted to discuss about the separate repo for thirdparty dependencies
>> which we need to shaded and include in Hadoop component's jars.
>>
>>    Apologies for the big text ahead, but this needs clear explanation!!
>>
>>    Right now most needed such dependency is protobuf. Protobuf dependency
>> was not upgraded from 2.5.0 onwards with the fear that downstream builds,
>> which depends on transitive dependency protobuf coming from hadoop's jars,
>> may fail with the upgrade. Apparently protobuf does not guarantee source
>> compatibility, though it guarantees wire compatibility between versions.
>> Because of this behavior, version upgrade may cause breakage in known and
>> unknown (private?) downstreams.
>>
>>    So to tackle this, we came up the following proposal in HADOOP-13363.
>>
>>    Luckily, As far as I know, no APIs, either public to user or between
>> Hadoop processes, is not directly using protobuf classes in signatures.
>> (If
>> any exist, please let us know).
>>
>>    Proposal:
>>    ------------
>>
>>    1. Create a artifact(s) which contains shaded dependencies. All such
>> shading/relocation will be with known prefix
>> **org.apache.hadoop.thirdparty.**.
>>    2. Right now protobuf jar (ex: o.a.h.thirdparty:hadoop-shaded-protobuf)
>> to start with, all **com.google.protobuf** classes will be relocated as
>> **org.apache.hadoop.thirdparty.com.google.protobuf**.
>>    3. Hadoop modules, which needs protobuf as dependency, will add this
>> shaded artifact as dependency (ex:
>> o.a.h.thirdparty:hadoop-shaded-protobuf).
>>    4. All previous usages of "com.google.protobuf" will be relocated to
>> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and will be
>> committed. Please note, this replacement is One-Time directly in source
>> code, NOT during compile and package.
>>    5. Once all usages of "com.google.protobuf" is relocated, then hadoop
>> dont care about which version of original  "protobuf-java" is in
>> dependency.
>>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to break the
>> downstreams. But hadoop will be originally using the latest protobuf
>> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
>>
>>    7. Coming back to separate repo, Following are most appropriate reasons
>> of keeping shaded dependency artifact in separate repo instead of
>> submodule.
>>
>>       7a. These artifacts need not be built all the time. It needs to be
>> built only when there is a change in the dependency version or the build
>> process.
>>       7b. If added as "submodule in Hadoop repo", maven-shade-plugin:shade
>> will execute only in package phase. That means, "mvn compile" or "mvn
>> test-compile" will not be failed as this artifact will not have relocated
>> classes, instead it will have original classes, resulting in compilation
>> failure. Workaround, build thirdparty submodule first and exclude
>> "thirdparty" submodule in other executions. This will be a complex process
>> compared to keeping in a separate repo.
>>
>>       7c. Separate repo, will be a subproject of Hadoop, using the same
>> HADOOP jira project, with different versioning prefixed with "thirdparty-"
>> (ex: thirdparty-1.0.0).
>>       7d. Separate will have same release process as Hadoop.
>>
>>     HADOOP-13363 (https://issues.apache.org/jira/browse/HADOOP-13363) is
>> an
>> umbrella jira tracking the changes to protobuf upgrade.
>>
>>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has been
>> raised
>> for separate repo creation in (HADOOP-16595 (
>> https://issues.apache.org/jira/browse/HADOOP-16595)
>>
>>     Please provide your inputs for the proposal and review the PR to
>> proceed with the proposal.
>>
>>
>    -Thanks,
>>     Vinay
>>
>> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
>> vinodkv@apache.org>
>> wrote:
>>
>> > Moving the thread to the dev lists.
>> >
>> > Thanks
>> > +Vinod
>> >
>> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <vi...@apache.org>
>> > wrote:
>> > >
>> > > Thanks Marton,
>> > >
>> > > Current created 'hadoop-thirdparty' repo is empty right now.
>> > > Whether to use that repo  for shaded artifact or not will be
>> monitored in
>> > > HADOOP-13363 umbrella jira. Please feel free to join the discussion.
>> > >
>> > > There is no existing codebase is being moved out of hadoop repo. So I
>> > think
>> > > right now we are good to go.
>> > >
>> > > -Vinay
>> > >
>> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org> wrote:
>> > >
>> > >>
>> > >> I am not sure if it's defined when is a vote required.
>> > >>
>> > >> https://www.apache.org/foundation/voting.html
>> > >>
>> > >> Personally I think it's a big enough change to send a notification to
>> > the
>> > >> dev lists with a 'lazy consensus'  closure
>> > >>
>> > >> Marton
>> > >>
>> > >> On 2019/09/23 17:46:37, Vinayakumar B <vi...@apache.org>
>> wrote:
>> > >>> Hi,
>> > >>>
>> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be more in
>> > >> future)
>> > >>> will be kept as a shaded artifact in a separate repo, which will be
>> > >>> referred as dependency in hadoop modules.  This approach avoids
>> shading
>> > >> of
>> > >>> every submodule during build.
>> > >>>
>> > >>> So question is does any VOTE required before asking to create a git
>> > repo?
>> > >>>
>> > >>> On selfserve platform https://gitbox.apache.org/setup/newrepo.html
>> > >>> I can access see that, requester should be PMC.
>> > >>>
>> > >>> Wanted to confirm here first.
>> > >>>
>> > >>> -Vinay
>> > >>>
>> > >>
>> > >> ---------------------------------------------------------------------
>> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
>> > >> For additional commands, e-mail: private-help@hadoop.apache.org
>> > >>
>> > >>
>> >
>> >
>>
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Owen O'Malley <ow...@gmail.com>.
I'm very unhappy with this direction. In particular, I don't think git is a
good place for distribution of binary artifacts. Furthermore, the PMC
shouldn't be releasing anything without a release vote.

I'd propose that we make a third party module that contains the *source* of
the pom files to build the relocated jars. This should absolutely be
treated as a last resort for the mostly Google projects that regularly
break binary compatibility (eg. Protobuf & Guava).

In terms of naming, I'd propose something like:

org.apache.hadoop.thirdparty.protobuf2_5
org.apache.hadoop.thirdparty.guava28

In particular, I think we absolutely need to include the version of the
underlying project. On the other hand, since we should not be shading
*everything* we can drop the leading com.google.

The Hadoop project can make releases of  the thirdparty module:

<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
  <version>1.0</version>
</dependency>

Note that the version has to be the hadoop thirdparty release number, which
is part of why you need to have the underlying version in the artifact
name. These we can push to maven central as new releases from Hadoop.

Thoughts?

.. Owen

On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <vi...@apache.org>
wrote:

> Hi All,
>
>    I wanted to discuss about the separate repo for thirdparty dependencies
> which we need to shaded and include in Hadoop component's jars.
>
>    Apologies for the big text ahead, but this needs clear explanation!!
>
>    Right now most needed such dependency is protobuf. Protobuf dependency
> was not upgraded from 2.5.0 onwards with the fear that downstream builds,
> which depends on transitive dependency protobuf coming from hadoop's jars,
> may fail with the upgrade. Apparently protobuf does not guarantee source
> compatibility, though it guarantees wire compatibility between versions.
> Because of this behavior, version upgrade may cause breakage in known and
> unknown (private?) downstreams.
>
>    So to tackle this, we came up the following proposal in HADOOP-13363.
>
>    Luckily, As far as I know, no APIs, either public to user or between
> Hadoop processes, is not directly using protobuf classes in signatures. (If
> any exist, please let us know).
>
>    Proposal:
>    ------------
>
>    1. Create a artifact(s) which contains shaded dependencies. All such
> shading/relocation will be with known prefix
> **org.apache.hadoop.thirdparty.**.
>    2. Right now protobuf jar (ex: o.a.h.thirdparty:hadoop-shaded-protobuf)
> to start with, all **com.google.protobuf** classes will be relocated as
> **org.apache.hadoop.thirdparty.com.google.protobuf**.
>    3. Hadoop modules, which needs protobuf as dependency, will add this
> shaded artifact as dependency (ex:
> o.a.h.thirdparty:hadoop-shaded-protobuf).
>    4. All previous usages of "com.google.protobuf" will be relocated to
> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and will be
> committed. Please note, this replacement is One-Time directly in source
> code, NOT during compile and package.
>    5. Once all usages of "com.google.protobuf" is relocated, then hadoop
> dont care about which version of original  "protobuf-java" is in
> dependency.
>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to break the
> downstreams. But hadoop will be originally using the latest protobuf
> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
>
>    7. Coming back to separate repo, Following are most appropriate reasons
> of keeping shaded dependency artifact in separate repo instead of
> submodule.
>
>       7a. These artifacts need not be built all the time. It needs to be
> built only when there is a change in the dependency version or the build
> process.
>       7b. If added as "submodule in Hadoop repo", maven-shade-plugin:shade
> will execute only in package phase. That means, "mvn compile" or "mvn
> test-compile" will not be failed as this artifact will not have relocated
> classes, instead it will have original classes, resulting in compilation
> failure. Workaround, build thirdparty submodule first and exclude
> "thirdparty" submodule in other executions. This will be a complex process
> compared to keeping in a separate repo.
>
>       7c. Separate repo, will be a subproject of Hadoop, using the same
> HADOOP jira project, with different versioning prefixed with "thirdparty-"
> (ex: thirdparty-1.0.0).
>       7d. Separate will have same release process as Hadoop.
>
>
>     HADOOP-13363 (https://issues.apache.org/jira/browse/HADOOP-13363) is
> an
> umbrella jira tracking the changes to protobuf upgrade.
>
>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has been
> raised
> for separate repo creation in (HADOOP-16595 (
> https://issues.apache.org/jira/browse/HADOOP-16595)
>
>     Please provide your inputs for the proposal and review the PR to
> proceed with the proposal.
>
>
>    -Thanks,
>     Vinay
>
> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> vinodkv@apache.org>
> wrote:
>
> > Moving the thread to the dev lists.
> >
> > Thanks
> > +Vinod
> >
> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <vi...@apache.org>
> > wrote:
> > >
> > > Thanks Marton,
> > >
> > > Current created 'hadoop-thirdparty' repo is empty right now.
> > > Whether to use that repo  for shaded artifact or not will be monitored
> in
> > > HADOOP-13363 umbrella jira. Please feel free to join the discussion.
> > >
> > > There is no existing codebase is being moved out of hadoop repo. So I
> > think
> > > right now we are good to go.
> > >
> > > -Vinay
> > >
> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org> wrote:
> > >
> > >>
> > >> I am not sure if it's defined when is a vote required.
> > >>
> > >> https://www.apache.org/foundation/voting.html
> > >>
> > >> Personally I think it's a big enough change to send a notification to
> > the
> > >> dev lists with a 'lazy consensus'  closure
> > >>
> > >> Marton
> > >>
> > >> On 2019/09/23 17:46:37, Vinayakumar B <vi...@apache.org>
> wrote:
> > >>> Hi,
> > >>>
> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be more in
> > >> future)
> > >>> will be kept as a shaded artifact in a separate repo, which will be
> > >>> referred as dependency in hadoop modules.  This approach avoids
> shading
> > >> of
> > >>> every submodule during build.
> > >>>
> > >>> So question is does any VOTE required before asking to create a git
> > repo?
> > >>>
> > >>> On selfserve platform https://gitbox.apache.org/setup/newrepo.html
> > >>> I can access see that, requester should be PMC.
> > >>>
> > >>> Wanted to confirm here first.
> > >>>
> > >>> -Vinay
> > >>>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
> > >> For additional commands, e-mail: private-help@hadoop.apache.org
> > >>
> > >>
> >
> >
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Owen O'Malley <ow...@gmail.com>.
I'm very unhappy with this direction. In particular, I don't think git is a
good place for distribution of binary artifacts. Furthermore, the PMC
shouldn't be releasing anything without a release vote.

I'd propose that we make a third party module that contains the *source* of
the pom files to build the relocated jars. This should absolutely be
treated as a last resort for the mostly Google projects that regularly
break binary compatibility (eg. Protobuf & Guava).

In terms of naming, I'd propose something like:

org.apache.hadoop.thirdparty.protobuf2_5
org.apache.hadoop.thirdparty.guava28

In particular, I think we absolutely need to include the version of the
underlying project. On the other hand, since we should not be shading
*everything* we can drop the leading com.google.

The Hadoop project can make releases of  the thirdparty module:

<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
  <version>1.0</version>
</dependency>

Note that the version has to be the hadoop thirdparty release number, which
is part of why you need to have the underlying version in the artifact
name. These we can push to maven central as new releases from Hadoop.

Thoughts?

.. Owen

On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <vi...@apache.org>
wrote:

> Hi All,
>
>    I wanted to discuss about the separate repo for thirdparty dependencies
> which we need to shaded and include in Hadoop component's jars.
>
>    Apologies for the big text ahead, but this needs clear explanation!!
>
>    Right now most needed such dependency is protobuf. Protobuf dependency
> was not upgraded from 2.5.0 onwards with the fear that downstream builds,
> which depends on transitive dependency protobuf coming from hadoop's jars,
> may fail with the upgrade. Apparently protobuf does not guarantee source
> compatibility, though it guarantees wire compatibility between versions.
> Because of this behavior, version upgrade may cause breakage in known and
> unknown (private?) downstreams.
>
>    So to tackle this, we came up the following proposal in HADOOP-13363.
>
>    Luckily, As far as I know, no APIs, either public to user or between
> Hadoop processes, is not directly using protobuf classes in signatures. (If
> any exist, please let us know).
>
>    Proposal:
>    ------------
>
>    1. Create a artifact(s) which contains shaded dependencies. All such
> shading/relocation will be with known prefix
> **org.apache.hadoop.thirdparty.**.
>    2. Right now protobuf jar (ex: o.a.h.thirdparty:hadoop-shaded-protobuf)
> to start with, all **com.google.protobuf** classes will be relocated as
> **org.apache.hadoop.thirdparty.com.google.protobuf**.
>    3. Hadoop modules, which needs protobuf as dependency, will add this
> shaded artifact as dependency (ex:
> o.a.h.thirdparty:hadoop-shaded-protobuf).
>    4. All previous usages of "com.google.protobuf" will be relocated to
> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and will be
> committed. Please note, this replacement is One-Time directly in source
> code, NOT during compile and package.
>    5. Once all usages of "com.google.protobuf" is relocated, then hadoop
> dont care about which version of original  "protobuf-java" is in
> dependency.
>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to break the
> downstreams. But hadoop will be originally using the latest protobuf
> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
>
>    7. Coming back to separate repo, Following are most appropriate reasons
> of keeping shaded dependency artifact in separate repo instead of
> submodule.
>
>       7a. These artifacts need not be built all the time. It needs to be
> built only when there is a change in the dependency version or the build
> process.
>       7b. If added as "submodule in Hadoop repo", maven-shade-plugin:shade
> will execute only in package phase. That means, "mvn compile" or "mvn
> test-compile" will not be failed as this artifact will not have relocated
> classes, instead it will have original classes, resulting in compilation
> failure. Workaround, build thirdparty submodule first and exclude
> "thirdparty" submodule in other executions. This will be a complex process
> compared to keeping in a separate repo.
>
>       7c. Separate repo, will be a subproject of Hadoop, using the same
> HADOOP jira project, with different versioning prefixed with "thirdparty-"
> (ex: thirdparty-1.0.0).
>       7d. Separate will have same release process as Hadoop.
>
>
>     HADOOP-13363 (https://issues.apache.org/jira/browse/HADOOP-13363) is
> an
> umbrella jira tracking the changes to protobuf upgrade.
>
>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has been
> raised
> for separate repo creation in (HADOOP-16595 (
> https://issues.apache.org/jira/browse/HADOOP-16595)
>
>     Please provide your inputs for the proposal and review the PR to
> proceed with the proposal.
>
>
>    -Thanks,
>     Vinay
>
> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> vinodkv@apache.org>
> wrote:
>
> > Moving the thread to the dev lists.
> >
> > Thanks
> > +Vinod
> >
> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <vi...@apache.org>
> > wrote:
> > >
> > > Thanks Marton,
> > >
> > > Current created 'hadoop-thirdparty' repo is empty right now.
> > > Whether to use that repo  for shaded artifact or not will be monitored
> in
> > > HADOOP-13363 umbrella jira. Please feel free to join the discussion.
> > >
> > > There is no existing codebase is being moved out of hadoop repo. So I
> > think
> > > right now we are good to go.
> > >
> > > -Vinay
> > >
> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org> wrote:
> > >
> > >>
> > >> I am not sure if it's defined when is a vote required.
> > >>
> > >> https://www.apache.org/foundation/voting.html
> > >>
> > >> Personally I think it's a big enough change to send a notification to
> > the
> > >> dev lists with a 'lazy consensus'  closure
> > >>
> > >> Marton
> > >>
> > >> On 2019/09/23 17:46:37, Vinayakumar B <vi...@apache.org>
> wrote:
> > >>> Hi,
> > >>>
> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be more in
> > >> future)
> > >>> will be kept as a shaded artifact in a separate repo, which will be
> > >>> referred as dependency in hadoop modules.  This approach avoids
> shading
> > >> of
> > >>> every submodule during build.
> > >>>
> > >>> So question is does any VOTE required before asking to create a git
> > repo?
> > >>>
> > >>> On selfserve platform https://gitbox.apache.org/setup/newrepo.html
> > >>> I can access see that, requester should be PMC.
> > >>>
> > >>> Wanted to confirm here first.
> > >>>
> > >>> -Vinay
> > >>>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
> > >> For additional commands, e-mail: private-help@hadoop.apache.org
> > >>
> > >>
> >
> >
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Owen O'Malley <ow...@gmail.com>.
I'm very unhappy with this direction. In particular, I don't think git is a
good place for distribution of binary artifacts. Furthermore, the PMC
shouldn't be releasing anything without a release vote.

I'd propose that we make a third party module that contains the *source* of
the pom files to build the relocated jars. This should absolutely be
treated as a last resort for the mostly Google projects that regularly
break binary compatibility (eg. Protobuf & Guava).

In terms of naming, I'd propose something like:

org.apache.hadoop.thirdparty.protobuf2_5
org.apache.hadoop.thirdparty.guava28

In particular, I think we absolutely need to include the version of the
underlying project. On the other hand, since we should not be shading
*everything* we can drop the leading com.google.

The Hadoop project can make releases of  the thirdparty module:

<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
  <version>1.0</version>
</dependency>

Note that the version has to be the hadoop thirdparty release number, which
is part of why you need to have the underlying version in the artifact
name. These we can push to maven central as new releases from Hadoop.

Thoughts?

.. Owen

On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <vi...@apache.org>
wrote:

> Hi All,
>
>    I wanted to discuss about the separate repo for thirdparty dependencies
> which we need to shaded and include in Hadoop component's jars.
>
>    Apologies for the big text ahead, but this needs clear explanation!!
>
>    Right now most needed such dependency is protobuf. Protobuf dependency
> was not upgraded from 2.5.0 onwards with the fear that downstream builds,
> which depends on transitive dependency protobuf coming from hadoop's jars,
> may fail with the upgrade. Apparently protobuf does not guarantee source
> compatibility, though it guarantees wire compatibility between versions.
> Because of this behavior, version upgrade may cause breakage in known and
> unknown (private?) downstreams.
>
>    So to tackle this, we came up the following proposal in HADOOP-13363.
>
>    Luckily, As far as I know, no APIs, either public to user or between
> Hadoop processes, is not directly using protobuf classes in signatures. (If
> any exist, please let us know).
>
>    Proposal:
>    ------------
>
>    1. Create a artifact(s) which contains shaded dependencies. All such
> shading/relocation will be with known prefix
> **org.apache.hadoop.thirdparty.**.
>    2. Right now protobuf jar (ex: o.a.h.thirdparty:hadoop-shaded-protobuf)
> to start with, all **com.google.protobuf** classes will be relocated as
> **org.apache.hadoop.thirdparty.com.google.protobuf**.
>    3. Hadoop modules, which needs protobuf as dependency, will add this
> shaded artifact as dependency (ex:
> o.a.h.thirdparty:hadoop-shaded-protobuf).
>    4. All previous usages of "com.google.protobuf" will be relocated to
> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and will be
> committed. Please note, this replacement is One-Time directly in source
> code, NOT during compile and package.
>    5. Once all usages of "com.google.protobuf" is relocated, then hadoop
> dont care about which version of original  "protobuf-java" is in
> dependency.
>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to break the
> downstreams. But hadoop will be originally using the latest protobuf
> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
>
>    7. Coming back to separate repo, Following are most appropriate reasons
> of keeping shaded dependency artifact in separate repo instead of
> submodule.
>
>       7a. These artifacts need not be built all the time. It needs to be
> built only when there is a change in the dependency version or the build
> process.
>       7b. If added as "submodule in Hadoop repo", maven-shade-plugin:shade
> will execute only in package phase. That means, "mvn compile" or "mvn
> test-compile" will not be failed as this artifact will not have relocated
> classes, instead it will have original classes, resulting in compilation
> failure. Workaround, build thirdparty submodule first and exclude
> "thirdparty" submodule in other executions. This will be a complex process
> compared to keeping in a separate repo.
>
>       7c. Separate repo, will be a subproject of Hadoop, using the same
> HADOOP jira project, with different versioning prefixed with "thirdparty-"
> (ex: thirdparty-1.0.0).
>       7d. Separate will have same release process as Hadoop.
>
>
>     HADOOP-13363 (https://issues.apache.org/jira/browse/HADOOP-13363) is
> an
> umbrella jira tracking the changes to protobuf upgrade.
>
>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has been
> raised
> for separate repo creation in (HADOOP-16595 (
> https://issues.apache.org/jira/browse/HADOOP-16595)
>
>     Please provide your inputs for the proposal and review the PR to
> proceed with the proposal.
>
>
>    -Thanks,
>     Vinay
>
> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> vinodkv@apache.org>
> wrote:
>
> > Moving the thread to the dev lists.
> >
> > Thanks
> > +Vinod
> >
> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <vi...@apache.org>
> > wrote:
> > >
> > > Thanks Marton,
> > >
> > > Current created 'hadoop-thirdparty' repo is empty right now.
> > > Whether to use that repo  for shaded artifact or not will be monitored
> in
> > > HADOOP-13363 umbrella jira. Please feel free to join the discussion.
> > >
> > > There is no existing codebase is being moved out of hadoop repo. So I
> > think
> > > right now we are good to go.
> > >
> > > -Vinay
> > >
> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org> wrote:
> > >
> > >>
> > >> I am not sure if it's defined when is a vote required.
> > >>
> > >> https://www.apache.org/foundation/voting.html
> > >>
> > >> Personally I think it's a big enough change to send a notification to
> > the
> > >> dev lists with a 'lazy consensus'  closure
> > >>
> > >> Marton
> > >>
> > >> On 2019/09/23 17:46:37, Vinayakumar B <vi...@apache.org>
> wrote:
> > >>> Hi,
> > >>>
> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be more in
> > >> future)
> > >>> will be kept as a shaded artifact in a separate repo, which will be
> > >>> referred as dependency in hadoop modules.  This approach avoids
> shading
> > >> of
> > >>> every submodule during build.
> > >>>
> > >>> So question is does any VOTE required before asking to create a git
> > repo?
> > >>>
> > >>> On selfserve platform https://gitbox.apache.org/setup/newrepo.html
> > >>> I can access see that, requester should be PMC.
> > >>>
> > >>> Wanted to confirm here first.
> > >>>
> > >>> -Vinay
> > >>>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
> > >> For additional commands, e-mail: private-help@hadoop.apache.org
> > >>
> > >>
> >
> >
>

Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

Posted by Owen O'Malley <ow...@gmail.com>.
I'm very unhappy with this direction. In particular, I don't think git is a
good place for distribution of binary artifacts. Furthermore, the PMC
shouldn't be releasing anything without a release vote.

I'd propose that we make a third party module that contains the *source* of
the pom files to build the relocated jars. This should absolutely be
treated as a last resort for the mostly Google projects that regularly
break binary compatibility (eg. Protobuf & Guava).

In terms of naming, I'd propose something like:

org.apache.hadoop.thirdparty.protobuf2_5
org.apache.hadoop.thirdparty.guava28

In particular, I think we absolutely need to include the version of the
underlying project. On the other hand, since we should not be shading
*everything* we can drop the leading com.google.

The Hadoop project can make releases of  the thirdparty module:

<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
  <version>1.0</version>
</dependency>

Note that the version has to be the hadoop thirdparty release number, which
is part of why you need to have the underlying version in the artifact
name. These we can push to maven central as new releases from Hadoop.

Thoughts?

.. Owen

On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <vi...@apache.org>
wrote:

> Hi All,
>
>    I wanted to discuss about the separate repo for thirdparty dependencies
> which we need to shaded and include in Hadoop component's jars.
>
>    Apologies for the big text ahead, but this needs clear explanation!!
>
>    Right now most needed such dependency is protobuf. Protobuf dependency
> was not upgraded from 2.5.0 onwards with the fear that downstream builds,
> which depends on transitive dependency protobuf coming from hadoop's jars,
> may fail with the upgrade. Apparently protobuf does not guarantee source
> compatibility, though it guarantees wire compatibility between versions.
> Because of this behavior, version upgrade may cause breakage in known and
> unknown (private?) downstreams.
>
>    So to tackle this, we came up the following proposal in HADOOP-13363.
>
>    Luckily, As far as I know, no APIs, either public to user or between
> Hadoop processes, is not directly using protobuf classes in signatures. (If
> any exist, please let us know).
>
>    Proposal:
>    ------------
>
>    1. Create a artifact(s) which contains shaded dependencies. All such
> shading/relocation will be with known prefix
> **org.apache.hadoop.thirdparty.**.
>    2. Right now protobuf jar (ex: o.a.h.thirdparty:hadoop-shaded-protobuf)
> to start with, all **com.google.protobuf** classes will be relocated as
> **org.apache.hadoop.thirdparty.com.google.protobuf**.
>    3. Hadoop modules, which needs protobuf as dependency, will add this
> shaded artifact as dependency (ex:
> o.a.h.thirdparty:hadoop-shaded-protobuf).
>    4. All previous usages of "com.google.protobuf" will be relocated to
> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and will be
> committed. Please note, this replacement is One-Time directly in source
> code, NOT during compile and package.
>    5. Once all usages of "com.google.protobuf" is relocated, then hadoop
> dont care about which version of original  "protobuf-java" is in
> dependency.
>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to break the
> downstreams. But hadoop will be originally using the latest protobuf
> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
>
>    7. Coming back to separate repo, Following are most appropriate reasons
> of keeping shaded dependency artifact in separate repo instead of
> submodule.
>
>       7a. These artifacts need not be built all the time. It needs to be
> built only when there is a change in the dependency version or the build
> process.
>       7b. If added as "submodule in Hadoop repo", maven-shade-plugin:shade
> will execute only in package phase. That means, "mvn compile" or "mvn
> test-compile" will not be failed as this artifact will not have relocated
> classes, instead it will have original classes, resulting in compilation
> failure. Workaround, build thirdparty submodule first and exclude
> "thirdparty" submodule in other executions. This will be a complex process
> compared to keeping in a separate repo.
>
>       7c. Separate repo, will be a subproject of Hadoop, using the same
> HADOOP jira project, with different versioning prefixed with "thirdparty-"
> (ex: thirdparty-1.0.0).
>       7d. Separate will have same release process as Hadoop.
>
>
>     HADOOP-13363 (https://issues.apache.org/jira/browse/HADOOP-13363) is
> an
> umbrella jira tracking the changes to protobuf upgrade.
>
>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has been
> raised
> for separate repo creation in (HADOOP-16595 (
> https://issues.apache.org/jira/browse/HADOOP-16595)
>
>     Please provide your inputs for the proposal and review the PR to
> proceed with the proposal.
>
>
>    -Thanks,
>     Vinay
>
> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
> vinodkv@apache.org>
> wrote:
>
> > Moving the thread to the dev lists.
> >
> > Thanks
> > +Vinod
> >
> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <vi...@apache.org>
> > wrote:
> > >
> > > Thanks Marton,
> > >
> > > Current created 'hadoop-thirdparty' repo is empty right now.
> > > Whether to use that repo  for shaded artifact or not will be monitored
> in
> > > HADOOP-13363 umbrella jira. Please feel free to join the discussion.
> > >
> > > There is no existing codebase is being moved out of hadoop repo. So I
> > think
> > > right now we are good to go.
> > >
> > > -Vinay
> > >
> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <el...@apache.org> wrote:
> > >
> > >>
> > >> I am not sure if it's defined when is a vote required.
> > >>
> > >> https://www.apache.org/foundation/voting.html
> > >>
> > >> Personally I think it's a big enough change to send a notification to
> > the
> > >> dev lists with a 'lazy consensus'  closure
> > >>
> > >> Marton
> > >>
> > >> On 2019/09/23 17:46:37, Vinayakumar B <vi...@apache.org>
> wrote:
> > >>> Hi,
> > >>>
> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be more in
> > >> future)
> > >>> will be kept as a shaded artifact in a separate repo, which will be
> > >>> referred as dependency in hadoop modules.  This approach avoids
> shading
> > >> of
> > >>> every submodule during build.
> > >>>
> > >>> So question is does any VOTE required before asking to create a git
> > repo?
> > >>>
> > >>> On selfserve platform https://gitbox.apache.org/setup/newrepo.html
> > >>> I can access see that, requester should be PMC.
> > >>>
> > >>> Wanted to confirm here first.
> > >>>
> > >>> -Vinay
> > >>>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
> > >> For additional commands, e-mail: private-help@hadoop.apache.org
> > >>
> > >>
> >
> >
>