You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@netbeans.apache.org by Mark Struberg <st...@yahoo.de.INVALID> on 2016/10/06 21:16:13 UTC

source grant and repo transfer

Hi!

I’ve migrated the NetBeans hg repo into GIT. Sadly this repo takes about 3.6 GiB and thus we cannot host it on github or Bitbucket (both have a 2GB limit).
I am currently hosting the repo on a small private server.
If anyone is interested then send me a private mail with your public key and I’ll give you access.
Jaroslav, Geertjan and a few others already have a clone. 

There are basically 3 ways how we can handle this

1.) import a tarball into a fresh git repo. We would loose the history but we only have sources which are explicitly cleared by Oracle.

2.) import the full hg history. That is pretty thick which means it’s not that easy to clone. github pull requests also wont work as we exceed the 2GB limit…
In addition the hg repo currently also contains lots of GPL libraries like e.g. hibernate jar, etc. That’s something we don’t host at the ASF.

3.) Take the git import from hg and filter it. Remove all (most) jars, temporary build results etc. We might also get rid of a few old branches etc. If we keep the original hg repo around in read only mode then we should be able to loose tons of weight.

I personally prefer option 3.
But that is also the most labor intensive.


LieGrue,
strub

Re: source grant and repo transfer

Posted by Wade Chandler <co...@wadechandler.com>.
Per the various limitations, it may be if we start to look at the various
code bases, and to be able to take advantage of modularization and GitHub,
that various modules subfolders or collections of them, php.* groovy.* as
examples, get moved into their own repositories and the main NetBeans build
uses submodules to link in everything which is the "standard" build. That
can also make working on specific subsets easier for some perhaps, but
would need thought out. Just putting out another possibility.

Wade

On Oct 6, 2016 6:12 PM, "Geertjan Wielenga" <
geertjan.wielenga@googlemail.com> wrote:

> Yes, though let's do that on a Wiki page, once we have it set up.
>
> Gj
>
> On Thu, Oct 6, 2016 at 11:47 PM, Mark Struberg <st...@yahoo.de.invalid>
> wrote:
>
> > Indeed it’s way too early. And indeed we need to discuss a lot of things
> > when it comes to it. But we need to have a plan. And for that we need to
> > analyse the existing codebase.
> >
> > We now know how to import from hg to git and we know that this works
> fine.
> > We also know that the hg repo contains lots of stuff which we need to
> > handle different than the core NetBeans parts.
> >
> > So we could e.g. start with identifying which parts are ‚core‘ and which
> > parts are modules which contain GPL and might need to get split from the
> > core NetBeans repo.
> > We also could start to script the git-filter-branch handling. That should
> > be doable in a repeatable manner.
> > Most of the questions will in the end come back to you old NetBeans folks
> > anyway ;)
> >
> > LieGrue,
> > strub
> >
> > > Am 06.10.2016 um 23:26 schrieb Geertjan Wielenga <
> > geertjan.wielenga@googlemail.com>:
> > >
> > > Thu, Oct 6, 2016 at 11:16 PM, Mark Struberg:
> > >
> > >> I’ve migrated the NetBeans hg repo into GIT.
> > >
> > >
> > >
> > > It is appreciated a lot. But we definitely need to wait a bit before
> > doing
> > > this. The JDK 9 branch needs to be merged into the main branch, etc,
> > i.e.,
> > > we really need to do quite some work on the NetBeans side before
> anything
> > > is ready for this kind of migration at this point. It is too early for
> > this
> > > and we need to discuss this in a lot of detail first. Timing is
> > everything.
> > >
> > > Gj
> > >
> > >
> > > On Thu, Oct 6, 2016 at 11:19 PM, Michael Nascimento <misterm@gmail.com
> >
> > > wrote:
> > >
> > >> 3
> > >>
> > >> Regards,
> > >> Michael
> > >>
> > >> On Thu, Oct 6, 2016 at 6:16 PM, Mark Struberg
> <struberg@yahoo.de.invalid
> > >
> > >> wrote:
> > >>
> > >>> Hi!
> > >>>
> > >>> I’ve migrated the NetBeans hg repo into GIT. Sadly this repo takes
> > about
> > >>> 3.6 GiB and thus we cannot host it on github or Bitbucket (both have
> a
> > >> 2GB
> > >>> limit).
> > >>> I am currently hosting the repo on a small private server.
> > >>> If anyone is interested then send me a private mail with your public
> > key
> > >>> and I’ll give you access.
> > >>> Jaroslav, Geertjan and a few others already have a clone.
> > >>>
> > >>> There are basically 3 ways how we can handle this
> > >>>
> > >>> 1.) import a tarball into a fresh git repo. We would loose the
> history
> > >> but
> > >>> we only have sources which are explicitly cleared by Oracle.
> > >>>
> > >>> 2.) import the full hg history. That is pretty thick which means it’s
> > not
> > >>> that easy to clone. github pull requests also wont work as we exceed
> > the
> > >>> 2GB limit…
> > >>> In addition the hg repo currently also contains lots of GPL libraries
> > >> like
> > >>> e.g. hibernate jar, etc. That’s something we don’t host at the ASF.
> > >>>
> > >>> 3.) Take the git import from hg and filter it. Remove all (most)
> jars,
> > >>> temporary build results etc. We might also get rid of a few old
> > branches
> > >>> etc. If we keep the original hg repo around in read only mode then we
> > >>> should be able to loose tons of weight.
> > >>>
> > >>> I personally prefer option 3.
> > >>> But that is also the most labor intensive.
> > >>>
> > >>>
> > >>> LieGrue,
> > >>> strub
> > >>
> >
> >
>

Re: source grant and repo transfer

Posted by Geertjan Wielenga <ge...@googlemail.com>.
Yes, though let's do that on a Wiki page, once we have it set up.

Gj

On Thu, Oct 6, 2016 at 11:47 PM, Mark Struberg <st...@yahoo.de.invalid>
wrote:

> Indeed it’s way too early. And indeed we need to discuss a lot of things
> when it comes to it. But we need to have a plan. And for that we need to
> analyse the existing codebase.
>
> We now know how to import from hg to git and we know that this works fine.
> We also know that the hg repo contains lots of stuff which we need to
> handle different than the core NetBeans parts.
>
> So we could e.g. start with identifying which parts are ‚core‘ and which
> parts are modules which contain GPL and might need to get split from the
> core NetBeans repo.
> We also could start to script the git-filter-branch handling. That should
> be doable in a repeatable manner.
> Most of the questions will in the end come back to you old NetBeans folks
> anyway ;)
>
> LieGrue,
> strub
>
> > Am 06.10.2016 um 23:26 schrieb Geertjan Wielenga <
> geertjan.wielenga@googlemail.com>:
> >
> > Thu, Oct 6, 2016 at 11:16 PM, Mark Struberg:
> >
> >> I’ve migrated the NetBeans hg repo into GIT.
> >
> >
> >
> > It is appreciated a lot. But we definitely need to wait a bit before
> doing
> > this. The JDK 9 branch needs to be merged into the main branch, etc,
> i.e.,
> > we really need to do quite some work on the NetBeans side before anything
> > is ready for this kind of migration at this point. It is too early for
> this
> > and we need to discuss this in a lot of detail first. Timing is
> everything.
> >
> > Gj
> >
> >
> > On Thu, Oct 6, 2016 at 11:19 PM, Michael Nascimento <mi...@gmail.com>
> > wrote:
> >
> >> 3
> >>
> >> Regards,
> >> Michael
> >>
> >> On Thu, Oct 6, 2016 at 6:16 PM, Mark Struberg <struberg@yahoo.de.invalid
> >
> >> wrote:
> >>
> >>> Hi!
> >>>
> >>> I’ve migrated the NetBeans hg repo into GIT. Sadly this repo takes
> about
> >>> 3.6 GiB and thus we cannot host it on github or Bitbucket (both have a
> >> 2GB
> >>> limit).
> >>> I am currently hosting the repo on a small private server.
> >>> If anyone is interested then send me a private mail with your public
> key
> >>> and I’ll give you access.
> >>> Jaroslav, Geertjan and a few others already have a clone.
> >>>
> >>> There are basically 3 ways how we can handle this
> >>>
> >>> 1.) import a tarball into a fresh git repo. We would loose the history
> >> but
> >>> we only have sources which are explicitly cleared by Oracle.
> >>>
> >>> 2.) import the full hg history. That is pretty thick which means it’s
> not
> >>> that easy to clone. github pull requests also wont work as we exceed
> the
> >>> 2GB limit…
> >>> In addition the hg repo currently also contains lots of GPL libraries
> >> like
> >>> e.g. hibernate jar, etc. That’s something we don’t host at the ASF.
> >>>
> >>> 3.) Take the git import from hg and filter it. Remove all (most) jars,
> >>> temporary build results etc. We might also get rid of a few old
> branches
> >>> etc. If we keep the original hg repo around in read only mode then we
> >>> should be able to loose tons of weight.
> >>>
> >>> I personally prefer option 3.
> >>> But that is also the most labor intensive.
> >>>
> >>>
> >>> LieGrue,
> >>> strub
> >>
>
>

Re: source grant and repo transfer

Posted by Mark Struberg <st...@yahoo.de.INVALID>.
Indeed it’s way too early. And indeed we need to discuss a lot of things when it comes to it. But we need to have a plan. And for that we need to analyse the existing codebase.

We now know how to import from hg to git and we know that this works fine. 
We also know that the hg repo contains lots of stuff which we need to handle different than the core NetBeans parts.

So we could e.g. start with identifying which parts are ‚core‘ and which parts are modules which contain GPL and might need to get split from the core NetBeans repo.
We also could start to script the git-filter-branch handling. That should be doable in a repeatable manner. 
Most of the questions will in the end come back to you old NetBeans folks anyway ;)

LieGrue,
strub

> Am 06.10.2016 um 23:26 schrieb Geertjan Wielenga <ge...@googlemail.com>:
> 
> Thu, Oct 6, 2016 at 11:16 PM, Mark Struberg:
> 
>> I’ve migrated the NetBeans hg repo into GIT.
> 
> 
> 
> It is appreciated a lot. But we definitely need to wait a bit before doing
> this. The JDK 9 branch needs to be merged into the main branch, etc, i.e.,
> we really need to do quite some work on the NetBeans side before anything
> is ready for this kind of migration at this point. It is too early for this
> and we need to discuss this in a lot of detail first. Timing is everything.
> 
> Gj
> 
> 
> On Thu, Oct 6, 2016 at 11:19 PM, Michael Nascimento <mi...@gmail.com>
> wrote:
> 
>> 3
>> 
>> Regards,
>> Michael
>> 
>> On Thu, Oct 6, 2016 at 6:16 PM, Mark Struberg <st...@yahoo.de.invalid>
>> wrote:
>> 
>>> Hi!
>>> 
>>> I’ve migrated the NetBeans hg repo into GIT. Sadly this repo takes about
>>> 3.6 GiB and thus we cannot host it on github or Bitbucket (both have a
>> 2GB
>>> limit).
>>> I am currently hosting the repo on a small private server.
>>> If anyone is interested then send me a private mail with your public key
>>> and I’ll give you access.
>>> Jaroslav, Geertjan and a few others already have a clone.
>>> 
>>> There are basically 3 ways how we can handle this
>>> 
>>> 1.) import a tarball into a fresh git repo. We would loose the history
>> but
>>> we only have sources which are explicitly cleared by Oracle.
>>> 
>>> 2.) import the full hg history. That is pretty thick which means it’s not
>>> that easy to clone. github pull requests also wont work as we exceed the
>>> 2GB limit…
>>> In addition the hg repo currently also contains lots of GPL libraries
>> like
>>> e.g. hibernate jar, etc. That’s something we don’t host at the ASF.
>>> 
>>> 3.) Take the git import from hg and filter it. Remove all (most) jars,
>>> temporary build results etc. We might also get rid of a few old branches
>>> etc. If we keep the original hg repo around in read only mode then we
>>> should be able to loose tons of weight.
>>> 
>>> I personally prefer option 3.
>>> But that is also the most labor intensive.
>>> 
>>> 
>>> LieGrue,
>>> strub
>> 


Re: source grant and repo transfer

Posted by Geertjan Wielenga <ge...@googlemail.com>.
Thu, Oct 6, 2016 at 11:16 PM, Mark Struberg:

> I’ve migrated the NetBeans hg repo into GIT.



It is appreciated a lot. But we definitely need to wait a bit before doing
this. The JDK 9 branch needs to be merged into the main branch, etc, i.e.,
we really need to do quite some work on the NetBeans side before anything
is ready for this kind of migration at this point. It is too early for this
and we need to discuss this in a lot of detail first. Timing is everything.

Gj


On Thu, Oct 6, 2016 at 11:19 PM, Michael Nascimento <mi...@gmail.com>
wrote:

> 3
>
> Regards,
> Michael
>
> On Thu, Oct 6, 2016 at 6:16 PM, Mark Struberg <st...@yahoo.de.invalid>
> wrote:
>
> > Hi!
> >
> > I’ve migrated the NetBeans hg repo into GIT. Sadly this repo takes about
> > 3.6 GiB and thus we cannot host it on github or Bitbucket (both have a
> 2GB
> > limit).
> > I am currently hosting the repo on a small private server.
> > If anyone is interested then send me a private mail with your public key
> > and I’ll give you access.
> > Jaroslav, Geertjan and a few others already have a clone.
> >
> > There are basically 3 ways how we can handle this
> >
> > 1.) import a tarball into a fresh git repo. We would loose the history
> but
> > we only have sources which are explicitly cleared by Oracle.
> >
> > 2.) import the full hg history. That is pretty thick which means it’s not
> > that easy to clone. github pull requests also wont work as we exceed the
> > 2GB limit…
> > In addition the hg repo currently also contains lots of GPL libraries
> like
> > e.g. hibernate jar, etc. That’s something we don’t host at the ASF.
> >
> > 3.) Take the git import from hg and filter it. Remove all (most) jars,
> > temporary build results etc. We might also get rid of a few old branches
> > etc. If we keep the original hg repo around in read only mode then we
> > should be able to loose tons of weight.
> >
> > I personally prefer option 3.
> > But that is also the most labor intensive.
> >
> >
> > LieGrue,
> > strub
>

Re: source grant and repo transfer

Posted by Michael Nascimento <mi...@gmail.com>.
3

Regards,
Michael

On Thu, Oct 6, 2016 at 6:16 PM, Mark Struberg <st...@yahoo.de.invalid>
wrote:

> Hi!
>
> I’ve migrated the NetBeans hg repo into GIT. Sadly this repo takes about
> 3.6 GiB and thus we cannot host it on github or Bitbucket (both have a 2GB
> limit).
> I am currently hosting the repo on a small private server.
> If anyone is interested then send me a private mail with your public key
> and I’ll give you access.
> Jaroslav, Geertjan and a few others already have a clone.
>
> There are basically 3 ways how we can handle this
>
> 1.) import a tarball into a fresh git repo. We would loose the history but
> we only have sources which are explicitly cleared by Oracle.
>
> 2.) import the full hg history. That is pretty thick which means it’s not
> that easy to clone. github pull requests also wont work as we exceed the
> 2GB limit…
> In addition the hg repo currently also contains lots of GPL libraries like
> e.g. hibernate jar, etc. That’s something we don’t host at the ASF.
>
> 3.) Take the git import from hg and filter it. Remove all (most) jars,
> temporary build results etc. We might also get rid of a few old branches
> etc. If we keep the original hg repo around in read only mode then we
> should be able to loose tons of weight.
>
> I personally prefer option 3.
> But that is also the most labor intensive.
>
>
> LieGrue,
> strub

Re: source grant and repo transfer

Posted by Emilian Bold <em...@gmail.com>.
> a.) the repo contains binaries which are GPL licensed. That needs to get
kicked out of the repo anyway.

Could you give an example for this? Like, a revision I can look at?

> b.) the repo size is about 3.6 GiB. That’s really huge. Devs would not
even be able to git-clone this over to their own github repos as those are
limited to 2GB.

Like I've said, I see the linux kernel will grow over 2GB soon. I wonder
what will github do then.

I don't see 4GB as something huge nowadays. What is that, 2 hours of
YouTube at 720p? And any individual will do that only once. Further updates
will be incremental.

> So how should we get pull requests in that case?

How do kernel contributors do pull requests? A patch attached to an issue
or email would work just fine.



--emi

On Fri, Oct 7, 2016 at 1:42 PM, Mark Struberg <st...@yahoo.de.invalid>
wrote:

> Hi Emilian!
>
> The problem with 2 is that it won’t work nicely.
>
> There are 2 problems as sketched.
>
> a.) the repo contains binaries which are GPL licensed. That needs to get
> kicked out of the repo anyway.
>
> > What is important is the legal clearance at
> > the moment the code grant happens.
> Yes, but Oracle can only grant stuff under ALv2 where they own the rights
> themselves. They simply don’t own any rights for a hibernate.jar…
>
> Also the pure fact that it contains binaries at all is not really good.
> It’s called source code management for a reason.
> Not sure if the ant build already uses ivy. If not then we need to improve
> this.
> It also contains temporary build artifacts (well, unfortunately such
> things happen…)
>
>
> b.) the repo size is about 3.6 GiB. That’s really huge. Devs would not
> even be able to git-clone this over to their own github repos as those are
> limited to 2GB.
> So how should we get pull requests in that case?
>
> I agree with you that we should preserve the history though.
> Thus the idea with moving over the original hg repo to some other place
> and switch it into read-only mode.
> And have the new GIT repo stripped down to the core parts (of course with
> their history).
> git-filter-branch is your friend.
>
> LieGrue,
> strub
>
>
> > Am 07.10.2016 um 11:37 schrieb Emilian Bold <em...@gmail.com>:
> >
> > I vote for 2!
> >
> > I see no reason we should get rid of the history.
> >
> > The way I have read before, ASF does not need to have a legal clearance
> for
> > every historical code revision. What is important is the legal clearance
> at
> > the moment the code grant happens.
> >
> > I don't believe the GitHub 2GB limit is any indicator of anything except
> > their capacity and business decision. The Linux kernel is close to 2GB,
> > OpenOffice is 1.5GB, Hadoop is 400MB, Lucene-Solr is 200MB, JMeter is
> > 200MB, etc.
> >
> > NetBeans is project with over a decade of history with hundreds of
> people.
> > The first commit is see is from 1999.
> >
> > Of course that such a large and old project will have a large repository!
> >
> > And as time passes each repository will only grow. I just read a
> > StackOverflow answer on how to determine the GitHub repository size and
> > their example for git/git mentioned it was 40MB -- it's, I believe, 200MB
> > now.
> >
> > I also don't think 3) will result in much economy. I doubt there are many
> > JARs or temporary build results.
> >
> > If the current repository turns out too much for the Apache Infra we
> could
> > decide in time how to improve that, but as an Incubation goal I believe
> > just switching to git should be enough.
> >
> >
> >
> > --emi
> >
> > On Fri, Oct 7, 2016 at 12:16 AM, Mark Struberg <struberg@yahoo.de.invalid
> >
> > wrote:
> >
> >> Hi!
> >>
> >> I’ve migrated the NetBeans hg repo into GIT. Sadly this repo takes about
> >> 3.6 GiB and thus we cannot host it on github or Bitbucket (both have a
> 2GB
> >> limit).
> >> I am currently hosting the repo on a small private server.
> >> If anyone is interested then send me a private mail with your public key
> >> and I’ll give you access.
> >> Jaroslav, Geertjan and a few others already have a clone.
> >>
> >> There are basically 3 ways how we can handle this
> >>
> >> 1.) import a tarball into a fresh git repo. We would loose the history
> but
> >> we only have sources which are explicitly cleared by Oracle.
> >>
> >> 2.) import the full hg history. That is pretty thick which means it’s
> not
> >> that easy to clone. github pull requests also wont work as we exceed the
> >> 2GB limit…
> >> In addition the hg repo currently also contains lots of GPL libraries
> like
> >> e.g. hibernate jar, etc. That’s something we don’t host at the ASF.
> >>
> >> 3.) Take the git import from hg and filter it. Remove all (most) jars,
> >> temporary build results etc. We might also get rid of a few old branches
> >> etc. If we keep the original hg repo around in read only mode then we
> >> should be able to loose tons of weight.
> >>
> >> I personally prefer option 3.
> >> But that is also the most labor intensive.
> >>
> >>
> >> LieGrue,
> >> strub
>
>

Re: source grant and repo transfer

Posted by Bertrand Delacretaz <bd...@apache.org>.
On Fri, Oct 7, 2016 at 12:42 PM, Mark Struberg
<st...@yahoo.de.invalid> wrote:
> ...I agree with you that we should preserve the history though....

Maybe create two repositories (or sets of): a historical one which
stays read-only and has all the history, and a current one starting
fresh from the current one, or maybe the last 2-3 years of changes.

-Bertrand

Re: source grant and repo transfer

Posted by John McDonnell <mc...@gmail.com>.
I like Wade’s idea of splitting up the repository into several different repositories into logical modules.

This way you can easily track which modules have been okay’d for licenses, etc a lot more easily then you can with 1 big code base.  

Also does anyone need the entire codebase checked out?  Maybe really only the CI build machine…  

Regards

John



> On 7 Oct 2016, at 11:42, Mark Struberg <st...@yahoo.de.INVALID> wrote:
> 
> Hi Emilian!
> 
> The problem with 2 is that it won’t work nicely.
> 
> There are 2 problems as sketched.
> 
> a.) the repo contains binaries which are GPL licensed. That needs to get kicked out of the repo anyway.
> 
>> What is important is the legal clearance at
>> the moment the code grant happens.
> Yes, but Oracle can only grant stuff under ALv2 where they own the rights themselves. They simply don’t own any rights for a hibernate.jar…
> 
> Also the pure fact that it contains binaries at all is not really good. It’s called source code management for a reason.
> Not sure if the ant build already uses ivy. If not then we need to improve this.
> It also contains temporary build artifacts (well, unfortunately such things happen…)
> 
> 
> b.) the repo size is about 3.6 GiB. That’s really huge. Devs would not even be able to git-clone this over to their own github repos as those are limited to 2GB.
> So how should we get pull requests in that case?
> 
> I agree with you that we should preserve the history though. 
> Thus the idea with moving over the original hg repo to some other place and switch it into read-only mode.
> And have the new GIT repo stripped down to the core parts (of course with their history). 
> git-filter-branch is your friend.
> 
> LieGrue,
> strub
> 
> 
>> Am 07.10.2016 um 11:37 schrieb Emilian Bold <em...@gmail.com>:
>> 
>> I vote for 2!
>> 
>> I see no reason we should get rid of the history.
>> 
>> The way I have read before, ASF does not need to have a legal clearance for
>> every historical code revision. What is important is the legal clearance at
>> the moment the code grant happens.
>> 
>> I don't believe the GitHub 2GB limit is any indicator of anything except
>> their capacity and business decision. The Linux kernel is close to 2GB,
>> OpenOffice is 1.5GB, Hadoop is 400MB, Lucene-Solr is 200MB, JMeter is
>> 200MB, etc.
>> 
>> NetBeans is project with over a decade of history with hundreds of people.
>> The first commit is see is from 1999.
>> 
>> Of course that such a large and old project will have a large repository!
>> 
>> And as time passes each repository will only grow. I just read a
>> StackOverflow answer on how to determine the GitHub repository size and
>> their example for git/git mentioned it was 40MB -- it's, I believe, 200MB
>> now.
>> 
>> I also don't think 3) will result in much economy. I doubt there are many
>> JARs or temporary build results.
>> 
>> If the current repository turns out too much for the Apache Infra we could
>> decide in time how to improve that, but as an Incubation goal I believe
>> just switching to git should be enough.
>> 
>> 
>> 
>> --emi
>> 
>> On Fri, Oct 7, 2016 at 12:16 AM, Mark Struberg <st...@yahoo.de.invalid>
>> wrote:
>> 
>>> Hi!
>>> 
>>> I’ve migrated the NetBeans hg repo into GIT. Sadly this repo takes about
>>> 3.6 GiB and thus we cannot host it on github or Bitbucket (both have a 2GB
>>> limit).
>>> I am currently hosting the repo on a small private server.
>>> If anyone is interested then send me a private mail with your public key
>>> and I’ll give you access.
>>> Jaroslav, Geertjan and a few others already have a clone.
>>> 
>>> There are basically 3 ways how we can handle this
>>> 
>>> 1.) import a tarball into a fresh git repo. We would loose the history but
>>> we only have sources which are explicitly cleared by Oracle.
>>> 
>>> 2.) import the full hg history. That is pretty thick which means it’s not
>>> that easy to clone. github pull requests also wont work as we exceed the
>>> 2GB limit…
>>> In addition the hg repo currently also contains lots of GPL libraries like
>>> e.g. hibernate jar, etc. That’s something we don’t host at the ASF.
>>> 
>>> 3.) Take the git import from hg and filter it. Remove all (most) jars,
>>> temporary build results etc. We might also get rid of a few old branches
>>> etc. If we keep the original hg repo around in read only mode then we
>>> should be able to loose tons of weight.
>>> 
>>> I personally prefer option 3.
>>> But that is also the most labor intensive.
>>> 
>>> 
>>> LieGrue,
>>> strub
> 


Re: source grant and repo transfer

Posted by Wade Chandler <co...@wadechandler.com>.
> On Oct 7, 2016, at 11:23, Wade Chandler <co...@wadechandler.com> wrote:
> 
> 
>> On Oct 7, 2016, at 10:23, Jan Lahoda <lahoda@gmail.com <ma...@gmail.com>> wrote:
>> 
>> I guess the question is what do you consider a core part. I think it would
>> be OK to not keep history in the "main" repo for modules that were placed
>> into separate repositories, like:
>> http://hg.netbeans.org/community-visualweb/ <http://hg.netbeans.org/community-visualweb/>
>> (as far as I can tell, the history is kept in the split up repositories.)
>> But is e.g. the Java support a core part? C/C++ support? PHP support?
>> 
> 
> The core IDE and even the main build can be split from the other specific feature support, regardless of what is said to be in the core, and even then, some repos may not “stand alone” for the sub-components/modules IMO with regard to the the main build as a sibling or parent; this gets into git sub-modules as a way to break up the size as well as what one has to checkout to work on a specific piece of functionality. So, the basic pieces to just get the platform up and running, and the build going, is 1 big sub-set to me. Then, Java specific support could be another. It could split on SE and EE though. Next, PHP, that seems independent. HTML/Web/etc…another. Groovy stands out as another. I’m sure that can keep going. Then, if the repos were essentially:
> 
> netbeans-core (however it is decided)
> netbeans-java
> netbeans-javaee
> netbeans-groovy
> netbeans-php
> netbeans-c
> netbeans-nodejs (or what ever the name is)
> …etc
> 
> Then that seems manageable to me. There could then be netbeans-main which, with a little restructuring, has all those which make up the NetBeans release as an entire structure with git sub-modules. It could be it breaks different. It could be like this
> netbeans-main (has what we would call core in it…along with current build.xml and nbbuild…this would allow the current build system to keep working without any or much rework I think)
> netbeans-java
> netbeans-javaee
> … etc etc
> 
> and netbeans-main has git sub-modules for all of what is the “NetBeans release”. This would take some work undoubtedly. Too, the history can be interesting in these cases.

Obviously this should be examined more, but a list of everything from a recent main-golden clone can be inspected with a "ls -d *” command from inside the project directory. Other than core, which one would have to piece mill based on dependencies, and break them apart, a good lot jump out as things which could be better split IMO; cnd.*, much of bugtracking if not all depending on how the exception reporter is handled/redone, the library wrapper plugins such as c.google.guava, clearcase, cloud.*, collab.*, cordova.*, db.*, debugger.*, and many others.

Wade

Re: source grant and repo transfer

Posted by Jan Lahoda <la...@gmail.com>.
On Fri, Oct 7, 2016 at 5:23 PM, Wade Chandler <co...@wadechandler.com>
wrote:

>
> > On Oct 7, 2016, at 10:23, Jan Lahoda <la...@gmail.com> wrote:
> >
>

[snip]


> >> b.) the repo size is about 3.6 GiB. That’s really huge. Devs would not
> >> even be able to git-clone this over to their own github repos as those
> are
> >> limited to 2GB.
> >> So how should we get pull requests in that case?
> >>
> >> I agree with you that we should preserve the history though.
> >> Thus the idea with moving over the original hg repo to some other place
> >> and switch it into read-only mode.
> >> And have the new GIT repo stripped down to the core parts (of course
> with
> >> their history).
> >>
> >
> > I guess the question is what do you consider a core part. I think it
> would
> > be OK to not keep history in the "main" repo for modules that were placed
> > into separate repositories, like:
> > http://hg.netbeans.org/community-visualweb/ <http://hg.netbeans.org/
> community-visualweb/>
> > (as far as I can tell, the history is kept in the split up repositories.)
> > But is e.g. the Java support a core part? C/C++ support? PHP support?
> >
>
> The core IDE and even the main build can be split from the other specific
> feature support, regardless of what is said to be in the core, and even
> then, some repos may not “stand alone” for the sub-components/modules IMO
> with regard to the the main build as a sibling or parent; this gets into
> git sub-modules as a way to break up the size as well as what one has to
> checkout to work on a specific piece of functionality. So, the basic pieces
> to just get the platform up and running, and the build going, is 1 big
> sub-set to me. Then, Java specific support could be another. It could split
> on SE and EE though. Next, PHP, that seems independent.
> HTML/Web/etc…another. Groovy stands out as another. I’m sure that can keep
> going. Then, if the repos were essentially:
>
> netbeans-core (however it is decided)
> netbeans-java
> netbeans-javaee
> netbeans-groovy
> netbeans-php
> netbeans-c
> netbeans-nodejs (or what ever the name is)
> …etc
>
> Then that seems manageable to me. There could then be netbeans-main which,
> with a little restructuring, has all those which make up the NetBeans
> release as an entire structure with git sub-modules. It could be it breaks
> different. It could be like this
> netbeans-main (has what we would call core in it…along with current
> build.xml and nbbuild…this would allow the current build system to keep
> working without any or much rework I think)
> netbeans-java
> netbeans-javaee
> … etc etc
>
> and netbeans-main has git sub-modules for all of what is the “NetBeans
> release”. This would take some work undoubtedly. Too, the history can be
> interesting in these cases.
>

That's principally doable, of course. One obvious (long-discussed) way to
split the repository is based on clusters (although even that might be
tricky). But someone needs to actually do the work, and adjusting the build
system may or may not be simple.

One possibility would be to use Module Suites for the clusters (at least
for clusters other than platform). The things that would need to be solved
in that case are test dependencies: I suspect test-to-test dependencies
among Module Suites may not work (and surely cannot work when compiling
against binary clusters, as binary clusters don't currently have tests).
And a lot of tests depend on tests in openide.util.lookup. (Also, not sure
if qa-functional tests are supported in Module Suites, would need to check.)


> > For me personally, having history is very important - and having it
> inside
> > my IDE (not in some other repository) is also important. For example,
> doing
> > a change to CasualDiff without history could be quite painful:
> > http://hg.netbeans.org/jet-main/annotate/437d7ca35923/
> java.source.base/src/org/netbeans/modules/java/source/save/CasualDiff.java
> <http://hg.netbeans.org/jet-main/annotate/437d7ca35923/
> java.source.base/src/org/netbeans/modules/java/source/save/CasualDiff.java
> >
> >
>
> Given history and regressions I can see that, but how would a change be
> more painful without the history? I’m asking for the specific context which
> we are referring to. For instance, I can see a depth of history being
> maintained as helpful for most use cases, but beyond that, unless some
> obscure regression is hit, then the most dated history doesn’t seem to come
> into play too often beyond a certain depth; other than just historical
> reasons or to see “who” worked on something. The code is what it is at some
> point in history. As an example, given some file which has a decade of
> history, if it has many changes, then it will have a big/deep depth, and
> that depth will nearly necessarily mean much to most of that file has
> changed, or certain parts have changed a lot, and thus been completely
> rewritten, and others not so much. The older history then becomes more of a
> liability if used for purposes other than identifying who did what; hard to
> sort through and have context. Older not relative to date per se but the
> depth and number of changes and iteration a file has gone through.
>

Well, taking the CausalDiff example: you may be debugging a problem where
the IDE generates too many spaces, and you'll find an if statement that is
causing that. By looking into the history, one can find the usecase for
which the statement was added (so one does not break it), and also the
tests that were introduced to test the behavior (so that in the first phase
of fixing, one can only run a sub-set of tests, not all of them).

Some of this may be available by running tests (esp. for things like
CasualDiff, where there is quite a few tests), but that takes time, while
looking at the history is quite fast.

Jan


> Of course, all that said, if the larger git repo was supported in the
> infra, then users cloning the repository from git can use depth, and most
> won’t need all the history. Too, until one gets into the small hundreds of
> GBs, cloning and spanning from a “cloud” perspective isn’t really that big
> of a deal, so I agree this is a github business decision limitation more
> than anything else.
>
> If git did not support depth cloning, I would very much argue that at some
> point too much history has a lot of diminishing return as it becomes
> impractical for a community projects members (all of them) to clone many
> many GB of data. Data plans and connections are not the same world wide,
> and generally that history is more useful to see if a regression has been
> introduced and what exactly changed. The gamble being a certain depth
> “back” will not have the regressions, and the current code is fairly well
> vetted, so for most cases, taking a depth based starting point considering
> such deep history, should be “mostly” safe and manageable.
>
> Thanks,
>
> Wade

Re: source grant and repo transfer

Posted by Wade Chandler <co...@wadechandler.com>.
> On Oct 7, 2016, at 10:23, Jan Lahoda <la...@gmail.com> wrote:
> 
> 
> This may miss some binaries due to file name encoding (and maybe I did
> something wrong in the experiment), but I would not expect gains
> significantly bigger than this.

The current source if one clones main-golden, be sure and perform a clean to remove build things, and exclude .hg, yields a size of ~1GB

From inside the project directory:
du -c -g -I .hg .

Do the same thing to .hg, and you get ~4. This all of course expanded on the drive; mine a Mac OS X file system.

> Also, not sure if exe and dll are
> problematic - I believe currently a build made on Linux can run on Windows,
> and it may be problematic to achieve this without having the exe/dll
> precompiled.
> 

I agree on the precompiled part, but those binaries can technically be packaged and put into a binary repository. As an example, in our local Artifactory, I have the Google Chrome Driver for Selenium, and then add it as a dependency to Gradle builds; works like a charm. So, there are ways to do that for the build which don’t impact the source repository.

> 
>> Not sure if the ant build already uses ivy. If not then we need to improve
>> this.
>> 
> 
> NetBeans (currently) uses this:
> http://wiki.netbeans.org/ExternalBinaries <http://wiki.netbeans.org/ExternalBinaries>
> 
> Also note the historical:
> http://wiki.netbeans.org/HgExternalBinaries <http://wiki.netbeans.org/HgExternalBinaries>
> 
> The latter explains the .jar/.zip files which are not actual binaries.
> 
> Also, I believe the official repos have a push hook in place which prevents
> pushing too big binary files with certain extensions into certain
> directories:
> http://hg.netbeans.org/nb-hooks/file/dfd2d386149f/forbid_external.py <http://hg.netbeans.org/nb-hooks/file/dfd2d386149f/forbid_external.py>
> 
> It also contains temporary build artifacts (well, unfortunately such things
>> happen…)
>> 
>> 
>> b.) the repo size is about 3.6 GiB. That’s really huge. Devs would not
>> even be able to git-clone this over to their own github repos as those are
>> limited to 2GB.
>> So how should we get pull requests in that case?
>> 
>> I agree with you that we should preserve the history though.
>> Thus the idea with moving over the original hg repo to some other place
>> and switch it into read-only mode.
>> And have the new GIT repo stripped down to the core parts (of course with
>> their history).
>> 
> 
> I guess the question is what do you consider a core part. I think it would
> be OK to not keep history in the "main" repo for modules that were placed
> into separate repositories, like:
> http://hg.netbeans.org/community-visualweb/ <http://hg.netbeans.org/community-visualweb/>
> (as far as I can tell, the history is kept in the split up repositories.)
> But is e.g. the Java support a core part? C/C++ support? PHP support?
> 

The core IDE and even the main build can be split from the other specific feature support, regardless of what is said to be in the core, and even then, some repos may not “stand alone” for the sub-components/modules IMO with regard to the the main build as a sibling or parent; this gets into git sub-modules as a way to break up the size as well as what one has to checkout to work on a specific piece of functionality. So, the basic pieces to just get the platform up and running, and the build going, is 1 big sub-set to me. Then, Java specific support could be another. It could split on SE and EE though. Next, PHP, that seems independent. HTML/Web/etc…another. Groovy stands out as another. I’m sure that can keep going. Then, if the repos were essentially:

netbeans-core (however it is decided)
netbeans-java
netbeans-javaee
netbeans-groovy
netbeans-php
netbeans-c
netbeans-nodejs (or what ever the name is)
…etc

Then that seems manageable to me. There could then be netbeans-main which, with a little restructuring, has all those which make up the NetBeans release as an entire structure with git sub-modules. It could be it breaks different. It could be like this
netbeans-main (has what we would call core in it…along with current build.xml and nbbuild…this would allow the current build system to keep working without any or much rework I think)
netbeans-java
netbeans-javaee
… etc etc

and netbeans-main has git sub-modules for all of what is the “NetBeans release”. This would take some work undoubtedly. Too, the history can be interesting in these cases.

> For me personally, having history is very important - and having it inside
> my IDE (not in some other repository) is also important. For example, doing
> a change to CasualDiff without history could be quite painful:
> http://hg.netbeans.org/jet-main/annotate/437d7ca35923/java.source.base/src/org/netbeans/modules/java/source/save/CasualDiff.java <http://hg.netbeans.org/jet-main/annotate/437d7ca35923/java.source.base/src/org/netbeans/modules/java/source/save/CasualDiff.java>
> 

Given history and regressions I can see that, but how would a change be more painful without the history? I’m asking for the specific context which we are referring to. For instance, I can see a depth of history being maintained as helpful for most use cases, but beyond that, unless some obscure regression is hit, then the most dated history doesn’t seem to come into play too often beyond a certain depth; other than just historical reasons or to see “who” worked on something. The code is what it is at some point in history. As an example, given some file which has a decade of history, if it has many changes, then it will have a big/deep depth, and that depth will nearly necessarily mean much to most of that file has changed, or certain parts have changed a lot, and thus been completely rewritten, and others not so much. The older history then becomes more of a liability if used for purposes other than identifying who did what; hard to sort through and have context. Older not relative to date per se but the depth and number of changes and iteration a file has gone through.

Of course, all that said, if the larger git repo was supported in the infra, then users cloning the repository from git can use depth, and most won’t need all the history. Too, until one gets into the small hundreds of GBs, cloning and spanning from a “cloud” perspective isn’t really that big of a deal, so I agree this is a github business decision limitation more than anything else.

If git did not support depth cloning, I would very much argue that at some point too much history has a lot of diminishing return as it becomes impractical for a community projects members (all of them) to clone many many GB of data. Data plans and connections are not the same world wide, and generally that history is more useful to see if a regression has been introduced and what exactly changed. The gamble being a certain depth “back” will not have the regressions, and the current code is fairly well vetted, so for most cases, taking a depth based starting point considering such deep history, should be “mostly” safe and manageable.

Thanks,

Wade

Re: source grant and repo transfer

Posted by Jan Lahoda <la...@gmail.com>.
Hi Mark,

On Fri, Oct 7, 2016 at 12:42 PM, Mark Struberg <st...@yahoo.de.invalid>
wrote:

> Hi Emilian!
>
> The problem with 2 is that it won’t work nicely.
>
> There are 2 problems as sketched.
>
> a.) the repo contains binaries which are GPL licensed. That needs to get
> kicked out of the repo anyway.
>
> > What is important is the legal clearance at
> > the moment the code grant happens.
> Yes, but Oracle can only grant stuff under ALv2 where they own the rights
> themselves. They simply don’t own any rights for a hibernate.jar…
>
> Also the pure fact that it contains binaries at all is not really good.
> It’s called source code management for a reason.
>

Inside one of my NetBeans clones, in .hg/store, I did:
---
$ for extension in zip jar class exe dll o; do echo $extension; find .
-type f -name "*\.$extension*" -print0 | xargs --null du --apparent-size
-sch | grep total; echo; done
zip
38M     total

jar
60M     total

class
190K    total

exe
15M     total

dll
16M     total

o
1,4M    total
1,2M    total
---

This may miss some binaries due to file name encoding (and maybe I did
something wrong in the experiment), but I would not expect gains
significantly bigger than this. Also, please note that not all historical
.jar/.zip files are actual binaries. Also, not sure if exe and dll are
problematic - I believe currently a build made on Linux can run on Windows,
and it may be problematic to achieve this without having the exe/dll
precompiled.


> Not sure if the ant build already uses ivy. If not then we need to improve
> this.
>

NetBeans (currently) uses this:
http://wiki.netbeans.org/ExternalBinaries

Also note the historical:
http://wiki.netbeans.org/HgExternalBinaries

The latter explains the .jar/.zip files which are not actual binaries.

Also, I believe the official repos have a push hook in place which prevents
pushing too big binary files with certain extensions into certain
directories:
http://hg.netbeans.org/nb-hooks/file/dfd2d386149f/forbid_external.py

It also contains temporary build artifacts (well, unfortunately such things
> happen…)
>
>
> b.) the repo size is about 3.6 GiB. That’s really huge. Devs would not
> even be able to git-clone this over to their own github repos as those are
> limited to 2GB.
> So how should we get pull requests in that case?
>
> I agree with you that we should preserve the history though.
> Thus the idea with moving over the original hg repo to some other place
> and switch it into read-only mode.
> And have the new GIT repo stripped down to the core parts (of course with
> their history).
>

I guess the question is what do you consider a core part. I think it would
be OK to not keep history in the "main" repo for modules that were placed
into separate repositories, like:
http://hg.netbeans.org/community-visualweb/
(as far as I can tell, the history is kept in the split up repositories.)
But is e.g. the Java support a core part? C/C++ support? PHP support?

For me personally, having history is very important - and having it inside
my IDE (not in some other repository) is also important. For example, doing
a change to CasualDiff without history could be quite painful:
http://hg.netbeans.org/jet-main/annotate/437d7ca35923/java.source.base/src/org/netbeans/modules/java/source/save/CasualDiff.java

Jan

PS: if someone would want to clone Mark's converted repository, this is a
location of a mirror, that can be used currently:
git clone http://lahoda.info/netbeans-import.git/

(it is not permanent, but should be good for now)

git-filter-branch is your friend.
>
> LieGrue,
> strub
>
>
> > Am 07.10.2016 um 11:37 schrieb Emilian Bold <em...@gmail.com>:
> >
> > I vote for 2!
> >
> > I see no reason we should get rid of the history.
> >
> > The way I have read before, ASF does not need to have a legal clearance
> for
> > every historical code revision. What is important is the legal clearance
> at
> > the moment the code grant happens.
> >
> > I don't believe the GitHub 2GB limit is any indicator of anything except
> > their capacity and business decision. The Linux kernel is close to 2GB,
> > OpenOffice is 1.5GB, Hadoop is 400MB, Lucene-Solr is 200MB, JMeter is
> > 200MB, etc.
> >
> > NetBeans is project with over a decade of history with hundreds of
> people.
> > The first commit is see is from 1999.
> >
> > Of course that such a large and old project will have a large repository!
> >
> > And as time passes each repository will only grow. I just read a
> > StackOverflow answer on how to determine the GitHub repository size and
> > their example for git/git mentioned it was 40MB -- it's, I believe, 200MB
> > now.
> >
> > I also don't think 3) will result in much economy. I doubt there are many
> > JARs or temporary build results.
> >
> > If the current repository turns out too much for the Apache Infra we
> could
> > decide in time how to improve that, but as an Incubation goal I believe
> > just switching to git should be enough.
> >
> >
> >
> > --emi
> >
> > On Fri, Oct 7, 2016 at 12:16 AM, Mark Struberg <struberg@yahoo.de.invalid
> >
> > wrote:
> >
> >> Hi!
> >>
> >> I’ve migrated the NetBeans hg repo into GIT. Sadly this repo takes about
> >> 3.6 GiB and thus we cannot host it on github or Bitbucket (both have a
> 2GB
> >> limit).
> >> I am currently hosting the repo on a small private server.
> >> If anyone is interested then send me a private mail with your public key
> >> and I’ll give you access.
> >> Jaroslav, Geertjan and a few others already have a clone.
> >>
> >> There are basically 3 ways how we can handle this
> >>
> >> 1.) import a tarball into a fresh git repo. We would loose the history
> but
> >> we only have sources which are explicitly cleared by Oracle.
> >>
> >> 2.) import the full hg history. That is pretty thick which means it’s
> not
> >> that easy to clone. github pull requests also wont work as we exceed the
> >> 2GB limit…
> >> In addition the hg repo currently also contains lots of GPL libraries
> like
> >> e.g. hibernate jar, etc. That’s something we don’t host at the ASF.
> >>
> >> 3.) Take the git import from hg and filter it. Remove all (most) jars,
> >> temporary build results etc. We might also get rid of a few old branches
> >> etc. If we keep the original hg repo around in read only mode then we
> >> should be able to loose tons of weight.
> >>
> >> I personally prefer option 3.
> >> But that is also the most labor intensive.
> >>
> >>
> >> LieGrue,
> >> strub
>
>

Re: source grant and repo transfer

Posted by Mark Struberg <st...@yahoo.de.INVALID>.
Hi Emilian!

The problem with 2 is that it won’t work nicely.

There are 2 problems as sketched.

a.) the repo contains binaries which are GPL licensed. That needs to get kicked out of the repo anyway.

> What is important is the legal clearance at
> the moment the code grant happens.
Yes, but Oracle can only grant stuff under ALv2 where they own the rights themselves. They simply don’t own any rights for a hibernate.jar…

Also the pure fact that it contains binaries at all is not really good. It’s called source code management for a reason.
Not sure if the ant build already uses ivy. If not then we need to improve this.
It also contains temporary build artifacts (well, unfortunately such things happen…)


b.) the repo size is about 3.6 GiB. That’s really huge. Devs would not even be able to git-clone this over to their own github repos as those are limited to 2GB.
So how should we get pull requests in that case?

I agree with you that we should preserve the history though. 
Thus the idea with moving over the original hg repo to some other place and switch it into read-only mode.
And have the new GIT repo stripped down to the core parts (of course with their history). 
git-filter-branch is your friend.

LieGrue,
strub


> Am 07.10.2016 um 11:37 schrieb Emilian Bold <em...@gmail.com>:
> 
> I vote for 2!
> 
> I see no reason we should get rid of the history.
> 
> The way I have read before, ASF does not need to have a legal clearance for
> every historical code revision. What is important is the legal clearance at
> the moment the code grant happens.
> 
> I don't believe the GitHub 2GB limit is any indicator of anything except
> their capacity and business decision. The Linux kernel is close to 2GB,
> OpenOffice is 1.5GB, Hadoop is 400MB, Lucene-Solr is 200MB, JMeter is
> 200MB, etc.
> 
> NetBeans is project with over a decade of history with hundreds of people.
> The first commit is see is from 1999.
> 
> Of course that such a large and old project will have a large repository!
> 
> And as time passes each repository will only grow. I just read a
> StackOverflow answer on how to determine the GitHub repository size and
> their example for git/git mentioned it was 40MB -- it's, I believe, 200MB
> now.
> 
> I also don't think 3) will result in much economy. I doubt there are many
> JARs or temporary build results.
> 
> If the current repository turns out too much for the Apache Infra we could
> decide in time how to improve that, but as an Incubation goal I believe
> just switching to git should be enough.
> 
> 
> 
> --emi
> 
> On Fri, Oct 7, 2016 at 12:16 AM, Mark Struberg <st...@yahoo.de.invalid>
> wrote:
> 
>> Hi!
>> 
>> I’ve migrated the NetBeans hg repo into GIT. Sadly this repo takes about
>> 3.6 GiB and thus we cannot host it on github or Bitbucket (both have a 2GB
>> limit).
>> I am currently hosting the repo on a small private server.
>> If anyone is interested then send me a private mail with your public key
>> and I’ll give you access.
>> Jaroslav, Geertjan and a few others already have a clone.
>> 
>> There are basically 3 ways how we can handle this
>> 
>> 1.) import a tarball into a fresh git repo. We would loose the history but
>> we only have sources which are explicitly cleared by Oracle.
>> 
>> 2.) import the full hg history. That is pretty thick which means it’s not
>> that easy to clone. github pull requests also wont work as we exceed the
>> 2GB limit…
>> In addition the hg repo currently also contains lots of GPL libraries like
>> e.g. hibernate jar, etc. That’s something we don’t host at the ASF.
>> 
>> 3.) Take the git import from hg and filter it. Remove all (most) jars,
>> temporary build results etc. We might also get rid of a few old branches
>> etc. If we keep the original hg repo around in read only mode then we
>> should be able to loose tons of weight.
>> 
>> I personally prefer option 3.
>> But that is also the most labor intensive.
>> 
>> 
>> LieGrue,
>> strub


Re: source grant and repo transfer

Posted by Emilian Bold <em...@gmail.com>.
I vote for 2!

I see no reason we should get rid of the history.

The way I have read before, ASF does not need to have a legal clearance for
every historical code revision. What is important is the legal clearance at
the moment the code grant happens.

I don't believe the GitHub 2GB limit is any indicator of anything except
their capacity and business decision. The Linux kernel is close to 2GB,
OpenOffice is 1.5GB, Hadoop is 400MB, Lucene-Solr is 200MB, JMeter is
200MB, etc.

NetBeans is project with over a decade of history with hundreds of people.
The first commit is see is from 1999.

Of course that such a large and old project will have a large repository!

And as time passes each repository will only grow. I just read a
StackOverflow answer on how to determine the GitHub repository size and
their example for git/git mentioned it was 40MB -- it's, I believe, 200MB
now.

I also don't think 3) will result in much economy. I doubt there are many
JARs or temporary build results.

If the current repository turns out too much for the Apache Infra we could
decide in time how to improve that, but as an Incubation goal I believe
just switching to git should be enough.



--emi

On Fri, Oct 7, 2016 at 12:16 AM, Mark Struberg <st...@yahoo.de.invalid>
wrote:

> Hi!
>
> I’ve migrated the NetBeans hg repo into GIT. Sadly this repo takes about
> 3.6 GiB and thus we cannot host it on github or Bitbucket (both have a 2GB
> limit).
> I am currently hosting the repo on a small private server.
> If anyone is interested then send me a private mail with your public key
> and I’ll give you access.
> Jaroslav, Geertjan and a few others already have a clone.
>
> There are basically 3 ways how we can handle this
>
> 1.) import a tarball into a fresh git repo. We would loose the history but
> we only have sources which are explicitly cleared by Oracle.
>
> 2.) import the full hg history. That is pretty thick which means it’s not
> that easy to clone. github pull requests also wont work as we exceed the
> 2GB limit…
> In addition the hg repo currently also contains lots of GPL libraries like
> e.g. hibernate jar, etc. That’s something we don’t host at the ASF.
>
> 3.) Take the git import from hg and filter it. Remove all (most) jars,
> temporary build results etc. We might also get rid of a few old branches
> etc. If we keep the original hg repo around in read only mode then we
> should be able to loose tons of weight.
>
> I personally prefer option 3.
> But that is also the most labor intensive.
>
>
> LieGrue,
> strub