You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@netbeans.apache.org by Gregory Szorc <gr...@gmail.com> on 2016/11/08 18:58:23 UTC

Version control advice

I'm a Mercurial developer who is also responsible for running
https://hg.mozilla.org/ and supporting Mercurial at Mozilla. I understand
NetBeans is contemplating its version control future because the ASF only
supports Subversion and Git. I think I've learned some things that may be
helpful to you.

First, the NetBeans "main" repo is on the same order of magnitude (but
marginally smaller than) the Firefox repository in terms of file count and
repository data size. So generally speaking, what I have learned supporting
Firefox can apply to NetBeans.

While I understand Mercurial may not be in your future, I'd like to point
out that hg.netbeans.org is running a very old and very slow version of
Mercurial (likely a release from before July 2010). The high volume of
merge commits in the "main" repo contributes to highly sub-optimal storage
utilization in old versions of Mercurial. This makes clones and pulls
significantly slower due to more data to transfer and contributes to
significant CPU load on the server to read/encode the sub-optimal storage
encoding. I wouldn't be surprised if you have CPU load issues on the server.

As it is stored today, the "main" repository is almost exactly 3 GB. If you
create a new repository with optimal storage encoding using Mercurial 3.7
or newer so "generaldelta" is the default storage format and configuring
the repository to recalculate optimal deltas, the repository size drops to
~1.1 GB. This can be done as such:

   $ hg init main-optimal
   $ cd main-optimal
   $ hg --config format.generaldelta=true --config
format.aggressivemergedeltas=true pull https://hg.netbeans.org/main
   <wait a long time>

Now, for your VCS future.

I'm a huge proponent of monorepos for productivity reasons. I've seen
discussion on this list about splitting the repo. I would discourage that.
I'd encourage you to read https://danluu.com/monorepo/ and the linked
articles at the bottom for more on the topic.

Unfortunately, one of the practical concerns about monorepos is they don't
scale with some version control tools, namely Git. This leads many to let
deficiencies in tools drive workflow decisions, which is quite unfortunate
because tools should enhance productivity, not hinder it. If NetBeans uses
Git and maintains the "main" repo as is, I believe you'll experience the
following performance issues now or in the future as the repository keeps
growing:

* You'll constantly be dealing with CPU explosions on the Git server
generated from clients performing clones and large pulls. GitHub uses a
server infrastructure that caches certain operations related to packfiles
to help mitigate this. I'm not sure the state of ASF's Git server.

* In many cases, shallow clones can require more CPU on the Git server to
process than full clones. This is because the server essentially has to
read objects from packs and repack things instead of doing a fastpath that
effectively streams a packfile to a client.

* Garbage collection could be problematic on the server and client

Now, Git is constantly improving, so these problems may not always
exist.And as much as GitHub does well scaling well - better than a vanilla
Git install - it isn't a silver bullet. On a few instances, processes at
Mozilla have overwhelmed GitHub and resulted in GitHub disabling access to
repositories! That hasn't happened in a while though (partially through
them scaling better and partially through us learning our lesson and not
pointing hundreds of machines at large Git repos). I'm not sure what if
anything ASF's Git server has done to mitigate load from large repositories.

It's worth nothing that while some of the server-side CPU issues exist in
default Mercurial installations, there are mitigations. The "clonebundles"
extension allows a server to advertise pre-generated "bundle" files of
repository content. When a client clones, they download a large bundle from
a static file server then go back to the Mercurial server and get the data
changed since the bundle was created. If you `hg clone
https://hg.mozilla.org/mozilla-unified` with a modern Mercurial client,
your client will grab a 1+ GB file from a CDN and our servers will spend
maybe 5s of total CPU to service the clone. The clones are faster for
clients and the server can scale clones to nearly infinitely. It is wins
all around.

Anyway, Mercurial's ability to scale doesn't help you if your choices are
Subversion or Git :/

Given those choices, I would lean towards Subversion if you want to
maintain the "main" repo as is. If you use the "main" repo as is with Git,
you should really do due diligence with the Git server operator to make
sure they won't be overwhelmed.

If you split the "main" repo, go with Git if your users prefer Git over
Subversion.

A compromise option would be to keep everything in a monorepo in Subversion
and have separate Git repositories for specific subdirectories or "views."
This is often a win-win but requires a bit of tooling to do the syncing.
Speaking of syncing, it should be unidirectional: bi-directional syncing of
anything is a hard problem and take my word from someone who has hacked on
bi-directional VCS syncing that it is not something you want to support.
Instead, I recommend abstracting the process of "pushing to the canonical
repo" to something a machine does and have it perform the VCS conversion to
the canonical repo and do the actual push. e.g. landing something from Git
would have a server fetch that Git ref and replay the commits as Subversion
commits (or squash and commit to preserve atomicity).

Anyway, I think this wall of text is long enough. Reply if you have any
questions.

Gregory

Re: Version control advice

Posted by Bertrand Delacretaz <bd...@apache.org>.
On Tue, Nov 8, 2016 at 7:58 PM, Gregory Szorc <gr...@gmail.com> wrote:
> ...one of the practical concerns about monorepos is they don't
> scale with some version control tools, namely Git....

Even without considering performance I think it's obvious that the Git
model favors small focused repositories.

-Bertrand

Re: Incremental Re: Switching to Git

Posted by Ivan Soleimanipour <iv...@oracle.com>.
On 11/29/16 08:45, Emilian Bold wrote:
> How long do you estimate this license check / migration will take?
>
> Because I don't see how the git repo would be of much help in read-only
> mode.
>

But it will be a throwaway repo, no?
Or is there a scheme being concocted to keep the two repo's in sync?


> I don't have a script per se, I just executed hg-fast-export.sh as
> mentioned here:
> https://git-scm.com/book/en/v2/Git-and-Other-Systems-Migrating-to-Git#Mercurial
>
>
>
> --emi
>
> On Tue, Nov 29, 2016 at 5:24 PM, Jaroslav Tulach<jaroslav.tulach@oracle.com
>> wrote:
>
>> On \u010dtvrtek 24. listopadu 2016 21:07:19 CET Emilian Bold wrote:
>>> It's unclear to me how history would be preserved with an incremental
>>> approach.
>>> I would prefer we migrate the whole thing in one piece with history and
>> all.
>>
>> If the git repo is in read only mode and we just update the patches from
>> the
>> Hg repository, then it shouldn't be that big problem. Do you have a page/
>> builder/config showing how do you do the conversion?
>>
>> Thanks.
>> -jt
>>
>>
>

Re: Incremental Re: Switching to Git

Posted by Emilian Bold <em...@gmail.com>.
How long do you estimate this license check / migration will take?

Because I don't see how the git repo would be of much help in read-only
mode.

I don't have a script per se, I just executed hg-fast-export.sh as
mentioned here:
https://git-scm.com/book/en/v2/Git-and-Other-Systems-Migrating-to-Git#Mercurial



--emi

On Tue, Nov 29, 2016 at 5:24 PM, Jaroslav Tulach <jaroslav.tulach@oracle.com
> wrote:

> On čtvrtek 24. listopadu 2016 21:07:19 CET Emilian Bold wrote:
> > It's unclear to me how history would be preserved with an incremental
> > approach.
> > I would prefer we migrate the whole thing in one piece with history and
> all.
>
> If the git repo is in read only mode and we just update the patches from
> the
> Hg repository, then it shouldn't be that big problem. Do you have a page/
> builder/config showing how do you do the conversion?
>
> Thanks.
> -jt
>
>

Incremental Re: Switching to Git

Posted by Jaroslav Tulach <ja...@oracle.com>.
On čtvrtek 24. listopadu 2016 21:07:19 CET Emilian Bold wrote:
> It's unclear to me how history would be preserved with an incremental
> approach.
> I would prefer we migrate the whole thing in one piece with history and all.

If the git repo is in read only mode and we just update the patches from the 
Hg repository, then it shouldn't be that big problem. Do you have a page/
builder/config showing how do you do the conversion?

Thanks.
-jt


Re: Switching to Git was: Version control advice

Posted by Geertjan Wielenga <ge...@googlemail.com>.
There's no sadness at all. :-) We're working on the final document at the
moment, yes, these things take time. I currently predict it will be next
week for the document to be finalized.

Gj

On Thursday, November 24, 2016, Emilian Bold <em...@gmail.com> wrote:

> At under 1GB the repository size is not an issue anymore.
>
> It's sad to see we will still have migration problems due to legal
> considerations.
>
> Could you provide an estimate how long it would take to verify and
> whitelist the entire codebase Oracle plans on donating?
>
> It's unclear to me how history would be preserved with an incremental
> approach.
>
> I would prefer we migrate the whole thing in one piece with history and
> all.
>
>
> --emi
>
> On Thu, Nov 24, 2016 at 5:22 PM, Jaroslav Tulach <
> jaroslav.tulach@oracle.com <javascript:;>
> > wrote:
>
> > Emilian, Jan, Mark, great work.
> >
> > Smooth migration from Hg to Git is essential for successful migration to
> > Apache. Thanks a lot for investigating how to do that.
> >
> > My plan (as described in another email) is to prepare the code donation
> in
> > Hg
> > and update it incrementally with code integrated into Hg.
> >
> > Are your conversions methods ready for incremental updates or do they
> only
> > work as a one-time batch conversion?
> >
> > -jt
> >
> > On čtvrtek 24. listopadu 2016 10:41:50 CET Jan Lahoda wrote:
> > > Interesting. I tried "git gc --aggressive" on the Mark's converted
> > > repository, and the result is:
> > > netbeans-import/.git$ du -hs .
> > > 792M    .
> > >
> > > The original was:
> > > netbeans-import.git $ du -hs .
> > > 3,5G    .
> > >
> > > (IIRC Mark was converting http://hg.netbeans.org/main, not releases,
> so
> > the
> > > repository is a little bit smaller than the releases one.)
> > >
> > > I tried:
> > > $ git log -p | sha1sum
> > >
> > > on both repositories, and the hashes appear to be the same. I also
> tried
> > to
> > > clone the gc-ed repository using git clone --bare --no-local, and the
> > > resulting repository is still about the same size. So, this seems good
> to
> > > me, unless there is some downside I don't know about.
> > >
> > > Jan
> > >
> > >
> > > On Wed, Nov 23, 2016 at 8:26 PM, Emilian Bold <emilian.bold@gmail.com
> <javascript:;>>
> > >
> > > wrote:
> > > > Actually I don't believe the data loss is that large. (There may also
> > be
> > > > mercurial commits that are intentionally ignored by the conversion
> > script,
> > > > like commits that only add tags?)
> > > >
> > > > hg log | grep '^changeset:' | wc -l
> > > >
> > > >   313209
> > > >
> > > > git log | grep '^commit ' | wc -l
> > > >
> > > >   301478
> > > >
> > > > So there is a difference of 11731 commits (about 4%) but those
> couldn't
> > > > have such a large impact on repository size.
> > > >
> > > > I hope somebody else is willing to work with me on this so we
> document
> > > > everything and do a reproducible repository conversion.
> > > >
> > > >
> > > >
> > > > --emi
> > > >
> > > > On Wed, Nov 23, 2016 at 9:10 PM, Emilian Bold <
> emilian.bold@gmail.com <javascript:;>>
> > > >
> > > > wrote:
> > > > > Well, I dunno what black magic `gc --aggressive` does but the
> > repository
> > > > > is 0.85GB now!
> > > > >
> > > > > I also ran `git reflog expire` first but it didn't change the size
> at
> > > >
> > > > all.
> > > >
> > > > > One thing to keep in mind is that I used --force although I had 6
> > > > > commits
> > > > > with the warning "repository has at least one unnamed head". Which
> > were
> > > > > probably all close branch commits (hg commit --close-branch).
> > > > >
> > > > > So I might have have data loss(!) since I believe I read
> > > >
> > > > hg-fast-export.sh
> > > >
> > > > > picks only one unnamed head as the migration winner. I wonder if
> the
> > gc
> > > > > command didn't just purge a lot of valid commits from such an
> unnamed
> > > >
> > > > head
> > > >
> > > > > and that's why the repository became so small.
> > > > >
> > > > > Could somebody else try a test repository conversion and validate
> my
> > > > > numbers?
> > > > >
> > > > > git gc --aggressive --prune=now
> > > > > Counting objects: 4085031, done.
> > > > > Delta compression using up to 8 threads.
> > > > > Compressing objects: 100% (2909203/2909203), done.
> > > > > Writing objects: 100% (4085031/4085031), done.
> > > > > Total 4085031 (delta 2150468), reused 1585934 (delta 0)
> > > > > Checking connectivity: 4085031, done.
> > > > >
> > > > >
> > > > >
> > > > > --emi
> > > > >
> > > > > On Wed, Nov 23, 2016 at 7:59 PM, Paul Merlin <
> paulmerlin@apache.org <javascript:;>>
> > > > >
> > > > > wrote:
> > > > >> Hi Emilian,
> > > > >>
> > > > >> > I see hg-fast-export.sh finished at some point.
> > > > >> >
> > > > >> > As expected though, git does not have any of the disk space
> gains.
> > > > >> > The
> > > > >> > converted git releases/ repository is 3.6GB.
> > > > >>
> > > > >> Just a thought.
> > > > >> Did you try some git cleanups after the conversion?
> > > > >>
> > > > >> git reflog expire --expire=now --all
> > > > >> git gc --aggressive --prune=now
> > > > >>
> > > > >> Cheers
> > > > >>
> > > > >> > In case these statistics mean something:
> > > > >> >
> > > > >> > git-fast-import statistics:
> > > > >> > ------------------------------------------------------------
> > ---------
> > > > >> > Alloc'd objects:    4090000
> > > > >> > Total objects:      4085509 (  40220100 duplicates
> >   )
> > > > >> >
> > > > >> >       blobs  :      1036365 (  28386238 duplicates     858087
> > deltas
> > > >
> > > > of
> > > >
> > > > >> > 969684 attempts)
> > > > >> >
> > > > >> >       trees  :      2735935 (  11833862 duplicates    1370606
> > deltas
> > > >
> > > > of
> > > >
> > > > >> >  2613480 attempts)
> > > > >> >
> > > > >> >       commits:       313209 (         0 duplicates          0
> > deltas
> > > >
> > > > of
> > > >
> > > > >> >      0 attempts)
> > > > >> >
> > > > >> >       tags   :            0 (         0 duplicates          0
> > deltas
> > > >
> > > > of
> > > >
> > > > >> >      0 attempts)
> > > > >> >
> > > > >> > Total branches:        1283 (       346 loads     )
> > > > >> >
> > > > >> >       marks:        1048576 (    313209 unique    )
> > > > >> >       atoms:         124011
> > > > >> >
> > > > >> > Memory total:        218429 KiB
> > > > >> >
> > > > >> >        pools:         26711 KiB
> > > > >> >
> > > > >> >      objects:        191718 KiB
> > > > >> >
> > > > >> > ------------------------------------------------------------
> > ---------
> > > > >> > pack_report: getpagesize()            =       4096
> > > > >> > pack_report: core.packedGitWindowSize = 1073741824
> > > > >> > pack_report: core.packedGitLimit      = 8589934592
> > > > >> > pack_report: pack_used_ctr            =   39000045
> > > > >> > pack_report: pack_mmap_calls          =     733040
> > > > >> > pack_report: pack_open_windows        =          4 /          7
> > > > >> > pack_report: pack_mapped              = 4280730006 / 6950823920
> > > > >> > ------------------------------------------------------------
> > ---------
> > > > >> >
> > > > >> >
> > > > >> > --emi
> > > > >> >
> > > > >> > On Fri, Nov 18, 2016 at 1:32 PM, Emilian Bold <
> > emilian.bold@gmail.com <javascript:;>
> > > > >> >
> > > > >> > wrote:
> > > > >> >> A releases/ clone which on my system takes 3.8GB is reduced to
> > 1.6GB
> > > > >>
> > > > >> with
> > > > >>
> > > > >> >> the generaldelta and aggressivemergedeltas flags (took about 14
> > > >
> > > > hours).
> > > >
> > > > >> >> Pretty impressive!
> > > > >> >>
> > > > >> >> Converting to git with hg-fast-export.sh complains that
> > "repository
> > > > >>
> > > > >> has at
> > > > >>
> > > > >> >> least one unnamed head" for about 6 revisions. With --force I'm
> > able
> > > >
> > > > to
> > > >
> > > > >> >> start the conversion but it hasn't finished yet.
> > > > >> >>
> > > > >> >> The git conversion is about 35% done and already using 1.3GB.
> > > > >> >>
> > > > >> >> So... I assume it's going to need just like the original
> > repository
> > > > >>
> > > > >> about
> > > > >>
> > > > >> >> 3.8GB.
> > > > >> >>
> > > > >> >> I wonder if git has similar space-saving tricks?
> > > > >> >>
> > > > >> >>
> > > > >> >>
> > > > >> >> --emi
> > > > >> >>
> > > > >> >> On Thu, Nov 17, 2016 at 8:46 AM, Emilian Bold <
> > > >
> > > > emilian.bold@gmail.com <javascript:;>>
> > > >
> > > > >> >> wrote:
> > > > >> >>> Forgot about this. I've just started the Mercurial repository
> > > > >>
> > > > >> conversion
> > > > >>
> > > > >> >>> which will take a few hours.
> > > > >> >>>
> > > > >> >>> Will report tomorrow or when it's done.
> > > > >> >>>
> > > > >> >>>
> > > > >> >>> --emi
> > > > >> >>>
> > > > >> >>> On Wed, Nov 16, 2016 at 11:18 PM, cowwoc <
> > cowwoc@bbs.darktech.org <javascript:;>>
> > > > >>
> > > > >> wrote:
> > > > >> >>>> Hi Emilian,
> > > > >> >>>>
> > > > >> >>>> Any update on this?
> > > > >> >>>>
> > > > >> >>>> Thanks,
> > > > >> >>>> Gili
> > > > >> >>>>
> > > > >> >>>> On 2016-11-11 01:33 (-0500), Emilian Bold <e...@gmail.com
> <javascript:;>>
> > wrote:
> > > > >> >>>>> Thank you for following through with this after we talked on
> > > > >> >>>>> IRC.>
> > > > >> >>>>>
> > > > >> >>>>> I will check later the size reduction for the releases/
> repo.>
> >
> >
> >
>

Re: main-silver similar to releases was: Switching to Git

Posted by "Petr.Gebauer@Oracle.COM" <pe...@oracle.com>.
The job push-to-releases [1] is responsible for propagation of new changesets from the main-silver default branch to the default branch in the releases repo.

Regards,
PetrG

[1] http://deadlock.netbeans.org/view/Push%20To%20Team%20Repository%20builds/job/push-to-releases/

On Dec 21, 2016, at 12:06 PM, Emilian Bold <em...@gmail.com> wrote:

> This is a good point.
> 
> I somehow believed that releases/ only has the releases (in the _fcs
> branches,etc) while the main development is still in mail-silver (and the
> team repositories).
> 
> Only yesterday I noticed that releases/ seems to have in the default brach
> recent commits.
> 
> Except it didn't occur to me I could just pull from releases/ into a
> main-silver clone! I will try it out.
> 
> 
> --emi
> 
> On Wed, Dec 21, 2016 at 1:01 PM, Jaroslav Tulach <jaroslav.tulach@oracle.com
>> wrote:
> 
>> On úterý 20. prosince 2016 15:47:08 CET Emilian Bold wrote:
>>> You guys weren't very clear which repository are reviewing for the
>> donation.
>>> 
>>> Initially I converted releases/ but then I released I have some commits
>> of
>>> mine in there so I couldn't push it to github.
>>> 
>>> So I did a fresh conversion for main-silver/ which I assumed is the
>>> repository Oracle will donate.
>>> 
>>> I will re-do releases/ but it will take a bit.
>> 
>> Hello Emilian,
>> I want to point out that main-silver and releases aren't that different. In
>> fact every commit inside of main-silver (like http://hg.netbeans.org/main-
>> silver/rev/f966be3cb73a) is also available in releases (at http://
>> hg.netbeans.org/releases/rev/f966be3cb73a).
>> 
>> However releases is more important, as it contains more. Especially the
>> actual
>> sources used for individual NetBeans releases - like
>> http://hg.netbeans.org/
>> releases/rev/release82_fcs - those aren't in main-silver.
>> 
>> As these repos are 90% the same, I thought it is enough to keep the
>> repository
>> you already have (https://github.com/emilianbold/main-silver) and just
>> migrate
>> rest of releases on top of it.
>> -jt
>> 
>>> On Tue, Dec 20, 2016 at 3:31 PM, Jaroslav Tulach <
>> jaroslav.tulach@oracle.com
>>>> wrote:
>>>> 
>>>> On pátek 9. prosince 2016 19:05:48 CET Emilian Bold wrote:
>>>>> Martin, I have just pushed https://github.com/
>> emilianbold/main-silver
>>>> 
>>>> You
>>>> 
>>>>> may experiment with that.
>>>> 
>>>> Hello Emilian,
>>>> I managed to fork & use your repository and everything seems great. I
>> have
>>>> a
>>>> functional job that executes
>>>> 
>>>> $ ant build-platform
>>>> 
>>>> for each pull request. I plan to add a call to "ant test-platform"
>> once it
>>>> is
>>>> stable enough[1]. Great work! I believe we shall use your Git
>> repository
>>>> as a
>>>> base (somehow) when donating code to Apache once my Oracle peers finish
>>>> review
>>>> of the code to donate.
>>>> 
>>>>> To https://github.com/emilianbold/main-silver.git
>>>>> 
>>>>> * [new branch]      master -> master
>>>>> 
>>>>> Branch master set up to track remote branch master from origin.
>>>> 
>>>> Could you synchronize https://hg.netbeans.org/releases/ instead? It
>>>> contains
>>>> history of all the NetBeans releases (in branches like release82, etc.)
>>>> and it
>>>> is the repository that is currently under the review. Btw. The releases
>>>> repository contains everything that is available in the main-silver -
>> just
>>>> more.
>>>> 
>>>> It would be fantastic, if you could create a complete mirror of the
>>>> releases
>>>> repository. Thanks again for your great work!
>>>> 
>>>> -jt
>>>> 
>>>> [1] http://deadlock.netbeans.org/job/prototypes-MavenDownload269264/ -
>>>> still
>>>> 12 test failures remaining
>>>> 
>>>>> git push -u origin master
>>>>> Counting objects: 3951610, done.
>>>>> Delta compression using up to 8 threads.
>>>>> Compressing objects: 100% (732965/732965), done.
>>>>> Writing objects: 100% (3951610/3951610), 674.94 MiB | 717.00 KiB/s,
>>>>> done.
>>>>> Total 3951610 (delta 2068729), reused 3951610 (delta 2068729)
>>>>> remote: Resolving deltas: 100% (2068729/2068729), done.
>>>>> remote: Checking connectivity: 3951610, done.
>>>>> remote: warning: GH001: Large files detected. You may want to try Git
>>>> 
>>>> Large
>>>> 
>>>>> File Storage - https://git-lfs.github.com.
>>>>> remote: warning: See http://git.io/iEPt8g for more information.
>>>>> remote: warning: File dlight.util/test/manual/
>> DLight_Simple_Tests/core
>>>> 
>>>> is
>>>> 
>>>>> 51.88 MB; this is larger than GitHub's recommended maximum file size
>> of
>>>>> 50.00 MB
>>>>> To https://github.com/emilianbold/main-silver.git
>>>>> 
>>>>> * [new branch]      master -> master
>>>>> 
>>>>> Branch master set up to track remote branch master from origin.
>>>>> 
>>>>> 
>>>>> --emi
>>>>> 
>>>>> On Wed, Dec 7, 2016 at 4:42 PM, Martin Balin <
>> Martin.Balin@oracle.com>
>>>>> 
>>>>> wrote:
>>>>>> Hello Emilian,
>>>>>> I'm working at Oracle on NetBeans development and we would like to
>>>> 
>>>> start
>>>> 
>>>>>> fixing build scripts to use Git instead of HG.
>>>>>> This could be done earlier on your Git repo if you agree to as it
>> will
>>>>>> take time. Does not need to wait for final official donation of
>>>> 
>>>> sources.
>>>> 
>>>>>> Can you please send me the URL,...
>>>>>> Thank you Martin Balin
>>>>>> 
>>>>>> On 24.11.2016 20:07, Emilian Bold wrote:
>>>>>>> At under 1GB the repository size is not an issue anymore.
>>>>>>> 
>>>>>>> It's sad to see we will still have migration problems due to legal
>>>>>>> considerations.
>>>>>>> 
>>>>>>> Could you provide an estimate how long it would take to verify and
>>>>>>> whitelist the entire codebase Oracle plans on donating?
>>>>>>> 
>>>>>>> It's unclear to me how history would be preserved with an
>> incremental
>>>>>>> approach.
>>>>>>> 
>>>>>>> I would prefer we migrate the whole thing in one piece with
>> history
>>>> 
>>>> and
>>>> 
>>>>>>> all.
>>>>>>> 
>>>>>>> 
>>>>>>> --emi
>>>>>>> 
>>>>>>> On Thu, Nov 24, 2016 at 5:22 PM, Jaroslav Tulach <
>>>>>>> jaroslav.tulach@oracle.com
>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> Emilian, Jan, Mark, great work.
>>>>>>>> 
>>>>>>>> Smooth migration from Hg to Git is essential for successful
>>>> 
>>>> migration to
>>>> 
>>>>>>>> Apache. Thanks a lot for investigating how to do that.
>>>>>>>> 
>>>>>>>> My plan (as described in another email) is to prepare the code
>>>> 
>>>> donation
>>>> 
>>>>>>>> in
>>>>>>>> Hg
>>>>>>>> and update it incrementally with code integrated into Hg.
>>>>>>>> 
>>>>>>>> Are your conversions methods ready for incremental updates or do
>>>>>>>> they
>>>>>>>> only
>>>>>>>> work as a one-time batch conversion?
>>>>>>>> 
>>>>>>>> -jt
>>>>>>>> 
>>>>>>>> On čtvrtek 24. listopadu 2016 10:41:50 CET Jan Lahoda wrote:
>>>>>>>>> Interesting. I tried "git gc --aggressive" on the Mark's
>> converted
>>>>>>>>> repository, and the result is:
>>>>>>>>> netbeans-import/.git$ du -hs .
>>>>>>>>> 792M    .
>>>>>>>>> 
>>>>>>>>> The original was:
>>>>>>>>> netbeans-import.git $ du -hs .
>>>>>>>>> 3,5G    .
>>>>>>>>> 
>>>>>>>>> (IIRC Mark was converting http://hg.netbeans.org/main, not
>>>> 
>>>> releases, so
>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>>> repository is a little bit smaller than the releases one.)
>>>>>>>>> 
>>>>>>>>> I tried:
>>>>>>>>> $ git log -p | sha1sum
>>>>>>>>> 
>>>>>>>>> on both repositories, and the hashes appear to be the same. I
>> also
>>>>>>>>> tried
>>>>>>>> 
>>>>>>>> to
>>>>>>>> 
>>>>>>>>> clone the gc-ed repository using git clone --bare --no-local,
>> and
>>>> 
>>>> the
>>>> 
>>>>>>>>> resulting repository is still about the same size. So, this
>> seems
>>>> 
>>>> good
>>>> 
>>>>>>>>> to
>>>>>>>>> me, unless there is some downside I don't know about.
>>>>>>>>> 
>>>>>>>>> Jan
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Nov 23, 2016 at 8:26 PM, Emilian Bold <
>>>> 
>>>> emilian.bold@gmail.com>
>>>> 
>>>>>>>>> wrote:
>>>>>>>>>> Actually I don't believe the data loss is that large. (There
>> may
>>>> 
>>>> also
>>>> 
>>>>>>>>> be
>>>>>>>>> 
>>>>>>>>> mercurial commits that are intentionally ignored by the
>> conversion
>>>>>>>>> 
>>>>>>>>> script,
>>>>>>>>> 
>>>>>>>>> like commits that only add tags?)
>>>>>>>>> 
>>>>>>>>>> hg log | grep '^changeset:' | wc -l
>>>>>>>>>> 
>>>>>>>>>>   313209
>>>>>>>>>> 
>>>>>>>>>> git log | grep '^commit ' | wc -l
>>>>>>>>>> 
>>>>>>>>>>   301478
>>>>>>>>>> 
>>>>>>>>>> So there is a difference of 11731 commits (about 4%) but those
>>>>>>>>>> couldn't
>>>>>>>>>> have such a large impact on repository size.
>>>>>>>>>> 
>>>>>>>>>> I hope somebody else is willing to work with me on this so we
>>>> 
>>>> document
>>>> 
>>>>>>>>>> everything and do a reproducible repository conversion.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --emi
>>>>>>>>>> 
>>>>>>>>>> On Wed, Nov 23, 2016 at 9:10 PM, Emilian Bold <
>>>> 
>>>> emilian.bold@gmail.com>
>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>>> Well, I dunno what black magic `gc --aggressive` does but the
>>>>>>>>>> 
>>>>>>>>>> repository
>>>>>>>>> 
>>>>>>>>> is 0.85GB now!
>>>>>>>>> 
>>>>>>>>>>> I also ran `git reflog expire` first but it didn't change the
>>>> 
>>>> size at
>>>> 
>>>>>>>>>> all.
>>>>>>>>>> 
>>>>>>>>>> One thing to keep in mind is that I used --force although I
>> had 6
>>>>>>>>>> 
>>>>>>>>>>> commits
>>>>>>>>>>> with the warning "repository has at least one unnamed head".
>>>>>>>>>>> Which
>>>>>>>>>> 
>>>>>>>>>> were
>>>>>>>>> 
>>>>>>>>> probably all close branch commits (hg commit --close-branch).
>>>>>>>>> 
>>>>>>>>>>> So I might have have data loss(!) since I believe I read
>>>>>>>>>> 
>>>>>>>>>> hg-fast-export.sh
>>>>>>>>>> 
>>>>>>>>>> picks only one unnamed head as the migration winner. I wonder
>> if
>>>> 
>>>> the
>>>> 
>>>>>>>>>> gc
>>>>>>>>> 
>>>>>>>>> command didn't just purge a lot of valid commits from such an
>>>> 
>>>> unnamed
>>>> 
>>>>>>>>>> head
>>>>>>>>>> 
>>>>>>>>>> and that's why the repository became so small.
>>>>>>>>>> 
>>>>>>>>>>> Could somebody else try a test repository conversion and
>> validate
>>>> 
>>>> my
>>>> 
>>>>>>>>>>> numbers?
>>>>>>>>>>> 
>>>>>>>>>>> git gc --aggressive --prune=now
>>>>>>>>>>> Counting objects: 4085031, done.
>>>>>>>>>>> Delta compression using up to 8 threads.
>>>>>>>>>>> Compressing objects: 100% (2909203/2909203), done.
>>>>>>>>>>> Writing objects: 100% (4085031/4085031), done.
>>>>>>>>>>> Total 4085031 (delta 2150468), reused 1585934 (delta 0)
>>>>>>>>>>> Checking connectivity: 4085031, done.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --emi
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Nov 23, 2016 at 7:59 PM, Paul Merlin <
>>>> 
>>>> paulmerlin@apache.org>
>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> Hi Emilian,
>>>>>>>>>>>> 
>>>>>>>>>>>> I see hg-fast-export.sh finished at some point.
>>>>>>>>>>>> 
>>>>>>>>>>>>> As expected though, git does not have any of the disk space
>>>> 
>>>> gains.
>>>> 
>>>>>>>>>>>>> The
>>>>>>>>>>>>> converted git releases/ repository is 3.6GB.
>>>>>>>>>>>> 
>>>>>>>>>>>> Just a thought.
>>>>>>>>>>>> Did you try some git cleanups after the conversion?
>>>>>>>>>>>> 
>>>>>>>>>>>> git reflog expire --expire=now --all
>>>>>>>>>>>> git gc --aggressive --prune=now
>>>>>>>>>>>> 
>>>>>>>>>>>> Cheers
>>>>>>>>>>>> 
>>>>>>>>>>>> In case these statistics mean something:
>>>>>>>>>>>>> git-fast-import statistics:
>>>>>>>>>>>>> ------------------------------
>> ------------------------------
>>>>>>>>>>>> 
>>>>>>>>>>>> ---------
>>>>>>>>> 
>>>>>>>>> Alloc'd objects:    4090000
>>>>>>>>> 
>>>>>>>>>>>>> Total objects:      4085509 (  40220100 duplicates
>>>>>>>>>>>>> 
>>>>>>>>>>>>   )
>>>>>>>>>>>> 
>>>>>>>>>       blobs  :      1036365 (  28386238 duplicates     858087
>>>>>>>>>>>> 
>>>>>>>>>>>> deltas
>>>>>>>>> 
>>>>>>>>> of
>>>>>>>>> 
>>>>>>>>>> 969684 attempts)
>>>>>>>>>> 
>>>>>>>>>>>>>       trees  :      2735935 (  11833862 duplicates
>> 1370606
>>>>>>>>>>>> 
>>>>>>>>>>>> deltas
>>>>>>>>> 
>>>>>>>>> of
>>>>>>>>> 
>>>>>>>>>>  2613480 attempts)
>>>>>>>>>> 
>>>>>>>>>>>>>       commits:       313209 (         0 duplicates
>>  0
>>>>>>>>>>>> 
>>>>>>>>>>>> deltas
>>>>>>>>> 
>>>>>>>>> of
>>>>>>>>> 
>>>>>>>>>>      0 attempts)
>>>>>>>>>> 
>>>>>>>>>>>>>       tags   :            0 (         0 duplicates
>>  0
>>>>>>>>>>>> 
>>>>>>>>>>>> deltas
>>>>>>>>> 
>>>>>>>>> of
>>>>>>>>> 
>>>>>>>>>>      0 attempts)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Total branches:        1283 (       346 loads     )
>>>>>>>>>>>>> 
>>>>>>>>>>>>>       marks:        1048576 (    313209 unique    )
>>>>>>>>>>>>>       atoms:         124011
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Memory total:        218429 KiB
>>>>>>>>>>>>> 
>>>>>>>>>>>>>        pools:         26711 KiB
>>>>>>>>>>>>> 
>>>>>>>>>>>>>      objects:        191718 KiB
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ------------------------------
>> ------------------------------
>>>>>>>>>>>> 
>>>>>>>>>>>> ---------
>>>>>>>>> 
>>>>>>>>> pack_report: getpagesize()            =       4096
>>>>>>>>> 
>>>>>>>>>>>>> pack_report: core.packedGitWindowSize = 1073741824
>>>>>>>>>>>>> pack_report: core.packedGitLimit      = 8589934592
>>>>>>>>>>>>> pack_report: pack_used_ctr            =   39000045
>>>>>>>>>>>>> pack_report: pack_mmap_calls          =     733040
>>>>>>>>>>>>> pack_report: pack_open_windows        =          4 /
>>  7
>>>>>>>>>>>>> pack_report: pack_mapped              = 4280730006 /
>> 6950823920
>>>>>>>>>>>>> ------------------------------
>> ------------------------------
>>>>>>>>>>>> 
>>>>>>>>>>>> ---------
>>>>>>>>>>>> 
>>>>>>>>>>>>> --emi
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Fri, Nov 18, 2016 at 1:32 PM, Emilian Bold <
>>>>>>>>>>>> 
>>>>>>>>>>>> emilian.bold@gmail.com
>>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>>> A releases/ clone which on my system takes 3.8GB is
>> reduced to
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 1.6GB
>>>>>>>>> 
>>>>>>>>> with
>>>>>>>>> 
>>>>>>>>>>>> the generaldelta and aggressivemergedeltas flags (took about
>> 14
>>>>>>>>>>>> 
>>>>>>>>>>>>> hours).
>>>>>>>>>> 
>>>>>>>>>> Pretty impressive!
>>>>>>>>>> 
>>>>>>>>>>>>>> Converting to git with hg-fast-export.sh complains that
>>>>>>>>>>>>> 
>>>>>>>>>>>>> "repository
>>>>>>>>> 
>>>>>>>>> has at
>>>>>>>>> 
>>>>>>>>>>>> least one unnamed head" for about 6 revisions. With --force
>> I'm
>>>>>>>>>>>> 
>>>>>>>>>>>>> able
>>>>>>>>> 
>>>>>>>>> to
>>>>>>>>> 
>>>>>>>>>> start the conversion but it hasn't finished yet.
>>>>>>>>>> 
>>>>>>>>>>>>>> The git conversion is about 35% done and already using
>> 1.3GB.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> So... I assume it's going to need just like the original
>>>>>>>>>>>>> 
>>>>>>>>>>>>> repository
>>>>>>>>> 
>>>>>>>>> about
>>>>>>>>> 
>>>>>>>>>>>> 3.8GB.
>>>>>>>>>>>> 
>>>>>>>>>>>>>> I wonder if git has similar space-saving tricks?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --emi
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Thu, Nov 17, 2016 at 8:46 AM, Emilian Bold <
>>>>>>>>>>>>> 
>>>>>>>>>>>>> emilian.bold@gmail.com>
>>>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> Forgot about this. I've just started the Mercurial
>> repository
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> conversion
>>>>>>>>>>>> 
>>>>>>>>>>>> which will take a few hours.
>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Will report tomorrow or when it's done.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> --emi
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Wed, Nov 16, 2016 at 11:18 PM, cowwoc <
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> cowwoc@bbs.darktech.org>
>>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>> Hi Emilian,
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Any update on this?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Gili
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 2016-11-11 01:33 (-0500), Emilian Bold <
>> e...@gmail.com>
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>> Thank you for following through with this after we talked on
>>>>>>>>> 
>>>>>>>>>>>>>>>>> IRC.>
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I will check later the size reduction for the releases/
>>>> 
>>>> repo.>
>> 
>> 
>> 


Re: main-silver similar to releases was: Switching to Git

Posted by Emilian Bold <em...@gmail.com>.
This is a good point.

I somehow believed that releases/ only has the releases (in the _fcs
branches,etc) while the main development is still in mail-silver (and the
team repositories).

Only yesterday I noticed that releases/ seems to have in the default brach
recent commits.

Except it didn't occur to me I could just pull from releases/ into a
main-silver clone! I will try it out.


--emi

On Wed, Dec 21, 2016 at 1:01 PM, Jaroslav Tulach <jaroslav.tulach@oracle.com
> wrote:

> On úterý 20. prosince 2016 15:47:08 CET Emilian Bold wrote:
> > You guys weren't very clear which repository are reviewing for the
> donation.
> >
> > Initially I converted releases/ but then I released I have some commits
> of
> > mine in there so I couldn't push it to github.
> >
> > So I did a fresh conversion for main-silver/ which I assumed is the
> > repository Oracle will donate.
> >
> > I will re-do releases/ but it will take a bit.
>
> Hello Emilian,
> I want to point out that main-silver and releases aren't that different. In
> fact every commit inside of main-silver (like http://hg.netbeans.org/main-
> silver/rev/f966be3cb73a) is also available in releases (at http://
> hg.netbeans.org/releases/rev/f966be3cb73a).
>
> However releases is more important, as it contains more. Especially the
> actual
> sources used for individual NetBeans releases - like
> http://hg.netbeans.org/
> releases/rev/release82_fcs - those aren't in main-silver.
>
> As these repos are 90% the same, I thought it is enough to keep the
> repository
> you already have (https://github.com/emilianbold/main-silver) and just
> migrate
> rest of releases on top of it.
> -jt
>
> > On Tue, Dec 20, 2016 at 3:31 PM, Jaroslav Tulach <
> jaroslav.tulach@oracle.com
> > > wrote:
> > >
> > > On pátek 9. prosince 2016 19:05:48 CET Emilian Bold wrote:
> > > > Martin, I have just pushed https://github.com/
> emilianbold/main-silver
> > >
> > > You
> > >
> > > > may experiment with that.
> > >
> > > Hello Emilian,
> > > I managed to fork & use your repository and everything seems great. I
> have
> > > a
> > > functional job that executes
> > >
> > > $ ant build-platform
> > >
> > > for each pull request. I plan to add a call to "ant test-platform"
> once it
> > > is
> > > stable enough[1]. Great work! I believe we shall use your Git
> repository
> > > as a
> > > base (somehow) when donating code to Apache once my Oracle peers finish
> > > review
> > > of the code to donate.
> > >
> > > > To https://github.com/emilianbold/main-silver.git
> > > >
> > > >  * [new branch]      master -> master
> > > >
> > > > Branch master set up to track remote branch master from origin.
> > >
> > > Could you synchronize https://hg.netbeans.org/releases/ instead? It
> > > contains
> > > history of all the NetBeans releases (in branches like release82, etc.)
> > > and it
> > > is the repository that is currently under the review. Btw. The releases
> > > repository contains everything that is available in the main-silver -
> just
> > > more.
> > >
> > > It would be fantastic, if you could create a complete mirror of the
> > > releases
> > > repository. Thanks again for your great work!
> > >
> > > -jt
> > >
> > > [1] http://deadlock.netbeans.org/job/prototypes-MavenDownload269264/ -
> > > still
> > > 12 test failures remaining
> > >
> > > > git push -u origin master
> > > > Counting objects: 3951610, done.
> > > > Delta compression using up to 8 threads.
> > > > Compressing objects: 100% (732965/732965), done.
> > > > Writing objects: 100% (3951610/3951610), 674.94 MiB | 717.00 KiB/s,
> > > > done.
> > > > Total 3951610 (delta 2068729), reused 3951610 (delta 2068729)
> > > > remote: Resolving deltas: 100% (2068729/2068729), done.
> > > > remote: Checking connectivity: 3951610, done.
> > > > remote: warning: GH001: Large files detected. You may want to try Git
> > >
> > > Large
> > >
> > > > File Storage - https://git-lfs.github.com.
> > > > remote: warning: See http://git.io/iEPt8g for more information.
> > > > remote: warning: File dlight.util/test/manual/
> DLight_Simple_Tests/core
> > >
> > > is
> > >
> > > > 51.88 MB; this is larger than GitHub's recommended maximum file size
> of
> > > > 50.00 MB
> > > > To https://github.com/emilianbold/main-silver.git
> > > >
> > > >  * [new branch]      master -> master
> > > >
> > > > Branch master set up to track remote branch master from origin.
> > > >
> > > >
> > > > --emi
> > > >
> > > > On Wed, Dec 7, 2016 at 4:42 PM, Martin Balin <
> Martin.Balin@oracle.com>
> > > >
> > > > wrote:
> > > > > Hello Emilian,
> > > > > I'm working at Oracle on NetBeans development and we would like to
> > >
> > > start
> > >
> > > > > fixing build scripts to use Git instead of HG.
> > > > > This could be done earlier on your Git repo if you agree to as it
> will
> > > > > take time. Does not need to wait for final official donation of
> > >
> > > sources.
> > >
> > > > > Can you please send me the URL,...
> > > > > Thank you Martin Balin
> > > > >
> > > > > On 24.11.2016 20:07, Emilian Bold wrote:
> > > > >> At under 1GB the repository size is not an issue anymore.
> > > > >>
> > > > >> It's sad to see we will still have migration problems due to legal
> > > > >> considerations.
> > > > >>
> > > > >> Could you provide an estimate how long it would take to verify and
> > > > >> whitelist the entire codebase Oracle plans on donating?
> > > > >>
> > > > >> It's unclear to me how history would be preserved with an
> incremental
> > > > >> approach.
> > > > >>
> > > > >> I would prefer we migrate the whole thing in one piece with
> history
> > >
> > > and
> > >
> > > > >> all.
> > > > >>
> > > > >>
> > > > >> --emi
> > > > >>
> > > > >> On Thu, Nov 24, 2016 at 5:22 PM, Jaroslav Tulach <
> > > > >> jaroslav.tulach@oracle.com
> > > > >>
> > > > >>> wrote:
> > > > >>> Emilian, Jan, Mark, great work.
> > > > >>>
> > > > >>> Smooth migration from Hg to Git is essential for successful
> > >
> > > migration to
> > >
> > > > >>> Apache. Thanks a lot for investigating how to do that.
> > > > >>>
> > > > >>> My plan (as described in another email) is to prepare the code
> > >
> > > donation
> > >
> > > > >>> in
> > > > >>> Hg
> > > > >>> and update it incrementally with code integrated into Hg.
> > > > >>>
> > > > >>> Are your conversions methods ready for incremental updates or do
> > > > >>> they
> > > > >>> only
> > > > >>> work as a one-time batch conversion?
> > > > >>>
> > > > >>> -jt
> > > > >>>
> > > > >>> On čtvrtek 24. listopadu 2016 10:41:50 CET Jan Lahoda wrote:
> > > > >>>> Interesting. I tried "git gc --aggressive" on the Mark's
> converted
> > > > >>>> repository, and the result is:
> > > > >>>> netbeans-import/.git$ du -hs .
> > > > >>>> 792M    .
> > > > >>>>
> > > > >>>> The original was:
> > > > >>>> netbeans-import.git $ du -hs .
> > > > >>>> 3,5G    .
> > > > >>>>
> > > > >>>> (IIRC Mark was converting http://hg.netbeans.org/main, not
> > >
> > > releases, so
> > >
> > > > >>> the
> > > > >>>
> > > > >>>> repository is a little bit smaller than the releases one.)
> > > > >>>>
> > > > >>>> I tried:
> > > > >>>> $ git log -p | sha1sum
> > > > >>>>
> > > > >>>> on both repositories, and the hashes appear to be the same. I
> also
> > > > >>>> tried
> > > > >>>
> > > > >>> to
> > > > >>>
> > > > >>>> clone the gc-ed repository using git clone --bare --no-local,
> and
> > >
> > > the
> > >
> > > > >>>> resulting repository is still about the same size. So, this
> seems
> > >
> > > good
> > >
> > > > >>>> to
> > > > >>>> me, unless there is some downside I don't know about.
> > > > >>>>
> > > > >>>> Jan
> > > > >>>>
> > > > >>>>
> > > > >>>> On Wed, Nov 23, 2016 at 8:26 PM, Emilian Bold <
> > >
> > > emilian.bold@gmail.com>
> > >
> > > > >>>> wrote:
> > > > >>>>> Actually I don't believe the data loss is that large. (There
> may
> > >
> > > also
> > >
> > > > >>>> be
> > > > >>>>
> > > > >>>> mercurial commits that are intentionally ignored by the
> conversion
> > > > >>>>
> > > > >>>> script,
> > > > >>>>
> > > > >>>> like commits that only add tags?)
> > > > >>>>
> > > > >>>>> hg log | grep '^changeset:' | wc -l
> > > > >>>>>
> > > > >>>>>    313209
> > > > >>>>>
> > > > >>>>> git log | grep '^commit ' | wc -l
> > > > >>>>>
> > > > >>>>>    301478
> > > > >>>>>
> > > > >>>>> So there is a difference of 11731 commits (about 4%) but those
> > > > >>>>> couldn't
> > > > >>>>> have such a large impact on repository size.
> > > > >>>>>
> > > > >>>>> I hope somebody else is willing to work with me on this so we
> > >
> > > document
> > >
> > > > >>>>> everything and do a reproducible repository conversion.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> --emi
> > > > >>>>>
> > > > >>>>> On Wed, Nov 23, 2016 at 9:10 PM, Emilian Bold <
> > >
> > > emilian.bold@gmail.com>
> > >
> > > > >>>>> wrote:
> > > > >>>>>> Well, I dunno what black magic `gc --aggressive` does but the
> > > > >>>>>
> > > > >>>>> repository
> > > > >>>>
> > > > >>>> is 0.85GB now!
> > > > >>>>
> > > > >>>>>> I also ran `git reflog expire` first but it didn't change the
> > >
> > > size at
> > >
> > > > >>>>> all.
> > > > >>>>>
> > > > >>>>> One thing to keep in mind is that I used --force although I
> had 6
> > > > >>>>>
> > > > >>>>>> commits
> > > > >>>>>> with the warning "repository has at least one unnamed head".
> > > > >>>>>> Which
> > > > >>>>>
> > > > >>>>> were
> > > > >>>>
> > > > >>>> probably all close branch commits (hg commit --close-branch).
> > > > >>>>
> > > > >>>>>> So I might have have data loss(!) since I believe I read
> > > > >>>>>
> > > > >>>>> hg-fast-export.sh
> > > > >>>>>
> > > > >>>>> picks only one unnamed head as the migration winner. I wonder
> if
> > >
> > > the
> > >
> > > > >>>>> gc
> > > > >>>>
> > > > >>>> command didn't just purge a lot of valid commits from such an
> > >
> > > unnamed
> > >
> > > > >>>>> head
> > > > >>>>>
> > > > >>>>> and that's why the repository became so small.
> > > > >>>>>
> > > > >>>>>> Could somebody else try a test repository conversion and
> validate
> > >
> > > my
> > >
> > > > >>>>>> numbers?
> > > > >>>>>>
> > > > >>>>>> git gc --aggressive --prune=now
> > > > >>>>>> Counting objects: 4085031, done.
> > > > >>>>>> Delta compression using up to 8 threads.
> > > > >>>>>> Compressing objects: 100% (2909203/2909203), done.
> > > > >>>>>> Writing objects: 100% (4085031/4085031), done.
> > > > >>>>>> Total 4085031 (delta 2150468), reused 1585934 (delta 0)
> > > > >>>>>> Checking connectivity: 4085031, done.
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> --emi
> > > > >>>>>>
> > > > >>>>>> On Wed, Nov 23, 2016 at 7:59 PM, Paul Merlin <
> > >
> > > paulmerlin@apache.org>
> > >
> > > > >>>>>> wrote:
> > > > >>>>>>> Hi Emilian,
> > > > >>>>>>>
> > > > >>>>>>> I see hg-fast-export.sh finished at some point.
> > > > >>>>>>>
> > > > >>>>>>>> As expected though, git does not have any of the disk space
> > >
> > > gains.
> > >
> > > > >>>>>>>> The
> > > > >>>>>>>> converted git releases/ repository is 3.6GB.
> > > > >>>>>>>
> > > > >>>>>>> Just a thought.
> > > > >>>>>>> Did you try some git cleanups after the conversion?
> > > > >>>>>>>
> > > > >>>>>>> git reflog expire --expire=now --all
> > > > >>>>>>> git gc --aggressive --prune=now
> > > > >>>>>>>
> > > > >>>>>>> Cheers
> > > > >>>>>>>
> > > > >>>>>>> In case these statistics mean something:
> > > > >>>>>>>> git-fast-import statistics:
> > > > >>>>>>>> ------------------------------
> ------------------------------
> > > > >>>>>>>
> > > > >>>>>>> ---------
> > > > >>>>
> > > > >>>> Alloc'd objects:    4090000
> > > > >>>>
> > > > >>>>>>>> Total objects:      4085509 (  40220100 duplicates
> > > > >>>>>>>>
> > > > >>>>>>>    )
> > > > >>>>>>>
> > > > >>>>        blobs  :      1036365 (  28386238 duplicates     858087
> > > > >>>>>>>
> > > > >>>>>>> deltas
> > > > >>>>
> > > > >>>> of
> > > > >>>>
> > > > >>>>> 969684 attempts)
> > > > >>>>>
> > > > >>>>>>>>        trees  :      2735935 (  11833862 duplicates
> 1370606
> > > > >>>>>>>
> > > > >>>>>>> deltas
> > > > >>>>
> > > > >>>> of
> > > > >>>>
> > > > >>>>>   2613480 attempts)
> > > > >>>>>
> > > > >>>>>>>>        commits:       313209 (         0 duplicates
>   0
> > > > >>>>>>>
> > > > >>>>>>> deltas
> > > > >>>>
> > > > >>>> of
> > > > >>>>
> > > > >>>>>       0 attempts)
> > > > >>>>>
> > > > >>>>>>>>        tags   :            0 (         0 duplicates
>   0
> > > > >>>>>>>
> > > > >>>>>>> deltas
> > > > >>>>
> > > > >>>> of
> > > > >>>>
> > > > >>>>>       0 attempts)
> > > > >>>>>>>>
> > > > >>>>>>>> Total branches:        1283 (       346 loads     )
> > > > >>>>>>>>
> > > > >>>>>>>>        marks:        1048576 (    313209 unique    )
> > > > >>>>>>>>        atoms:         124011
> > > > >>>>>>>>
> > > > >>>>>>>> Memory total:        218429 KiB
> > > > >>>>>>>>
> > > > >>>>>>>>         pools:         26711 KiB
> > > > >>>>>>>>
> > > > >>>>>>>>       objects:        191718 KiB
> > > > >>>>>>>>
> > > > >>>>>>>> ------------------------------
> ------------------------------
> > > > >>>>>>>
> > > > >>>>>>> ---------
> > > > >>>>
> > > > >>>> pack_report: getpagesize()            =       4096
> > > > >>>>
> > > > >>>>>>>> pack_report: core.packedGitWindowSize = 1073741824
> > > > >>>>>>>> pack_report: core.packedGitLimit      = 8589934592
> > > > >>>>>>>> pack_report: pack_used_ctr            =   39000045
> > > > >>>>>>>> pack_report: pack_mmap_calls          =     733040
> > > > >>>>>>>> pack_report: pack_open_windows        =          4 /
>   7
> > > > >>>>>>>> pack_report: pack_mapped              = 4280730006 /
> 6950823920
> > > > >>>>>>>> ------------------------------
> ------------------------------
> > > > >>>>>>>
> > > > >>>>>>> ---------
> > > > >>>>>>>
> > > > >>>>>>>> --emi
> > > > >>>>>>>>
> > > > >>>>>>>> On Fri, Nov 18, 2016 at 1:32 PM, Emilian Bold <
> > > > >>>>>>>
> > > > >>>>>>> emilian.bold@gmail.com
> > > > >>>>
> > > > >>>> wrote:
> > > > >>>>>>>>> A releases/ clone which on my system takes 3.8GB is
> reduced to
> > > > >>>>>>>>
> > > > >>>>>>>> 1.6GB
> > > > >>>>
> > > > >>>> with
> > > > >>>>
> > > > >>>>>>> the generaldelta and aggressivemergedeltas flags (took about
> 14
> > > > >>>>>>>
> > > > >>>>>>>> hours).
> > > > >>>>>
> > > > >>>>> Pretty impressive!
> > > > >>>>>
> > > > >>>>>>>>> Converting to git with hg-fast-export.sh complains that
> > > > >>>>>>>>
> > > > >>>>>>>> "repository
> > > > >>>>
> > > > >>>> has at
> > > > >>>>
> > > > >>>>>>> least one unnamed head" for about 6 revisions. With --force
> I'm
> > > > >>>>>>>
> > > > >>>>>>>> able
> > > > >>>>
> > > > >>>> to
> > > > >>>>
> > > > >>>>> start the conversion but it hasn't finished yet.
> > > > >>>>>
> > > > >>>>>>>>> The git conversion is about 35% done and already using
> 1.3GB.
> > > > >>>>>>>>>
> > > > >>>>>>>>> So... I assume it's going to need just like the original
> > > > >>>>>>>>
> > > > >>>>>>>> repository
> > > > >>>>
> > > > >>>> about
> > > > >>>>
> > > > >>>>>>> 3.8GB.
> > > > >>>>>>>
> > > > >>>>>>>>> I wonder if git has similar space-saving tricks?
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> --emi
> > > > >>>>>>>>>
> > > > >>>>>>>>> On Thu, Nov 17, 2016 at 8:46 AM, Emilian Bold <
> > > > >>>>>>>>
> > > > >>>>>>>> emilian.bold@gmail.com>
> > > > >>>>>
> > > > >>>>> wrote:
> > > > >>>>>>>>>> Forgot about this. I've just started the Mercurial
> repository
> > > > >>>>>>>>>
> > > > >>>>>>>>> conversion
> > > > >>>>>>>
> > > > >>>>>>> which will take a few hours.
> > > > >>>>>>>
> > > > >>>>>>>>>> Will report tomorrow or when it's done.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> --emi
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> On Wed, Nov 16, 2016 at 11:18 PM, cowwoc <
> > > > >>>>>>>>>
> > > > >>>>>>>>> cowwoc@bbs.darktech.org>
> > > > >>>>
> > > > >>>> wrote:
> > > > >>>>>>>> Hi Emilian,
> > > > >>>>>>>>
> > > > >>>>>>>>>>> Any update on this?
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Thanks,
> > > > >>>>>>>>>>> Gili
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> On 2016-11-11 01:33 (-0500), Emilian Bold <
> e...@gmail.com>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> wrote:
> > > > >>>> Thank you for following through with this after we talked on
> > > > >>>>
> > > > >>>>>>>>>>>> IRC.>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> I will check later the size reduction for the releases/
> > >
> > > repo.>
>
>
>

main-silver similar to releases was: Switching to Git

Posted by Jaroslav Tulach <ja...@oracle.com>.
On úterý 20. prosince 2016 15:47:08 CET Emilian Bold wrote:
> You guys weren't very clear which repository are reviewing for the donation.
> 
> Initially I converted releases/ but then I released I have some commits of
> mine in there so I couldn't push it to github.
> 
> So I did a fresh conversion for main-silver/ which I assumed is the
> repository Oracle will donate.
> 
> I will re-do releases/ but it will take a bit.

Hello Emilian,
I want to point out that main-silver and releases aren't that different. In 
fact every commit inside of main-silver (like http://hg.netbeans.org/main-silver/rev/f966be3cb73a) is also available in releases (at http://
hg.netbeans.org/releases/rev/f966be3cb73a).

However releases is more important, as it contains more. Especially the actual 
sources used for individual NetBeans releases - like http://hg.netbeans.org/
releases/rev/release82_fcs - those aren't in main-silver.

As these repos are 90% the same, I thought it is enough to keep the repository 
you already have (https://github.com/emilianbold/main-silver) and just migrate 
rest of releases on top of it.
-jt

> On Tue, Dec 20, 2016 at 3:31 PM, Jaroslav Tulach <jaroslav.tulach@oracle.com
> > wrote:
> > 
> > On pátek 9. prosince 2016 19:05:48 CET Emilian Bold wrote:
> > > Martin, I have just pushed https://github.com/emilianbold/main-silver
> > 
> > You
> > 
> > > may experiment with that.
> > 
> > Hello Emilian,
> > I managed to fork & use your repository and everything seems great. I have
> > a
> > functional job that executes
> > 
> > $ ant build-platform
> > 
> > for each pull request. I plan to add a call to "ant test-platform" once it
> > is
> > stable enough[1]. Great work! I believe we shall use your Git repository
> > as a
> > base (somehow) when donating code to Apache once my Oracle peers finish
> > review
> > of the code to donate.
> > 
> > > To https://github.com/emilianbold/main-silver.git
> > > 
> > >  * [new branch]      master -> master
> > > 
> > > Branch master set up to track remote branch master from origin.
> > 
> > Could you synchronize https://hg.netbeans.org/releases/ instead? It
> > contains
> > history of all the NetBeans releases (in branches like release82, etc.)
> > and it
> > is the repository that is currently under the review. Btw. The releases
> > repository contains everything that is available in the main-silver - just
> > more.
> > 
> > It would be fantastic, if you could create a complete mirror of the
> > releases
> > repository. Thanks again for your great work!
> > 
> > -jt
> > 
> > [1] http://deadlock.netbeans.org/job/prototypes-MavenDownload269264/ -
> > still
> > 12 test failures remaining
> > 
> > > git push -u origin master
> > > Counting objects: 3951610, done.
> > > Delta compression using up to 8 threads.
> > > Compressing objects: 100% (732965/732965), done.
> > > Writing objects: 100% (3951610/3951610), 674.94 MiB | 717.00 KiB/s,
> > > done.
> > > Total 3951610 (delta 2068729), reused 3951610 (delta 2068729)
> > > remote: Resolving deltas: 100% (2068729/2068729), done.
> > > remote: Checking connectivity: 3951610, done.
> > > remote: warning: GH001: Large files detected. You may want to try Git
> > 
> > Large
> > 
> > > File Storage - https://git-lfs.github.com.
> > > remote: warning: See http://git.io/iEPt8g for more information.
> > > remote: warning: File dlight.util/test/manual/DLight_Simple_Tests/core
> > 
> > is
> > 
> > > 51.88 MB; this is larger than GitHub's recommended maximum file size of
> > > 50.00 MB
> > > To https://github.com/emilianbold/main-silver.git
> > > 
> > >  * [new branch]      master -> master
> > > 
> > > Branch master set up to track remote branch master from origin.
> > > 
> > > 
> > > --emi
> > > 
> > > On Wed, Dec 7, 2016 at 4:42 PM, Martin Balin <Ma...@oracle.com>
> > > 
> > > wrote:
> > > > Hello Emilian,
> > > > I'm working at Oracle on NetBeans development and we would like to
> > 
> > start
> > 
> > > > fixing build scripts to use Git instead of HG.
> > > > This could be done earlier on your Git repo if you agree to as it will
> > > > take time. Does not need to wait for final official donation of
> > 
> > sources.
> > 
> > > > Can you please send me the URL,...
> > > > Thank you Martin Balin
> > > > 
> > > > On 24.11.2016 20:07, Emilian Bold wrote:
> > > >> At under 1GB the repository size is not an issue anymore.
> > > >> 
> > > >> It's sad to see we will still have migration problems due to legal
> > > >> considerations.
> > > >> 
> > > >> Could you provide an estimate how long it would take to verify and
> > > >> whitelist the entire codebase Oracle plans on donating?
> > > >> 
> > > >> It's unclear to me how history would be preserved with an incremental
> > > >> approach.
> > > >> 
> > > >> I would prefer we migrate the whole thing in one piece with history
> > 
> > and
> > 
> > > >> all.
> > > >> 
> > > >> 
> > > >> --emi
> > > >> 
> > > >> On Thu, Nov 24, 2016 at 5:22 PM, Jaroslav Tulach <
> > > >> jaroslav.tulach@oracle.com
> > > >> 
> > > >>> wrote:
> > > >>> Emilian, Jan, Mark, great work.
> > > >>> 
> > > >>> Smooth migration from Hg to Git is essential for successful
> > 
> > migration to
> > 
> > > >>> Apache. Thanks a lot for investigating how to do that.
> > > >>> 
> > > >>> My plan (as described in another email) is to prepare the code
> > 
> > donation
> > 
> > > >>> in
> > > >>> Hg
> > > >>> and update it incrementally with code integrated into Hg.
> > > >>> 
> > > >>> Are your conversions methods ready for incremental updates or do
> > > >>> they
> > > >>> only
> > > >>> work as a one-time batch conversion?
> > > >>> 
> > > >>> -jt
> > > >>> 
> > > >>> On čtvrtek 24. listopadu 2016 10:41:50 CET Jan Lahoda wrote:
> > > >>>> Interesting. I tried "git gc --aggressive" on the Mark's converted
> > > >>>> repository, and the result is:
> > > >>>> netbeans-import/.git$ du -hs .
> > > >>>> 792M    .
> > > >>>> 
> > > >>>> The original was:
> > > >>>> netbeans-import.git $ du -hs .
> > > >>>> 3,5G    .
> > > >>>> 
> > > >>>> (IIRC Mark was converting http://hg.netbeans.org/main, not
> > 
> > releases, so
> > 
> > > >>> the
> > > >>> 
> > > >>>> repository is a little bit smaller than the releases one.)
> > > >>>> 
> > > >>>> I tried:
> > > >>>> $ git log -p | sha1sum
> > > >>>> 
> > > >>>> on both repositories, and the hashes appear to be the same. I also
> > > >>>> tried
> > > >>> 
> > > >>> to
> > > >>> 
> > > >>>> clone the gc-ed repository using git clone --bare --no-local, and
> > 
> > the
> > 
> > > >>>> resulting repository is still about the same size. So, this seems
> > 
> > good
> > 
> > > >>>> to
> > > >>>> me, unless there is some downside I don't know about.
> > > >>>> 
> > > >>>> Jan
> > > >>>> 
> > > >>>> 
> > > >>>> On Wed, Nov 23, 2016 at 8:26 PM, Emilian Bold <
> > 
> > emilian.bold@gmail.com>
> > 
> > > >>>> wrote:
> > > >>>>> Actually I don't believe the data loss is that large. (There may
> > 
> > also
> > 
> > > >>>> be
> > > >>>> 
> > > >>>> mercurial commits that are intentionally ignored by the conversion
> > > >>>> 
> > > >>>> script,
> > > >>>> 
> > > >>>> like commits that only add tags?)
> > > >>>> 
> > > >>>>> hg log | grep '^changeset:' | wc -l
> > > >>>>> 
> > > >>>>>    313209
> > > >>>>> 
> > > >>>>> git log | grep '^commit ' | wc -l
> > > >>>>> 
> > > >>>>>    301478
> > > >>>>> 
> > > >>>>> So there is a difference of 11731 commits (about 4%) but those
> > > >>>>> couldn't
> > > >>>>> have such a large impact on repository size.
> > > >>>>> 
> > > >>>>> I hope somebody else is willing to work with me on this so we
> > 
> > document
> > 
> > > >>>>> everything and do a reproducible repository conversion.
> > > >>>>> 
> > > >>>>> 
> > > >>>>> 
> > > >>>>> --emi
> > > >>>>> 
> > > >>>>> On Wed, Nov 23, 2016 at 9:10 PM, Emilian Bold <
> > 
> > emilian.bold@gmail.com>
> > 
> > > >>>>> wrote:
> > > >>>>>> Well, I dunno what black magic `gc --aggressive` does but the
> > > >>>>> 
> > > >>>>> repository
> > > >>>> 
> > > >>>> is 0.85GB now!
> > > >>>> 
> > > >>>>>> I also ran `git reflog expire` first but it didn't change the
> > 
> > size at
> > 
> > > >>>>> all.
> > > >>>>> 
> > > >>>>> One thing to keep in mind is that I used --force although I had 6
> > > >>>>> 
> > > >>>>>> commits
> > > >>>>>> with the warning "repository has at least one unnamed head".
> > > >>>>>> Which
> > > >>>>> 
> > > >>>>> were
> > > >>>> 
> > > >>>> probably all close branch commits (hg commit --close-branch).
> > > >>>> 
> > > >>>>>> So I might have have data loss(!) since I believe I read
> > > >>>>> 
> > > >>>>> hg-fast-export.sh
> > > >>>>> 
> > > >>>>> picks only one unnamed head as the migration winner. I wonder if
> > 
> > the
> > 
> > > >>>>> gc
> > > >>>> 
> > > >>>> command didn't just purge a lot of valid commits from such an
> > 
> > unnamed
> > 
> > > >>>>> head
> > > >>>>> 
> > > >>>>> and that's why the repository became so small.
> > > >>>>> 
> > > >>>>>> Could somebody else try a test repository conversion and validate
> > 
> > my
> > 
> > > >>>>>> numbers?
> > > >>>>>> 
> > > >>>>>> git gc --aggressive --prune=now
> > > >>>>>> Counting objects: 4085031, done.
> > > >>>>>> Delta compression using up to 8 threads.
> > > >>>>>> Compressing objects: 100% (2909203/2909203), done.
> > > >>>>>> Writing objects: 100% (4085031/4085031), done.
> > > >>>>>> Total 4085031 (delta 2150468), reused 1585934 (delta 0)
> > > >>>>>> Checking connectivity: 4085031, done.
> > > >>>>>> 
> > > >>>>>> 
> > > >>>>>> 
> > > >>>>>> --emi
> > > >>>>>> 
> > > >>>>>> On Wed, Nov 23, 2016 at 7:59 PM, Paul Merlin <
> > 
> > paulmerlin@apache.org>
> > 
> > > >>>>>> wrote:
> > > >>>>>>> Hi Emilian,
> > > >>>>>>> 
> > > >>>>>>> I see hg-fast-export.sh finished at some point.
> > > >>>>>>> 
> > > >>>>>>>> As expected though, git does not have any of the disk space
> > 
> > gains.
> > 
> > > >>>>>>>> The
> > > >>>>>>>> converted git releases/ repository is 3.6GB.
> > > >>>>>>> 
> > > >>>>>>> Just a thought.
> > > >>>>>>> Did you try some git cleanups after the conversion?
> > > >>>>>>> 
> > > >>>>>>> git reflog expire --expire=now --all
> > > >>>>>>> git gc --aggressive --prune=now
> > > >>>>>>> 
> > > >>>>>>> Cheers
> > > >>>>>>> 
> > > >>>>>>> In case these statistics mean something:
> > > >>>>>>>> git-fast-import statistics:
> > > >>>>>>>> ------------------------------------------------------------
> > > >>>>>>> 
> > > >>>>>>> ---------
> > > >>>> 
> > > >>>> Alloc'd objects:    4090000
> > > >>>> 
> > > >>>>>>>> Total objects:      4085509 (  40220100 duplicates
> > > >>>>>>>> 
> > > >>>>>>>    )
> > > >>>>>>>    
> > > >>>>        blobs  :      1036365 (  28386238 duplicates     858087
> > > >>>>>>> 
> > > >>>>>>> deltas
> > > >>>> 
> > > >>>> of
> > > >>>> 
> > > >>>>> 969684 attempts)
> > > >>>>> 
> > > >>>>>>>>        trees  :      2735935 (  11833862 duplicates    1370606
> > > >>>>>>> 
> > > >>>>>>> deltas
> > > >>>> 
> > > >>>> of
> > > >>>> 
> > > >>>>>   2613480 attempts)
> > > >>>>>   
> > > >>>>>>>>        commits:       313209 (         0 duplicates          0
> > > >>>>>>> 
> > > >>>>>>> deltas
> > > >>>> 
> > > >>>> of
> > > >>>> 
> > > >>>>>       0 attempts)
> > > >>>>>       
> > > >>>>>>>>        tags   :            0 (         0 duplicates          0
> > > >>>>>>> 
> > > >>>>>>> deltas
> > > >>>> 
> > > >>>> of
> > > >>>> 
> > > >>>>>       0 attempts)
> > > >>>>>>>> 
> > > >>>>>>>> Total branches:        1283 (       346 loads     )
> > > >>>>>>>> 
> > > >>>>>>>>        marks:        1048576 (    313209 unique    )
> > > >>>>>>>>        atoms:         124011
> > > >>>>>>>> 
> > > >>>>>>>> Memory total:        218429 KiB
> > > >>>>>>>> 
> > > >>>>>>>>         pools:         26711 KiB
> > > >>>>>>>>       
> > > >>>>>>>>       objects:        191718 KiB
> > > >>>>>>>> 
> > > >>>>>>>> ------------------------------------------------------------
> > > >>>>>>> 
> > > >>>>>>> ---------
> > > >>>> 
> > > >>>> pack_report: getpagesize()            =       4096
> > > >>>> 
> > > >>>>>>>> pack_report: core.packedGitWindowSize = 1073741824
> > > >>>>>>>> pack_report: core.packedGitLimit      = 8589934592
> > > >>>>>>>> pack_report: pack_used_ctr            =   39000045
> > > >>>>>>>> pack_report: pack_mmap_calls          =     733040
> > > >>>>>>>> pack_report: pack_open_windows        =          4 /          7
> > > >>>>>>>> pack_report: pack_mapped              = 4280730006 / 6950823920
> > > >>>>>>>> ------------------------------------------------------------
> > > >>>>>>> 
> > > >>>>>>> ---------
> > > >>>>>>> 
> > > >>>>>>>> --emi
> > > >>>>>>>> 
> > > >>>>>>>> On Fri, Nov 18, 2016 at 1:32 PM, Emilian Bold <
> > > >>>>>>> 
> > > >>>>>>> emilian.bold@gmail.com
> > > >>>> 
> > > >>>> wrote:
> > > >>>>>>>>> A releases/ clone which on my system takes 3.8GB is reduced to
> > > >>>>>>>> 
> > > >>>>>>>> 1.6GB
> > > >>>> 
> > > >>>> with
> > > >>>> 
> > > >>>>>>> the generaldelta and aggressivemergedeltas flags (took about 14
> > > >>>>>>> 
> > > >>>>>>>> hours).
> > > >>>>> 
> > > >>>>> Pretty impressive!
> > > >>>>> 
> > > >>>>>>>>> Converting to git with hg-fast-export.sh complains that
> > > >>>>>>>> 
> > > >>>>>>>> "repository
> > > >>>> 
> > > >>>> has at
> > > >>>> 
> > > >>>>>>> least one unnamed head" for about 6 revisions. With --force I'm
> > > >>>>>>> 
> > > >>>>>>>> able
> > > >>>> 
> > > >>>> to
> > > >>>> 
> > > >>>>> start the conversion but it hasn't finished yet.
> > > >>>>> 
> > > >>>>>>>>> The git conversion is about 35% done and already using 1.3GB.
> > > >>>>>>>>> 
> > > >>>>>>>>> So... I assume it's going to need just like the original
> > > >>>>>>>> 
> > > >>>>>>>> repository
> > > >>>> 
> > > >>>> about
> > > >>>> 
> > > >>>>>>> 3.8GB.
> > > >>>>>>> 
> > > >>>>>>>>> I wonder if git has similar space-saving tricks?
> > > >>>>>>>>> 
> > > >>>>>>>>> 
> > > >>>>>>>>> 
> > > >>>>>>>>> --emi
> > > >>>>>>>>> 
> > > >>>>>>>>> On Thu, Nov 17, 2016 at 8:46 AM, Emilian Bold <
> > > >>>>>>>> 
> > > >>>>>>>> emilian.bold@gmail.com>
> > > >>>>> 
> > > >>>>> wrote:
> > > >>>>>>>>>> Forgot about this. I've just started the Mercurial repository
> > > >>>>>>>>> 
> > > >>>>>>>>> conversion
> > > >>>>>>> 
> > > >>>>>>> which will take a few hours.
> > > >>>>>>> 
> > > >>>>>>>>>> Will report tomorrow or when it's done.
> > > >>>>>>>>>> 
> > > >>>>>>>>>> 
> > > >>>>>>>>>> --emi
> > > >>>>>>>>>> 
> > > >>>>>>>>>> On Wed, Nov 16, 2016 at 11:18 PM, cowwoc <
> > > >>>>>>>>> 
> > > >>>>>>>>> cowwoc@bbs.darktech.org>
> > > >>>> 
> > > >>>> wrote:
> > > >>>>>>>> Hi Emilian,
> > > >>>>>>>> 
> > > >>>>>>>>>>> Any update on this?
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> Thanks,
> > > >>>>>>>>>>> Gili
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> On 2016-11-11 01:33 (-0500), Emilian Bold <e....@gmail.com>
> > > >>>>>>>>>> 
> > > >>>>>>>>>> wrote:
> > > >>>> Thank you for following through with this after we talked on
> > > >>>> 
> > > >>>>>>>>>>>> IRC.>
> > > >>>>>>>>>>>> 
> > > >>>>>>>>>>>> I will check later the size reduction for the releases/
> > 
> > repo.>



Re: Switching to Git

Posted by Emilian Bold <em...@gmail.com>.
You guys weren't very clear which repository are reviewing for the donation.

Initially I converted releases/ but then I released I have some commits of
mine in there so I couldn't push it to github.

So I did a fresh conversion for main-silver/ which I assumed is the
repository Oracle will donate.

I will re-do releases/ but it will take a bit.



--emi

On Tue, Dec 20, 2016 at 3:31 PM, Jaroslav Tulach <jaroslav.tulach@oracle.com
> wrote:

> On pátek 9. prosince 2016 19:05:48 CET Emilian Bold wrote:
> > Martin, I have just pushed https://github.com/emilianbold/main-silver
> You
> > may experiment with that.
>
> Hello Emilian,
> I managed to fork & use your repository and everything seems great. I have
> a
> functional job that executes
>
> $ ant build-platform
>
> for each pull request. I plan to add a call to "ant test-platform" once it
> is
> stable enough[1]. Great work! I believe we shall use your Git repository
> as a
> base (somehow) when donating code to Apache once my Oracle peers finish
> review
> of the code to donate.
>
> > To https://github.com/emilianbold/main-silver.git
> >  * [new branch]      master -> master
> > Branch master set up to track remote branch master from origin.
>
> Could you synchronize https://hg.netbeans.org/releases/ instead? It
> contains
> history of all the NetBeans releases (in branches like release82, etc.)
> and it
> is the repository that is currently under the review. Btw. The releases
> repository contains everything that is available in the main-silver - just
> more.
>
> It would be fantastic, if you could create a complete mirror of the
> releases
> repository. Thanks again for your great work!
>
> -jt
>
> [1] http://deadlock.netbeans.org/job/prototypes-MavenDownload269264/ -
> still
> 12 test failures remaining
>
> > git push -u origin master
> > Counting objects: 3951610, done.
> > Delta compression using up to 8 threads.
> > Compressing objects: 100% (732965/732965), done.
> > Writing objects: 100% (3951610/3951610), 674.94 MiB | 717.00 KiB/s, done.
> > Total 3951610 (delta 2068729), reused 3951610 (delta 2068729)
> > remote: Resolving deltas: 100% (2068729/2068729), done.
> > remote: Checking connectivity: 3951610, done.
> > remote: warning: GH001: Large files detected. You may want to try Git
> Large
> > File Storage - https://git-lfs.github.com.
> > remote: warning: See http://git.io/iEPt8g for more information.
> > remote: warning: File dlight.util/test/manual/DLight_Simple_Tests/core
> is
> > 51.88 MB; this is larger than GitHub's recommended maximum file size of
> > 50.00 MB
> > To https://github.com/emilianbold/main-silver.git
> >  * [new branch]      master -> master
> > Branch master set up to track remote branch master from origin.
> >
> >
> > --emi
> >
> > On Wed, Dec 7, 2016 at 4:42 PM, Martin Balin <Ma...@oracle.com>
> >
> > wrote:
> > > Hello Emilian,
> > > I'm working at Oracle on NetBeans development and we would like to
> start
> > > fixing build scripts to use Git instead of HG.
> > > This could be done earlier on your Git repo if you agree to as it will
> > > take time. Does not need to wait for final official donation of
> sources.
> > > Can you please send me the URL,...
> > > Thank you Martin Balin
> > >
> > > On 24.11.2016 20:07, Emilian Bold wrote:
> > >> At under 1GB the repository size is not an issue anymore.
> > >>
> > >> It's sad to see we will still have migration problems due to legal
> > >> considerations.
> > >>
> > >> Could you provide an estimate how long it would take to verify and
> > >> whitelist the entire codebase Oracle plans on donating?
> > >>
> > >> It's unclear to me how history would be preserved with an incremental
> > >> approach.
> > >>
> > >> I would prefer we migrate the whole thing in one piece with history
> and
> > >> all.
> > >>
> > >>
> > >> --emi
> > >>
> > >> On Thu, Nov 24, 2016 at 5:22 PM, Jaroslav Tulach <
> > >> jaroslav.tulach@oracle.com
> > >>
> > >>> wrote:
> > >>> Emilian, Jan, Mark, great work.
> > >>>
> > >>> Smooth migration from Hg to Git is essential for successful
> migration to
> > >>> Apache. Thanks a lot for investigating how to do that.
> > >>>
> > >>> My plan (as described in another email) is to prepare the code
> donation
> > >>> in
> > >>> Hg
> > >>> and update it incrementally with code integrated into Hg.
> > >>>
> > >>> Are your conversions methods ready for incremental updates or do they
> > >>> only
> > >>> work as a one-time batch conversion?
> > >>>
> > >>> -jt
> > >>>
> > >>> On čtvrtek 24. listopadu 2016 10:41:50 CET Jan Lahoda wrote:
> > >>>> Interesting. I tried "git gc --aggressive" on the Mark's converted
> > >>>> repository, and the result is:
> > >>>> netbeans-import/.git$ du -hs .
> > >>>> 792M    .
> > >>>>
> > >>>> The original was:
> > >>>> netbeans-import.git $ du -hs .
> > >>>> 3,5G    .
> > >>>>
> > >>>> (IIRC Mark was converting http://hg.netbeans.org/main, not
> releases, so
> > >>>
> > >>> the
> > >>>
> > >>>> repository is a little bit smaller than the releases one.)
> > >>>>
> > >>>> I tried:
> > >>>> $ git log -p | sha1sum
> > >>>>
> > >>>> on both repositories, and the hashes appear to be the same. I also
> > >>>> tried
> > >>>
> > >>> to
> > >>>
> > >>>> clone the gc-ed repository using git clone --bare --no-local, and
> the
> > >>>> resulting repository is still about the same size. So, this seems
> good
> > >>>> to
> > >>>> me, unless there is some downside I don't know about.
> > >>>>
> > >>>> Jan
> > >>>>
> > >>>>
> > >>>> On Wed, Nov 23, 2016 at 8:26 PM, Emilian Bold <
> emilian.bold@gmail.com>
> > >>>>
> > >>>> wrote:
> > >>>>> Actually I don't believe the data loss is that large. (There may
> also
> > >>>>
> > >>>> be
> > >>>>
> > >>>> mercurial commits that are intentionally ignored by the conversion
> > >>>>
> > >>>> script,
> > >>>>
> > >>>> like commits that only add tags?)
> > >>>>
> > >>>>> hg log | grep '^changeset:' | wc -l
> > >>>>>
> > >>>>>    313209
> > >>>>>
> > >>>>> git log | grep '^commit ' | wc -l
> > >>>>>
> > >>>>>    301478
> > >>>>>
> > >>>>> So there is a difference of 11731 commits (about 4%) but those
> > >>>>> couldn't
> > >>>>> have such a large impact on repository size.
> > >>>>>
> > >>>>> I hope somebody else is willing to work with me on this so we
> document
> > >>>>> everything and do a reproducible repository conversion.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --emi
> > >>>>>
> > >>>>> On Wed, Nov 23, 2016 at 9:10 PM, Emilian Bold <
> emilian.bold@gmail.com>
> > >>>>>
> > >>>>> wrote:
> > >>>>>> Well, I dunno what black magic `gc --aggressive` does but the
> > >>>>>
> > >>>>> repository
> > >>>>
> > >>>> is 0.85GB now!
> > >>>>
> > >>>>>> I also ran `git reflog expire` first but it didn't change the
> size at
> > >>>>>
> > >>>>> all.
> > >>>>>
> > >>>>> One thing to keep in mind is that I used --force although I had 6
> > >>>>>
> > >>>>>> commits
> > >>>>>> with the warning "repository has at least one unnamed head". Which
> > >>>>>
> > >>>>> were
> > >>>>
> > >>>> probably all close branch commits (hg commit --close-branch).
> > >>>>
> > >>>>>> So I might have have data loss(!) since I believe I read
> > >>>>>
> > >>>>> hg-fast-export.sh
> > >>>>>
> > >>>>> picks only one unnamed head as the migration winner. I wonder if
> the
> > >>>>>
> > >>>>> gc
> > >>>>
> > >>>> command didn't just purge a lot of valid commits from such an
> unnamed
> > >>>>
> > >>>>> head
> > >>>>>
> > >>>>> and that's why the repository became so small.
> > >>>>>
> > >>>>>> Could somebody else try a test repository conversion and validate
> my
> > >>>>>> numbers?
> > >>>>>>
> > >>>>>> git gc --aggressive --prune=now
> > >>>>>> Counting objects: 4085031, done.
> > >>>>>> Delta compression using up to 8 threads.
> > >>>>>> Compressing objects: 100% (2909203/2909203), done.
> > >>>>>> Writing objects: 100% (4085031/4085031), done.
> > >>>>>> Total 4085031 (delta 2150468), reused 1585934 (delta 0)
> > >>>>>> Checking connectivity: 4085031, done.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> --emi
> > >>>>>>
> > >>>>>> On Wed, Nov 23, 2016 at 7:59 PM, Paul Merlin <
> paulmerlin@apache.org>
> > >>>>>>
> > >>>>>> wrote:
> > >>>>>>> Hi Emilian,
> > >>>>>>>
> > >>>>>>> I see hg-fast-export.sh finished at some point.
> > >>>>>>>
> > >>>>>>>> As expected though, git does not have any of the disk space
> gains.
> > >>>>>>>> The
> > >>>>>>>> converted git releases/ repository is 3.6GB.
> > >>>>>>>
> > >>>>>>> Just a thought.
> > >>>>>>> Did you try some git cleanups after the conversion?
> > >>>>>>>
> > >>>>>>> git reflog expire --expire=now --all
> > >>>>>>> git gc --aggressive --prune=now
> > >>>>>>>
> > >>>>>>> Cheers
> > >>>>>>>
> > >>>>>>> In case these statistics mean something:
> > >>>>>>>> git-fast-import statistics:
> > >>>>>>>> ------------------------------------------------------------
> > >>>>>>>
> > >>>>>>> ---------
> > >>>>
> > >>>> Alloc'd objects:    4090000
> > >>>>
> > >>>>>>>> Total objects:      4085509 (  40220100 duplicates
> > >>>>>>>>
> > >>>>>>>    )
> > >>>>>>>
> > >>>>        blobs  :      1036365 (  28386238 duplicates     858087
> > >>>>>>>
> > >>>>>>> deltas
> > >>>>
> > >>>> of
> > >>>>
> > >>>>> 969684 attempts)
> > >>>>>
> > >>>>>>>>        trees  :      2735935 (  11833862 duplicates    1370606
> > >>>>>>>
> > >>>>>>> deltas
> > >>>>
> > >>>> of
> > >>>>
> > >>>>>   2613480 attempts)
> > >>>>>
> > >>>>>>>>        commits:       313209 (         0 duplicates          0
> > >>>>>>>
> > >>>>>>> deltas
> > >>>>
> > >>>> of
> > >>>>
> > >>>>>       0 attempts)
> > >>>>>
> > >>>>>>>>        tags   :            0 (         0 duplicates          0
> > >>>>>>>
> > >>>>>>> deltas
> > >>>>
> > >>>> of
> > >>>>
> > >>>>>       0 attempts)
> > >>>>>>>>
> > >>>>>>>> Total branches:        1283 (       346 loads     )
> > >>>>>>>>
> > >>>>>>>>        marks:        1048576 (    313209 unique    )
> > >>>>>>>>        atoms:         124011
> > >>>>>>>>
> > >>>>>>>> Memory total:        218429 KiB
> > >>>>>>>>
> > >>>>>>>>         pools:         26711 KiB
> > >>>>>>>>
> > >>>>>>>>       objects:        191718 KiB
> > >>>>>>>>
> > >>>>>>>> ------------------------------------------------------------
> > >>>>>>>
> > >>>>>>> ---------
> > >>>>
> > >>>> pack_report: getpagesize()            =       4096
> > >>>>
> > >>>>>>>> pack_report: core.packedGitWindowSize = 1073741824
> > >>>>>>>> pack_report: core.packedGitLimit      = 8589934592
> > >>>>>>>> pack_report: pack_used_ctr            =   39000045
> > >>>>>>>> pack_report: pack_mmap_calls          =     733040
> > >>>>>>>> pack_report: pack_open_windows        =          4 /          7
> > >>>>>>>> pack_report: pack_mapped              = 4280730006 / 6950823920
> > >>>>>>>> ------------------------------------------------------------
> > >>>>>>>
> > >>>>>>> ---------
> > >>>>>>>
> > >>>>>>>> --emi
> > >>>>>>>>
> > >>>>>>>> On Fri, Nov 18, 2016 at 1:32 PM, Emilian Bold <
> > >>>>>>>
> > >>>>>>> emilian.bold@gmail.com
> > >>>>
> > >>>> wrote:
> > >>>>>>>>> A releases/ clone which on my system takes 3.8GB is reduced to
> > >>>>>>>>
> > >>>>>>>> 1.6GB
> > >>>>
> > >>>> with
> > >>>>
> > >>>>>>> the generaldelta and aggressivemergedeltas flags (took about 14
> > >>>>>>>
> > >>>>>>>> hours).
> > >>>>>
> > >>>>> Pretty impressive!
> > >>>>>
> > >>>>>>>>> Converting to git with hg-fast-export.sh complains that
> > >>>>>>>>
> > >>>>>>>> "repository
> > >>>>
> > >>>> has at
> > >>>>
> > >>>>>>> least one unnamed head" for about 6 revisions. With --force I'm
> > >>>>>>>
> > >>>>>>>> able
> > >>>>
> > >>>> to
> > >>>>
> > >>>>> start the conversion but it hasn't finished yet.
> > >>>>>
> > >>>>>>>>> The git conversion is about 35% done and already using 1.3GB.
> > >>>>>>>>>
> > >>>>>>>>> So... I assume it's going to need just like the original
> > >>>>>>>>
> > >>>>>>>> repository
> > >>>>
> > >>>> about
> > >>>>
> > >>>>>>> 3.8GB.
> > >>>>>>>
> > >>>>>>>>> I wonder if git has similar space-saving tricks?
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> --emi
> > >>>>>>>>>
> > >>>>>>>>> On Thu, Nov 17, 2016 at 8:46 AM, Emilian Bold <
> > >>>>>>>>
> > >>>>>>>> emilian.bold@gmail.com>
> > >>>>>
> > >>>>> wrote:
> > >>>>>>>>>> Forgot about this. I've just started the Mercurial repository
> > >>>>>>>>>
> > >>>>>>>>> conversion
> > >>>>>>>
> > >>>>>>> which will take a few hours.
> > >>>>>>>
> > >>>>>>>>>> Will report tomorrow or when it's done.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> --emi
> > >>>>>>>>>>
> > >>>>>>>>>> On Wed, Nov 16, 2016 at 11:18 PM, cowwoc <
> > >>>>>>>>>
> > >>>>>>>>> cowwoc@bbs.darktech.org>
> > >>>>
> > >>>> wrote:
> > >>>>>>>> Hi Emilian,
> > >>>>>>>>
> > >>>>>>>>>>> Any update on this?
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thanks,
> > >>>>>>>>>>> Gili
> > >>>>>>>>>>>
> > >>>>>>>>>>> On 2016-11-11 01:33 (-0500), Emilian Bold <e....@gmail.com>
> > >>>>>>>>>>
> > >>>>>>>>>> wrote:
> > >>>> Thank you for following through with this after we talked on
> > >>>>
> > >>>>>>>>>>>> IRC.>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I will check later the size reduction for the releases/
> repo.>
>
>
>

Re: Switching to Git

Posted by Jaroslav Tulach <ja...@oracle.com>.
On pátek 9. prosince 2016 19:05:48 CET Emilian Bold wrote:
> Martin, I have just pushed https://github.com/emilianbold/main-silver You
> may experiment with that.

Hello Emilian,
I managed to fork & use your repository and everything seems great. I have a 
functional job that executes 

$ ant build-platform

for each pull request. I plan to add a call to "ant test-platform" once it is 
stable enough[1]. Great work! I believe we shall use your Git repository as a 
base (somehow) when donating code to Apache once my Oracle peers finish review 
of the code to donate.

> To https://github.com/emilianbold/main-silver.git
>  * [new branch]      master -> master
> Branch master set up to track remote branch master from origin.

Could you synchronize https://hg.netbeans.org/releases/ instead? It contains 
history of all the NetBeans releases (in branches like release82, etc.) and it 
is the repository that is currently under the review. Btw. The releases 
repository contains everything that is available in the main-silver - just 
more.

It would be fantastic, if you could create a complete mirror of the releases 
repository. Thanks again for your great work!

-jt
 
[1] http://deadlock.netbeans.org/job/prototypes-MavenDownload269264/ - still 
12 test failures remaining

> git push -u origin master
> Counting objects: 3951610, done.
> Delta compression using up to 8 threads.
> Compressing objects: 100% (732965/732965), done.
> Writing objects: 100% (3951610/3951610), 674.94 MiB | 717.00 KiB/s, done.
> Total 3951610 (delta 2068729), reused 3951610 (delta 2068729)
> remote: Resolving deltas: 100% (2068729/2068729), done.
> remote: Checking connectivity: 3951610, done.
> remote: warning: GH001: Large files detected. You may want to try Git Large
> File Storage - https://git-lfs.github.com.
> remote: warning: See http://git.io/iEPt8g for more information.
> remote: warning: File dlight.util/test/manual/DLight_Simple_Tests/core is
> 51.88 MB; this is larger than GitHub's recommended maximum file size of
> 50.00 MB
> To https://github.com/emilianbold/main-silver.git
>  * [new branch]      master -> master
> Branch master set up to track remote branch master from origin.
> 
> 
> --emi
> 
> On Wed, Dec 7, 2016 at 4:42 PM, Martin Balin <Ma...@oracle.com>
> 
> wrote:
> > Hello Emilian,
> > I'm working at Oracle on NetBeans development and we would like to start
> > fixing build scripts to use Git instead of HG.
> > This could be done earlier on your Git repo if you agree to as it will
> > take time. Does not need to wait for final official donation of sources.
> > Can you please send me the URL,...
> > Thank you Martin Balin
> > 
> > On 24.11.2016 20:07, Emilian Bold wrote:
> >> At under 1GB the repository size is not an issue anymore.
> >> 
> >> It's sad to see we will still have migration problems due to legal
> >> considerations.
> >> 
> >> Could you provide an estimate how long it would take to verify and
> >> whitelist the entire codebase Oracle plans on donating?
> >> 
> >> It's unclear to me how history would be preserved with an incremental
> >> approach.
> >> 
> >> I would prefer we migrate the whole thing in one piece with history and
> >> all.
> >> 
> >> 
> >> --emi
> >> 
> >> On Thu, Nov 24, 2016 at 5:22 PM, Jaroslav Tulach <
> >> jaroslav.tulach@oracle.com
> >> 
> >>> wrote:
> >>> Emilian, Jan, Mark, great work.
> >>> 
> >>> Smooth migration from Hg to Git is essential for successful migration to
> >>> Apache. Thanks a lot for investigating how to do that.
> >>> 
> >>> My plan (as described in another email) is to prepare the code donation
> >>> in
> >>> Hg
> >>> and update it incrementally with code integrated into Hg.
> >>> 
> >>> Are your conversions methods ready for incremental updates or do they
> >>> only
> >>> work as a one-time batch conversion?
> >>> 
> >>> -jt
> >>> 
> >>> On čtvrtek 24. listopadu 2016 10:41:50 CET Jan Lahoda wrote:
> >>>> Interesting. I tried "git gc --aggressive" on the Mark's converted
> >>>> repository, and the result is:
> >>>> netbeans-import/.git$ du -hs .
> >>>> 792M    .
> >>>> 
> >>>> The original was:
> >>>> netbeans-import.git $ du -hs .
> >>>> 3,5G    .
> >>>> 
> >>>> (IIRC Mark was converting http://hg.netbeans.org/main, not releases, so
> >>> 
> >>> the
> >>> 
> >>>> repository is a little bit smaller than the releases one.)
> >>>> 
> >>>> I tried:
> >>>> $ git log -p | sha1sum
> >>>> 
> >>>> on both repositories, and the hashes appear to be the same. I also
> >>>> tried
> >>> 
> >>> to
> >>> 
> >>>> clone the gc-ed repository using git clone --bare --no-local, and the
> >>>> resulting repository is still about the same size. So, this seems good
> >>>> to
> >>>> me, unless there is some downside I don't know about.
> >>>> 
> >>>> Jan
> >>>> 
> >>>> 
> >>>> On Wed, Nov 23, 2016 at 8:26 PM, Emilian Bold <em...@gmail.com>
> >>>> 
> >>>> wrote:
> >>>>> Actually I don't believe the data loss is that large. (There may also
> >>>> 
> >>>> be
> >>>> 
> >>>> mercurial commits that are intentionally ignored by the conversion
> >>>> 
> >>>> script,
> >>>> 
> >>>> like commits that only add tags?)
> >>>> 
> >>>>> hg log | grep '^changeset:' | wc -l
> >>>>> 
> >>>>>    313209
> >>>>> 
> >>>>> git log | grep '^commit ' | wc -l
> >>>>> 
> >>>>>    301478
> >>>>> 
> >>>>> So there is a difference of 11731 commits (about 4%) but those
> >>>>> couldn't
> >>>>> have such a large impact on repository size.
> >>>>> 
> >>>>> I hope somebody else is willing to work with me on this so we document
> >>>>> everything and do a reproducible repository conversion.
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> --emi
> >>>>> 
> >>>>> On Wed, Nov 23, 2016 at 9:10 PM, Emilian Bold <em...@gmail.com>
> >>>>> 
> >>>>> wrote:
> >>>>>> Well, I dunno what black magic `gc --aggressive` does but the
> >>>>> 
> >>>>> repository
> >>>> 
> >>>> is 0.85GB now!
> >>>> 
> >>>>>> I also ran `git reflog expire` first but it didn't change the size at
> >>>>> 
> >>>>> all.
> >>>>> 
> >>>>> One thing to keep in mind is that I used --force although I had 6
> >>>>> 
> >>>>>> commits
> >>>>>> with the warning "repository has at least one unnamed head". Which
> >>>>> 
> >>>>> were
> >>>> 
> >>>> probably all close branch commits (hg commit --close-branch).
> >>>> 
> >>>>>> So I might have have data loss(!) since I believe I read
> >>>>> 
> >>>>> hg-fast-export.sh
> >>>>> 
> >>>>> picks only one unnamed head as the migration winner. I wonder if the
> >>>>> 
> >>>>> gc
> >>>> 
> >>>> command didn't just purge a lot of valid commits from such an unnamed
> >>>> 
> >>>>> head
> >>>>> 
> >>>>> and that's why the repository became so small.
> >>>>> 
> >>>>>> Could somebody else try a test repository conversion and validate my
> >>>>>> numbers?
> >>>>>> 
> >>>>>> git gc --aggressive --prune=now
> >>>>>> Counting objects: 4085031, done.
> >>>>>> Delta compression using up to 8 threads.
> >>>>>> Compressing objects: 100% (2909203/2909203), done.
> >>>>>> Writing objects: 100% (4085031/4085031), done.
> >>>>>> Total 4085031 (delta 2150468), reused 1585934 (delta 0)
> >>>>>> Checking connectivity: 4085031, done.
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> --emi
> >>>>>> 
> >>>>>> On Wed, Nov 23, 2016 at 7:59 PM, Paul Merlin <pa...@apache.org>
> >>>>>> 
> >>>>>> wrote:
> >>>>>>> Hi Emilian,
> >>>>>>> 
> >>>>>>> I see hg-fast-export.sh finished at some point.
> >>>>>>> 
> >>>>>>>> As expected though, git does not have any of the disk space gains.
> >>>>>>>> The
> >>>>>>>> converted git releases/ repository is 3.6GB.
> >>>>>>> 
> >>>>>>> Just a thought.
> >>>>>>> Did you try some git cleanups after the conversion?
> >>>>>>> 
> >>>>>>> git reflog expire --expire=now --all
> >>>>>>> git gc --aggressive --prune=now
> >>>>>>> 
> >>>>>>> Cheers
> >>>>>>> 
> >>>>>>> In case these statistics mean something:
> >>>>>>>> git-fast-import statistics:
> >>>>>>>> ------------------------------------------------------------
> >>>>>>> 
> >>>>>>> ---------
> >>>> 
> >>>> Alloc'd objects:    4090000
> >>>> 
> >>>>>>>> Total objects:      4085509 (  40220100 duplicates
> >>>>>>>> 
> >>>>>>>    )
> >>>>>>>    
> >>>>        blobs  :      1036365 (  28386238 duplicates     858087
> >>>>>>> 
> >>>>>>> deltas
> >>>> 
> >>>> of
> >>>> 
> >>>>> 969684 attempts)
> >>>>> 
> >>>>>>>>        trees  :      2735935 (  11833862 duplicates    1370606
> >>>>>>> 
> >>>>>>> deltas
> >>>> 
> >>>> of
> >>>> 
> >>>>>   2613480 attempts)
> >>>>>   
> >>>>>>>>        commits:       313209 (         0 duplicates          0
> >>>>>>> 
> >>>>>>> deltas
> >>>> 
> >>>> of
> >>>> 
> >>>>>       0 attempts)
> >>>>>       
> >>>>>>>>        tags   :            0 (         0 duplicates          0
> >>>>>>> 
> >>>>>>> deltas
> >>>> 
> >>>> of
> >>>> 
> >>>>>       0 attempts)
> >>>>>>>> 
> >>>>>>>> Total branches:        1283 (       346 loads     )
> >>>>>>>> 
> >>>>>>>>        marks:        1048576 (    313209 unique    )
> >>>>>>>>        atoms:         124011
> >>>>>>>> 
> >>>>>>>> Memory total:        218429 KiB
> >>>>>>>> 
> >>>>>>>>         pools:         26711 KiB
> >>>>>>>>       
> >>>>>>>>       objects:        191718 KiB
> >>>>>>>> 
> >>>>>>>> ------------------------------------------------------------
> >>>>>>> 
> >>>>>>> ---------
> >>>> 
> >>>> pack_report: getpagesize()            =       4096
> >>>> 
> >>>>>>>> pack_report: core.packedGitWindowSize = 1073741824
> >>>>>>>> pack_report: core.packedGitLimit      = 8589934592
> >>>>>>>> pack_report: pack_used_ctr            =   39000045
> >>>>>>>> pack_report: pack_mmap_calls          =     733040
> >>>>>>>> pack_report: pack_open_windows        =          4 /          7
> >>>>>>>> pack_report: pack_mapped              = 4280730006 / 6950823920
> >>>>>>>> ------------------------------------------------------------
> >>>>>>> 
> >>>>>>> ---------
> >>>>>>> 
> >>>>>>>> --emi
> >>>>>>>> 
> >>>>>>>> On Fri, Nov 18, 2016 at 1:32 PM, Emilian Bold <
> >>>>>>> 
> >>>>>>> emilian.bold@gmail.com
> >>>> 
> >>>> wrote:
> >>>>>>>>> A releases/ clone which on my system takes 3.8GB is reduced to
> >>>>>>>> 
> >>>>>>>> 1.6GB
> >>>> 
> >>>> with
> >>>> 
> >>>>>>> the generaldelta and aggressivemergedeltas flags (took about 14
> >>>>>>> 
> >>>>>>>> hours).
> >>>>> 
> >>>>> Pretty impressive!
> >>>>> 
> >>>>>>>>> Converting to git with hg-fast-export.sh complains that
> >>>>>>>> 
> >>>>>>>> "repository
> >>>> 
> >>>> has at
> >>>> 
> >>>>>>> least one unnamed head" for about 6 revisions. With --force I'm
> >>>>>>> 
> >>>>>>>> able
> >>>> 
> >>>> to
> >>>> 
> >>>>> start the conversion but it hasn't finished yet.
> >>>>> 
> >>>>>>>>> The git conversion is about 35% done and already using 1.3GB.
> >>>>>>>>> 
> >>>>>>>>> So... I assume it's going to need just like the original
> >>>>>>>> 
> >>>>>>>> repository
> >>>> 
> >>>> about
> >>>> 
> >>>>>>> 3.8GB.
> >>>>>>> 
> >>>>>>>>> I wonder if git has similar space-saving tricks?
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> --emi
> >>>>>>>>> 
> >>>>>>>>> On Thu, Nov 17, 2016 at 8:46 AM, Emilian Bold <
> >>>>>>>> 
> >>>>>>>> emilian.bold@gmail.com>
> >>>>> 
> >>>>> wrote:
> >>>>>>>>>> Forgot about this. I've just started the Mercurial repository
> >>>>>>>>> 
> >>>>>>>>> conversion
> >>>>>>> 
> >>>>>>> which will take a few hours.
> >>>>>>> 
> >>>>>>>>>> Will report tomorrow or when it's done.
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>> --emi
> >>>>>>>>>> 
> >>>>>>>>>> On Wed, Nov 16, 2016 at 11:18 PM, cowwoc <
> >>>>>>>>> 
> >>>>>>>>> cowwoc@bbs.darktech.org>
> >>>> 
> >>>> wrote:
> >>>>>>>> Hi Emilian,
> >>>>>>>> 
> >>>>>>>>>>> Any update on this?
> >>>>>>>>>>> 
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Gili
> >>>>>>>>>>> 
> >>>>>>>>>>> On 2016-11-11 01:33 (-0500), Emilian Bold <e....@gmail.com>
> >>>>>>>>>> 
> >>>>>>>>>> wrote:
> >>>> Thank you for following through with this after we talked on
> >>>> 
> >>>>>>>>>>>> IRC.>
> >>>>>>>>>>>> 
> >>>>>>>>>>>> I will check later the size reduction for the releases/ repo.>



Re: Switching to Git was: Version control advice

Posted by Emilian Bold <em...@gmail.com>.
Martin, I have just pushed https://github.com/emilianbold/main-silver You
may experiment with that.

git push -u origin master
Counting objects: 3951610, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (732965/732965), done.
Writing objects: 100% (3951610/3951610), 674.94 MiB | 717.00 KiB/s, done.
Total 3951610 (delta 2068729), reused 3951610 (delta 2068729)
remote: Resolving deltas: 100% (2068729/2068729), done.
remote: Checking connectivity: 3951610, done.
remote: warning: GH001: Large files detected. You may want to try Git Large
File Storage - https://git-lfs.github.com.
remote: warning: See http://git.io/iEPt8g for more information.
remote: warning: File dlight.util/test/manual/DLight_Simple_Tests/core is
51.88 MB; this is larger than GitHub's recommended maximum file size of
50.00 MB
To https://github.com/emilianbold/main-silver.git
 * [new branch]      master -> master
Branch master set up to track remote branch master from origin.


--emi

On Wed, Dec 7, 2016 at 4:42 PM, Martin Balin <Ma...@oracle.com>
wrote:

> Hello Emilian,
> I'm working at Oracle on NetBeans development and we would like to start
> fixing build scripts to use Git instead of HG.
> This could be done earlier on your Git repo if you agree to as it will
> take time. Does not need to wait for final official donation of sources.
> Can you please send me the URL,...
> Thank you Martin Balin
>
>
>
> On 24.11.2016 20:07, Emilian Bold wrote:
>
>> At under 1GB the repository size is not an issue anymore.
>>
>> It's sad to see we will still have migration problems due to legal
>> considerations.
>>
>> Could you provide an estimate how long it would take to verify and
>> whitelist the entire codebase Oracle plans on donating?
>>
>> It's unclear to me how history would be preserved with an incremental
>> approach.
>>
>> I would prefer we migrate the whole thing in one piece with history and
>> all.
>>
>>
>> --emi
>>
>> On Thu, Nov 24, 2016 at 5:22 PM, Jaroslav Tulach <
>> jaroslav.tulach@oracle.com
>>
>>> wrote:
>>> Emilian, Jan, Mark, great work.
>>>
>>> Smooth migration from Hg to Git is essential for successful migration to
>>> Apache. Thanks a lot for investigating how to do that.
>>>
>>> My plan (as described in another email) is to prepare the code donation
>>> in
>>> Hg
>>> and update it incrementally with code integrated into Hg.
>>>
>>> Are your conversions methods ready for incremental updates or do they
>>> only
>>> work as a one-time batch conversion?
>>>
>>> -jt
>>>
>>> On čtvrtek 24. listopadu 2016 10:41:50 CET Jan Lahoda wrote:
>>>
>>>> Interesting. I tried "git gc --aggressive" on the Mark's converted
>>>> repository, and the result is:
>>>> netbeans-import/.git$ du -hs .
>>>> 792M    .
>>>>
>>>> The original was:
>>>> netbeans-import.git $ du -hs .
>>>> 3,5G    .
>>>>
>>>> (IIRC Mark was converting http://hg.netbeans.org/main, not releases, so
>>>>
>>> the
>>>
>>>> repository is a little bit smaller than the releases one.)
>>>>
>>>> I tried:
>>>> $ git log -p | sha1sum
>>>>
>>>> on both repositories, and the hashes appear to be the same. I also tried
>>>>
>>> to
>>>
>>>> clone the gc-ed repository using git clone --bare --no-local, and the
>>>> resulting repository is still about the same size. So, this seems good
>>>> to
>>>> me, unless there is some downside I don't know about.
>>>>
>>>> Jan
>>>>
>>>>
>>>> On Wed, Nov 23, 2016 at 8:26 PM, Emilian Bold <em...@gmail.com>
>>>>
>>>> wrote:
>>>>
>>>>> Actually I don't believe the data loss is that large. (There may also
>>>>>
>>>> be
>>>
>>>> mercurial commits that are intentionally ignored by the conversion
>>>>>
>>>> script,
>>>
>>>> like commits that only add tags?)
>>>>>
>>>>> hg log | grep '^changeset:' | wc -l
>>>>>
>>>>>    313209
>>>>>
>>>>> git log | grep '^commit ' | wc -l
>>>>>
>>>>>    301478
>>>>>
>>>>> So there is a difference of 11731 commits (about 4%) but those couldn't
>>>>> have such a large impact on repository size.
>>>>>
>>>>> I hope somebody else is willing to work with me on this so we document
>>>>> everything and do a reproducible repository conversion.
>>>>>
>>>>>
>>>>>
>>>>> --emi
>>>>>
>>>>> On Wed, Nov 23, 2016 at 9:10 PM, Emilian Bold <em...@gmail.com>
>>>>>
>>>>> wrote:
>>>>>
>>>>>> Well, I dunno what black magic `gc --aggressive` does but the
>>>>>>
>>>>> repository
>>>
>>>> is 0.85GB now!
>>>>>>
>>>>>> I also ran `git reflog expire` first but it didn't change the size at
>>>>>>
>>>>> all.
>>>>>
>>>>> One thing to keep in mind is that I used --force although I had 6
>>>>>> commits
>>>>>> with the warning "repository has at least one unnamed head". Which
>>>>>>
>>>>> were
>>>
>>>> probably all close branch commits (hg commit --close-branch).
>>>>>>
>>>>>> So I might have have data loss(!) since I believe I read
>>>>>>
>>>>> hg-fast-export.sh
>>>>>
>>>>> picks only one unnamed head as the migration winner. I wonder if the
>>>>>>
>>>>> gc
>>>
>>>> command didn't just purge a lot of valid commits from such an unnamed
>>>>>>
>>>>> head
>>>>>
>>>>> and that's why the repository became so small.
>>>>>>
>>>>>> Could somebody else try a test repository conversion and validate my
>>>>>> numbers?
>>>>>>
>>>>>> git gc --aggressive --prune=now
>>>>>> Counting objects: 4085031, done.
>>>>>> Delta compression using up to 8 threads.
>>>>>> Compressing objects: 100% (2909203/2909203), done.
>>>>>> Writing objects: 100% (4085031/4085031), done.
>>>>>> Total 4085031 (delta 2150468), reused 1585934 (delta 0)
>>>>>> Checking connectivity: 4085031, done.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --emi
>>>>>>
>>>>>> On Wed, Nov 23, 2016 at 7:59 PM, Paul Merlin <pa...@apache.org>
>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Emilian,
>>>>>>>
>>>>>>> I see hg-fast-export.sh finished at some point.
>>>>>>>>
>>>>>>>> As expected though, git does not have any of the disk space gains.
>>>>>>>> The
>>>>>>>> converted git releases/ repository is 3.6GB.
>>>>>>>>
>>>>>>> Just a thought.
>>>>>>> Did you try some git cleanups after the conversion?
>>>>>>>
>>>>>>> git reflog expire --expire=now --all
>>>>>>> git gc --aggressive --prune=now
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>> In case these statistics mean something:
>>>>>>>>
>>>>>>>> git-fast-import statistics:
>>>>>>>> ------------------------------------------------------------
>>>>>>>>
>>>>>>> ---------
>>>
>>>> Alloc'd objects:    4090000
>>>>>>>> Total objects:      4085509 (  40220100 duplicates
>>>>>>>>
>>>>>>>    )
>>>
>>>>        blobs  :      1036365 (  28386238 duplicates     858087
>>>>>>>>
>>>>>>> deltas
>>>
>>>> of
>>>>>
>>>>> 969684 attempts)
>>>>>>>>
>>>>>>>>        trees  :      2735935 (  11833862 duplicates    1370606
>>>>>>>>
>>>>>>> deltas
>>>
>>>> of
>>>>>
>>>>>   2613480 attempts)
>>>>>>>>
>>>>>>>>        commits:       313209 (         0 duplicates          0
>>>>>>>>
>>>>>>> deltas
>>>
>>>> of
>>>>>
>>>>>       0 attempts)
>>>>>>>>
>>>>>>>>        tags   :            0 (         0 duplicates          0
>>>>>>>>
>>>>>>> deltas
>>>
>>>> of
>>>>>
>>>>>       0 attempts)
>>>>>>>>
>>>>>>>> Total branches:        1283 (       346 loads     )
>>>>>>>>
>>>>>>>>        marks:        1048576 (    313209 unique    )
>>>>>>>>        atoms:         124011
>>>>>>>>
>>>>>>>> Memory total:        218429 KiB
>>>>>>>>
>>>>>>>>         pools:         26711 KiB
>>>>>>>>
>>>>>>>>       objects:        191718 KiB
>>>>>>>>
>>>>>>>> ------------------------------------------------------------
>>>>>>>>
>>>>>>> ---------
>>>
>>>> pack_report: getpagesize()            =       4096
>>>>>>>> pack_report: core.packedGitWindowSize = 1073741824
>>>>>>>> pack_report: core.packedGitLimit      = 8589934592
>>>>>>>> pack_report: pack_used_ctr            =   39000045
>>>>>>>> pack_report: pack_mmap_calls          =     733040
>>>>>>>> pack_report: pack_open_windows        =          4 /          7
>>>>>>>> pack_report: pack_mapped              = 4280730006 / 6950823920
>>>>>>>> ------------------------------------------------------------
>>>>>>>>
>>>>>>> ---------
>>>
>>>>
>>>>>>>> --emi
>>>>>>>>
>>>>>>>> On Fri, Nov 18, 2016 at 1:32 PM, Emilian Bold <
>>>>>>>>
>>>>>>> emilian.bold@gmail.com
>>>
>>>> wrote:
>>>>>>>>
>>>>>>>>> A releases/ clone which on my system takes 3.8GB is reduced to
>>>>>>>>>
>>>>>>>> 1.6GB
>>>
>>>> with
>>>>>>>
>>>>>>> the generaldelta and aggressivemergedeltas flags (took about 14
>>>>>>>>>
>>>>>>>> hours).
>>>>>
>>>>> Pretty impressive!
>>>>>>>>>
>>>>>>>>> Converting to git with hg-fast-export.sh complains that
>>>>>>>>>
>>>>>>>> "repository
>>>
>>>> has at
>>>>>>>
>>>>>>> least one unnamed head" for about 6 revisions. With --force I'm
>>>>>>>>>
>>>>>>>> able
>>>
>>>> to
>>>>>
>>>>> start the conversion but it hasn't finished yet.
>>>>>>>>>
>>>>>>>>> The git conversion is about 35% done and already using 1.3GB.
>>>>>>>>>
>>>>>>>>> So... I assume it's going to need just like the original
>>>>>>>>>
>>>>>>>> repository
>>>
>>>> about
>>>>>>>
>>>>>>> 3.8GB.
>>>>>>>>>
>>>>>>>>> I wonder if git has similar space-saving tricks?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --emi
>>>>>>>>>
>>>>>>>>> On Thu, Nov 17, 2016 at 8:46 AM, Emilian Bold <
>>>>>>>>>
>>>>>>>> emilian.bold@gmail.com>
>>>>>
>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Forgot about this. I've just started the Mercurial repository
>>>>>>>>>>
>>>>>>>>> conversion
>>>>>>>
>>>>>>> which will take a few hours.
>>>>>>>>>>
>>>>>>>>>> Will report tomorrow or when it's done.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --emi
>>>>>>>>>>
>>>>>>>>>> On Wed, Nov 16, 2016 at 11:18 PM, cowwoc <
>>>>>>>>>>
>>>>>>>>> cowwoc@bbs.darktech.org>
>>>
>>>> wrote:
>>>>>>>
>>>>>>>> Hi Emilian,
>>>>>>>>>>>
>>>>>>>>>>> Any update on this?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Gili
>>>>>>>>>>>
>>>>>>>>>>> On 2016-11-11 01:33 (-0500), Emilian Bold <e....@gmail.com>
>>>>>>>>>>>
>>>>>>>>>> wrote:
>>>
>>>> Thank you for following through with this after we talked on
>>>>>>>>>>>> IRC.>
>>>>>>>>>>>>
>>>>>>>>>>>> I will check later the size reduction for the releases/ repo.>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>
>>>
>

Re: Switching to Git was: Version control advice

Posted by Martin Balin <Ma...@Oracle.COM>.
Hello Emilian,
I'm working at Oracle on NetBeans development and we would like to start 
fixing build scripts to use Git instead of HG.
This could be done earlier on your Git repo if you agree to as it will 
take time. Does not need to wait for final official donation of sources.
Can you please send me the URL,...
Thank you Martin Balin


On 24.11.2016 20:07, Emilian Bold wrote:
> At under 1GB the repository size is not an issue anymore.
>
> It's sad to see we will still have migration problems due to legal
> considerations.
>
> Could you provide an estimate how long it would take to verify and
> whitelist the entire codebase Oracle plans on donating?
>
> It's unclear to me how history would be preserved with an incremental
> approach.
>
> I would prefer we migrate the whole thing in one piece with history and all.
>
>
> --emi
>
> On Thu, Nov 24, 2016 at 5:22 PM, Jaroslav Tulach <jaroslav.tulach@oracle.com
>> wrote:
>> Emilian, Jan, Mark, great work.
>>
>> Smooth migration from Hg to Git is essential for successful migration to
>> Apache. Thanks a lot for investigating how to do that.
>>
>> My plan (as described in another email) is to prepare the code donation in
>> Hg
>> and update it incrementally with code integrated into Hg.
>>
>> Are your conversions methods ready for incremental updates or do they only
>> work as a one-time batch conversion?
>>
>> -jt
>>
>> On \u010dtvrtek 24. listopadu 2016 10:41:50 CET Jan Lahoda wrote:
>>> Interesting. I tried "git gc --aggressive" on the Mark's converted
>>> repository, and the result is:
>>> netbeans-import/.git$ du -hs .
>>> 792M    .
>>>
>>> The original was:
>>> netbeans-import.git $ du -hs .
>>> 3,5G    .
>>>
>>> (IIRC Mark was converting http://hg.netbeans.org/main, not releases, so
>> the
>>> repository is a little bit smaller than the releases one.)
>>>
>>> I tried:
>>> $ git log -p | sha1sum
>>>
>>> on both repositories, and the hashes appear to be the same. I also tried
>> to
>>> clone the gc-ed repository using git clone --bare --no-local, and the
>>> resulting repository is still about the same size. So, this seems good to
>>> me, unless there is some downside I don't know about.
>>>
>>> Jan
>>>
>>>
>>> On Wed, Nov 23, 2016 at 8:26 PM, Emilian Bold <em...@gmail.com>
>>>
>>> wrote:
>>>> Actually I don't believe the data loss is that large. (There may also
>> be
>>>> mercurial commits that are intentionally ignored by the conversion
>> script,
>>>> like commits that only add tags?)
>>>>
>>>> hg log | grep '^changeset:' | wc -l
>>>>
>>>>    313209
>>>>
>>>> git log | grep '^commit ' | wc -l
>>>>
>>>>    301478
>>>>
>>>> So there is a difference of 11731 commits (about 4%) but those couldn't
>>>> have such a large impact on repository size.
>>>>
>>>> I hope somebody else is willing to work with me on this so we document
>>>> everything and do a reproducible repository conversion.
>>>>
>>>>
>>>>
>>>> --emi
>>>>
>>>> On Wed, Nov 23, 2016 at 9:10 PM, Emilian Bold <em...@gmail.com>
>>>>
>>>> wrote:
>>>>> Well, I dunno what black magic `gc --aggressive` does but the
>> repository
>>>>> is 0.85GB now!
>>>>>
>>>>> I also ran `git reflog expire` first but it didn't change the size at
>>>> all.
>>>>
>>>>> One thing to keep in mind is that I used --force although I had 6
>>>>> commits
>>>>> with the warning "repository has at least one unnamed head". Which
>> were
>>>>> probably all close branch commits (hg commit --close-branch).
>>>>>
>>>>> So I might have have data loss(!) since I believe I read
>>>> hg-fast-export.sh
>>>>
>>>>> picks only one unnamed head as the migration winner. I wonder if the
>> gc
>>>>> command didn't just purge a lot of valid commits from such an unnamed
>>>> head
>>>>
>>>>> and that's why the repository became so small.
>>>>>
>>>>> Could somebody else try a test repository conversion and validate my
>>>>> numbers?
>>>>>
>>>>> git gc --aggressive --prune=now
>>>>> Counting objects: 4085031, done.
>>>>> Delta compression using up to 8 threads.
>>>>> Compressing objects: 100% (2909203/2909203), done.
>>>>> Writing objects: 100% (4085031/4085031), done.
>>>>> Total 4085031 (delta 2150468), reused 1585934 (delta 0)
>>>>> Checking connectivity: 4085031, done.
>>>>>
>>>>>
>>>>>
>>>>> --emi
>>>>>
>>>>> On Wed, Nov 23, 2016 at 7:59 PM, Paul Merlin <pa...@apache.org>
>>>>>
>>>>> wrote:
>>>>>> Hi Emilian,
>>>>>>
>>>>>>> I see hg-fast-export.sh finished at some point.
>>>>>>>
>>>>>>> As expected though, git does not have any of the disk space gains.
>>>>>>> The
>>>>>>> converted git releases/ repository is 3.6GB.
>>>>>> Just a thought.
>>>>>> Did you try some git cleanups after the conversion?
>>>>>>
>>>>>> git reflog expire --expire=now --all
>>>>>> git gc --aggressive --prune=now
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>>> In case these statistics mean something:
>>>>>>>
>>>>>>> git-fast-import statistics:
>>>>>>> ------------------------------------------------------------
>> ---------
>>>>>>> Alloc'd objects:    4090000
>>>>>>> Total objects:      4085509 (  40220100 duplicates
>>    )
>>>>>>>        blobs  :      1036365 (  28386238 duplicates     858087
>> deltas
>>>> of
>>>>
>>>>>>> 969684 attempts)
>>>>>>>
>>>>>>>        trees  :      2735935 (  11833862 duplicates    1370606
>> deltas
>>>> of
>>>>
>>>>>>>   2613480 attempts)
>>>>>>>
>>>>>>>        commits:       313209 (         0 duplicates          0
>> deltas
>>>> of
>>>>
>>>>>>>       0 attempts)
>>>>>>>
>>>>>>>        tags   :            0 (         0 duplicates          0
>> deltas
>>>> of
>>>>
>>>>>>>       0 attempts)
>>>>>>>
>>>>>>> Total branches:        1283 (       346 loads     )
>>>>>>>
>>>>>>>        marks:        1048576 (    313209 unique    )
>>>>>>>        atoms:         124011
>>>>>>>
>>>>>>> Memory total:        218429 KiB
>>>>>>>
>>>>>>>         pools:         26711 KiB
>>>>>>>
>>>>>>>       objects:        191718 KiB
>>>>>>>
>>>>>>> ------------------------------------------------------------
>> ---------
>>>>>>> pack_report: getpagesize()            =       4096
>>>>>>> pack_report: core.packedGitWindowSize = 1073741824
>>>>>>> pack_report: core.packedGitLimit      = 8589934592
>>>>>>> pack_report: pack_used_ctr            =   39000045
>>>>>>> pack_report: pack_mmap_calls          =     733040
>>>>>>> pack_report: pack_open_windows        =          4 /          7
>>>>>>> pack_report: pack_mapped              = 4280730006 / 6950823920
>>>>>>> ------------------------------------------------------------
>> ---------
>>>>>>>
>>>>>>> --emi
>>>>>>>
>>>>>>> On Fri, Nov 18, 2016 at 1:32 PM, Emilian Bold <
>> emilian.bold@gmail.com
>>>>>>> wrote:
>>>>>>>> A releases/ clone which on my system takes 3.8GB is reduced to
>> 1.6GB
>>>>>> with
>>>>>>
>>>>>>>> the generaldelta and aggressivemergedeltas flags (took about 14
>>>> hours).
>>>>
>>>>>>>> Pretty impressive!
>>>>>>>>
>>>>>>>> Converting to git with hg-fast-export.sh complains that
>> "repository
>>>>>> has at
>>>>>>
>>>>>>>> least one unnamed head" for about 6 revisions. With --force I'm
>> able
>>>> to
>>>>
>>>>>>>> start the conversion but it hasn't finished yet.
>>>>>>>>
>>>>>>>> The git conversion is about 35% done and already using 1.3GB.
>>>>>>>>
>>>>>>>> So... I assume it's going to need just like the original
>> repository
>>>>>> about
>>>>>>
>>>>>>>> 3.8GB.
>>>>>>>>
>>>>>>>> I wonder if git has similar space-saving tricks?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --emi
>>>>>>>>
>>>>>>>> On Thu, Nov 17, 2016 at 8:46 AM, Emilian Bold <
>>>> emilian.bold@gmail.com>
>>>>
>>>>>>>> wrote:
>>>>>>>>> Forgot about this. I've just started the Mercurial repository
>>>>>> conversion
>>>>>>
>>>>>>>>> which will take a few hours.
>>>>>>>>>
>>>>>>>>> Will report tomorrow or when it's done.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --emi
>>>>>>>>>
>>>>>>>>> On Wed, Nov 16, 2016 at 11:18 PM, cowwoc <
>> cowwoc@bbs.darktech.org>
>>>>>> wrote:
>>>>>>>>>> Hi Emilian,
>>>>>>>>>>
>>>>>>>>>> Any update on this?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Gili
>>>>>>>>>>
>>>>>>>>>> On 2016-11-11 01:33 (-0500), Emilian Bold <e....@gmail.com>
>> wrote:
>>>>>>>>>>> Thank you for following through with this after we talked on
>>>>>>>>>>> IRC.>
>>>>>>>>>>>
>>>>>>>>>>> I will check later the size reduction for the releases/ repo.>
>>
>>


Re: Switching to Git was: Version control advice

Posted by Emilian Bold <em...@gmail.com>.
At under 1GB the repository size is not an issue anymore.

It's sad to see we will still have migration problems due to legal
considerations.

Could you provide an estimate how long it would take to verify and
whitelist the entire codebase Oracle plans on donating?

It's unclear to me how history would be preserved with an incremental
approach.

I would prefer we migrate the whole thing in one piece with history and all.


--emi

On Thu, Nov 24, 2016 at 5:22 PM, Jaroslav Tulach <jaroslav.tulach@oracle.com
> wrote:

> Emilian, Jan, Mark, great work.
>
> Smooth migration from Hg to Git is essential for successful migration to
> Apache. Thanks a lot for investigating how to do that.
>
> My plan (as described in another email) is to prepare the code donation in
> Hg
> and update it incrementally with code integrated into Hg.
>
> Are your conversions methods ready for incremental updates or do they only
> work as a one-time batch conversion?
>
> -jt
>
> On čtvrtek 24. listopadu 2016 10:41:50 CET Jan Lahoda wrote:
> > Interesting. I tried "git gc --aggressive" on the Mark's converted
> > repository, and the result is:
> > netbeans-import/.git$ du -hs .
> > 792M    .
> >
> > The original was:
> > netbeans-import.git $ du -hs .
> > 3,5G    .
> >
> > (IIRC Mark was converting http://hg.netbeans.org/main, not releases, so
> the
> > repository is a little bit smaller than the releases one.)
> >
> > I tried:
> > $ git log -p | sha1sum
> >
> > on both repositories, and the hashes appear to be the same. I also tried
> to
> > clone the gc-ed repository using git clone --bare --no-local, and the
> > resulting repository is still about the same size. So, this seems good to
> > me, unless there is some downside I don't know about.
> >
> > Jan
> >
> >
> > On Wed, Nov 23, 2016 at 8:26 PM, Emilian Bold <em...@gmail.com>
> >
> > wrote:
> > > Actually I don't believe the data loss is that large. (There may also
> be
> > > mercurial commits that are intentionally ignored by the conversion
> script,
> > > like commits that only add tags?)
> > >
> > > hg log | grep '^changeset:' | wc -l
> > >
> > >   313209
> > >
> > > git log | grep '^commit ' | wc -l
> > >
> > >   301478
> > >
> > > So there is a difference of 11731 commits (about 4%) but those couldn't
> > > have such a large impact on repository size.
> > >
> > > I hope somebody else is willing to work with me on this so we document
> > > everything and do a reproducible repository conversion.
> > >
> > >
> > >
> > > --emi
> > >
> > > On Wed, Nov 23, 2016 at 9:10 PM, Emilian Bold <em...@gmail.com>
> > >
> > > wrote:
> > > > Well, I dunno what black magic `gc --aggressive` does but the
> repository
> > > > is 0.85GB now!
> > > >
> > > > I also ran `git reflog expire` first but it didn't change the size at
> > >
> > > all.
> > >
> > > > One thing to keep in mind is that I used --force although I had 6
> > > > commits
> > > > with the warning "repository has at least one unnamed head". Which
> were
> > > > probably all close branch commits (hg commit --close-branch).
> > > >
> > > > So I might have have data loss(!) since I believe I read
> > >
> > > hg-fast-export.sh
> > >
> > > > picks only one unnamed head as the migration winner. I wonder if the
> gc
> > > > command didn't just purge a lot of valid commits from such an unnamed
> > >
> > > head
> > >
> > > > and that's why the repository became so small.
> > > >
> > > > Could somebody else try a test repository conversion and validate my
> > > > numbers?
> > > >
> > > > git gc --aggressive --prune=now
> > > > Counting objects: 4085031, done.
> > > > Delta compression using up to 8 threads.
> > > > Compressing objects: 100% (2909203/2909203), done.
> > > > Writing objects: 100% (4085031/4085031), done.
> > > > Total 4085031 (delta 2150468), reused 1585934 (delta 0)
> > > > Checking connectivity: 4085031, done.
> > > >
> > > >
> > > >
> > > > --emi
> > > >
> > > > On Wed, Nov 23, 2016 at 7:59 PM, Paul Merlin <pa...@apache.org>
> > > >
> > > > wrote:
> > > >> Hi Emilian,
> > > >>
> > > >> > I see hg-fast-export.sh finished at some point.
> > > >> >
> > > >> > As expected though, git does not have any of the disk space gains.
> > > >> > The
> > > >> > converted git releases/ repository is 3.6GB.
> > > >>
> > > >> Just a thought.
> > > >> Did you try some git cleanups after the conversion?
> > > >>
> > > >> git reflog expire --expire=now --all
> > > >> git gc --aggressive --prune=now
> > > >>
> > > >> Cheers
> > > >>
> > > >> > In case these statistics mean something:
> > > >> >
> > > >> > git-fast-import statistics:
> > > >> > ------------------------------------------------------------
> ---------
> > > >> > Alloc'd objects:    4090000
> > > >> > Total objects:      4085509 (  40220100 duplicates
>   )
> > > >> >
> > > >> >       blobs  :      1036365 (  28386238 duplicates     858087
> deltas
> > >
> > > of
> > >
> > > >> > 969684 attempts)
> > > >> >
> > > >> >       trees  :      2735935 (  11833862 duplicates    1370606
> deltas
> > >
> > > of
> > >
> > > >> >  2613480 attempts)
> > > >> >
> > > >> >       commits:       313209 (         0 duplicates          0
> deltas
> > >
> > > of
> > >
> > > >> >      0 attempts)
> > > >> >
> > > >> >       tags   :            0 (         0 duplicates          0
> deltas
> > >
> > > of
> > >
> > > >> >      0 attempts)
> > > >> >
> > > >> > Total branches:        1283 (       346 loads     )
> > > >> >
> > > >> >       marks:        1048576 (    313209 unique    )
> > > >> >       atoms:         124011
> > > >> >
> > > >> > Memory total:        218429 KiB
> > > >> >
> > > >> >        pools:         26711 KiB
> > > >> >
> > > >> >      objects:        191718 KiB
> > > >> >
> > > >> > ------------------------------------------------------------
> ---------
> > > >> > pack_report: getpagesize()            =       4096
> > > >> > pack_report: core.packedGitWindowSize = 1073741824
> > > >> > pack_report: core.packedGitLimit      = 8589934592
> > > >> > pack_report: pack_used_ctr            =   39000045
> > > >> > pack_report: pack_mmap_calls          =     733040
> > > >> > pack_report: pack_open_windows        =          4 /          7
> > > >> > pack_report: pack_mapped              = 4280730006 / 6950823920
> > > >> > ------------------------------------------------------------
> ---------
> > > >> >
> > > >> >
> > > >> > --emi
> > > >> >
> > > >> > On Fri, Nov 18, 2016 at 1:32 PM, Emilian Bold <
> emilian.bold@gmail.com
> > > >> >
> > > >> > wrote:
> > > >> >> A releases/ clone which on my system takes 3.8GB is reduced to
> 1.6GB
> > > >>
> > > >> with
> > > >>
> > > >> >> the generaldelta and aggressivemergedeltas flags (took about 14
> > >
> > > hours).
> > >
> > > >> >> Pretty impressive!
> > > >> >>
> > > >> >> Converting to git with hg-fast-export.sh complains that
> "repository
> > > >>
> > > >> has at
> > > >>
> > > >> >> least one unnamed head" for about 6 revisions. With --force I'm
> able
> > >
> > > to
> > >
> > > >> >> start the conversion but it hasn't finished yet.
> > > >> >>
> > > >> >> The git conversion is about 35% done and already using 1.3GB.
> > > >> >>
> > > >> >> So... I assume it's going to need just like the original
> repository
> > > >>
> > > >> about
> > > >>
> > > >> >> 3.8GB.
> > > >> >>
> > > >> >> I wonder if git has similar space-saving tricks?
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> --emi
> > > >> >>
> > > >> >> On Thu, Nov 17, 2016 at 8:46 AM, Emilian Bold <
> > >
> > > emilian.bold@gmail.com>
> > >
> > > >> >> wrote:
> > > >> >>> Forgot about this. I've just started the Mercurial repository
> > > >>
> > > >> conversion
> > > >>
> > > >> >>> which will take a few hours.
> > > >> >>>
> > > >> >>> Will report tomorrow or when it's done.
> > > >> >>>
> > > >> >>>
> > > >> >>> --emi
> > > >> >>>
> > > >> >>> On Wed, Nov 16, 2016 at 11:18 PM, cowwoc <
> cowwoc@bbs.darktech.org>
> > > >>
> > > >> wrote:
> > > >> >>>> Hi Emilian,
> > > >> >>>>
> > > >> >>>> Any update on this?
> > > >> >>>>
> > > >> >>>> Thanks,
> > > >> >>>> Gili
> > > >> >>>>
> > > >> >>>> On 2016-11-11 01:33 (-0500), Emilian Bold <e....@gmail.com>
> wrote:
> > > >> >>>>> Thank you for following through with this after we talked on
> > > >> >>>>> IRC.>
> > > >> >>>>>
> > > >> >>>>> I will check later the size reduction for the releases/ repo.>
>
>
>

Switching to Git was: Version control advice

Posted by Jaroslav Tulach <ja...@oracle.com>.
Emilian, Jan, Mark, great work.

Smooth migration from Hg to Git is essential for successful migration to 
Apache. Thanks a lot for investigating how to do that.

My plan (as described in another email) is to prepare the code donation in Hg 
and update it incrementally with code integrated into Hg.

Are your conversions methods ready for incremental updates or do they only 
work as a one-time batch conversion?

-jt

On čtvrtek 24. listopadu 2016 10:41:50 CET Jan Lahoda wrote:
> Interesting. I tried "git gc --aggressive" on the Mark's converted
> repository, and the result is:
> netbeans-import/.git$ du -hs .
> 792M    .
> 
> The original was:
> netbeans-import.git $ du -hs .
> 3,5G    .
> 
> (IIRC Mark was converting http://hg.netbeans.org/main, not releases, so the
> repository is a little bit smaller than the releases one.)
> 
> I tried:
> $ git log -p | sha1sum
> 
> on both repositories, and the hashes appear to be the same. I also tried to
> clone the gc-ed repository using git clone --bare --no-local, and the
> resulting repository is still about the same size. So, this seems good to
> me, unless there is some downside I don't know about.
> 
> Jan
> 
> 
> On Wed, Nov 23, 2016 at 8:26 PM, Emilian Bold <em...@gmail.com>
> 
> wrote:
> > Actually I don't believe the data loss is that large. (There may also be
> > mercurial commits that are intentionally ignored by the conversion script,
> > like commits that only add tags?)
> > 
> > hg log | grep '^changeset:' | wc -l
> > 
> >   313209
> > 
> > git log | grep '^commit ' | wc -l
> > 
> >   301478
> > 
> > So there is a difference of 11731 commits (about 4%) but those couldn't
> > have such a large impact on repository size.
> > 
> > I hope somebody else is willing to work with me on this so we document
> > everything and do a reproducible repository conversion.
> > 
> > 
> > 
> > --emi
> > 
> > On Wed, Nov 23, 2016 at 9:10 PM, Emilian Bold <em...@gmail.com>
> > 
> > wrote:
> > > Well, I dunno what black magic `gc --aggressive` does but the repository
> > > is 0.85GB now!
> > > 
> > > I also ran `git reflog expire` first but it didn't change the size at
> > 
> > all.
> > 
> > > One thing to keep in mind is that I used --force although I had 6
> > > commits
> > > with the warning "repository has at least one unnamed head". Which were
> > > probably all close branch commits (hg commit --close-branch).
> > > 
> > > So I might have have data loss(!) since I believe I read
> > 
> > hg-fast-export.sh
> > 
> > > picks only one unnamed head as the migration winner. I wonder if the gc
> > > command didn't just purge a lot of valid commits from such an unnamed
> > 
> > head
> > 
> > > and that's why the repository became so small.
> > > 
> > > Could somebody else try a test repository conversion and validate my
> > > numbers?
> > > 
> > > git gc --aggressive --prune=now
> > > Counting objects: 4085031, done.
> > > Delta compression using up to 8 threads.
> > > Compressing objects: 100% (2909203/2909203), done.
> > > Writing objects: 100% (4085031/4085031), done.
> > > Total 4085031 (delta 2150468), reused 1585934 (delta 0)
> > > Checking connectivity: 4085031, done.
> > > 
> > > 
> > > 
> > > --emi
> > > 
> > > On Wed, Nov 23, 2016 at 7:59 PM, Paul Merlin <pa...@apache.org>
> > > 
> > > wrote:
> > >> Hi Emilian,
> > >> 
> > >> > I see hg-fast-export.sh finished at some point.
> > >> > 
> > >> > As expected though, git does not have any of the disk space gains.
> > >> > The
> > >> > converted git releases/ repository is 3.6GB.
> > >> 
> > >> Just a thought.
> > >> Did you try some git cleanups after the conversion?
> > >> 
> > >> git reflog expire --expire=now --all
> > >> git gc --aggressive --prune=now
> > >> 
> > >> Cheers
> > >> 
> > >> > In case these statistics mean something:
> > >> > 
> > >> > git-fast-import statistics:
> > >> > ---------------------------------------------------------------------
> > >> > Alloc'd objects:    4090000
> > >> > Total objects:      4085509 (  40220100 duplicates                  )
> > >> > 
> > >> >       blobs  :      1036365 (  28386238 duplicates     858087 deltas
> > 
> > of
> > 
> > >> > 969684 attempts)
> > >> > 
> > >> >       trees  :      2735935 (  11833862 duplicates    1370606 deltas
> > 
> > of
> > 
> > >> >  2613480 attempts)
> > >> >  
> > >> >       commits:       313209 (         0 duplicates          0 deltas
> > 
> > of
> > 
> > >> >      0 attempts)
> > >> >      
> > >> >       tags   :            0 (         0 duplicates          0 deltas
> > 
> > of
> > 
> > >> >      0 attempts)
> > >> > 
> > >> > Total branches:        1283 (       346 loads     )
> > >> > 
> > >> >       marks:        1048576 (    313209 unique    )
> > >> >       atoms:         124011
> > >> > 
> > >> > Memory total:        218429 KiB
> > >> > 
> > >> >        pools:         26711 KiB
> > >> >      
> > >> >      objects:        191718 KiB
> > >> > 
> > >> > ---------------------------------------------------------------------
> > >> > pack_report: getpagesize()            =       4096
> > >> > pack_report: core.packedGitWindowSize = 1073741824
> > >> > pack_report: core.packedGitLimit      = 8589934592
> > >> > pack_report: pack_used_ctr            =   39000045
> > >> > pack_report: pack_mmap_calls          =     733040
> > >> > pack_report: pack_open_windows        =          4 /          7
> > >> > pack_report: pack_mapped              = 4280730006 / 6950823920
> > >> > ---------------------------------------------------------------------
> > >> > 
> > >> > 
> > >> > --emi
> > >> > 
> > >> > On Fri, Nov 18, 2016 at 1:32 PM, Emilian Bold <emilian.bold@gmail.com
> > >> > 
> > >> > wrote:
> > >> >> A releases/ clone which on my system takes 3.8GB is reduced to 1.6GB
> > >> 
> > >> with
> > >> 
> > >> >> the generaldelta and aggressivemergedeltas flags (took about 14
> > 
> > hours).
> > 
> > >> >> Pretty impressive!
> > >> >> 
> > >> >> Converting to git with hg-fast-export.sh complains that "repository
> > >> 
> > >> has at
> > >> 
> > >> >> least one unnamed head" for about 6 revisions. With --force I'm able
> > 
> > to
> > 
> > >> >> start the conversion but it hasn't finished yet.
> > >> >> 
> > >> >> The git conversion is about 35% done and already using 1.3GB.
> > >> >> 
> > >> >> So... I assume it's going to need just like the original repository
> > >> 
> > >> about
> > >> 
> > >> >> 3.8GB.
> > >> >> 
> > >> >> I wonder if git has similar space-saving tricks?
> > >> >> 
> > >> >> 
> > >> >> 
> > >> >> --emi
> > >> >> 
> > >> >> On Thu, Nov 17, 2016 at 8:46 AM, Emilian Bold <
> > 
> > emilian.bold@gmail.com>
> > 
> > >> >> wrote:
> > >> >>> Forgot about this. I've just started the Mercurial repository
> > >> 
> > >> conversion
> > >> 
> > >> >>> which will take a few hours.
> > >> >>> 
> > >> >>> Will report tomorrow or when it's done.
> > >> >>> 
> > >> >>> 
> > >> >>> --emi
> > >> >>> 
> > >> >>> On Wed, Nov 16, 2016 at 11:18 PM, cowwoc <co...@bbs.darktech.org>
> > >> 
> > >> wrote:
> > >> >>>> Hi Emilian,
> > >> >>>> 
> > >> >>>> Any update on this?
> > >> >>>> 
> > >> >>>> Thanks,
> > >> >>>> Gili
> > >> >>>> 
> > >> >>>> On 2016-11-11 01:33 (-0500), Emilian Bold <e....@gmail.com> wrote:
> > >> >>>>> Thank you for following through with this after we talked on
> > >> >>>>> IRC.>
> > >> >>>>> 
> > >> >>>>> I will check later the size reduction for the releases/ repo.>



Re: Version control advice

Posted by Emilian Bold <em...@gmail.com>.
You could push it to Github and see if it's the same size there.

În joi, 24 nov. 2016 la 11:42 Jan Lahoda <la...@gmail.com> a scris:

> Interesting. I tried "git gc --aggressive" on the Mark's converted
> repository, and the result is:
> netbeans-import/.git$ du -hs .
> 792M    .
>
> The original was:
> netbeans-import.git $ du -hs .
> 3,5G    .
>
> (IIRC Mark was converting http://hg.netbeans.org/main, not releases, so
> the
> repository is a little bit smaller than the releases one.)
>
> I tried:
> $ git log -p | sha1sum
>
> on both repositories, and the hashes appear to be the same. I also tried to
> clone the gc-ed repository using git clone --bare --no-local, and the
> resulting repository is still about the same size. So, this seems good to
> me, unless there is some downside I don't know about.
>
> Jan
>
>
> On Wed, Nov 23, 2016 at 8:26 PM, Emilian Bold <em...@gmail.com>
> wrote:
>
> > Actually I don't believe the data loss is that large. (There may also be
> > mercurial commits that are intentionally ignored by the conversion
> script,
> > like commits that only add tags?)
> >
> > hg log | grep '^changeset:' | wc -l
> >   313209
> >
> > git log | grep '^commit ' | wc -l
> >   301478
> >
> > So there is a difference of 11731 commits (about 4%) but those couldn't
> > have such a large impact on repository size.
> >
> > I hope somebody else is willing to work with me on this so we document
> > everything and do a reproducible repository conversion.
> >
> >
> >
> > --emi
> >
> > On Wed, Nov 23, 2016 at 9:10 PM, Emilian Bold <em...@gmail.com>
> > wrote:
> >
> > > Well, I dunno what black magic `gc --aggressive` does but the
> repository
> > > is 0.85GB now!
> > >
> > > I also ran `git reflog expire` first but it didn't change the size at
> > all.
> > >
> > > One thing to keep in mind is that I used --force although I had 6
> commits
> > > with the warning "repository has at least one unnamed head". Which were
> > > probably all close branch commits (hg commit --close-branch).
> > >
> > > So I might have have data loss(!) since I believe I read
> > hg-fast-export.sh
> > > picks only one unnamed head as the migration winner. I wonder if the gc
> > > command didn't just purge a lot of valid commits from such an unnamed
> > head
> > > and that's why the repository became so small.
> > >
> > > Could somebody else try a test repository conversion and validate my
> > > numbers?
> > >
> > > git gc --aggressive --prune=now
> > > Counting objects: 4085031, done.
> > > Delta compression using up to 8 threads.
> > > Compressing objects: 100% (2909203/2909203), done.
> > > Writing objects: 100% (4085031/4085031), done.
> > > Total 4085031 (delta 2150468), reused 1585934 (delta 0)
> > > Checking connectivity: 4085031, done.
> > >
> > >
> > >
> > > --emi
> > >
> > > On Wed, Nov 23, 2016 at 7:59 PM, Paul Merlin <pa...@apache.org>
> > > wrote:
> > >
> > >> Hi Emilian,
> > >>
> > >> > I see hg-fast-export.sh finished at some point.
> > >> >
> > >> > As expected though, git does not have any of the disk space gains.
> The
> > >> > converted git releases/ repository is 3.6GB.
> > >>
> > >> Just a thought.
> > >> Did you try some git cleanups after the conversion?
> > >>
> > >> git reflog expire --expire=now --all
> > >> git gc --aggressive --prune=now
> > >>
> > >> Cheers
> > >>
> > >>
> > >> > In case these statistics mean something:
> > >> >
> > >> > git-fast-import statistics:
> > >> >
> ---------------------------------------------------------------------
> > >> > Alloc'd objects:    4090000
> > >> > Total objects:      4085509 (  40220100 duplicates
> )
> > >> >       blobs  :      1036365 (  28386238 duplicates     858087 deltas
> > of
> > >> > 969684 attempts)
> > >> >       trees  :      2735935 (  11833862 duplicates    1370606 deltas
> > of
> > >> >  2613480 attempts)
> > >> >       commits:       313209 (         0 duplicates          0 deltas
> > of
> > >> >      0 attempts)
> > >> >       tags   :            0 (         0 duplicates          0 deltas
> > of
> > >> >      0 attempts)
> > >> > Total branches:        1283 (       346 loads     )
> > >> >       marks:        1048576 (    313209 unique    )
> > >> >       atoms:         124011
> > >> > Memory total:        218429 KiB
> > >> >        pools:         26711 KiB
> > >> >      objects:        191718 KiB
> > >> >
> ---------------------------------------------------------------------
> > >> > pack_report: getpagesize()            =       4096
> > >> > pack_report: core.packedGitWindowSize = 1073741824
> > >> > pack_report: core.packedGitLimit      = 8589934592
> > >> > pack_report: pack_used_ctr            =   39000045
> > >> > pack_report: pack_mmap_calls          =     733040
> > >> > pack_report: pack_open_windows        =          4 /          7
> > >> > pack_report: pack_mapped              = 4280730006 / 6950823920
> > >> >
> ---------------------------------------------------------------------
> > >> >
> > >> >
> > >> > --emi
> > >> >
> > >> > On Fri, Nov 18, 2016 at 1:32 PM, Emilian Bold <
> emilian.bold@gmail.com
> > >
> > >> > wrote:
> > >> >
> > >> >> A releases/ clone which on my system takes 3.8GB is reduced to
> 1.6GB
> > >> with
> > >> >> the generaldelta and aggressivemergedeltas flags (took about 14
> > hours).
> > >> >>
> > >> >> Pretty impressive!
> > >> >>
> > >> >> Converting to git with hg-fast-export.sh complains that "repository
> > >> has at
> > >> >> least one unnamed head" for about 6 revisions. With --force I'm
> able
> > to
> > >> >> start the conversion but it hasn't finished yet.
> > >> >>
> > >> >> The git conversion is about 35% done and already using 1.3GB.
> > >> >>
> > >> >> So... I assume it's going to need just like the original repository
> > >> about
> > >> >> 3.8GB.
> > >> >>
> > >> >> I wonder if git has similar space-saving tricks?
> > >> >>
> > >> >>
> > >> >>
> > >> >> --emi
> > >> >>
> > >> >> On Thu, Nov 17, 2016 at 8:46 AM, Emilian Bold <
> > emilian.bold@gmail.com>
> > >> >> wrote:
> > >> >>
> > >> >>> Forgot about this. I've just started the Mercurial repository
> > >> conversion
> > >> >>> which will take a few hours.
> > >> >>>
> > >> >>> Will report tomorrow or when it's done.
> > >> >>>
> > >> >>>
> > >> >>> --emi
> > >> >>>
> > >> >>> On Wed, Nov 16, 2016 at 11:18 PM, cowwoc <cowwoc@bbs.darktech.org
> >
> > >> wrote:
> > >> >>>
> > >> >>>> Hi Emilian,
> > >> >>>>
> > >> >>>> Any update on this?
> > >> >>>>
> > >> >>>> Thanks,
> > >> >>>> Gili
> > >> >>>>
> > >> >>>>
> > >> >>>> On 2016-11-11 01:33 (-0500), Emilian Bold <e....@gmail.com>
> wrote:
> > >> >>>>> Thank you for following through with this after we talked on
> IRC.>
> > >> >>>>>
> > >> >>>>> I will check later the size reduction for the releases/ repo.>
> > >> >
> > >>
> > >
> > >
> >
>

Re: Version control advice

Posted by Jan Lahoda <la...@gmail.com>.
Interesting. I tried "git gc --aggressive" on the Mark's converted
repository, and the result is:
netbeans-import/.git$ du -hs .
792M    .

The original was:
netbeans-import.git $ du -hs .
3,5G    .

(IIRC Mark was converting http://hg.netbeans.org/main, not releases, so the
repository is a little bit smaller than the releases one.)

I tried:
$ git log -p | sha1sum

on both repositories, and the hashes appear to be the same. I also tried to
clone the gc-ed repository using git clone --bare --no-local, and the
resulting repository is still about the same size. So, this seems good to
me, unless there is some downside I don't know about.

Jan


On Wed, Nov 23, 2016 at 8:26 PM, Emilian Bold <em...@gmail.com>
wrote:

> Actually I don't believe the data loss is that large. (There may also be
> mercurial commits that are intentionally ignored by the conversion script,
> like commits that only add tags?)
>
> hg log | grep '^changeset:' | wc -l
>   313209
>
> git log | grep '^commit ' | wc -l
>   301478
>
> So there is a difference of 11731 commits (about 4%) but those couldn't
> have such a large impact on repository size.
>
> I hope somebody else is willing to work with me on this so we document
> everything and do a reproducible repository conversion.
>
>
>
> --emi
>
> On Wed, Nov 23, 2016 at 9:10 PM, Emilian Bold <em...@gmail.com>
> wrote:
>
> > Well, I dunno what black magic `gc --aggressive` does but the repository
> > is 0.85GB now!
> >
> > I also ran `git reflog expire` first but it didn't change the size at
> all.
> >
> > One thing to keep in mind is that I used --force although I had 6 commits
> > with the warning "repository has at least one unnamed head". Which were
> > probably all close branch commits (hg commit --close-branch).
> >
> > So I might have have data loss(!) since I believe I read
> hg-fast-export.sh
> > picks only one unnamed head as the migration winner. I wonder if the gc
> > command didn't just purge a lot of valid commits from such an unnamed
> head
> > and that's why the repository became so small.
> >
> > Could somebody else try a test repository conversion and validate my
> > numbers?
> >
> > git gc --aggressive --prune=now
> > Counting objects: 4085031, done.
> > Delta compression using up to 8 threads.
> > Compressing objects: 100% (2909203/2909203), done.
> > Writing objects: 100% (4085031/4085031), done.
> > Total 4085031 (delta 2150468), reused 1585934 (delta 0)
> > Checking connectivity: 4085031, done.
> >
> >
> >
> > --emi
> >
> > On Wed, Nov 23, 2016 at 7:59 PM, Paul Merlin <pa...@apache.org>
> > wrote:
> >
> >> Hi Emilian,
> >>
> >> > I see hg-fast-export.sh finished at some point.
> >> >
> >> > As expected though, git does not have any of the disk space gains. The
> >> > converted git releases/ repository is 3.6GB.
> >>
> >> Just a thought.
> >> Did you try some git cleanups after the conversion?
> >>
> >> git reflog expire --expire=now --all
> >> git gc --aggressive --prune=now
> >>
> >> Cheers
> >>
> >>
> >> > In case these statistics mean something:
> >> >
> >> > git-fast-import statistics:
> >> > ---------------------------------------------------------------------
> >> > Alloc'd objects:    4090000
> >> > Total objects:      4085509 (  40220100 duplicates                  )
> >> >       blobs  :      1036365 (  28386238 duplicates     858087 deltas
> of
> >> > 969684 attempts)
> >> >       trees  :      2735935 (  11833862 duplicates    1370606 deltas
> of
> >> >  2613480 attempts)
> >> >       commits:       313209 (         0 duplicates          0 deltas
> of
> >> >      0 attempts)
> >> >       tags   :            0 (         0 duplicates          0 deltas
> of
> >> >      0 attempts)
> >> > Total branches:        1283 (       346 loads     )
> >> >       marks:        1048576 (    313209 unique    )
> >> >       atoms:         124011
> >> > Memory total:        218429 KiB
> >> >        pools:         26711 KiB
> >> >      objects:        191718 KiB
> >> > ---------------------------------------------------------------------
> >> > pack_report: getpagesize()            =       4096
> >> > pack_report: core.packedGitWindowSize = 1073741824
> >> > pack_report: core.packedGitLimit      = 8589934592
> >> > pack_report: pack_used_ctr            =   39000045
> >> > pack_report: pack_mmap_calls          =     733040
> >> > pack_report: pack_open_windows        =          4 /          7
> >> > pack_report: pack_mapped              = 4280730006 / 6950823920
> >> > ---------------------------------------------------------------------
> >> >
> >> >
> >> > --emi
> >> >
> >> > On Fri, Nov 18, 2016 at 1:32 PM, Emilian Bold <emilian.bold@gmail.com
> >
> >> > wrote:
> >> >
> >> >> A releases/ clone which on my system takes 3.8GB is reduced to 1.6GB
> >> with
> >> >> the generaldelta and aggressivemergedeltas flags (took about 14
> hours).
> >> >>
> >> >> Pretty impressive!
> >> >>
> >> >> Converting to git with hg-fast-export.sh complains that "repository
> >> has at
> >> >> least one unnamed head" for about 6 revisions. With --force I'm able
> to
> >> >> start the conversion but it hasn't finished yet.
> >> >>
> >> >> The git conversion is about 35% done and already using 1.3GB.
> >> >>
> >> >> So... I assume it's going to need just like the original repository
> >> about
> >> >> 3.8GB.
> >> >>
> >> >> I wonder if git has similar space-saving tricks?
> >> >>
> >> >>
> >> >>
> >> >> --emi
> >> >>
> >> >> On Thu, Nov 17, 2016 at 8:46 AM, Emilian Bold <
> emilian.bold@gmail.com>
> >> >> wrote:
> >> >>
> >> >>> Forgot about this. I've just started the Mercurial repository
> >> conversion
> >> >>> which will take a few hours.
> >> >>>
> >> >>> Will report tomorrow or when it's done.
> >> >>>
> >> >>>
> >> >>> --emi
> >> >>>
> >> >>> On Wed, Nov 16, 2016 at 11:18 PM, cowwoc <co...@bbs.darktech.org>
> >> wrote:
> >> >>>
> >> >>>> Hi Emilian,
> >> >>>>
> >> >>>> Any update on this?
> >> >>>>
> >> >>>> Thanks,
> >> >>>> Gili
> >> >>>>
> >> >>>>
> >> >>>> On 2016-11-11 01:33 (-0500), Emilian Bold <e....@gmail.com> wrote:
> >> >>>>> Thank you for following through with this after we talked on IRC.>
> >> >>>>>
> >> >>>>> I will check later the size reduction for the releases/ repo.>
> >> >
> >>
> >
> >
>

Re: Version control advice

Posted by Emilian Bold <em...@gmail.com>.
Actually I don't believe the data loss is that large. (There may also be
mercurial commits that are intentionally ignored by the conversion script,
like commits that only add tags?)

hg log | grep '^changeset:' | wc -l
  313209

git log | grep '^commit ' | wc -l
  301478

So there is a difference of 11731 commits (about 4%) but those couldn't
have such a large impact on repository size.

I hope somebody else is willing to work with me on this so we document
everything and do a reproducible repository conversion.



--emi

On Wed, Nov 23, 2016 at 9:10 PM, Emilian Bold <em...@gmail.com>
wrote:

> Well, I dunno what black magic `gc --aggressive` does but the repository
> is 0.85GB now!
>
> I also ran `git reflog expire` first but it didn't change the size at all.
>
> One thing to keep in mind is that I used --force although I had 6 commits
> with the warning "repository has at least one unnamed head". Which were
> probably all close branch commits (hg commit --close-branch).
>
> So I might have have data loss(!) since I believe I read hg-fast-export.sh
> picks only one unnamed head as the migration winner. I wonder if the gc
> command didn't just purge a lot of valid commits from such an unnamed head
> and that's why the repository became so small.
>
> Could somebody else try a test repository conversion and validate my
> numbers?
>
> git gc --aggressive --prune=now
> Counting objects: 4085031, done.
> Delta compression using up to 8 threads.
> Compressing objects: 100% (2909203/2909203), done.
> Writing objects: 100% (4085031/4085031), done.
> Total 4085031 (delta 2150468), reused 1585934 (delta 0)
> Checking connectivity: 4085031, done.
>
>
>
> --emi
>
> On Wed, Nov 23, 2016 at 7:59 PM, Paul Merlin <pa...@apache.org>
> wrote:
>
>> Hi Emilian,
>>
>> > I see hg-fast-export.sh finished at some point.
>> >
>> > As expected though, git does not have any of the disk space gains. The
>> > converted git releases/ repository is 3.6GB.
>>
>> Just a thought.
>> Did you try some git cleanups after the conversion?
>>
>> git reflog expire --expire=now --all
>> git gc --aggressive --prune=now
>>
>> Cheers
>>
>>
>> > In case these statistics mean something:
>> >
>> > git-fast-import statistics:
>> > ---------------------------------------------------------------------
>> > Alloc'd objects:    4090000
>> > Total objects:      4085509 (  40220100 duplicates                  )
>> >       blobs  :      1036365 (  28386238 duplicates     858087 deltas of
>> > 969684 attempts)
>> >       trees  :      2735935 (  11833862 duplicates    1370606 deltas of
>> >  2613480 attempts)
>> >       commits:       313209 (         0 duplicates          0 deltas of
>> >      0 attempts)
>> >       tags   :            0 (         0 duplicates          0 deltas of
>> >      0 attempts)
>> > Total branches:        1283 (       346 loads     )
>> >       marks:        1048576 (    313209 unique    )
>> >       atoms:         124011
>> > Memory total:        218429 KiB
>> >        pools:         26711 KiB
>> >      objects:        191718 KiB
>> > ---------------------------------------------------------------------
>> > pack_report: getpagesize()            =       4096
>> > pack_report: core.packedGitWindowSize = 1073741824
>> > pack_report: core.packedGitLimit      = 8589934592
>> > pack_report: pack_used_ctr            =   39000045
>> > pack_report: pack_mmap_calls          =     733040
>> > pack_report: pack_open_windows        =          4 /          7
>> > pack_report: pack_mapped              = 4280730006 / 6950823920
>> > ---------------------------------------------------------------------
>> >
>> >
>> > --emi
>> >
>> > On Fri, Nov 18, 2016 at 1:32 PM, Emilian Bold <em...@gmail.com>
>> > wrote:
>> >
>> >> A releases/ clone which on my system takes 3.8GB is reduced to 1.6GB
>> with
>> >> the generaldelta and aggressivemergedeltas flags (took about 14 hours).
>> >>
>> >> Pretty impressive!
>> >>
>> >> Converting to git with hg-fast-export.sh complains that "repository
>> has at
>> >> least one unnamed head" for about 6 revisions. With --force I'm able to
>> >> start the conversion but it hasn't finished yet.
>> >>
>> >> The git conversion is about 35% done and already using 1.3GB.
>> >>
>> >> So... I assume it's going to need just like the original repository
>> about
>> >> 3.8GB.
>> >>
>> >> I wonder if git has similar space-saving tricks?
>> >>
>> >>
>> >>
>> >> --emi
>> >>
>> >> On Thu, Nov 17, 2016 at 8:46 AM, Emilian Bold <em...@gmail.com>
>> >> wrote:
>> >>
>> >>> Forgot about this. I've just started the Mercurial repository
>> conversion
>> >>> which will take a few hours.
>> >>>
>> >>> Will report tomorrow or when it's done.
>> >>>
>> >>>
>> >>> --emi
>> >>>
>> >>> On Wed, Nov 16, 2016 at 11:18 PM, cowwoc <co...@bbs.darktech.org>
>> wrote:
>> >>>
>> >>>> Hi Emilian,
>> >>>>
>> >>>> Any update on this?
>> >>>>
>> >>>> Thanks,
>> >>>> Gili
>> >>>>
>> >>>>
>> >>>> On 2016-11-11 01:33 (-0500), Emilian Bold <e....@gmail.com> wrote:
>> >>>>> Thank you for following through with this after we talked on IRC.>
>> >>>>>
>> >>>>> I will check later the size reduction for the releases/ repo.>
>> >
>>
>
>

Re: Version control advice

Posted by Emilian Bold <em...@gmail.com>.
Well, I dunno what black magic `gc --aggressive` does but the repository is
0.85GB now!

I also ran `git reflog expire` first but it didn't change the size at all.

One thing to keep in mind is that I used --force although I had 6 commits
with the warning "repository has at least one unnamed head". Which were
probably all close branch commits (hg commit --close-branch).

So I might have have data loss(!) since I believe I read hg-fast-export.sh
picks only one unnamed head as the migration winner. I wonder if the gc
command didn't just purge a lot of valid commits from such an unnamed head
and that's why the repository became so small.

Could somebody else try a test repository conversion and validate my
numbers?

git gc --aggressive --prune=now
Counting objects: 4085031, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2909203/2909203), done.
Writing objects: 100% (4085031/4085031), done.
Total 4085031 (delta 2150468), reused 1585934 (delta 0)
Checking connectivity: 4085031, done.



--emi

On Wed, Nov 23, 2016 at 7:59 PM, Paul Merlin <pa...@apache.org> wrote:

> Hi Emilian,
>
> > I see hg-fast-export.sh finished at some point.
> >
> > As expected though, git does not have any of the disk space gains. The
> > converted git releases/ repository is 3.6GB.
>
> Just a thought.
> Did you try some git cleanups after the conversion?
>
> git reflog expire --expire=now --all
> git gc --aggressive --prune=now
>
> Cheers
>
>
> > In case these statistics mean something:
> >
> > git-fast-import statistics:
> > ---------------------------------------------------------------------
> > Alloc'd objects:    4090000
> > Total objects:      4085509 (  40220100 duplicates                  )
> >       blobs  :      1036365 (  28386238 duplicates     858087 deltas of
> > 969684 attempts)
> >       trees  :      2735935 (  11833862 duplicates    1370606 deltas of
> >  2613480 attempts)
> >       commits:       313209 (         0 duplicates          0 deltas of
> >      0 attempts)
> >       tags   :            0 (         0 duplicates          0 deltas of
> >      0 attempts)
> > Total branches:        1283 (       346 loads     )
> >       marks:        1048576 (    313209 unique    )
> >       atoms:         124011
> > Memory total:        218429 KiB
> >        pools:         26711 KiB
> >      objects:        191718 KiB
> > ---------------------------------------------------------------------
> > pack_report: getpagesize()            =       4096
> > pack_report: core.packedGitWindowSize = 1073741824
> > pack_report: core.packedGitLimit      = 8589934592
> > pack_report: pack_used_ctr            =   39000045
> > pack_report: pack_mmap_calls          =     733040
> > pack_report: pack_open_windows        =          4 /          7
> > pack_report: pack_mapped              = 4280730006 / 6950823920
> > ---------------------------------------------------------------------
> >
> >
> > --emi
> >
> > On Fri, Nov 18, 2016 at 1:32 PM, Emilian Bold <em...@gmail.com>
> > wrote:
> >
> >> A releases/ clone which on my system takes 3.8GB is reduced to 1.6GB
> with
> >> the generaldelta and aggressivemergedeltas flags (took about 14 hours).
> >>
> >> Pretty impressive!
> >>
> >> Converting to git with hg-fast-export.sh complains that "repository has
> at
> >> least one unnamed head" for about 6 revisions. With --force I'm able to
> >> start the conversion but it hasn't finished yet.
> >>
> >> The git conversion is about 35% done and already using 1.3GB.
> >>
> >> So... I assume it's going to need just like the original repository
> about
> >> 3.8GB.
> >>
> >> I wonder if git has similar space-saving tricks?
> >>
> >>
> >>
> >> --emi
> >>
> >> On Thu, Nov 17, 2016 at 8:46 AM, Emilian Bold <em...@gmail.com>
> >> wrote:
> >>
> >>> Forgot about this. I've just started the Mercurial repository
> conversion
> >>> which will take a few hours.
> >>>
> >>> Will report tomorrow or when it's done.
> >>>
> >>>
> >>> --emi
> >>>
> >>> On Wed, Nov 16, 2016 at 11:18 PM, cowwoc <co...@bbs.darktech.org>
> wrote:
> >>>
> >>>> Hi Emilian,
> >>>>
> >>>> Any update on this?
> >>>>
> >>>> Thanks,
> >>>> Gili
> >>>>
> >>>>
> >>>> On 2016-11-11 01:33 (-0500), Emilian Bold <e....@gmail.com> wrote:
> >>>>> Thank you for following through with this after we talked on IRC.>
> >>>>>
> >>>>> I will check later the size reduction for the releases/ repo.>
> >
>

Re: Version control advice

Posted by Paul Merlin <pa...@apache.org>.
Hi Emilian,

> I see hg-fast-export.sh finished at some point.
>
> As expected though, git does not have any of the disk space gains. The
> converted git releases/ repository is 3.6GB.

Just a thought.
Did you try some git cleanups after the conversion?

git reflog expire --expire=now --all
git gc --aggressive --prune=now

Cheers


> In case these statistics mean something:
>
> git-fast-import statistics:
> ---------------------------------------------------------------------
> Alloc'd objects:    4090000
> Total objects:      4085509 (  40220100 duplicates                  )
>       blobs  :      1036365 (  28386238 duplicates     858087 deltas of
> 969684 attempts)
>       trees  :      2735935 (  11833862 duplicates    1370606 deltas of
>  2613480 attempts)
>       commits:       313209 (         0 duplicates          0 deltas of
>      0 attempts)
>       tags   :            0 (         0 duplicates          0 deltas of
>      0 attempts)
> Total branches:        1283 (       346 loads     )
>       marks:        1048576 (    313209 unique    )
>       atoms:         124011
> Memory total:        218429 KiB
>        pools:         26711 KiB
>      objects:        191718 KiB
> ---------------------------------------------------------------------
> pack_report: getpagesize()            =       4096
> pack_report: core.packedGitWindowSize = 1073741824
> pack_report: core.packedGitLimit      = 8589934592
> pack_report: pack_used_ctr            =   39000045
> pack_report: pack_mmap_calls          =     733040
> pack_report: pack_open_windows        =          4 /          7
> pack_report: pack_mapped              = 4280730006 / 6950823920
> ---------------------------------------------------------------------
>
>
> --emi
>
> On Fri, Nov 18, 2016 at 1:32 PM, Emilian Bold <em...@gmail.com>
> wrote:
>
>> A releases/ clone which on my system takes 3.8GB is reduced to 1.6GB with
>> the generaldelta and aggressivemergedeltas flags (took about 14 hours).
>>
>> Pretty impressive!
>>
>> Converting to git with hg-fast-export.sh complains that "repository has at
>> least one unnamed head" for about 6 revisions. With --force I'm able to
>> start the conversion but it hasn't finished yet.
>>
>> The git conversion is about 35% done and already using 1.3GB.
>>
>> So... I assume it's going to need just like the original repository about
>> 3.8GB.
>>
>> I wonder if git has similar space-saving tricks?
>>
>>
>>
>> --emi
>>
>> On Thu, Nov 17, 2016 at 8:46 AM, Emilian Bold <em...@gmail.com>
>> wrote:
>>
>>> Forgot about this. I've just started the Mercurial repository conversion
>>> which will take a few hours.
>>>
>>> Will report tomorrow or when it's done.
>>>
>>>
>>> --emi
>>>
>>> On Wed, Nov 16, 2016 at 11:18 PM, cowwoc <co...@bbs.darktech.org> wrote:
>>>
>>>> Hi Emilian,
>>>>
>>>> Any update on this?
>>>>
>>>> Thanks,
>>>> Gili
>>>>
>>>>
>>>> On 2016-11-11 01:33 (-0500), Emilian Bold <e....@gmail.com> wrote:
>>>>> Thank you for following through with this after we talked on IRC.>
>>>>>
>>>>> I will check later the size reduction for the releases/ repo.>
>

Re: Version control advice

Posted by Emilian Bold <em...@gmail.com>.
I see hg-fast-export.sh finished at some point.

As expected though, git does not have any of the disk space gains. The
converted git releases/ repository is 3.6GB.

In case these statistics mean something:

git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects:    4090000
Total objects:      4085509 (  40220100 duplicates                  )
      blobs  :      1036365 (  28386238 duplicates     858087 deltas of
969684 attempts)
      trees  :      2735935 (  11833862 duplicates    1370606 deltas of
 2613480 attempts)
      commits:       313209 (         0 duplicates          0 deltas of
     0 attempts)
      tags   :            0 (         0 duplicates          0 deltas of
     0 attempts)
Total branches:        1283 (       346 loads     )
      marks:        1048576 (    313209 unique    )
      atoms:         124011
Memory total:        218429 KiB
       pools:         26711 KiB
     objects:        191718 KiB
---------------------------------------------------------------------
pack_report: getpagesize()            =       4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit      = 8589934592
pack_report: pack_used_ctr            =   39000045
pack_report: pack_mmap_calls          =     733040
pack_report: pack_open_windows        =          4 /          7
pack_report: pack_mapped              = 4280730006 / 6950823920
---------------------------------------------------------------------


--emi

On Fri, Nov 18, 2016 at 1:32 PM, Emilian Bold <em...@gmail.com>
wrote:

> A releases/ clone which on my system takes 3.8GB is reduced to 1.6GB with
> the generaldelta and aggressivemergedeltas flags (took about 14 hours).
>
> Pretty impressive!
>
> Converting to git with hg-fast-export.sh complains that "repository has at
> least one unnamed head" for about 6 revisions. With --force I'm able to
> start the conversion but it hasn't finished yet.
>
> The git conversion is about 35% done and already using 1.3GB.
>
> So... I assume it's going to need just like the original repository about
> 3.8GB.
>
> I wonder if git has similar space-saving tricks?
>
>
>
> --emi
>
> On Thu, Nov 17, 2016 at 8:46 AM, Emilian Bold <em...@gmail.com>
> wrote:
>
>> Forgot about this. I've just started the Mercurial repository conversion
>> which will take a few hours.
>>
>> Will report tomorrow or when it's done.
>>
>>
>> --emi
>>
>> On Wed, Nov 16, 2016 at 11:18 PM, cowwoc <co...@bbs.darktech.org> wrote:
>>
>>> Hi Emilian,
>>>
>>> Any update on this?
>>>
>>> Thanks,
>>> Gili
>>>
>>>
>>> On 2016-11-11 01:33 (-0500), Emilian Bold <e....@gmail.com> wrote:
>>> > Thank you for following through with this after we talked on IRC.>
>>> >
>>> > I will check later the size reduction for the releases/ repo.>
>>>
>>
>>
>

Re: Version control advice

Posted by Emilian Bold <em...@gmail.com>.
A releases/ clone which on my system takes 3.8GB is reduced to 1.6GB with
the generaldelta and aggressivemergedeltas flags (took about 14 hours).

Pretty impressive!

Converting to git with hg-fast-export.sh complains that "repository has at
least one unnamed head" for about 6 revisions. With --force I'm able to
start the conversion but it hasn't finished yet.

The git conversion is about 35% done and already using 1.3GB.

So... I assume it's going to need just like the original repository about
3.8GB.

I wonder if git has similar space-saving tricks?



--emi

On Thu, Nov 17, 2016 at 8:46 AM, Emilian Bold <em...@gmail.com>
wrote:

> Forgot about this. I've just started the Mercurial repository conversion
> which will take a few hours.
>
> Will report tomorrow or when it's done.
>
>
> --emi
>
> On Wed, Nov 16, 2016 at 11:18 PM, cowwoc <co...@bbs.darktech.org> wrote:
>
>> Hi Emilian,
>>
>> Any update on this?
>>
>> Thanks,
>> Gili
>>
>>
>> On 2016-11-11 01:33 (-0500), Emilian Bold <e....@gmail.com> wrote:
>> > Thank you for following through with this after we talked on IRC.>
>> >
>> > I will check later the size reduction for the releases/ repo.>
>>
>
>

Re: Version control advice

Posted by Emilian Bold <em...@gmail.com>.
Forgot about this. I've just started the Mercurial repository conversion
which will take a few hours.

Will report tomorrow or when it's done.


--emi

On Wed, Nov 16, 2016 at 11:18 PM, cowwoc <co...@bbs.darktech.org> wrote:

> Hi Emilian,
>
> Any update on this?
>
> Thanks,
> Gili
>
>
> On 2016-11-11 01:33 (-0500), Emilian Bold <e....@gmail.com> wrote:
> > Thank you for following through with this after we talked on IRC.>
> >
> > I will check later the size reduction for the releases/ repo.>
>

Re: Version control advice

Posted by cowwoc <co...@bbs.darktech.org>.
Hi Emilian,

Any update on this?

Thanks,
Gili

On 2016-11-11 01:33 (-0500), Emilian Bold <e....@gmail.com> wrote:
 > Thank you for following through with this after we talked on IRC.>
 >
 > I will check later the size reduction for the releases/ repo.>

Re: Version control advice

Posted by Emilian Bold <em...@gmail.com>.
Thank you for following through with this after we talked on IRC.

I will check later the size reduction for the releases/ repo.

În Vin, 11 nov. 2016 la 07:45 Gregory Szorc <gr...@gmail.com> a
scris:

> I'm a Mercurial developer who is also responsible for running
> https://hg.mozilla.org/ and supporting Mercurial at Mozilla. I understand
> NetBeans is contemplating its version control future because the ASF only
> supports Subversion and Git. I think I've learned some things that may be
> helpful to you.
>
> First, the NetBeans "main" repo is on the same order of magnitude (but
> marginally smaller than) the Firefox repository in terms of file count and
> repository data size. So generally speaking, what I have learned supporting
> Firefox can apply to NetBeans.
>
> While I understand Mercurial may not be in your future, I'd like to point
> out that hg.netbeans.org is running a very old and very slow version of
> Mercurial (likely a release from before July 2010). The high volume of
> merge commits in the "main" repo contributes to highly sub-optimal storage
> utilization in old versions of Mercurial. This makes clones and pulls
> significantly slower due to more data to transfer and contributes to
> significant CPU load on the server to read/encode the sub-optimal storage
> encoding. I wouldn't be surprised if you have CPU load issues on the
> server.
>
> As it is stored today, the "main" repository is almost exactly 3 GB. If you
> create a new repository with optimal storage encoding using Mercurial 3.7
> or newer so "generaldelta" is the default storage format and configuring
> the repository to recalculate optimal deltas, the repository size drops to
> ~1.1 GB. This can be done as such:
>
>    $ hg init main-optimal
>    $ cd main-optimal
>    $ hg --config format.generaldelta=true --config
> format.aggressivemergedeltas=true pull https://hg.netbeans.org/main
>    <wait a long time>
>
> Now, for your VCS future.
>
> I'm a huge proponent of monorepos for productivity reasons. I've seen
> discussion on this list about splitting the repo. I would discourage that.
> I'd encourage you to read https://danluu.com/monorepo/ and the linked
> articles at the bottom for more on the topic.
>
> Unfortunately, one of the practical concerns about monorepos is they don't
> scale with some version control tools, namely Git. This leads many to let
> deficiencies in tools drive workflow decisions, which is quite unfortunate
> because tools should enhance productivity, not hinder it. If NetBeans uses
> Git and maintains the "main" repo as is, I believe you'll experience the
> following performance issues now or in the future as the repository keeps
> growing:
>
> * You'll constantly be dealing with CPU explosions on the Git server
> generated from clients performing clones and large pulls. GitHub uses a
> server infrastructure that caches certain operations related to packfiles
> to help mitigate this. I'm not sure the state of ASF's Git server.
>
> * In many cases, shallow clones can require more CPU on the Git server to
> process than full clones. This is because the server essentially has to
> read objects from packs and repack things instead of doing a fastpath that
> effectively streams a packfile to a client.
>
> * Garbage collection could be problematic on the server and client
>
> Now, Git is constantly improving, so these problems may not always
> exist.And as much as GitHub does well scaling well - better than a vanilla
> Git install - it isn't a silver bullet. On a few instances, processes at
> Mozilla have overwhelmed GitHub and resulted in GitHub disabling access to
> repositories! That hasn't happened in a while though (partially through
> them scaling better and partially through us learning our lesson and not
> pointing hundreds of machines at large Git repos). I'm not sure what if
> anything ASF's Git server has done to mitigate load from large
> repositories.
>
> It's worth nothing that while some of the server-side CPU issues exist in
> default Mercurial installations, there are mitigations. The "clonebundles"
> extension allows a server to advertise pre-generated "bundle" files of
> repository content. When a client clones, they download a large bundle from
> a static file server then go back to the Mercurial server and get the data
> changed since the bundle was created. If you `hg clone
> https://hg.mozilla.org/mozilla-unified`
> <https://hg.mozilla.org/mozilla-unified> with a modern Mercurial client,
> your client will grab a 1+ GB file from a CDN and our servers will spend
> maybe 5s of total CPU to service the clone. The clones are faster for
> clients and the server can scale clones to nearly infinitely. It is wins
> all around.
>
> Anyway, Mercurial's ability to scale doesn't help you if your choices are
> Subversion or Git :/
>
> Given those choices, I would lean towards Subversion if you want to
> maintain the "main" repo as is. If you use the "main" repo as is with Git,
> you should really do due diligence with the Git server operator to make
> sure they won't be overwhelmed.
>
> If you split the "main" repo, go with Git if your users prefer Git over
> Subversion.
>
> A compromise option would be to keep everything in a monorepo in Subversion
> and have separate Git repositories for specific subdirectories or "views."
> This is often a win-win but requires a bit of tooling to do the syncing.
> Speaking of syncing, it should be unidirectional: bi-directional syncing of
> anything is a hard problem and take my word from someone who has hacked on
> bi-directional VCS syncing that it is not something you want to support.
> Instead, I recommend abstracting the process of "pushing to the canonical
> repo" to something a machine does and have it perform the VCS conversion to
> the canonical repo and do the actual push. e.g. landing something from Git
> would have a server fetch that Git ref and replay the commits as Subversion
> commits (or squash and commit to preserve atomicity).
>
> Anyway, I think this wall of text is long enough. Reply if you have any
> questions.
>
> Gregory
>