You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@brooklyn.apache.org by Svetoslav Neykov <sv...@cloudsoftcorp.com> on 2015/01/22 15:36:28 UTC
Big files in repo
Having cloned the brooklyn repo a number of times over the last days I was curious why it's so big. As expected there are a number of binary files, probably unmodified since their first commit, having been deleted a long time ago polluting the history.
Here's a summary of the largest offenders:
All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file.
size pack SHA location
57722 57718 01760133fa9dedc811762019a2aaa691bbe61988 monterey-example/src/main/resources/booking-mvc.war
23917 23905 d0b52371d4ba447ee9b0f882854b4b5d4d39147f monterey-example/src/main/resources/jboss-booking.war
23915 23903 4f6c7f01e0f49c5e3574d648bcab991f7cab5236 monterey-example/src/main/resources/monterey-booking-as7.war
21737 20984 ab9740889e11e06df060595684c2bd803c1baac1 examples/simple-nosql-cluster/src/main/resources/cumulusrdf-0.6.1-pre.jar
20988 20993 04a0a6fd94873543cfec769b5a272bbe36f0a914 examples/simple-nosql-cluster/src/main/resources/cumulusrdf.war
12124 12119 7ccd0ef45879c78941575d47a8521eef49b0b704 sandbox/examples/src/main/resources/swf-booking-mvc.war
7642 7063 0249fd11430b461f918e2b95adb52425caf81230 gemfire/lib/gemfire-6.5.1.4.jar
7217 7219 9d384383e5542b04ca2b6332aedba92bc92bc5bb examples/simple-nosql-cluster/src/main/resources/cumulusrdf.war
2431 2428 5d7d1216a4276d82c3fabdfa25852e527e1bf78e monterey-example/src/main/resources/booking-mvc.war
2430 2427 9fc9f74223ed9704e91810d5dd9fe79d41ff4b84 com.cloudsoftcorp.monterey.brooklyn/src/main/resources/booking-mvc.war
The only sane way to delete them without losing the history is to rewrite it with the big-file-commits excluded. This has the downside that any PRs or cloned repositories will need manual intervention after the changes.
I think it's worth the effort of removing them, having in mind the benefit of shrinking the repo to a few MBs in long run. There's never a "right" moment to do such a change, but the more it is delayed, the worse it becomes. So what are the thoughts of the community?
Svet.
Changing git history -- question for mentors
Posted by Alex Heneveld <al...@cloudsoftcorp.com>.
Brooklyn Mentors-
We have a few early (years ago, pre-incubation) commits which included
some large binaries which were then removed. A consequence of this is
that a `git clone` is much larger than it should be, taking hours
instead of minutes sometimes to download and discouraging participation.
Is there a way we can excise these dumb and costly commits in the history?
For good reason committers shouldn't and can't do `git push -f` (as
discussed below), but in this case it seems warranted to have an
appropriate admin person make the change.
Best
Alex
On 22/01/2015 17:23, aled sage wrote:
> Hi Svet,
>
> I'm not sure if it's possible to do a `git push -f` on an apache repo. We
> should check with mentors and/or infrastructure whether that is impossible
> (or strongly discouraged).
>
> We'd want there to be almost no PRs, and lots of notice to users, before we
> did this if we go down that road.
>
> I agree it would be good if those files were not in the repo history!
> Whether it's worth the price, I'm not sure.
>
> Aled
>
>
> On Thu, Jan 22, 2015 at 2:36 PM, Svetoslav Neykov <
> svetoslav.neykov@cloudsoftcorp.com> wrote:
>
>> Having cloned the brooklyn repo a number of times over the last days I was
>> curious why it's so big. As expected there are a number of binary files,
>> probably unmodified since their first commit, having been deleted a long
>> time ago polluting the history.
>> Here's a summary of the largest offenders:
>>
>> All sizes are in kB's. The pack column is the size of the object,
>> compressed, inside the pack file.
>> size pack SHA location
>> 57722 57718 01760133fa9dedc811762019a2aaa691bbe61988
>> monterey-example/src/main/resources/booking-mvc.war
>> 23917 23905 d0b52371d4ba447ee9b0f882854b4b5d4d39147f
>> monterey-example/src/main/resources/jboss-booking.war
>> 23915 23903 4f6c7f01e0f49c5e3574d648bcab991f7cab5236
>> monterey-example/src/main/resources/monterey-booking-as7.war
>> 21737 20984 ab9740889e11e06df060595684c2bd803c1baac1
>> examples/simple-nosql-cluster/src/main/resources/cumulusrdf-0.6.1-pre.jar
>> 20988 20993 04a0a6fd94873543cfec769b5a272bbe36f0a914
>> examples/simple-nosql-cluster/src/main/resources/cumulusrdf.war
>> 12124 12119 7ccd0ef45879c78941575d47a8521eef49b0b704
>> sandbox/examples/src/main/resources/swf-booking-mvc.war
>> 7642 7063 0249fd11430b461f918e2b95adb52425caf81230
>> gemfire/lib/gemfire-6.5.1.4.jar
>> 7217 7219 9d384383e5542b04ca2b6332aedba92bc92bc5bb
>> examples/simple-nosql-cluster/src/main/resources/cumulusrdf.war
>> 2431 2428 5d7d1216a4276d82c3fabdfa25852e527e1bf78e
>> monterey-example/src/main/resources/booking-mvc.war
>> 2430 2427 9fc9f74223ed9704e91810d5dd9fe79d41ff4b84
>> com.cloudsoftcorp.monterey.brooklyn/src/main/resources/booking-mvc.war
>>
>> The only sane way to delete them without losing the history is to rewrite
>> it with the big-file-commits excluded. This has the downside that any PRs
>> or cloned repositories will need manual intervention after the changes.
>>
>> I think it's worth the effort of removing them, having in mind the benefit
>> of shrinking the repo to a few MBs in long run. There's never a "right"
>> moment to do such a change, but the more it is delayed, the worse it
>> becomes. So what are the thoughts of the community?
>>
>> Svet.
--
Cloudsoft Corporation Limited, Registered in Scotland No: SC349230.
Registered Office: 13 Dryden Place, Edinburgh, EH9 1RP
This e-mail message is confidential and for use by the addressee only. If
the message is received by anyone other than the addressee, please return
the message to the sender by replying to it and then delete the message
from your computer. Internet e-mails are not necessarily secure. Cloudsoft
Corporation Limited does not accept responsibility for changes made to this
message after it was sent.
Whilst all reasonable care has been taken to avoid the transmission of
viruses, it is the responsibility of the recipient to ensure that the
onward transmission, opening or use of this message and any attachments
will not adversely affect its systems or data. No responsibility is
accepted by Cloudsoft Corporation Limited in this regard and the recipient
should carry out such virus and other checks as it considers appropriate.
Re: Big files in repo
Posted by aled sage <al...@gmail.com>.
Hi Svet,
I'm not sure if it's possible to do a `git push -f` on an apache repo. We
should check with mentors and/or infrastructure whether that is impossible
(or strongly discouraged).
We'd want there to be almost no PRs, and lots of notice to users, before we
did this if we go down that road.
I agree it would be good if those files were not in the repo history!
Whether it's worth the price, I'm not sure.
Aled
On Thu, Jan 22, 2015 at 2:36 PM, Svetoslav Neykov <
svetoslav.neykov@cloudsoftcorp.com> wrote:
> Having cloned the brooklyn repo a number of times over the last days I was
> curious why it's so big. As expected there are a number of binary files,
> probably unmodified since their first commit, having been deleted a long
> time ago polluting the history.
> Here's a summary of the largest offenders:
>
> All sizes are in kB's. The pack column is the size of the object,
> compressed, inside the pack file.
> size pack SHA location
> 57722 57718 01760133fa9dedc811762019a2aaa691bbe61988
> monterey-example/src/main/resources/booking-mvc.war
> 23917 23905 d0b52371d4ba447ee9b0f882854b4b5d4d39147f
> monterey-example/src/main/resources/jboss-booking.war
> 23915 23903 4f6c7f01e0f49c5e3574d648bcab991f7cab5236
> monterey-example/src/main/resources/monterey-booking-as7.war
> 21737 20984 ab9740889e11e06df060595684c2bd803c1baac1
> examples/simple-nosql-cluster/src/main/resources/cumulusrdf-0.6.1-pre.jar
> 20988 20993 04a0a6fd94873543cfec769b5a272bbe36f0a914
> examples/simple-nosql-cluster/src/main/resources/cumulusrdf.war
> 12124 12119 7ccd0ef45879c78941575d47a8521eef49b0b704
> sandbox/examples/src/main/resources/swf-booking-mvc.war
> 7642 7063 0249fd11430b461f918e2b95adb52425caf81230
> gemfire/lib/gemfire-6.5.1.4.jar
> 7217 7219 9d384383e5542b04ca2b6332aedba92bc92bc5bb
> examples/simple-nosql-cluster/src/main/resources/cumulusrdf.war
> 2431 2428 5d7d1216a4276d82c3fabdfa25852e527e1bf78e
> monterey-example/src/main/resources/booking-mvc.war
> 2430 2427 9fc9f74223ed9704e91810d5dd9fe79d41ff4b84
> com.cloudsoftcorp.monterey.brooklyn/src/main/resources/booking-mvc.war
>
> The only sane way to delete them without losing the history is to rewrite
> it with the big-file-commits excluded. This has the downside that any PRs
> or cloned repositories will need manual intervention after the changes.
>
> I think it's worth the effort of removing them, having in mind the benefit
> of shrinking the repo to a few MBs in long run. There's never a "right"
> moment to do such a change, but the more it is delayed, the worse it
> becomes. So what are the thoughts of the community?
>
> Svet.