You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@brooklyn.apache.org by Svetoslav Neykov <sv...@cloudsoftcorp.com> on 2015/01/22 15:36:28 UTC

Big files in repo

Having cloned the brooklyn repo a number of times over the last days I was curious why it's so big. As expected there are a number of binary files, probably unmodified since their first commit, having been deleted a long time ago polluting the history.
Here's a summary of the largest offenders:

All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file.
size   pack   SHA                                       location
57722  57718  01760133fa9dedc811762019a2aaa691bbe61988  monterey-example/src/main/resources/booking-mvc.war
23917  23905  d0b52371d4ba447ee9b0f882854b4b5d4d39147f  monterey-example/src/main/resources/jboss-booking.war
23915  23903  4f6c7f01e0f49c5e3574d648bcab991f7cab5236  monterey-example/src/main/resources/monterey-booking-as7.war
21737  20984  ab9740889e11e06df060595684c2bd803c1baac1  examples/simple-nosql-cluster/src/main/resources/cumulusrdf-0.6.1-pre.jar
20988  20993  04a0a6fd94873543cfec769b5a272bbe36f0a914  examples/simple-nosql-cluster/src/main/resources/cumulusrdf.war
12124  12119  7ccd0ef45879c78941575d47a8521eef49b0b704  sandbox/examples/src/main/resources/swf-booking-mvc.war
7642   7063   0249fd11430b461f918e2b95adb52425caf81230  gemfire/lib/gemfire-6.5.1.4.jar
7217   7219   9d384383e5542b04ca2b6332aedba92bc92bc5bb  examples/simple-nosql-cluster/src/main/resources/cumulusrdf.war
2431   2428   5d7d1216a4276d82c3fabdfa25852e527e1bf78e  monterey-example/src/main/resources/booking-mvc.war
2430   2427   9fc9f74223ed9704e91810d5dd9fe79d41ff4b84  com.cloudsoftcorp.monterey.brooklyn/src/main/resources/booking-mvc.war

The only sane way to delete them without losing the history is to rewrite it with the big-file-commits excluded. This has the downside that any PRs or cloned repositories will need manual intervention after the changes.

I think it's worth the effort of removing them, having in mind the benefit of shrinking the repo to a few MBs in long run. There's never a "right" moment to do such a change, but the more it is delayed, the worse it becomes. So what are the thoughts of the community?

Svet.

Changing git history -- question for mentors

Posted by Alex Heneveld <al...@cloudsoftcorp.com>.
Brooklyn Mentors-

We have a few early (years ago, pre-incubation) commits which included 
some large binaries which were then removed.  A consequence of this is 
that a `git clone` is much larger than it should be, taking hours 
instead of minutes sometimes to download and discouraging participation.

Is there a way we can excise these dumb and costly commits in the history?

For good reason committers shouldn't and can't do `git push -f` (as 
discussed below), but in this case it seems warranted to have an 
appropriate admin person make the change.

Best
Alex


On 22/01/2015 17:23, aled sage wrote:
> Hi Svet,
>
> I'm not sure if it's possible to do a `git push -f` on an apache repo. We
> should check with mentors and/or infrastructure whether that is impossible
> (or strongly discouraged).
>
> We'd want there to be almost no PRs, and lots of notice to users, before we
> did this if we go down that road.
>
> I agree it would be good if those files were not in the repo history!
> Whether it's worth the price, I'm not sure.
>
> Aled
>
>
> On Thu, Jan 22, 2015 at 2:36 PM, Svetoslav Neykov <
> svetoslav.neykov@cloudsoftcorp.com> wrote:
>
>> Having cloned the brooklyn repo a number of times over the last days I was
>> curious why it's so big. As expected there are a number of binary files,
>> probably unmodified since their first commit, having been deleted a long
>> time ago polluting the history.
>> Here's a summary of the largest offenders:
>>
>> All sizes are in kB's. The pack column is the size of the object,
>> compressed, inside the pack file.
>> size   pack   SHA                                       location
>> 57722  57718  01760133fa9dedc811762019a2aaa691bbe61988
>> monterey-example/src/main/resources/booking-mvc.war
>> 23917  23905  d0b52371d4ba447ee9b0f882854b4b5d4d39147f
>> monterey-example/src/main/resources/jboss-booking.war
>> 23915  23903  4f6c7f01e0f49c5e3574d648bcab991f7cab5236
>> monterey-example/src/main/resources/monterey-booking-as7.war
>> 21737  20984  ab9740889e11e06df060595684c2bd803c1baac1
>> examples/simple-nosql-cluster/src/main/resources/cumulusrdf-0.6.1-pre.jar
>> 20988  20993  04a0a6fd94873543cfec769b5a272bbe36f0a914
>> examples/simple-nosql-cluster/src/main/resources/cumulusrdf.war
>> 12124  12119  7ccd0ef45879c78941575d47a8521eef49b0b704
>> sandbox/examples/src/main/resources/swf-booking-mvc.war
>> 7642   7063   0249fd11430b461f918e2b95adb52425caf81230
>> gemfire/lib/gemfire-6.5.1.4.jar
>> 7217   7219   9d384383e5542b04ca2b6332aedba92bc92bc5bb
>> examples/simple-nosql-cluster/src/main/resources/cumulusrdf.war
>> 2431   2428   5d7d1216a4276d82c3fabdfa25852e527e1bf78e
>> monterey-example/src/main/resources/booking-mvc.war
>> 2430   2427   9fc9f74223ed9704e91810d5dd9fe79d41ff4b84
>> com.cloudsoftcorp.monterey.brooklyn/src/main/resources/booking-mvc.war
>>
>> The only sane way to delete them without losing the history is to rewrite
>> it with the big-file-commits excluded. This has the downside that any PRs
>> or cloned repositories will need manual intervention after the changes.
>>
>> I think it's worth the effort of removing them, having in mind the benefit
>> of shrinking the repo to a few MBs in long run. There's never a "right"
>> moment to do such a change, but the more it is delayed, the worse it
>> becomes. So what are the thoughts of the community?
>>
>> Svet.


-- 
Cloudsoft Corporation Limited, Registered in Scotland No: SC349230. 
 Registered Office: 13 Dryden Place, Edinburgh, EH9 1RP
 
This e-mail message is confidential and for use by the addressee only. If 
the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message 
from your computer. Internet e-mails are not necessarily secure. Cloudsoft 
Corporation Limited does not accept responsibility for changes made to this 
message after it was sent.

Whilst all reasonable care has been taken to avoid the transmission of 
viruses, it is the responsibility of the recipient to ensure that the 
onward transmission, opening or use of this message and any attachments 
will not adversely affect its systems or data. No responsibility is 
accepted by Cloudsoft Corporation Limited in this regard and the recipient 
should carry out such virus and other checks as it considers appropriate.

Re: Big files in repo

Posted by aled sage <al...@gmail.com>.
Hi Svet,

I'm not sure if it's possible to do a `git push -f` on an apache repo. We
should check with mentors and/or infrastructure whether that is impossible
(or strongly discouraged).

We'd want there to be almost no PRs, and lots of notice to users, before we
did this if we go down that road.

I agree it would be good if those files were not in the repo history!
Whether it's worth the price, I'm not sure.

Aled


On Thu, Jan 22, 2015 at 2:36 PM, Svetoslav Neykov <
svetoslav.neykov@cloudsoftcorp.com> wrote:

> Having cloned the brooklyn repo a number of times over the last days I was
> curious why it's so big. As expected there are a number of binary files,
> probably unmodified since their first commit, having been deleted a long
> time ago polluting the history.
> Here's a summary of the largest offenders:
>
> All sizes are in kB's. The pack column is the size of the object,
> compressed, inside the pack file.
> size   pack   SHA                                       location
> 57722  57718  01760133fa9dedc811762019a2aaa691bbe61988
> monterey-example/src/main/resources/booking-mvc.war
> 23917  23905  d0b52371d4ba447ee9b0f882854b4b5d4d39147f
> monterey-example/src/main/resources/jboss-booking.war
> 23915  23903  4f6c7f01e0f49c5e3574d648bcab991f7cab5236
> monterey-example/src/main/resources/monterey-booking-as7.war
> 21737  20984  ab9740889e11e06df060595684c2bd803c1baac1
> examples/simple-nosql-cluster/src/main/resources/cumulusrdf-0.6.1-pre.jar
> 20988  20993  04a0a6fd94873543cfec769b5a272bbe36f0a914
> examples/simple-nosql-cluster/src/main/resources/cumulusrdf.war
> 12124  12119  7ccd0ef45879c78941575d47a8521eef49b0b704
> sandbox/examples/src/main/resources/swf-booking-mvc.war
> 7642   7063   0249fd11430b461f918e2b95adb52425caf81230
> gemfire/lib/gemfire-6.5.1.4.jar
> 7217   7219   9d384383e5542b04ca2b6332aedba92bc92bc5bb
> examples/simple-nosql-cluster/src/main/resources/cumulusrdf.war
> 2431   2428   5d7d1216a4276d82c3fabdfa25852e527e1bf78e
> monterey-example/src/main/resources/booking-mvc.war
> 2430   2427   9fc9f74223ed9704e91810d5dd9fe79d41ff4b84
> com.cloudsoftcorp.monterey.brooklyn/src/main/resources/booking-mvc.war
>
> The only sane way to delete them without losing the history is to rewrite
> it with the big-file-commits excluded. This has the downside that any PRs
> or cloned repositories will need manual intervention after the changes.
>
> I think it's worth the effort of removing them, having in mind the benefit
> of shrinking the repo to a few MBs in long run. There's never a "right"
> moment to do such a change, but the more it is delayed, the worse it
> becomes. So what are the thoughts of the community?
>
> Svet.