You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Guangning E <gu...@apache.org> on 2021/11/01 09:57:20 UTC

Re: Size of Asf-site branch in Pulsar repo

+1

Enrico Olivelli <eo...@gmail.com> 于2021年10月30日周六 下午6:16写道:

> Il Ven 29 Ott 2021, 21:27 Dave Fisher <wa...@apache.org> ha scritto:
>
> > Hi -
> >
> > > On Oct 29, 2021, at 12:02 PM, Matteo Merli <mm...@apache.org> wrote:
> > >
> > > The Pulsar website is getting published through a CI job that updates
> > > the generated HTML files and commits them in the Pulsar repo, in a
> > > separate branch ('asf-site'). From there the site is immediately
> > > visible on the web.
> > >
> > > One of the issues with this process is that we have a lot of updates
> > > of generated HTML files that are growing the size of the Pulsar Git
> > > repo. Each time we clone, the entire repo has to be fetched by
> > > developers and users.
> > >
> > > This is somewhat made worse by having daily updates in many HTML files
> > > to update timestamps. I just merged a fix for that
> > > https://github.com/apache/pulsar/pull/12538 .
> > >
> > > The size of the clone git repo is already at 1.4 GB. 90% of this size
> > > is due to the 'asf-site' branch.
> > >
> > > Ideally, we should try to find a solution to use an ad-hoc repo for
> > > the website deployment, outside the main Pulsar repo.
> >
> > We can have as many apache/pulsar-* repos as the PMC wants
> >
> > If we create a pulsar-site repos we can publish from multiple branches.
> >
>
> +1
>
>
>
> > See GitHub.com/apache/openjpa-site
> >
> > The main branch could contain website sources.
> > The asf-site branch would have the built website.
> > .asf.yaml
> > publish:
> >   profile: ~
> >   whoami: asf-site
> > A builds branch could have api docs that seldom change. OpenJPA keeps
> > every release…
> > .asf.yaml
> > publish:
> >   profile: ~
> >   subdir: output/builds
> >   whoami: builds
> >
> >
> >
> > >
> > > In the meantime, I propose to truncate the history of the "asf-site"
> > > branch and squash all commits into a single one, in order to reduce
> > > the repo size.
> >
> > +1
> >
>
> +1
>
> Enrico
>
>
> > >
> > > Let me know what you think.
> > >
> > > Matteo
> > >
> > > --
> > > Matteo Merli
> > > <mm...@apache.org>
> >
> >
>

Re: Size of Asf-site branch in Pulsar repo

Posted by Matteo Merli <ma...@gmail.com>.
A small issue here is that the 'asf-site' is currently marked as a
"protected" branch, so it's not possible to delete/recreate or force
push. We need to control the protected branches in the .asf.yaml file.

Another item here is that there are many updates to the swagger
generated files (eg:
https://github.com/apache/pulsar/commit/c10eb563c5e60148edddbd1155caf21bed863bc5#diff-c7cea09cf40d2bfce495fc010bf034925619bf62976d3192af71d4763c089313R19608
) even though there are no changes.

It looks like the fields keep getting reordered in the generated JSON
files. If someone has cycles to check through it, it would be very
helpful to avoid these unneeded updates to the site.




--
Matteo Merli
<ma...@gmail.com>

On Wed, Nov 3, 2021 at 5:07 PM PengHui Li <pe...@apache.org> wrote:
>
> +1
>
> -Penghui
>
> On Mon, Nov 1, 2021 at 5:53 PM Guangning E <gu...@apache.org> wrote:
>>
>> +1
>>
>> Enrico Olivelli <eo...@gmail.com> 于2021年10月30日周六 下午6:16写道:
>>
>> > Il Ven 29 Ott 2021, 21:27 Dave Fisher <wa...@apache.org> ha scritto:
>> >
>> > > Hi -
>> > >
>> > > > On Oct 29, 2021, at 12:02 PM, Matteo Merli <mm...@apache.org> wrote:
>> > > >
>> > > > The Pulsar website is getting published through a CI job that updates
>> > > > the generated HTML files and commits them in the Pulsar repo, in a
>> > > > separate branch ('asf-site'). From there the site is immediately
>> > > > visible on the web.
>> > > >
>> > > > One of the issues with this process is that we have a lot of updates
>> > > > of generated HTML files that are growing the size of the Pulsar Git
>> > > > repo. Each time we clone, the entire repo has to be fetched by
>> > > > developers and users.
>> > > >
>> > > > This is somewhat made worse by having daily updates in many HTML files
>> > > > to update timestamps. I just merged a fix for that
>> > > > https://github.com/apache/pulsar/pull/12538 .
>> > > >
>> > > > The size of the clone git repo is already at 1.4 GB. 90% of this size
>> > > > is due to the 'asf-site' branch.
>> > > >
>> > > > Ideally, we should try to find a solution to use an ad-hoc repo for
>> > > > the website deployment, outside the main Pulsar repo.
>> > >
>> > > We can have as many apache/pulsar-* repos as the PMC wants
>> > >
>> > > If we create a pulsar-site repos we can publish from multiple branches.
>> > >
>> >
>> > +1
>> >
>> >
>> >
>> > > See GitHub.com/apache/openjpa-site
>> > >
>> > > The main branch could contain website sources.
>> > > The asf-site branch would have the built website.
>> > > .asf.yaml
>> > > publish:
>> > >   profile: ~
>> > >   whoami: asf-site
>> > > A builds branch could have api docs that seldom change. OpenJPA keeps
>> > > every release…
>> > > .asf.yaml
>> > > publish:
>> > >   profile: ~
>> > >   subdir: output/builds
>> > >   whoami: builds
>> > >
>> > >
>> > >
>> > > >
>> > > > In the meantime, I propose to truncate the history of the "asf-site"
>> > > > branch and squash all commits into a single one, in order to reduce
>> > > > the repo size.
>> > >
>> > > +1
>> > >
>> >
>> > +1
>> >
>> > Enrico
>> >
>> >
>> > > >
>> > > > Let me know what you think.
>> > > >
>> > > > Matteo
>> > > >
>> > > > --
>> > > > Matteo Merli
>> > > > <mm...@apache.org>
>> > >
>> > >
>> >

Re: Size of Asf-site branch in Pulsar repo

Posted by Matteo Merli <ma...@gmail.com>.
A small issue here is that the 'asf-site' is currently marked as a
"protected" branch, so it's not possible to delete/recreate or force
push. We need to control the protected branches in the .asf.yaml file.

Another item here is that there are many updates to the swagger
generated files (eg:
https://github.com/apache/pulsar/commit/c10eb563c5e60148edddbd1155caf21bed863bc5#diff-c7cea09cf40d2bfce495fc010bf034925619bf62976d3192af71d4763c089313R19608
) even though there are no changes.

It looks like the fields keep getting reordered in the generated JSON
files. If someone has cycles to check through it, it would be very
helpful to avoid these unneeded updates to the site.




--
Matteo Merli
<ma...@gmail.com>

On Wed, Nov 3, 2021 at 5:07 PM PengHui Li <pe...@apache.org> wrote:
>
> +1
>
> -Penghui
>
> On Mon, Nov 1, 2021 at 5:53 PM Guangning E <gu...@apache.org> wrote:
>>
>> +1
>>
>> Enrico Olivelli <eo...@gmail.com> 于2021年10月30日周六 下午6:16写道:
>>
>> > Il Ven 29 Ott 2021, 21:27 Dave Fisher <wa...@apache.org> ha scritto:
>> >
>> > > Hi -
>> > >
>> > > > On Oct 29, 2021, at 12:02 PM, Matteo Merli <mm...@apache.org> wrote:
>> > > >
>> > > > The Pulsar website is getting published through a CI job that updates
>> > > > the generated HTML files and commits them in the Pulsar repo, in a
>> > > > separate branch ('asf-site'). From there the site is immediately
>> > > > visible on the web.
>> > > >
>> > > > One of the issues with this process is that we have a lot of updates
>> > > > of generated HTML files that are growing the size of the Pulsar Git
>> > > > repo. Each time we clone, the entire repo has to be fetched by
>> > > > developers and users.
>> > > >
>> > > > This is somewhat made worse by having daily updates in many HTML files
>> > > > to update timestamps. I just merged a fix for that
>> > > > https://github.com/apache/pulsar/pull/12538 .
>> > > >
>> > > > The size of the clone git repo is already at 1.4 GB. 90% of this size
>> > > > is due to the 'asf-site' branch.
>> > > >
>> > > > Ideally, we should try to find a solution to use an ad-hoc repo for
>> > > > the website deployment, outside the main Pulsar repo.
>> > >
>> > > We can have as many apache/pulsar-* repos as the PMC wants
>> > >
>> > > If we create a pulsar-site repos we can publish from multiple branches.
>> > >
>> >
>> > +1
>> >
>> >
>> >
>> > > See GitHub.com/apache/openjpa-site
>> > >
>> > > The main branch could contain website sources.
>> > > The asf-site branch would have the built website.
>> > > .asf.yaml
>> > > publish:
>> > >   profile: ~
>> > >   whoami: asf-site
>> > > A builds branch could have api docs that seldom change. OpenJPA keeps
>> > > every release…
>> > > .asf.yaml
>> > > publish:
>> > >   profile: ~
>> > >   subdir: output/builds
>> > >   whoami: builds
>> > >
>> > >
>> > >
>> > > >
>> > > > In the meantime, I propose to truncate the history of the "asf-site"
>> > > > branch and squash all commits into a single one, in order to reduce
>> > > > the repo size.
>> > >
>> > > +1
>> > >
>> >
>> > +1
>> >
>> > Enrico
>> >
>> >
>> > > >
>> > > > Let me know what you think.
>> > > >
>> > > > Matteo
>> > > >
>> > > > --
>> > > > Matteo Merli
>> > > > <mm...@apache.org>
>> > >
>> > >
>> >

Re: Size of Asf-site branch in Pulsar repo

Posted by PengHui Li <pe...@apache.org>.
+1

-Penghui

On Mon, Nov 1, 2021 at 5:53 PM Guangning E <gu...@apache.org> wrote:

> +1
>
> Enrico Olivelli <eo...@gmail.com> 于2021年10月30日周六 下午6:16写道:
>
> > Il Ven 29 Ott 2021, 21:27 Dave Fisher <wa...@apache.org> ha scritto:
> >
> > > Hi -
> > >
> > > > On Oct 29, 2021, at 12:02 PM, Matteo Merli <mm...@apache.org>
> wrote:
> > > >
> > > > The Pulsar website is getting published through a CI job that updates
> > > > the generated HTML files and commits them in the Pulsar repo, in a
> > > > separate branch ('asf-site'). From there the site is immediately
> > > > visible on the web.
> > > >
> > > > One of the issues with this process is that we have a lot of updates
> > > > of generated HTML files that are growing the size of the Pulsar Git
> > > > repo. Each time we clone, the entire repo has to be fetched by
> > > > developers and users.
> > > >
> > > > This is somewhat made worse by having daily updates in many HTML
> files
> > > > to update timestamps. I just merged a fix for that
> > > > https://github.com/apache/pulsar/pull/12538 .
> > > >
> > > > The size of the clone git repo is already at 1.4 GB. 90% of this size
> > > > is due to the 'asf-site' branch.
> > > >
> > > > Ideally, we should try to find a solution to use an ad-hoc repo for
> > > > the website deployment, outside the main Pulsar repo.
> > >
> > > We can have as many apache/pulsar-* repos as the PMC wants
> > >
> > > If we create a pulsar-site repos we can publish from multiple branches.
> > >
> >
> > +1
> >
> >
> >
> > > See GitHub.com/apache/openjpa-site
> > >
> > > The main branch could contain website sources.
> > > The asf-site branch would have the built website.
> > > .asf.yaml
> > > publish:
> > >   profile: ~
> > >   whoami: asf-site
> > > A builds branch could have api docs that seldom change. OpenJPA keeps
> > > every release…
> > > .asf.yaml
> > > publish:
> > >   profile: ~
> > >   subdir: output/builds
> > >   whoami: builds
> > >
> > >
> > >
> > > >
> > > > In the meantime, I propose to truncate the history of the "asf-site"
> > > > branch and squash all commits into a single one, in order to reduce
> > > > the repo size.
> > >
> > > +1
> > >
> >
> > +1
> >
> > Enrico
> >
> >
> > > >
> > > > Let me know what you think.
> > > >
> > > > Matteo
> > > >
> > > > --
> > > > Matteo Merli
> > > > <mm...@apache.org>
> > >
> > >
> >
>

Re: Size of Asf-site branch in Pulsar repo

Posted by PengHui Li <pe...@apache.org>.
+1

-Penghui

On Mon, Nov 1, 2021 at 5:53 PM Guangning E <gu...@apache.org> wrote:

> +1
>
> Enrico Olivelli <eo...@gmail.com> 于2021年10月30日周六 下午6:16写道:
>
> > Il Ven 29 Ott 2021, 21:27 Dave Fisher <wa...@apache.org> ha scritto:
> >
> > > Hi -
> > >
> > > > On Oct 29, 2021, at 12:02 PM, Matteo Merli <mm...@apache.org>
> wrote:
> > > >
> > > > The Pulsar website is getting published through a CI job that updates
> > > > the generated HTML files and commits them in the Pulsar repo, in a
> > > > separate branch ('asf-site'). From there the site is immediately
> > > > visible on the web.
> > > >
> > > > One of the issues with this process is that we have a lot of updates
> > > > of generated HTML files that are growing the size of the Pulsar Git
> > > > repo. Each time we clone, the entire repo has to be fetched by
> > > > developers and users.
> > > >
> > > > This is somewhat made worse by having daily updates in many HTML
> files
> > > > to update timestamps. I just merged a fix for that
> > > > https://github.com/apache/pulsar/pull/12538 .
> > > >
> > > > The size of the clone git repo is already at 1.4 GB. 90% of this size
> > > > is due to the 'asf-site' branch.
> > > >
> > > > Ideally, we should try to find a solution to use an ad-hoc repo for
> > > > the website deployment, outside the main Pulsar repo.
> > >
> > > We can have as many apache/pulsar-* repos as the PMC wants
> > >
> > > If we create a pulsar-site repos we can publish from multiple branches.
> > >
> >
> > +1
> >
> >
> >
> > > See GitHub.com/apache/openjpa-site
> > >
> > > The main branch could contain website sources.
> > > The asf-site branch would have the built website.
> > > .asf.yaml
> > > publish:
> > >   profile: ~
> > >   whoami: asf-site
> > > A builds branch could have api docs that seldom change. OpenJPA keeps
> > > every release…
> > > .asf.yaml
> > > publish:
> > >   profile: ~
> > >   subdir: output/builds
> > >   whoami: builds
> > >
> > >
> > >
> > > >
> > > > In the meantime, I propose to truncate the history of the "asf-site"
> > > > branch and squash all commits into a single one, in order to reduce
> > > > the repo size.
> > >
> > > +1
> > >
> >
> > +1
> >
> > Enrico
> >
> >
> > > >
> > > > Let me know what you think.
> > > >
> > > > Matteo
> > > >
> > > > --
> > > > Matteo Merli
> > > > <mm...@apache.org>
> > >
> > >
> >
>