You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@camel.apache.org by David Jencks <da...@gmail.com> on 2021/11/13 18:46:55 UTC

What can we do to make the documentation and website build more resilient?

The Antora build part of the website is getting better at detecting problems and failing the build, and the website build seems to me to be failing more often.  Perhaps we can find ways to improve our process so there are fewer problematic commits and it’s easier to detect and fix problems earlier.

There are a few problems caused by interactions between near-in-time commits and commits that bring in stuff that is obsolete due to recent website build changes.  Let’s ignore those :-)… especially the second kind will iron themselves out over time.

So, people keep merging PRs that change the documentation without checking that it doesn’t break the website build, either locally or as a CI check on the PR.

They theoretically could do a local website build that incorporates their changes, but right now it’s way too hard and time consuming. (I’ll discuss the problems with the projects that attempt to do a partial local build later)
So one good step would be to make local website builds to check doc changes easy and quick.  I’ve made some progress on this.

Another step would be for CI to check the website build on each PR, either the whole site or a partial build.  I think GH actions can trigger each other, but I’ve never set it up.  Do we have enough GH action time to do a full website build on every PR to any camel subproject?  Is it practical to trigger the website build only when something documentation-related changes? (this detection would need to be carefully set up in each subproject)  If these are possible I think we should just do this.  It’s probably possible to set up quicker partial builds, but it’s decidedly more complicated.

Another step would be to make it extremely visible when the jenkins website build fails.  I try to follow the dev list pretty closely, and see a lot of GH PR CI build failures reported, but apparently the jenkins build has been failing for several days and I had no idea.

In principle, what other steps could we take?

——

Comments on the existing attempts to have subproject-specific partial builds:

Dan Allen (of Antora) has repeatedly said that subsidiary builds such as local or partial builds should be done from (clones) of the repo containing the playbook for the actual site.  For a long time I disagreed and thought approaches like that of camel-quarkus to have a local build in the subproject were workable but I’m now convinced that they  are totally unmaintainable.  They rely on updating each such subproject every time the main playbook changes, and in a way that requires deep understanding of the entire site build.  It just isn’t going to work, ever.


——
Maybe there’s hope…

If we’re going to encourage or require local builds of the website, there needs to be a defined file system relationship between  the camel-website clone and the subproject(s) clone(s).  I have a “global” directory (named camel) into which I’ve cloned all the subprojects next to one another (together with some extra git work trees).  I think this is the simplest arrangement and I think we could require it.

Next, there needs to be an easy way (preferably automated) to modify the playbook to take account of building against one (or possibly more) local clones.  E.g, if I’m working on camel-quarkus, I should only need to have camel-quarkus cloned, and still be able to do a build. Doing this is much more plausible if we can assume that every branch participating in the website is present and up to date locally.  Does anyone know if it’s possible to write a git script that can update branches without switching to them?  If we can assume this, then the local build just involves changing the playbook source url from GitHub….<project>.git to `./../<project>` and adjusting the checked out branch name.

Then there’s the problem that the full Antora build takes something like 6 minutes now, which is too long for anyone to wait for. So, we need an effective way of doing quick partial builds.  I’ve been working on this with some progress.  Dan has an idea he calls a site manifest, which means that the site build writes out the content catalog with information about the Antora coordinates and the site location of every page. Then a partial build can read this in to populate the partial build content catalog, so that xrefs can be properly resolved. This was originally developed to enable a “subsidiary site” to have xrefs to a “main site”.  I’ve adapted this to be an Antora pipeline extension, and it can be used in a couple of ways.

- A site manifest could be published as part of the actual site.  In this case the partial build would fetch it, and only pages actually present locally would get local links.  You’d find out whether there are any problems, but it might be hard to locate the local pages through navigation.

- If you do a full build locally to generate a local site manifest, a partial build using that site manifest will only overwrite the rebuilt local files, leaving you with a functional local site.

- Possibly the full Jenkins build could also package the Antora site as a zip archive, and local builds could fetch and unpack it rather than doing a full local build.

With  the site manifest, there’s still the problem of modifying the playbook to only build a little bit.  I’ve written another extension that you configure with  the part you want to build, and it applies appropriate filters. You can configure it down to one page.  It also watches for changes and rebuilds when it detects a change: I think I’ll need to make that configurable since it’s great to see your changes quickly but not what you want for a build step.
I have not yet tried to make it easy to select which subproject you want to build: so far it requires knowing how to configure the extensions. I’ve started having some ideas on how this might be done. 

What I’m envisioning and hoping for is a pre-PR process that involves running, in a local  camel-website clone, something like `yarn partial-build-camel-quarkus` that will in less than a minute detect any errors and produce a local site you can look at with the local  changes.

Thoughts?

David Jencks

Re: What can we do to make the documentation and website build more resilient?

Posted by Zoran Regvart <zo...@regvart.com>.
Hi Andrea,

On Thu, Dec 2, 2021 at 10:25 AM Andrea <an...@tarocch.it> wrote:
> I have one practical question though: right now, which is the recommended practice to check if a new release of a sub-project brake the website build?

I'm deliberately not going to consider a release event as it is not a
release that breaks the website but a change made in the documentation
or configuration of Antora and that can be done independently of a
release.

In general I have seen these two cases most often:

A. A change was made, and existing documentation is affected. Examples
of this are: Camel Quarkus documentation pointing to Camel Component
Reference that doesn't or no longer exists; a page was removed from
the Camel User Manual that is referenced from Camel K; or a version of
Camel Component Reference was unpublished from the website and Camel
Kafka Connectors documentation is referencing it. This shows up as a
broken xref when building the documentation, the error message says
what document on what branch is using what xref that can no longer be
dereferenced.

B. New branch of documentation was configured in the Antora playbook,
but the descriptors on that branch haven't been updated to match the
version. This shows up as a "duplicate nav" error, explaining that
there are sources of a document with the same page coordinates
(component name, module name, version, path).

Both of these cases can be detected by running the build locally with
the proposed changes. Those changes don't need to be committed/pushed.
There is an example of such workflow in the website README[1].

In some (sub-)projects there are xref checks that are run in the local
context as a part of the build, those can detect only a subset of
issues, i.e. they typically can't detect an issue outside of the
branch being built.

We could, and David is proposing that we do, invest more effort to
build better xref checks, see the other thread on incremental builds
using site manifests.

zoran

[1] https://github.com/apache/camel-website/#working-on-documentation-asciidoc-content

--
Zoran Regvart

Re: What can we do to make the documentation and website build more resilient?

Posted by Andrea <an...@tarocch.it>.
Hello,

catching up a little bit on this.

I have one practical question though: right now, which is the recommended practice to check if a new release of a sub-project brake the website build? 

On Sat, Nov 13, 2021, at 22:55, David Jencks wrote:
> Inline...
> 
> > On Nov 13, 2021, at 12:40 PM, Zoran Regvart <zo...@regvart.com> wrote:
> > 
> > Hi David,
> > lots of great stuff here, I'll try to keep my replies short though...
> > 
> > On Sat, Nov 13, 2021 at 7:47 PM David Jencks <david.a.jencks@gmail.com <ma...@gmail.com>> wrote:
> >> 
> >> The Antora build part of the website is getting better at detecting problems and failing the build, and the website build seems to me to be failing more often.  Perhaps we can find ways to improve our process so there are fewer problematic commits and it’s easier to detect and fix problems earlier.
> > 
> > My intent with the website was that we should fail to publish as often
> > as necessary to do our best in not publishing a broken website. I
> > think we can tolerate website not being up to date with the latest for
> > a day or two.
> 
> Absolutely! But there have been lots of times recently when I’ve broken the website and had no idea for days.
> 
> > 
> >> There are a few problems caused by interactions between near-in-time commits and commits that bring in stuff that is obsolete due to recent website build changes.  Let’s ignore those :-)… especially the second kind will iron themselves out over time.
> > 
> > +1
> > 
> >> So, people keep merging PRs that change the documentation without checking that it doesn’t break the website build, either locally or as a CI check on the PR.
> > 
> > I think the xref syntax makes it difficult for folk to wrap their
> > head, I think everyone should be familiar with the documentation we
> > have here:
> > 
> > https://github.com/apache/camel-website/#links-between-pages-in-antora-content <https://github.com/apache/camel-website/#links-between-pages-in-antora-content>
> 
> I think  that’s a really inaccessible location for the information…. I’d like to move it to a page in the manual, near the release guidelines.  Something like “how to contribute to the docs”.
> > 
> >> They theoretically could do a local website build that incorporates their changes, but right now it’s way too hard and time consuming. (I’ll discuss the problems with the projects that attempt to do a partial local build later)
> >> So one good step would be to make local website builds to check doc changes easy and quick.  I’ve made some progress on this.
> > 
> > +1 I think so too, local build and then the CI build against the git
> > repository should be our first lines of defense. We should make those
> > take seconds not minutes and then mandatory.
> > 
> >> Another step would be for CI to check the website build on each PR, either the whole site or a partial build.  I think GH actions can trigger each other, but I’ve never set it up.  Do we have enough GH action time to do a full website build on every PR to any camel subproject?  Is it practical to trigger the website build only when something documentation-related changes? (this detection would need to be carefully set up in each subproject)  If these are possible I think we should just do this.  It’s probably possible to set up quicker partial builds, but it’s decidedly more complicated.
> > 
> > Most issues I've seen have been with xref linking, I think we should
> > focus on that first. Other checks I think fail less often. Perhaps not
> > building at all could be a good solution. For example in the Camel
> > main, Guillaume built a Maven plugin that checks for broken xrefs in
> > seconds. But that (currently) works only within the main Camel
> > repository.
> > 
> > What if we could expand that or build comparable tooling that checks
> > xrefs in the adocs of the git repository the developer is working on
> > but also takes into account xrefs to other subprojects? Two ideas I
> > have here would be to clone other subprojects and build an index of
> > xrefs against them as well; or to use the sitemap XMLs (could be
> > fairly quick!) from the live website and reverse them back to xrefs
> > for checking.
> 
> Well, the site manifest is like the sitemap with  antora-compatible information.  I think what I’m proposing with  the sitemap and partial builds will be quick enough without another tool we struggle to keep up to date.
> > 
> >> Another step would be to make it extremely visible when the jenkins website build fails.  I try to follow the dev list pretty closely, and see a lot of GH PR CI build failures reported, but apparently the jenkins build has been failing for several days and I had no idea.
> > 
> > +1 I think this is an area I can focus on next. I think we're in
> > agreement that we don't want to send emails to the dev@ mailing list,
> > one idea is to create GitHub issues; I just thought of a "status"
> > channel on Zulip. Or perhaps both.
> > 
> >> In principle, what other steps could we take?
> >> 
> >> ——
> >> 
> >> Comments on the existing attempts to have subproject-specific partial builds:
> >> 
> >> Dan Allen (of Antora) has repeatedly said that subsidiary builds such as local or partial builds should be done from (clones) of the repo containing the playbook for the actual site.  For a long time I disagreed and thought approaches like that of camel-quarkus to have a local build in the subproject were workable but I’m now convinced that they  are totally unmaintainable.  They rely on updating each such subproject every time the main playbook changes, and in a way that requires deep understanding of the entire site build.  It just isn’t going to work, ever.
> > 
> > I wonder if we could have the approach of Camel Quarkus and solve the
> > issue of outdated playbooks by having a git submodule of the website
> > in every project. Be warned though, the website git repository is very
> > large (3.6GB).
> 
> That’s another issue…. Isn’t most of that the built site branch? I think the site should be published to a separate repo from the sources, something like camel-site-pub.  Then we won’t need to delete earlier versions, for one thing.  I set this up for Aries and Felix, it’s not hard to do.
> > 
> >> ——
> >> Maybe there’s hope…
> >> 
> >> If we’re going to encourage or require local builds of the website, there needs to be a defined file system relationship between  the camel-website clone and the subproject(s) clone(s).  I have a “global” directory (named camel) into which I’ve cloned all the subprojects next to one another (together with some extra git work trees).  I think this is the simplest arrangement and I think we could require it.
> >> 
> >> Next, there needs to be an easy way (preferably automated) to modify the playbook to take account of building against one (or possibly more) local clones.  E.g, if I’m working on camel-quarkus, I should only need to have camel-quarkus cloned, and still be able to do a build. Doing this is much more plausible if we can assume that every branch participating in the website is present and up to date locally.  Does anyone know if it’s possible to write a git script that can update branches without switching to them?  If we can assume this, then the local build just involves changing the playbook source url from GitHub….<project>.git to `./../<project>` and adjusting the checked out branch name.
> > 
> > I think this could help, but I'm a bit skeptical that if this is not
> > automated folk will skip over this. My workflow is somewhat similar, I
> > have all subprojects checked out in the same (parent) directory, so I
> > just change the playbook to use HEAD branch and ../camel-$subproject
> > to build.
> > 
> >> Then there’s the problem that the full Antora build takes something like 6 minutes now, which is too long for anyone to wait for. So, we need an effective way of doing quick partial builds.  I’ve been working on this with some progress.  Dan has an idea he calls a site manifest, which means that the site build writes out the content catalog with information about the Antora coordinates and the site location of every page. Then a partial build can read this in to populate the partial build content catalog, so that xrefs can be properly resolved. This was originally developed to enable a “subsidiary site” to have xrefs to a “main site”.  I’ve adapted this to be an Antora pipeline extension, and it can be used in a couple of ways.
> > 
> > Here's where it dawned on me that we already have the manifest of
> > sorts in the sitemap XML files. But the idea of each subproject
> > building it's bit of the website is also interesting to me.
> > 
> >> - A site manifest could be published as part of the actual site.  In this case the partial build would fetch it, and only pages actually present locally would get local links.  You’d find out whether there are any problems, but it might be hard to locate the local pages through navigation.
> >> 
> >> - If you do a full build locally to generate a local site manifest, a partial build using that site manifest will only overwrite the rebuilt local files, leaving you with a functional local site.
> >> 
> >> - Possibly the full Jenkins build could also package the Antora site as a zip archive, and local builds could fetch and unpack it rather than doing a full local build.
> > 
> > I think INFRA might not look too keenly on us taking up too much disk
> > space on ci-builds.a.o. We _could/perhaps_ push to repository.a.o. as
> > a -SNAPSHOT.
> 
> I thought we’d have Antora also package as a zip or tar.gz and just include it in the website, like the site manifest.
> > 
> >> With  the site manifest, there’s still the problem of modifying the playbook to only build a little bit.  I’ve written another extension that you configure with  the part you want to build, and it applies appropriate filters. You can configure it down to one page.  It also watches for changes and rebuilds when it detects a change: I think I’ll need to make that configurable since it’s great to see your changes quickly but not what you want for a build step.
> >> I have not yet tried to make it easy to select which subproject you want to build: so far it requires knowing how to configure the extensions. I’ve started having some ideas on how this might be done.
> > 
> > This would bring super fast previews, could be part of the preview
> > functionality we already have for the website...
> > 
> >> What I’m envisioning and hoping for is a pre-PR process that involves running, in a local  camel-website clone, something like `yarn partial-build-camel-quarkus` that will in less than a minute detect any errors and produce a local site you can look at with the local changes.
> > 
> > This would be really cool.
> > 
> >> Thoughts?
> > 
> > If we agree that most issues are broken xrefs (that's how it seems to
> > me) perhaps focusing on not building the Antora bits at all, but doing
> > something along the lines what Guillaume built with information (say
> > from XML sitemaps) about other Antora components in the mix, feels
> > like it would bring some quick wins.
> 
> I really think that as soon as we try to cross component boundaries we’ll be reinventing Antora for no good reason.  People should preview their doc changes locally IMO, so lets make that quick and easy and also so it will detect xref problems.
> > 
> > Sorry I don't think that was short...
> 
> It could have been much longer!!
> 
> David Jencks

Re: What can we do to make the documentation and website build more resilient?

Posted by David Jencks <da...@gmail.com>.
Inline...

> On Nov 13, 2021, at 12:40 PM, Zoran Regvart <zo...@regvart.com> wrote:
> 
> Hi David,
> lots of great stuff here, I'll try to keep my replies short though...
> 
> On Sat, Nov 13, 2021 at 7:47 PM David Jencks <david.a.jencks@gmail.com <ma...@gmail.com>> wrote:
>> 
>> The Antora build part of the website is getting better at detecting problems and failing the build, and the website build seems to me to be failing more often.  Perhaps we can find ways to improve our process so there are fewer problematic commits and it’s easier to detect and fix problems earlier.
> 
> My intent with the website was that we should fail to publish as often
> as necessary to do our best in not publishing a broken website. I
> think we can tolerate website not being up to date with the latest for
> a day or two.

Absolutely! But there have been lots of times recently when I’ve broken the website and had no idea for days.

> 
>> There are a few problems caused by interactions between near-in-time commits and commits that bring in stuff that is obsolete due to recent website build changes.  Let’s ignore those :-)… especially the second kind will iron themselves out over time.
> 
> +1
> 
>> So, people keep merging PRs that change the documentation without checking that it doesn’t break the website build, either locally or as a CI check on the PR.
> 
> I think the xref syntax makes it difficult for folk to wrap their
> head, I think everyone should be familiar with the documentation we
> have here:
> 
> https://github.com/apache/camel-website/#links-between-pages-in-antora-content <https://github.com/apache/camel-website/#links-between-pages-in-antora-content>

I think  that’s a really inaccessible location for the information…. I’d like to move it to a page in the manual, near the release guidelines.  Something like “how to contribute to the docs”.
> 
>> They theoretically could do a local website build that incorporates their changes, but right now it’s way too hard and time consuming. (I’ll discuss the problems with the projects that attempt to do a partial local build later)
>> So one good step would be to make local website builds to check doc changes easy and quick.  I’ve made some progress on this.
> 
> +1 I think so too, local build and then the CI build against the git
> repository should be our first lines of defense. We should make those
> take seconds not minutes and then mandatory.
> 
>> Another step would be for CI to check the website build on each PR, either the whole site or a partial build.  I think GH actions can trigger each other, but I’ve never set it up.  Do we have enough GH action time to do a full website build on every PR to any camel subproject?  Is it practical to trigger the website build only when something documentation-related changes? (this detection would need to be carefully set up in each subproject)  If these are possible I think we should just do this.  It’s probably possible to set up quicker partial builds, but it’s decidedly more complicated.
> 
> Most issues I've seen have been with xref linking, I think we should
> focus on that first. Other checks I think fail less often. Perhaps not
> building at all could be a good solution. For example in the Camel
> main, Guillaume built a Maven plugin that checks for broken xrefs in
> seconds. But that (currently) works only within the main Camel
> repository.
> 
> What if we could expand that or build comparable tooling that checks
> xrefs in the adocs of the git repository the developer is working on
> but also takes into account xrefs to other subprojects? Two ideas I
> have here would be to clone other subprojects and build an index of
> xrefs against them as well; or to use the sitemap XMLs (could be
> fairly quick!) from the live website and reverse them back to xrefs
> for checking.

Well, the site manifest is like the sitemap with  antora-compatible information.  I think what I’m proposing with  the sitemap and partial builds will be quick enough without another tool we struggle to keep up to date.
> 
>> Another step would be to make it extremely visible when the jenkins website build fails.  I try to follow the dev list pretty closely, and see a lot of GH PR CI build failures reported, but apparently the jenkins build has been failing for several days and I had no idea.
> 
> +1 I think this is an area I can focus on next. I think we're in
> agreement that we don't want to send emails to the dev@ mailing list,
> one idea is to create GitHub issues; I just thought of a "status"
> channel on Zulip. Or perhaps both.
> 
>> In principle, what other steps could we take?
>> 
>> ——
>> 
>> Comments on the existing attempts to have subproject-specific partial builds:
>> 
>> Dan Allen (of Antora) has repeatedly said that subsidiary builds such as local or partial builds should be done from (clones) of the repo containing the playbook for the actual site.  For a long time I disagreed and thought approaches like that of camel-quarkus to have a local build in the subproject were workable but I’m now convinced that they  are totally unmaintainable.  They rely on updating each such subproject every time the main playbook changes, and in a way that requires deep understanding of the entire site build.  It just isn’t going to work, ever.
> 
> I wonder if we could have the approach of Camel Quarkus and solve the
> issue of outdated playbooks by having a git submodule of the website
> in every project. Be warned though, the website git repository is very
> large (3.6GB).

That’s another issue…. Isn’t most of that the built site branch? I think the site should be published to a separate repo from the sources, something like camel-site-pub.  Then we won’t need to delete earlier versions, for one thing.  I set this up for Aries and Felix, it’s not hard to do.
> 
>> ——
>> Maybe there’s hope…
>> 
>> If we’re going to encourage or require local builds of the website, there needs to be a defined file system relationship between  the camel-website clone and the subproject(s) clone(s).  I have a “global” directory (named camel) into which I’ve cloned all the subprojects next to one another (together with some extra git work trees).  I think this is the simplest arrangement and I think we could require it.
>> 
>> Next, there needs to be an easy way (preferably automated) to modify the playbook to take account of building against one (or possibly more) local clones.  E.g, if I’m working on camel-quarkus, I should only need to have camel-quarkus cloned, and still be able to do a build. Doing this is much more plausible if we can assume that every branch participating in the website is present and up to date locally.  Does anyone know if it’s possible to write a git script that can update branches without switching to them?  If we can assume this, then the local build just involves changing the playbook source url from GitHub….<project>.git to `./../<project>` and adjusting the checked out branch name.
> 
> I think this could help, but I'm a bit skeptical that if this is not
> automated folk will skip over this. My workflow is somewhat similar, I
> have all subprojects checked out in the same (parent) directory, so I
> just change the playbook to use HEAD branch and ../camel-$subproject
> to build.
> 
>> Then there’s the problem that the full Antora build takes something like 6 minutes now, which is too long for anyone to wait for. So, we need an effective way of doing quick partial builds.  I’ve been working on this with some progress.  Dan has an idea he calls a site manifest, which means that the site build writes out the content catalog with information about the Antora coordinates and the site location of every page. Then a partial build can read this in to populate the partial build content catalog, so that xrefs can be properly resolved. This was originally developed to enable a “subsidiary site” to have xrefs to a “main site”.  I’ve adapted this to be an Antora pipeline extension, and it can be used in a couple of ways.
> 
> Here's where it dawned on me that we already have the manifest of
> sorts in the sitemap XML files. But the idea of each subproject
> building it's bit of the website is also interesting to me.
> 
>> - A site manifest could be published as part of the actual site.  In this case the partial build would fetch it, and only pages actually present locally would get local links.  You’d find out whether there are any problems, but it might be hard to locate the local pages through navigation.
>> 
>> - If you do a full build locally to generate a local site manifest, a partial build using that site manifest will only overwrite the rebuilt local files, leaving you with a functional local site.
>> 
>> - Possibly the full Jenkins build could also package the Antora site as a zip archive, and local builds could fetch and unpack it rather than doing a full local build.
> 
> I think INFRA might not look too keenly on us taking up too much disk
> space on ci-builds.a.o. We _could/perhaps_ push to repository.a.o. as
> a -SNAPSHOT.

I thought we’d have Antora also package as a zip or tar.gz and just include it in the website, like the site manifest.
> 
>> With  the site manifest, there’s still the problem of modifying the playbook to only build a little bit.  I’ve written another extension that you configure with  the part you want to build, and it applies appropriate filters. You can configure it down to one page.  It also watches for changes and rebuilds when it detects a change: I think I’ll need to make that configurable since it’s great to see your changes quickly but not what you want for a build step.
>> I have not yet tried to make it easy to select which subproject you want to build: so far it requires knowing how to configure the extensions. I’ve started having some ideas on how this might be done.
> 
> This would bring super fast previews, could be part of the preview
> functionality we already have for the website...
> 
>> What I’m envisioning and hoping for is a pre-PR process that involves running, in a local  camel-website clone, something like `yarn partial-build-camel-quarkus` that will in less than a minute detect any errors and produce a local site you can look at with the local changes.
> 
> This would be really cool.
> 
>> Thoughts?
> 
> If we agree that most issues are broken xrefs (that's how it seems to
> me) perhaps focusing on not building the Antora bits at all, but doing
> something along the lines what Guillaume built with information (say
> from XML sitemaps) about other Antora components in the mix, feels
> like it would bring some quick wins.

I really think that as soon as we try to cross component boundaries we’ll be reinventing Antora for no good reason.  People should preview their doc changes locally IMO, so lets make that quick and easy and also so it will detect xref problems.
> 
> Sorry I don't think that was short...

It could have been much longer!!

David Jencks

Re: What can we do to make the documentation and website build more resilient?

Posted by Zoran Regvart <zo...@regvart.com>.
Hi David,
lots of great stuff here, I'll try to keep my replies short though...

On Sat, Nov 13, 2021 at 7:47 PM David Jencks <da...@gmail.com> wrote:
>
> The Antora build part of the website is getting better at detecting problems and failing the build, and the website build seems to me to be failing more often.  Perhaps we can find ways to improve our process so there are fewer problematic commits and it’s easier to detect and fix problems earlier.

My intent with the website was that we should fail to publish as often
as necessary to do our best in not publishing a broken website. I
think we can tolerate website not being up to date with the latest for
a day or two.

> There are a few problems caused by interactions between near-in-time commits and commits that bring in stuff that is obsolete due to recent website build changes.  Let’s ignore those :-)… especially the second kind will iron themselves out over time.

+1

> So, people keep merging PRs that change the documentation without checking that it doesn’t break the website build, either locally or as a CI check on the PR.

I think the xref syntax makes it difficult for folk to wrap their
head, I think everyone should be familiar with the documentation we
have here:

https://github.com/apache/camel-website/#links-between-pages-in-antora-content

> They theoretically could do a local website build that incorporates their changes, but right now it’s way too hard and time consuming. (I’ll discuss the problems with the projects that attempt to do a partial local build later)
> So one good step would be to make local website builds to check doc changes easy and quick.  I’ve made some progress on this.

+1 I think so too, local build and then the CI build against the git
repository should be our first lines of defense. We should make those
take seconds not minutes and then mandatory.

> Another step would be for CI to check the website build on each PR, either the whole site or a partial build.  I think GH actions can trigger each other, but I’ve never set it up.  Do we have enough GH action time to do a full website build on every PR to any camel subproject?  Is it practical to trigger the website build only when something documentation-related changes? (this detection would need to be carefully set up in each subproject)  If these are possible I think we should just do this.  It’s probably possible to set up quicker partial builds, but it’s decidedly more complicated.

Most issues I've seen have been with xref linking, I think we should
focus on that first. Other checks I think fail less often. Perhaps not
building at all could be a good solution. For example in the Camel
main, Guillaume built a Maven plugin that checks for broken xrefs in
seconds. But that (currently) works only within the main Camel
repository.

What if we could expand that or build comparable tooling that checks
xrefs in the adocs of the git repository the developer is working on
but also takes into account xrefs to other subprojects? Two ideas I
have here would be to clone other subprojects and build an index of
xrefs against them as well; or to use the sitemap XMLs (could be
fairly quick!) from the live website and reverse them back to xrefs
for checking.

> Another step would be to make it extremely visible when the jenkins website build fails.  I try to follow the dev list pretty closely, and see a lot of GH PR CI build failures reported, but apparently the jenkins build has been failing for several days and I had no idea.

+1 I think this is an area I can focus on next. I think we're in
agreement that we don't want to send emails to the dev@ mailing list,
one idea is to create GitHub issues; I just thought of a "status"
channel on Zulip. Or perhaps both.

> In principle, what other steps could we take?
>
> ——
>
> Comments on the existing attempts to have subproject-specific partial builds:
>
> Dan Allen (of Antora) has repeatedly said that subsidiary builds such as local or partial builds should be done from (clones) of the repo containing the playbook for the actual site.  For a long time I disagreed and thought approaches like that of camel-quarkus to have a local build in the subproject were workable but I’m now convinced that they  are totally unmaintainable.  They rely on updating each such subproject every time the main playbook changes, and in a way that requires deep understanding of the entire site build.  It just isn’t going to work, ever.

I wonder if we could have the approach of Camel Quarkus and solve the
issue of outdated playbooks by having a git submodule of the website
in every project. Be warned though, the website git repository is very
large (3.6GB).

> ——
> Maybe there’s hope…
>
> If we’re going to encourage or require local builds of the website, there needs to be a defined file system relationship between  the camel-website clone and the subproject(s) clone(s).  I have a “global” directory (named camel) into which I’ve cloned all the subprojects next to one another (together with some extra git work trees).  I think this is the simplest arrangement and I think we could require it.
>
> Next, there needs to be an easy way (preferably automated) to modify the playbook to take account of building against one (or possibly more) local clones.  E.g, if I’m working on camel-quarkus, I should only need to have camel-quarkus cloned, and still be able to do a build. Doing this is much more plausible if we can assume that every branch participating in the website is present and up to date locally.  Does anyone know if it’s possible to write a git script that can update branches without switching to them?  If we can assume this, then the local build just involves changing the playbook source url from GitHub….<project>.git to `./../<project>` and adjusting the checked out branch name.

I think this could help, but I'm a bit skeptical that if this is not
automated folk will skip over this. My workflow is somewhat similar, I
have all subprojects checked out in the same (parent) directory, so I
just change the playbook to use HEAD branch and ../camel-$subproject
to build.

> Then there’s the problem that the full Antora build takes something like 6 minutes now, which is too long for anyone to wait for. So, we need an effective way of doing quick partial builds.  I’ve been working on this with some progress.  Dan has an idea he calls a site manifest, which means that the site build writes out the content catalog with information about the Antora coordinates and the site location of every page. Then a partial build can read this in to populate the partial build content catalog, so that xrefs can be properly resolved. This was originally developed to enable a “subsidiary site” to have xrefs to a “main site”.  I’ve adapted this to be an Antora pipeline extension, and it can be used in a couple of ways.

Here's where it dawned on me that we already have the manifest of
sorts in the sitemap XML files. But the idea of each subproject
building it's bit of the website is also interesting to me.

> - A site manifest could be published as part of the actual site.  In this case the partial build would fetch it, and only pages actually present locally would get local links.  You’d find out whether there are any problems, but it might be hard to locate the local pages through navigation.
>
> - If you do a full build locally to generate a local site manifest, a partial build using that site manifest will only overwrite the rebuilt local files, leaving you with a functional local site.
>
> - Possibly the full Jenkins build could also package the Antora site as a zip archive, and local builds could fetch and unpack it rather than doing a full local build.

I think INFRA might not look too keenly on us taking up too much disk
space on ci-builds.a.o. We _could/perhaps_ push to repository.a.o. as
a -SNAPSHOT.

> With  the site manifest, there’s still the problem of modifying the playbook to only build a little bit.  I’ve written another extension that you configure with  the part you want to build, and it applies appropriate filters. You can configure it down to one page.  It also watches for changes and rebuilds when it detects a change: I think I’ll need to make that configurable since it’s great to see your changes quickly but not what you want for a build step.
> I have not yet tried to make it easy to select which subproject you want to build: so far it requires knowing how to configure the extensions. I’ve started having some ideas on how this might be done.

This would bring super fast previews, could be part of the preview
functionality we already have for the website...

> What I’m envisioning and hoping for is a pre-PR process that involves running, in a local  camel-website clone, something like `yarn partial-build-camel-quarkus` that will in less than a minute detect any errors and produce a local site you can look at with the local  changes.

This would be really cool.

> Thoughts?

If we agree that most issues are broken xrefs (that's how it seems to
me) perhaps focusing on not building the Antora bits at all, but doing
something along the lines what Guillaume built with information (say
from XML sitemaps) about other Antora components in the mix, feels
like it would bring some quick wins.

Sorry I don't think that was short...

zoran
-- 
Zoran Regvart