You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by Dirk Rudolph <di...@apache.org> on 2021/06/04 16:30:44 UTC

Sling Sitemap Bundle

Hi all,

I added a new bundle for xml sitemap generation to the whiteboard [0] and
kindly want to ask for your feedback.

The key highlights are:
- A simple, builder-like API to create Sitemaps, that hides all the XML
specifics
- Supports on-demand and background generation w/ continuation after job
interruption
- Support for nested sitemaps, that are automatically collected into a
sitemap indexes

As this implementation depends on an actual project / product's content
structure, I created a sample implementation for the Sling CMS [1].

I still have some open points on my list:
- Link externalization. IIRC there was a discussion to implement a general
approach  in Sling, has that been implemented?
- Housekeeping of old/obsolete sitemap files

However, I wanted to start the discussion and ask - when there are no major
objections - if this contribution could make its own module?

Best,
Dirk

[0] https://github.com/apache/sling-whiteboard/tree/master/sitemap
[1]
https://github.com/apache/sling-org-apache-sling-app-cms/compare/master...Buuhuu:feature/sitemap

Re: Sling Sitemap Bundle

Posted by Bertrand Delacretaz <bd...@apache.org>.
Hi Dirk,

On Fri, Jun 4, 2021 at 6:30 PM Dirk Rudolph <di...@apache.org> wrote:
> ...I added a new bundle for xml sitemap generation to the whiteboard [0] and
> kindly want to ask for your feedback...

I've just taken a quick look so far, but FYI I made a minor change
that hopefully doesn't break anything,
https://github.com/apache/sling-whiteboard/commit/15f07b98434db595b810c16f727dba4d7231e9ba

-Bertrand

Re: Sling Sitemap Bundle

Posted by Dirk Rudolph <di...@apache.org>.
Thanks, Dan.

>  Even just a minor rename to have it be SitemapLinkExternalizer or
something similar might make more sense as even if a more general solution
comes available

I renamed the Externalizer interface to SitemapLinkExternalizer as you
suggested. At least for the canonical link on a page the same
externalization should (if not even must) be used [0], but there can be
others indeed.

> How would cleanup work? Based on a cursory review of the code I'm

You are right. It is as simple as iterating through all the sitemap files
and checking if they are still "relevant", meaning their corresponding
content resource exists and is still a sitemap root.

> It'd be *nice* to have a Web Console or some means for an administrator /
developer to understand what sitemaps currently exist and trigger
regeneration (or if there's some better way let me know)

I added an InventoryPrinter [1] to cover the first part and introduced some
API methods in the SitemapService [2] to cover the later one. I did not
create a WebConsolePlugin as from my pov the (re)generation should be
accessible to business users and so, the product/project should provide an
UI. I am not sure how that could look like for Sling CMS.

Best,
Dirk

[0]
https://developers.google.com/search/docs/advanced/sitemaps/build-sitemap#general-guidelines
[1]
https://github.com/apache/sling-whiteboard/blob/master/sitemap/src/main/java/org/apache/sling/sitemap/impl/console/SitemapInventoryPlugin.java
[2]
https://github.com/apache/sling-whiteboard/blob/master/sitemap/src/main/java/org/apache/sling/sitemap/SitemapService.java#L39

On Fri, 4 Jun 2021 at 22:11, Daniel Klco <dk...@apache.org> wrote:

> Dirk,
>
> Looks great to me! A couple of thoughts:
>
>  - I like this 10x better than my hacked together script in Sling CMS :-)
>  - I don't think a different externalization method has been implemented. I
> like this one, but I'm not sure it's appropriate for the scope of this
> bundle. Even just a minor rename to have it be SitemapLinkExternalizer or
> something similar might make more sense as even if a more general solution
> comes available, there could be legitimate reasons for an externalizer to
> work differently when generating a sitemap than other use cases
>  - How would cleanup work? Based on a cursory review of the code I'm
> assuming it'd have to check the repository for each sitemap to find ones
> that are no longer referenced? Sound about right?
>  - It'd be *nice* to have a Web Console or some means for an
> administrator / developer to understand what sitemaps currently exist and
> trigger regeneration (or if there's some better way let me know)
>
> Awesome work!
> -Dan
>
> On Fri, Jun 4, 2021 at 12:31 PM Dirk Rudolph <di...@apache.org> wrote:
>
> > Hi all,
> >
> > I added a new bundle for xml sitemap generation to the whiteboard [0] and
> > kindly want to ask for your feedback.
> >
> > The key highlights are:
> > - A simple, builder-like API to create Sitemaps, that hides all the XML
> > specifics
> > - Supports on-demand and background generation w/ continuation after job
> > interruption
> > - Support for nested sitemaps, that are automatically collected into a
> > sitemap indexes
> >
> > As this implementation depends on an actual project / product's content
> > structure, I created a sample implementation for the Sling CMS [1].
> >
> > I still have some open points on my list:
> > - Link externalization. IIRC there was a discussion to implement a
> general
> > approach  in Sling, has that been implemented?
> > - Housekeeping of old/obsolete sitemap files
> >
> > However, I wanted to start the discussion and ask - when there are no
> major
> > objections - if this contribution could make its own module?
> >
> > Best,
> > Dirk
> >
> > [0] https://github.com/apache/sling-whiteboard/tree/master/sitemap
> > [1]
> >
> >
> https://github.com/apache/sling-org-apache-sling-app-cms/compare/master...Buuhuu:feature/sitemap
> >
>

Re: Sling Sitemap Bundle

Posted by Daniel Klco <dk...@apache.org>.
Dirk,

Looks great to me! A couple of thoughts:

 - I like this 10x better than my hacked together script in Sling CMS :-)
 - I don't think a different externalization method has been implemented. I
like this one, but I'm not sure it's appropriate for the scope of this
bundle. Even just a minor rename to have it be SitemapLinkExternalizer or
something similar might make more sense as even if a more general solution
comes available, there could be legitimate reasons for an externalizer to
work differently when generating a sitemap than other use cases
 - How would cleanup work? Based on a cursory review of the code I'm
assuming it'd have to check the repository for each sitemap to find ones
that are no longer referenced? Sound about right?
 - It'd be *nice* to have a Web Console or some means for an
administrator / developer to understand what sitemaps currently exist and
trigger regeneration (or if there's some better way let me know)

Awesome work!
-Dan

On Fri, Jun 4, 2021 at 12:31 PM Dirk Rudolph <di...@apache.org> wrote:

> Hi all,
>
> I added a new bundle for xml sitemap generation to the whiteboard [0] and
> kindly want to ask for your feedback.
>
> The key highlights are:
> - A simple, builder-like API to create Sitemaps, that hides all the XML
> specifics
> - Supports on-demand and background generation w/ continuation after job
> interruption
> - Support for nested sitemaps, that are automatically collected into a
> sitemap indexes
>
> As this implementation depends on an actual project / product's content
> structure, I created a sample implementation for the Sling CMS [1].
>
> I still have some open points on my list:
> - Link externalization. IIRC there was a discussion to implement a general
> approach  in Sling, has that been implemented?
> - Housekeeping of old/obsolete sitemap files
>
> However, I wanted to start the discussion and ask - when there are no major
> objections - if this contribution could make its own module?
>
> Best,
> Dirk
>
> [0] https://github.com/apache/sling-whiteboard/tree/master/sitemap
> [1]
>
> https://github.com/apache/sling-org-apache-sling-app-cms/compare/master...Buuhuu:feature/sitemap
>