You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Nathan McGarvey <na...@gmail.com> on 2021/02/14 03:02:57 UTC

RPM Repository Metadata

To whom this reaches (@widodh, perhaps?):

    First of all, thank you for building binary distributions (rpm, deb)
of CloudStack.

    I am attempting to create a downstream rsync mirror of the
http://download.cloudstack.org/ centos/rhel repos. (Namely, centos and
systemvm) and noticed two oddities:

    1. The frequency with which the metadata is being rebuilt is
astronomical. E.g.
http://download.cloudstack.org/centos/8/4.15/repodata/ looks like it is
being fully rebuilt every hour, though the RPMs contained within haven't
been updated in over a month. Was this supposed to have a
--retain-old-md or --retain-old-md-by-age flag? The default for things
like RHEL 8 is 48 hours for metadata expiry, so re-generating the entire
repo every hour can (and does) cause caching issues.

    2. The metadata contained in
http://download.cloudstack.org/centos/8/4.15/repodata/repomd.xml makes
it virtually impossible to mirror since it points the <location> tag at
http://cloudstack.apt-get.eu/
        a. Most RPM repos (E.g. The CentOS official ones, just point the
<location> tag to the repodata/<hash>> of the data type without external
links via relative URI. (See
http://mirror.centos.org/centos-8/8.3.2011/BaseOS/x86_64/os/repodata/repomd.xml
, for example.)
        b. Putting the xml:base in there effectively makes it not a
mirror since anyone pointed at their local mirror will actually redirect
to whatever xml:base is set to.

    FWIW: It looks like cloudstack.apt-get.eu and
download.cloudstack.org are *probably* the same host, which likely means
the xml:base being set at all may not actually be doing anything useful.


    Please pardon my ignorance if these technical configurations are
intentional and have already been discussed as I am new poster to this
thread.


Thanks,
-Nathan McGarvey

Re: RPM Repository Metadata

Posted by Nathan McGarvey <na...@gmail.com>.
    You might also consider tying the createrepo command into something
that is trigger-based instead of time-based. (E.g. a systemd.path unit
or something with inotify, or if you have an automated publisher like
Jenkins or something, just put it as a post-publish command.)

    That --baseurl is also causing everyone to go directly to the source
instead, since their main repomd.xml will redirect them to the
non-mirror instead of downstream mirror, which may have different
filelist and primary and other files. You might save some bandwidth and
other headaches if you left that flag out so it is just a relative link
to wherever you run that as so:

    cd <path to repo>
    createrepo <other options without --baseurl> .

    Note the dot for the current directory.


    ...unless you want everyone to go directly to the upstream source.
I'm not sure what design you're going for.


   Thanks both for considering this so quickly.

Thanks,
-Nathan


On 2/16/21 1:59 AM, Wido den Hollander wrote:
> 
> 
> On 16/02/2021 07:00, Rohit Yadav wrote:
>> Wido - in that case should we disable metadata re-generating cronjobs
>> for both rpm/deb repos? We only need to regenerate repo metadata on a
>> new release. There's no need to do this via cron for official releases.
>>
> 
> I have disabled the CRON jobs for now for both RPM and DEB. Need to run
> them manually if we upload new packages.
> 
> Wido
> 
>>
>> Regards.
>>
>> ________________________________
>> From: Wido den Hollander <wi...@widodh.nl>
>> Sent: Monday, February 15, 2021 21:30
>> To: dev@cloudstack.apache.org <de...@cloudstack.apache.org>; Rohit Yadav
>> <ro...@shapeblue.com>
>> Subject: Re: RPM Repository Metadata
>>
>>
>>
>> On 15/02/2021 09:51, Rohit Yadav wrote:
>>> Hi Nathan,
>>>
>>> Thanks for reporting, I've been managing the rpm builds/repos on the
>>> server and I wasn't aware of this issue.
>>> I checked and found there's a hourly cron job that updates rpm repo
>>> metadata using:
>>>
>>>           createrepo --update --workers 1 --baseurl <other
>>> options/paths>
>>>
>>>
>>> Based on your suggestion, I've changed the script to include:
>>> "createrepo --update --retain-old-md <rest of the args...>".
>>>
>>> @Wido Hollander<ma...@pcextreme.nl> @Gabriel Beims
>>> Bräscher<ma...@pcextreme.nl> - any reason why we have the
>>> cron job to update repo metadata?
>>
>> No, I think it's just an oversight. It was setup and I don't think it
>> was very well thought of.
>>
>> The CRON is a very simple Shell script which probably can use some
>> attention.
>>
>> Wido
>>
>>>
>>> Regards.
>>>
>>> ________________________________
>>> From: Nathan McGarvey <na...@gmail.com>
>>> Sent: Sunday, February 14, 2021 08:32
>>> To: dev@cloudstack.apache.org <de...@cloudstack.apache.org>
>>> Subject: RPM Repository Metadata
>>>
>>> To whom this reaches (@widodh, perhaps?):
>>>
>>>       First of all, thank you for building binary distributions (rpm,
>>> deb)
>>> of CloudStack.
>>>
>>>       I am attempting to create a downstream rsync mirror of the
>>> http://download.cloudstack.org/ centos/rhel repos. (Namely, centos and
>>> systemvm) and noticed two oddities:
>>>
>>>       1. The frequency with which the metadata is being rebuilt is
>>> astronomical. E.g.
>>> http://download.cloudstack.org/centos/8/4.15/repodata/ looks like it is
>>> being fully rebuilt every hour, though the RPMs contained within haven't
>>> been updated in over a month. Was this supposed to have a
>>> --retain-old-md or --retain-old-md-by-age flag? The default for things
>>> like RHEL 8 is 48 hours for metadata expiry, so re-generating the entire
>>> repo every hour can (and does) cause caching issues.
>>>
>>>       2. The metadata contained in
>>> http://download.cloudstack.org/centos/8/4.15/repodata/repomd.xml makes
>>> it virtually impossible to mirror since it points the <location> tag at
>>> http://cloudstack.apt-get.eu/
>>>           a. Most RPM repos (E.g. The CentOS official ones, just
>>> point the
>>> <location> tag to the repodata/<hash>> of the data type without external
>>> links via relative URI. (See
>>> http://mirror.centos.org/centos-8/8.3.2011/BaseOS/x86_64/os/repodata/repomd.xml
>>>
>>> , for example.)
>>>           b. Putting the xml:base in there effectively makes it not a
>>> mirror since anyone pointed at their local mirror will actually redirect
>>> to whatever xml:base is set to.
>>>
>>>       FWIW: It looks like cloudstack.apt-get.eu and
>>> download.cloudstack.org are *probably* the same host, which likely means
>>> the xml:base being set at all may not actually be doing anything useful.
>>>
>>>
>>>       Please pardon my ignorance if these technical configurations are
>>> intentional and have already been discussed as I am new poster to this
>>> thread.
>>>
>>>
>>> Thanks,
>>> -Nathan McGarvey
>>>
>>> rohit.yadav@shapeblue.com
>>> www.shapeblue.com<http://www.shapeblue.com>
>>> 3 London Bridge Street,  3rd floor, News Building, London  SE1 9SGUK
>>> @shapeblue
>>>
>>>
>>>
>>>
>>
>> rohit.yadav@shapeblue.com
>> www.shapeblue.com
>> 3 London Bridge Street,  3rd floor, News Building, London  SE1 9SGUK
>> @shapeblue
>>     
>>

Re: RPM Repository Metadata

Posted by Wido den Hollander <wi...@widodh.nl>.

On 16/02/2021 07:00, Rohit Yadav wrote:
> Wido - in that case should we disable metadata re-generating cronjobs for both rpm/deb repos? We only need to regenerate repo metadata on a new release. There's no need to do this via cron for official releases.
> 

I have disabled the CRON jobs for now for both RPM and DEB. Need to run 
them manually if we upload new packages.

Wido

> 
> Regards.
> 
> ________________________________
> From: Wido den Hollander <wi...@widodh.nl>
> Sent: Monday, February 15, 2021 21:30
> To: dev@cloudstack.apache.org <de...@cloudstack.apache.org>; Rohit Yadav <ro...@shapeblue.com>
> Subject: Re: RPM Repository Metadata
> 
> 
> 
> On 15/02/2021 09:51, Rohit Yadav wrote:
>> Hi Nathan,
>>
>> Thanks for reporting, I've been managing the rpm builds/repos on the server and I wasn't aware of this issue.
>> I checked and found there's a hourly cron job that updates rpm repo metadata using:
>>
>>           createrepo --update --workers 1 --baseurl <other options/paths>
>>
>>
>> Based on your suggestion, I've changed the script to include: "createrepo --update --retain-old-md <rest of the args...>".
>>
>> @Wido Hollander<ma...@pcextreme.nl> @Gabriel Beims Bräscher<ma...@pcextreme.nl> - any reason why we have the cron job to update repo metadata?
> 
> No, I think it's just an oversight. It was setup and I don't think it
> was very well thought of.
> 
> The CRON is a very simple Shell script which probably can use some
> attention.
> 
> Wido
> 
>>
>> Regards.
>>
>> ________________________________
>> From: Nathan McGarvey <na...@gmail.com>
>> Sent: Sunday, February 14, 2021 08:32
>> To: dev@cloudstack.apache.org <de...@cloudstack.apache.org>
>> Subject: RPM Repository Metadata
>>
>> To whom this reaches (@widodh, perhaps?):
>>
>>       First of all, thank you for building binary distributions (rpm, deb)
>> of CloudStack.
>>
>>       I am attempting to create a downstream rsync mirror of the
>> http://download.cloudstack.org/ centos/rhel repos. (Namely, centos and
>> systemvm) and noticed two oddities:
>>
>>       1. The frequency with which the metadata is being rebuilt is
>> astronomical. E.g.
>> http://download.cloudstack.org/centos/8/4.15/repodata/ looks like it is
>> being fully rebuilt every hour, though the RPMs contained within haven't
>> been updated in over a month. Was this supposed to have a
>> --retain-old-md or --retain-old-md-by-age flag? The default for things
>> like RHEL 8 is 48 hours for metadata expiry, so re-generating the entire
>> repo every hour can (and does) cause caching issues.
>>
>>       2. The metadata contained in
>> http://download.cloudstack.org/centos/8/4.15/repodata/repomd.xml makes
>> it virtually impossible to mirror since it points the <location> tag at
>> http://cloudstack.apt-get.eu/
>>           a. Most RPM repos (E.g. The CentOS official ones, just point the
>> <location> tag to the repodata/<hash>> of the data type without external
>> links via relative URI. (See
>> http://mirror.centos.org/centos-8/8.3.2011/BaseOS/x86_64/os/repodata/repomd.xml
>> , for example.)
>>           b. Putting the xml:base in there effectively makes it not a
>> mirror since anyone pointed at their local mirror will actually redirect
>> to whatever xml:base is set to.
>>
>>       FWIW: It looks like cloudstack.apt-get.eu and
>> download.cloudstack.org are *probably* the same host, which likely means
>> the xml:base being set at all may not actually be doing anything useful.
>>
>>
>>       Please pardon my ignorance if these technical configurations are
>> intentional and have already been discussed as I am new poster to this
>> thread.
>>
>>
>> Thanks,
>> -Nathan McGarvey
>>
>> rohit.yadav@shapeblue.com
>> www.shapeblue.com<http://www.shapeblue.com>
>> 3 London Bridge Street,  3rd floor, News Building, London  SE1 9SGUK
>> @shapeblue
>>
>>
>>
>>
> 
> rohit.yadav@shapeblue.com
> www.shapeblue.com
> 3 London Bridge Street,  3rd floor, News Building, London  SE1 9SGUK
> @shapeblue
>    
>   
> 
> 

Re: RPM Repository Metadata

Posted by Rohit Yadav <ro...@shapeblue.com>.
Wido - in that case should we disable metadata re-generating cronjobs for both rpm/deb repos? We only need to regenerate repo metadata on a new release. There's no need to do this via cron for official releases.


Regards.

________________________________
From: Wido den Hollander <wi...@widodh.nl>
Sent: Monday, February 15, 2021 21:30
To: dev@cloudstack.apache.org <de...@cloudstack.apache.org>; Rohit Yadav <ro...@shapeblue.com>
Subject: Re: RPM Repository Metadata



On 15/02/2021 09:51, Rohit Yadav wrote:
> Hi Nathan,
>
> Thanks for reporting, I've been managing the rpm builds/repos on the server and I wasn't aware of this issue.
> I checked and found there's a hourly cron job that updates rpm repo metadata using:
>
>          createrepo --update --workers 1 --baseurl <other options/paths>
>
>
> Based on your suggestion, I've changed the script to include: "createrepo --update --retain-old-md <rest of the args...>".
>
> @Wido Hollander<ma...@pcextreme.nl> @Gabriel Beims Bräscher<ma...@pcextreme.nl> - any reason why we have the cron job to update repo metadata?

No, I think it's just an oversight. It was setup and I don't think it
was very well thought of.

The CRON is a very simple Shell script which probably can use some
attention.

Wido

>
> Regards.
>
> ________________________________
> From: Nathan McGarvey <na...@gmail.com>
> Sent: Sunday, February 14, 2021 08:32
> To: dev@cloudstack.apache.org <de...@cloudstack.apache.org>
> Subject: RPM Repository Metadata
>
> To whom this reaches (@widodh, perhaps?):
>
>      First of all, thank you for building binary distributions (rpm, deb)
> of CloudStack.
>
>      I am attempting to create a downstream rsync mirror of the
> http://download.cloudstack.org/ centos/rhel repos. (Namely, centos and
> systemvm) and noticed two oddities:
>
>      1. The frequency with which the metadata is being rebuilt is
> astronomical. E.g.
> http://download.cloudstack.org/centos/8/4.15/repodata/ looks like it is
> being fully rebuilt every hour, though the RPMs contained within haven't
> been updated in over a month. Was this supposed to have a
> --retain-old-md or --retain-old-md-by-age flag? The default for things
> like RHEL 8 is 48 hours for metadata expiry, so re-generating the entire
> repo every hour can (and does) cause caching issues.
>
>      2. The metadata contained in
> http://download.cloudstack.org/centos/8/4.15/repodata/repomd.xml makes
> it virtually impossible to mirror since it points the <location> tag at
> http://cloudstack.apt-get.eu/
>          a. Most RPM repos (E.g. The CentOS official ones, just point the
> <location> tag to the repodata/<hash>> of the data type without external
> links via relative URI. (See
> http://mirror.centos.org/centos-8/8.3.2011/BaseOS/x86_64/os/repodata/repomd.xml
> , for example.)
>          b. Putting the xml:base in there effectively makes it not a
> mirror since anyone pointed at their local mirror will actually redirect
> to whatever xml:base is set to.
>
>      FWIW: It looks like cloudstack.apt-get.eu and
> download.cloudstack.org are *probably* the same host, which likely means
> the xml:base being set at all may not actually be doing anything useful.
>
>
>      Please pardon my ignorance if these technical configurations are
> intentional and have already been discussed as I am new poster to this
> thread.
>
>
> Thanks,
> -Nathan McGarvey
>
> rohit.yadav@shapeblue.com
> www.shapeblue.com<http://www.shapeblue.com>
> 3 London Bridge Street,  3rd floor, News Building, London  SE1 9SGUK
> @shapeblue
>
>
>
>

rohit.yadav@shapeblue.com 
www.shapeblue.com
3 London Bridge Street,  3rd floor, News Building, London  SE1 9SGUK
@shapeblue
  
 


Re: RPM Repository Metadata

Posted by Wido den Hollander <wi...@widodh.nl>.

On 15/02/2021 09:51, Rohit Yadav wrote:
> Hi Nathan,
> 
> Thanks for reporting, I've been managing the rpm builds/repos on the server and I wasn't aware of this issue.
> I checked and found there's a hourly cron job that updates rpm repo metadata using:
> 
>          createrepo --update --workers 1 --baseurl <other options/paths>
> 
> 
> Based on your suggestion, I've changed the script to include: "createrepo --update --retain-old-md <rest of the args...>".
> 
> @Wido Hollander<ma...@pcextreme.nl> @Gabriel Beims Bräscher<ma...@pcextreme.nl> - any reason why we have the cron job to update repo metadata?

No, I think it's just an oversight. It was setup and I don't think it 
was very well thought of.

The CRON is a very simple Shell script which probably can use some 
attention.

Wido

> 
> Regards.
> 
> ________________________________
> From: Nathan McGarvey <na...@gmail.com>
> Sent: Sunday, February 14, 2021 08:32
> To: dev@cloudstack.apache.org <de...@cloudstack.apache.org>
> Subject: RPM Repository Metadata
> 
> To whom this reaches (@widodh, perhaps?):
> 
>      First of all, thank you for building binary distributions (rpm, deb)
> of CloudStack.
> 
>      I am attempting to create a downstream rsync mirror of the
> http://download.cloudstack.org/ centos/rhel repos. (Namely, centos and
> systemvm) and noticed two oddities:
> 
>      1. The frequency with which the metadata is being rebuilt is
> astronomical. E.g.
> http://download.cloudstack.org/centos/8/4.15/repodata/ looks like it is
> being fully rebuilt every hour, though the RPMs contained within haven't
> been updated in over a month. Was this supposed to have a
> --retain-old-md or --retain-old-md-by-age flag? The default for things
> like RHEL 8 is 48 hours for metadata expiry, so re-generating the entire
> repo every hour can (and does) cause caching issues.
> 
>      2. The metadata contained in
> http://download.cloudstack.org/centos/8/4.15/repodata/repomd.xml makes
> it virtually impossible to mirror since it points the <location> tag at
> http://cloudstack.apt-get.eu/
>          a. Most RPM repos (E.g. The CentOS official ones, just point the
> <location> tag to the repodata/<hash>> of the data type without external
> links via relative URI. (See
> http://mirror.centos.org/centos-8/8.3.2011/BaseOS/x86_64/os/repodata/repomd.xml
> , for example.)
>          b. Putting the xml:base in there effectively makes it not a
> mirror since anyone pointed at their local mirror will actually redirect
> to whatever xml:base is set to.
> 
>      FWIW: It looks like cloudstack.apt-get.eu and
> download.cloudstack.org are *probably* the same host, which likely means
> the xml:base being set at all may not actually be doing anything useful.
> 
> 
>      Please pardon my ignorance if these technical configurations are
> intentional and have already been discussed as I am new poster to this
> thread.
> 
> 
> Thanks,
> -Nathan McGarvey
> 
> rohit.yadav@shapeblue.com
> www.shapeblue.com
> 3 London Bridge Street,  3rd floor, News Building, London  SE1 9SGUK
> @shapeblue
>    
>   
> 
> 

Re: RPM Repository Metadata

Posted by Rohit Yadav <ro...@shapeblue.com>.
Hi Nathan,

Thanks for reporting, I've been managing the rpm builds/repos on the server and I wasn't aware of this issue.
I checked and found there's a hourly cron job that updates rpm repo metadata using:

        createrepo --update --workers 1 --baseurl <other options/paths>


Based on your suggestion, I've changed the script to include: "createrepo --update --retain-old-md <rest of the args...>".

@Wido Hollander<ma...@pcextreme.nl> @Gabriel Beims Bräscher<ma...@pcextreme.nl> - any reason why we have the cron job to update repo metadata?

Regards.

________________________________
From: Nathan McGarvey <na...@gmail.com>
Sent: Sunday, February 14, 2021 08:32
To: dev@cloudstack.apache.org <de...@cloudstack.apache.org>
Subject: RPM Repository Metadata

To whom this reaches (@widodh, perhaps?):

    First of all, thank you for building binary distributions (rpm, deb)
of CloudStack.

    I am attempting to create a downstream rsync mirror of the
http://download.cloudstack.org/ centos/rhel repos. (Namely, centos and
systemvm) and noticed two oddities:

    1. The frequency with which the metadata is being rebuilt is
astronomical. E.g.
http://download.cloudstack.org/centos/8/4.15/repodata/ looks like it is
being fully rebuilt every hour, though the RPMs contained within haven't
been updated in over a month. Was this supposed to have a
--retain-old-md or --retain-old-md-by-age flag? The default for things
like RHEL 8 is 48 hours for metadata expiry, so re-generating the entire
repo every hour can (and does) cause caching issues.

    2. The metadata contained in
http://download.cloudstack.org/centos/8/4.15/repodata/repomd.xml makes
it virtually impossible to mirror since it points the <location> tag at
http://cloudstack.apt-get.eu/
        a. Most RPM repos (E.g. The CentOS official ones, just point the
<location> tag to the repodata/<hash>> of the data type without external
links via relative URI. (See
http://mirror.centos.org/centos-8/8.3.2011/BaseOS/x86_64/os/repodata/repomd.xml
, for example.)
        b. Putting the xml:base in there effectively makes it not a
mirror since anyone pointed at their local mirror will actually redirect
to whatever xml:base is set to.

    FWIW: It looks like cloudstack.apt-get.eu and
download.cloudstack.org are *probably* the same host, which likely means
the xml:base being set at all may not actually be doing anything useful.


    Please pardon my ignorance if these technical configurations are
intentional and have already been discussed as I am new poster to this
thread.


Thanks,
-Nathan McGarvey

rohit.yadav@shapeblue.com 
www.shapeblue.com
3 London Bridge Street,  3rd floor, News Building, London  SE1 9SGUK
@shapeblue