You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Mark Phippard <ma...@gmail.com> on 2020/11/04 20:32:25 UTC

svn.haxx.se is going away

Just a general fyi ... I went to https://svn.haxx.se/ today to search the
lists and noticed there is a banner on the site saying it is going offline
forever soon.

I am not sure what the ramifications will be as I know there are a lot of
historical links in the docs and site but I guess it is what it is.

-- 
Thanks

Mark Phippard

Re: svn.haxx.se is going away

Posted by David Chapman <dc...@acm.org>.
On 11/4/2020 12:32 PM, Mark Phippard wrote:
> Just a general fyi ... I went to https://svn.haxx.se/ 
> <https://svn.haxx.se/> today to search the lists and noticed there is 
> a banner on the site saying it is going offline forever soon.
>
> I am not sure what the ramifications will be as I know there are a lot 
> of historical links in the docs and site but I guess it is what it is.
>
> -- 
> Thanks
>
> Mark Phippard

Daniel Stenberg is in the process of moving the Curl Web site from 
https://curl.haxx.se/ to https://www.curl.se/.  I'm not sure why 
https://svn.haxx.se/ is not following along (there is no 
https://svn.curl.se as of a few minutes ago), but then again I'm just a 
Curl user, not a dev.  Curl development is hosted on Github, so maybe he 
lost interest in a Subversion archive?

-- 
     David Chapman      dcchapman@acm.org
     Chapman Consulting -- San Jose, CA
     EDA Software Developer, Expert Witness
     www.chapman-consulting-sj.com
     2018-2019 Chair, IEEE Consultants' Network of Silicon Valley


Re: svn.haxx.se is going away

Posted by David Chapman <dc...@acm.org>.
On 11/4/2020 12:32 PM, Mark Phippard wrote:
> Just a general fyi ... I went to https://svn.haxx.se/ 
> <https://svn.haxx.se/> today to search the lists and noticed there is 
> a banner on the site saying it is going offline forever soon.
>
> I am not sure what the ramifications will be as I know there are a lot 
> of historical links in the docs and site but I guess it is what it is.
>
> -- 
> Thanks
>
> Mark Phippard

Daniel Stenberg is in the process of moving the Curl Web site from 
https://curl.haxx.se/ to https://www.curl.se/.  I'm not sure why 
https://svn.haxx.se/ is not following along (there is no 
https://svn.curl.se as of a few minutes ago), but then again I'm just a 
Curl user, not a dev.  Curl development is hosted on Github, so maybe he 
lost interest in a Subversion archive?

-- 
     David Chapman      dcchapman@acm.org
     Chapman Consulting -- San Jose, CA
     EDA Software Developer, Expert Witness
     www.chapman-consulting-sj.com
     2018-2019 Chair, IEEE Consultants' Network of Silicon Valley


Re: svn.haxx.se is going away

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Nathan Hartman wrote on Wed, 04 Nov 2020 16:32 -0500:
> On Wed, Nov 4, 2020 at 3:32 PM Mark Phippard <ma...@gmail.com> wrote:
> >
> > Just a general fyi ... I went to https://svn.haxx.se/ today to search the lists and noticed there is a banner on the site saying it is going offline forever soon.
> >
> > I am not sure what the ramifications will be as I know there are a lot of historical links in the docs and site but I guess it is what it is.  
> 
> Daniel (danielsh) has been trying to get Infra to import the material
> from pre-2009 (pre-migration to ASF) into lists.apache.org to avoid
> losing the archives from the earliest period of development, which
> arguably contain some of the most important development information.
> 
> See the discussion here:
> https://lists.apache.org/thread.html/r97c9c5208af706b067fd8e67a7cbe79b37255958bb087bf699b722f8%40%3Cdev.subversion.apache.org%3E
> 
> Possibly it's still mirrored at home.apache.org but I can't check at the moment.

It is —

% ssh home.apache.org du -hs /home/danielsh/svn-haxx-se-mirror
245M    /home/danielsh/svn-haxx-se-mirror
% ssh svn-qavm.apache.org du -hs /x1/svn-haxx-se-mirror 
245M    /x1/svn-haxx-se-mirror

— but I don't know that either of these is backed up, so please someone
rsync either of those [they're identical] to their own hardware.

Cheers,

Daniel

Re: svn.haxx.se is going away

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Sahlberg wrote on Thu, 05 Nov 2020 11:16 +0100:
> Den ons 4 nov. 2020 kl 22:32 skrev Nathan Hartman <hartman.nathan@gmail.com
> >:  
> 
> > On Wed, Nov 4, 2020 at 3:32 PM Mark Phippard <ma...@gmail.com> wrote:  
> > >
> > > Just a general fyi ... I went to https://svn.haxx.se/ today to search  
> > the lists and noticed there is a banner on the site saying it is going
> > offline forever soon.  
> > >
> > > I am not sure what the ramifications will be as I know there are a lot  
> > of historical links in the docs and site but I guess it is what it is.
> >
> > Daniel (danielsh) has been trying to get Infra to import the material
> > from pre-2009 (pre-migration to ASF) into lists.apache.org to avoid
> > losing the archives from the earliest period of development, which
> > arguably contain some of the most important development information.
> >
> > See the discussion here:
> >
> > https://lists.apache.org/thread.html/r97c9c5208af706b067fd8e67a7cbe79b37255958bb087bf699b722f8%40%3Cdev.subversion.apache.org%3E
> >

And https://issues.apache.org/jira/browse/INFRA-20213

> > Possibly it's still mirrored at home.apache.org but I can't check at the
> > moment.
> >
> > Nathan
> >  
> 
> Would it be considered a good thing if we manage to keep svn.haxx.se
> around? Even if Infra would get the old lists imported (I don't know what's
> holding them back), there are a bunch of references to the archives in the
> source (63 if I'm counting correctly), and in the website (87).
> 

Those in the website should be covered by
site/publish/.message-ids.tsv.  (See site/tools/ for the generating
scripts.)

The logic for converting the message-ids into URLs is embedded in [1]
(which I have tried to make discoverable, [2], but that seems to have
regressed, and I'm ENOTIME to chase it).

[1] https://svn.apache.org/repos/infra/infrastructure/trunk/projects/asf-generate-mail-archives-link
[2] https://issues.apache.org/jira/browse/INFRA-19422

> I have reached out to Daniel Stenberg and he seems willing to discuss to
> point the domain name to another server. I could probably volunteer to keep
> the site alive, provided there is an agreement within @Dev this is a good
> thing. Or is it better to just do the job and update the sources and
> website?

We should keep old links working, if possible.  Ideally, not only links
we happen to have lying around, but also other links (e.g., in people's
non-public branches of https://github.com/apache/subversion).

There's more than one way to preserve links (redirecting old URLs to
new URLs for the same messages; keeping the site online but not
updating; keeping the site online and updating, on ASF hardware, e.g.,
svn-qavm.a.o; etc.).  Any and all assistance would be most welcome!

> (Daniel S... seems to be a popular name!)

It is, yes.  And then there are people like danderson, who aren't named
"Daniel" but still get in the way of tab-completing Daniels ☺

Cheers,

Daniel

Re: svn.haxx.se is going away

Posted by Nathan Hartman <ha...@gmail.com>.
On Thu, Nov 26, 2020 at 4:16 PM Daniel Sahlberg <da...@gmail.com>
wrote:

>
> Den ons 25 nov. 2020 kl 16:40 skrev Greg Stein <gs...@gmail.com>:
>
>> On Wed, Nov 25, 2020 at 8:52 AM Daniel Sahlberg <
>> daniel.l.sahlberg@gmail.com> wrote:
>> >...
>>
>>> As for the question in your other mail (the reply to Daniel Shahaf)
>>> regarding the desire to keep the URLs. My initial question to Daniel
>>> Stenberg was if they would consider CNAME:ing svn.haxx.se to my server
>>> and he seemed ok with that, however we have not reached a formal agreement.
>>> I assume it would be even easier for him to CNAME it to a server provided
>>> by the ASF.
>>>
>>
>> I would suggest a CNAME to svn-haxx.apache.org, which Infra would
>> further CNAME to (say) svn-qavm. That would mean Mr Stenberg wouldn't ever
>> need to alter his CNAME record, while the ASF could repoint svn-haxx.a.o
>> at-will over time. Today, the "301 mapping server" could be svn-qavm, but
>> maybe we'd do something different in a few years
>>
>
> Now that we seem to have an agreement with Daniel Stenberg, what is the
> next step? You mentioned something about volunteer time, so what can we do?
>
> Do you want to have a 301 redirect from the old urls to another archive,
> or is it acceptable to just keep the site as it is? The site is mostly HTML
> files with some SSIs. In addition there are a few CGI scripts (perl-based)
> but I think these can be replaced with static HTML pages. So if we can keep
> the site as it is, then it is is "just" a matter of providing a VHOST where
> the site can be uploaded.
>

I think the first steps should be getting the site as-is uploaded to
whatever location Greg indicates and coordinating the CNAME setup with
Daniel Stenberg. This will ensure the archive to date remains accessible.

Then we can discuss what steps to take next, what form the archive should
take, etc.

Cheers,
Nathan

Re: svn.haxx.se is going away

Posted by Daniel Sahlberg <da...@gmail.com>.
Den ons 25 nov. 2020 kl 16:40 skrev Greg Stein <gs...@gmail.com>:

> On Wed, Nov 25, 2020 at 8:52 AM Daniel Sahlberg <
> daniel.l.sahlberg@gmail.com> wrote:
> >...
>
>> As for the question in your other mail (the reply to Daniel Shahaf)
>> regarding the desire to keep the URLs. My initial question to Daniel
>> Stenberg was if they would consider CNAME:ing svn.haxx.se to my server
>> and he seemed ok with that, however we have not reached a formal agreement.
>> I assume it would be even easier for him to CNAME it to a server provided
>> by the ASF.
>>
>
> I would suggest a CNAME to svn-haxx.apache.org, which Infra would further
> CNAME to (say) svn-qavm. That would mean Mr Stenberg wouldn't ever need to
> alter his CNAME record, while the ASF could repoint svn-haxx.a.o at-will
> over time. Today, the "301 mapping server" could be svn-qavm, but maybe
> we'd do something different in a few years
>

Now that we seem to have an agreement with Daniel Stenberg, what is the
next step? You mentioned something about volunteer time, so what can we do?

Do you want to have a 301 redirect from the old urls to another archive, or
is it acceptable to just keep the site as it is? The site is mostly HTML
files with some SSIs. In addition there are a few CGI scripts (perl-based)
but I think these can be replaced with static HTML pages. So if we can keep
the site as it is, then it is is "just" a matter of providing a VHOST where
the site can be uploaded.

Kind regards,
Daniel Sahlberg

Re: svn.haxx.se is going away

Posted by Daniel Stenberg <da...@haxx.se>.
On Wed, 25 Nov 2020, Nathan Hartman wrote:

>> I would suggest a CNAME to svn-haxx.apache.org, which Infra would further 
>> CNAME to (say) svn-qavm. That would mean Mr Stenberg wouldn't ever need to 
>> alter his CNAME record, while the ASF could repoint svn-haxx.a.o at-will 
>> over time. Today, the "301 mapping server" could be svn-qavm, but maybe 
>> we'd do something different in a few years.

That sounds like a perfect solution from my point of view. I've admined the 
haxx.se domain for two decades already and for all I know, I will probably do 
it for a few more so the svn.haxx.se name should be okay for a good while 
ahead.

If you do what you need to do in your end to get everything setup, and when 
you think you're good to go I'll be ready to turn it into a CNAME on your 
call. There's no immediate hurry, the machine will be running for a while 
longer, most likely through the rest of the year and a little more. I'll leave 
the rsync setup there for now if you want to update anything during the 
transition period.


Some old-timers may remember: I was involved in the Subversion project early 
on, and I did a few commits and it was in that period I figured I'd setup an 
archive of the lists to make it easier to search and link to posts. I then 
dropped off the project, but I've left the archive running.

The simple reason I stop now is that this service runs on a very old physical 
machine that we fear will die at some point and we're moving everything over 
elsewhere before that happens, and I just couldn't muster the will to do the 
necessary work for this site to do the jump.

I hope you understand. The fact that you want to work on maintaining the 
archive at least tells me it has been a good service and that it has servered 
the project - and that warms my heart.

-- 

  / daniel.haxx.se

Re: svn.haxx.se is going away

Posted by Nathan Hartman <ha...@gmail.com>.
Hi all,

I'm CC'ing Daniel Stenberg of svn.haxx.se fame, as it will be easier to
have everyone in the loop.

Excuse me for using last names but we have at least 3 Daniel S's in the
conversation :-)

Just to update everyone, Mr. Stenberg very kindly setup a rsync that
allowed Mr. Sahlberg and I to download the entire svn.haxx.se site. This
was a 6.5 GB download of ~450,000 files comprising all the mbox files and
all the .shtml files, which are generated by hypermail 2.2.0. This backup
should alleviate any concerns about preserving the URLs found throughout
our code, site, logs, and the mail archives themselves.

@Mr. Stenberg, would the following be agreeable to you:

On Wed, Nov 25, 2020 at 10:41 AM Greg Stein <gs...@gmail.com> wrote:

>
> I would suggest a CNAME to svn-haxx.apache.org, which Infra would further
> CNAME to (say) svn-qavm. That would mean Mr Stenberg wouldn't ever need to
> alter his CNAME record, while the ASF could repoint svn-haxx.a.o at-will
> over time. Today, the "301 mapping server" could be svn-qavm, but maybe
> we'd do something different in a few years.
>

Cheers,
Nathan

Re: svn.haxx.se is going away

Posted by Greg Stein <gs...@gmail.com>.
On Wed, Nov 25, 2020 at 8:52 AM Daniel Sahlberg <da...@gmail.com>
wrote:
>...

> As for the question in your other mail (the reply to Daniel Shahaf)
> regarding the desire to keep the URLs. My initial question to Daniel
> Stenberg was if they would consider CNAME:ing svn.haxx.se to my server
> and he seemed ok with that, however we have not reached a formal agreement.
> I assume it would be even easier for him to CNAME it to a server provided
> by the ASF.
>

I would suggest a CNAME to svn-haxx.apache.org, which Infra would further
CNAME to (say) svn-qavm. That would mean Mr Stenberg wouldn't ever need to
alter his CNAME record, while the ASF could repoint svn-haxx.a.o at-will
over time. Today, the "301 mapping server" could be svn-qavm, but maybe
we'd do something different in a few years.

Cheers,
-g

Re: svn.haxx.se is going away

Posted by Daniel Sahlberg <da...@gmail.com>.
Den ons 25 nov. 2020 kl 07:22 skrev Greg Stein <gs...@gmail.com>:

> On Thu, Nov 12, 2020 at 10:47 AM Daniel Sahlberg <
> daniel.l.sahlberg@gmail.com> wrote:
>
>> Den tors 5 nov. 2020 kl 15:31 skrev Julian Foad <ju...@foad.me.uk>:
>>
>>> Main point: Thanks to everyone helping this preservation effort.
>>>
>>> > * updating the 63+87 links in the site and source to point to links
>>> hosted on ASF hardware
>>> >
>>> Observation: s/hardware/domain/. While the ASF has long promoted "on our
>>> own hardware", the more critical and often under-valued key to keeping
>>> control of one's Internet assets is "on our own domain name". That's
>>> assumed in this context, but something to keep in mind elsewhere.
>>>
>>
>> Agreeing with Julian's point on "on our own domain name", however this is
>> as it is. If we can get an agreement regarding keeping svn.haxx.se
>> pointing to a server where, at least, the old mailing list archive is
>> available then we would be better off.
>>
>> Could ASF provide this server space (basically a VirtualHost)? The
>> archive is about 6.5 GB so it is not a huge amount.
>>
>
> Well, svn-qavm.a.o already exists, and DShahaf has already moved content
> there. I think the larger concern is if a "redirect mapping" server were
> stood up to capture svn.haxx.se clicks and redirect them, then to ...
> where?
>
> In INFRA-20213, we noted that mail-archives.a.o is going away. Our end
> goal is lists.a.o, so that is where the content needs to be migrated.
>
> We (Infra) have a lot of issues with loading archival data onto lists.a.o.
> We have a bunch of it, there are permalink issues, and it is going to a
> long slog. So there is also the issue for the svn community to determine
> whether it wants to fill the gap or maybe throw in some volunteer infra
> time to help sort through our backlog. (access to archival messages has
> generally been lower priority; volunteers welcome)
>
> No issues on the storage. It's all about servicing up a landing page for
> $oldHaxxLink.´
>

Wouldn't the easiest way be to just serve up the old HTML pages (that has
been generated from the mbox:es by hypermail)? Then we won't need to
consider permalinks, redirects, etc. There would be some duplicate storage
but a lot less time required to set it up. No need to have it updated with
recent mails, just put a note that mails newer than [some date] can be
found at lists.a.o.

As for the question in your other mail (the reply to Daniel Shahaf)
regarding the desire to keep the URLs. My initial question to Daniel
Stenberg was if they would consider CNAME:ing svn.haxx.se to my server and
he seemed ok with that, however we have not reached a formal agreement. I
assume it would be even easier for him to CNAME it to a server provided by
the ASF.

Kind regards,
Daniel Sahlberg

Re: svn.haxx.se is going away

Posted by Greg Stein <gs...@gmail.com>.
On Thu, Nov 12, 2020 at 10:47 AM Daniel Sahlberg <
daniel.l.sahlberg@gmail.com> wrote:

> Den tors 5 nov. 2020 kl 15:31 skrev Julian Foad <ju...@foad.me.uk>:
>
>> Main point: Thanks to everyone helping this preservation effort.
>>
>> > * updating the 63+87 links in the site and source to point to links
>> hosted on ASF hardware
>> >
>> Observation: s/hardware/domain/. While the ASF has long promoted "on our
>> own hardware", the more critical and often under-valued key to keeping
>> control of one's Internet assets is "on our own domain name". That's
>> assumed in this context, but something to keep in mind elsewhere.
>>
>
> Agreeing with Julian's point on "on our own domain name", however this is
> as it is. If we can get an agreement regarding keeping svn.haxx.se
> pointing to a server where, at least, the old mailing list archive is
> available then we would be better off.
>
> Could ASF provide this server space (basically a VirtualHost)? The archive
> is about 6.5 GB so it is not a huge amount.
>

Well, svn-qavm.a.o already exists, and DShahaf has already moved content
there. I think the larger concern is if a "redirect mapping" server were
stood up to capture svn.haxx.se clicks and redirect them, then to ... where?

In INFRA-20213, we noted that mail-archives.a.o is going away. Our end goal
is lists.a.o, so that is where the content needs to be migrated.

We (Infra) have a lot of issues with loading archival data onto lists.a.o.
We have a bunch of it, there are permalink issues, and it is going to a
long slog. So there is also the issue for the svn community to determine
whether it wants to fill the gap or maybe throw in some volunteer infra
time to help sort through our backlog. (access to archival messages has
generally been lower priority; volunteers welcome)

No issues on the storage. It's all about servicing up a landing page for
$oldHaxxLink.

Cheers,
Greg Stein
Infrastructure Administrator, ASF

Re: svn.haxx.se is going away

Posted by Greg Stein <gs...@gmail.com>.
On Fri, Dec 25, 2020 at 11:17 AM Daniel Shahaf <d....@daniel.shahaf.name>
wrote:
>...

> > I'll figure out a way to have the mboxes downloadable. If I understand
> > Google's documentation of robots.txt they don't care about robots.txt if
> a
> > specific URL is linked from somewhere indexable, they will index it
> anyway.
> > Maybe just make one big tarball of everything?
>
> One big tarball would be wasteful to consume (would have to download
> everything) and to produce (would need to, basically, «cp everything.tgz
> tmp.tgz; tar -zcf - $new >> tmp.tgz; mv tmp.tgz everything.tgz», and you
> can
> see that's O(#everything) rather than O(appended stuff)).  Would rather
> avoid
> it if possible.
>
> Not sure what to do about robots.  I suppose we could set <link
> rel="canonical"> in the HTTP headers when serving the rfc822 files (example
> in <https://en.wikipedia.org/wiki/Canonical_link_element#HTTP>)?
>

I thought robots.txt can exclude subdirs. So just cut off (say)
svn-haxx.apache.org/mbox/

I'm not too worried about Google crawling the mboxes, as they'll likely do
it just once and never again (by keeping the etag and/or mtime).

>...

> > I couldn't figure out puppet, the links was 404 for me. I've created a
> > request in Jira and I hope someone will take a look:
> > https://issues.apache.org/jira/browse/INFRA-21230
>
> I think the github repository is restricted to Apache committers only, so
> you'll need to enter your github username on id.apache.org in order to get
> access to that URL.  If you don't have a github account, there ought to be
> a mirror of the repository on *.apache.org somewhere (at least, if Infra's
> following the same policy PMCs do).
>

Correct: committers only. And only after linking accounts via
https://gitbox.apache.org/setup/ as Nathan noted (and we forgot to mention
to DSahlberg).

If you do not have a GitHub account, or do not want one (say, because you
don't want to accept their T&Cs), then you can use the repository via
gitbox.apache.org (ask on Slack for the link; I prefer not to post it here).

Cheers,
-g

Re: svn.haxx.se is going away

Posted by Nathan Hartman <ha...@gmail.com>.
On Fri, Dec 25, 2020 at 12:17 PM Daniel Shahaf <d....@daniel.shahaf.name>
wrote:

> Daniel Sahlberg wrote on Thu, Dec 24, 2020 at 20:38:17 +0100:
>
> > I couldn't figure out puppet, the links was 404 for me. I've created a
> > request in Jira and I hope someone will take a look:
> > https://issues.apache.org/jira/browse/INFRA-21230
>
> I think the github repository is restricted to Apache committers only, so
> you'll need to enter your github username on id.apache.org in order to get
> access to that URL.


And (if you're going the GitHub route) setup 2 factor authentication. I
think it won't work without that. See:
https://gitbox.apache.org/setup/

Cheers,
Nathan

Re: svn.haxx.se is going away

Posted by Daniel Sahlberg <da...@gmail.com>.
Den tors 21 jan. 2021 17:18Daniel Shahaf <d....@daniel.shahaf.name> skrev:

> Daniel Sahlberg wrote on Thu, 21 Jan 2021 07:37 +00:00:
> > Also modified the server setup to redirect any requests for
> > http://svn-haxx.apache.org to https://svn.haxx.se. Only problem now is
> > if someone tries to access https://svn-haxx.apache.org, it will give a
> > certificate warning. I don't really see how we can avoid it without
> > having svn-haxx.a.o in the certificate. (We can't redirect the
> > svn-haxx.a.o DNS entry to another box since the whole purpose of that
> > entry is to be a CNAME target). Anyhow, nobody should be browsing that
> > URL anyway, it shouldn't exist anywhere except in a few mails in dev@.
> >
>
> I'm not concerned about people using unofficial URLs.
>

Me neither. Are you happy with the setup as it is now?


> > > Second, I'm not happy about setting the address to dev@, for several
> > > reasons.  One, it's not development-related traffic.  Two, IIRC in
> Let's
> > > Encrypt the email address given is the "owner's" address, so if LE ever
> > > need to contact the PMC for whatever reason, automated or otherwise,
> > > that's the address they'd use.  Such traffic should go to private@.
> > >
> > > I'm aware you aren't on that list, Daniel.  We'll just have to loop
> > > you in on relevant threads.  That would mirror the ACL configuration
> > > (for /repos/private/pmc/subversion/machines, and, IIRC, Infra's puppet
> > > repos too).
> >
> > Switched to private@
>
> Thanks.  Nothing received so far (just like nothing was received on dev@
> previously).
>
> Cheers,
>
> Daniel
>

Re: svn.haxx.se is going away

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Sahlberg wrote on Thu, 21 Jan 2021 07:37 +00:00:
> Also modified the server setup to redirect any requests for 
> http://svn-haxx.apache.org to https://svn.haxx.se. Only problem now is 
> if someone tries to access https://svn-haxx.apache.org, it will give a 
> certificate warning. I don't really see how we can avoid it without 
> having svn-haxx.a.o in the certificate. (We can't redirect the 
> svn-haxx.a.o DNS entry to another box since the whole purpose of that 
> entry is to be a CNAME target). Anyhow, nobody should be browsing that 
> URL anyway, it shouldn't exist anywhere except in a few mails in dev@.
> 

I'm not concerned about people using unofficial URLs.

> > Second, I'm not happy about setting the address to dev@, for several
> > reasons.  One, it's not development-related traffic.  Two, IIRC in Let's
> > Encrypt the email address given is the "owner's" address, so if LE ever
> > need to contact the PMC for whatever reason, automated or otherwise,
> > that's the address they'd use.  Such traffic should go to private@.
> > 
> > I'm aware you aren't on that list, Daniel.  We'll just have to loop
> > you in on relevant threads.  That would mirror the ACL configuration
> > (for /repos/private/pmc/subversion/machines, and, IIRC, Infra's puppet
> > repos too).
> 
> Switched to private@

Thanks.  Nothing received so far (just like nothing was received on dev@ previously).

Cheers,

Daniel

Re: svn.haxx.se is going away

Posted by Daniel Sahlberg <da...@gmail.com>.
Den ons 20 jan. 2021 kl 16:57 skrev Daniel Shahaf <d....@daniel.shahaf.name>:

> Daniel Sahlberg wrote on Wed, 20 Jan 2021 07:12 +00:00:
> > Den ons 20 jan. 2021 kl 00:16 skrev Nathan Hartman <
> hartman.nathan@gmail.com>:
> > > On Mon, Jan 18, 2021 at 4:17 AM Daniel Sahlberg
> > > <da...@gmail.com> wrote:
> > > > * No SSL at the moment. I suggest to install certbot and a Let's
> encrypt certificate. (Should renewal notices go to dev@?)
> > >
> > > Is there any sensitive information in them? If yes, private@, if not,
> > > dev@ should be fine.
> >
> > Don't think there should be anything sensitive, worst case is probably
> > "Your site's certificate has not been updated and is now EOL, you need
> > to update". I've put it to dev@ and we might reconsider later on.
>
> I see you used CN=svn-haxx.apache.org on the certificate.
>
> Please run this by Infra.  It's conceivable that having a *.apache.org
> site that _doesn't_ use the wildcard cert might impact the wildcard's
> reputation in some way (e.g., break certificate pinning rules in
> plugins such as HTTPS Everywhere).
>
> [We aren't going to get a copy of the wildcard cert on a PMC VM, but
> Infra might do an SSL-terminating reverse proxy for us from a box
> they control.]
>

I removed svn-haxx.a.o from the certificate.

Also modified the server setup to redirect any requests for
http://svn-haxx.apache.org to https://svn.haxx.se. Only problem now is if
someone tries to access https://svn-haxx.apache.org, it will give a
certificate warning. I don't really see how we can avoid it without having
svn-haxx.a.o in the certificate. (We can't redirect the svn-haxx.a.o DNS
entry to another box since the whole purpose of that entry is to be a CNAME
target). Anyhow, nobody should be browsing that URL anyway, it shouldn't
exist anywhere except in a few mails in dev@.

Second, I'm not happy about setting the address to dev@, for several
> reasons.  One, it's not development-related traffic.  Two, IIRC in Let's
> Encrypt the email address given is the "owner's" address, so if LE ever
> need to contact the PMC for whatever reason, automated or otherwise,
> that's the address they'd use.  Such traffic should go to private@.
>
> I'm aware you aren't on that list, Daniel.  We'll just have to loop
> you in on relevant threads.  That would mirror the ACL configuration
> (for /repos/private/pmc/subversion/machines, and, IIRC, Infra's puppet
> repos too).
>

Switched to private@

Kind regards,
Daniel

Re: svn.haxx.se is going away

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Shahaf wrote on Wed, 20 Jan 2021 15:56 +00:00:
> I see you used CN=svn-haxx.apache.org on the certificate.
> 
> Please run this by Infra.  It's conceivable that having a *.apache.org
> site that _doesn't_ use the wildcard cert might impact the wildcard's
> reputation in some way (e.g., break certificate pinning rules in
> plugins such as HTTPS Everywhere).

Also, I kinda wonder what trademarks@ would think about the current
setup, since there's this old policy:

https://blogs.apache.org/foundation/entry/if_it_s_not_at "If it's not at apache.org, it's not from the Apache Software Foundation!" (2009)

In particular, I wonder whether they'd prefer that svn.haxx.se redirect
(as in, HTTP 301 responses) to some *.apache.org name.

Thanks for making this happen!  These are good problems to have :-)

Daniel

Re: svn.haxx.se is going away

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Sahlberg wrote on Wed, 20 Jan 2021 07:12 +00:00:
> Den ons 20 jan. 2021 kl 00:16 skrev Nathan Hartman <ha...@gmail.com>:
> > On Mon, Jan 18, 2021 at 4:17 AM Daniel Sahlberg
> > <da...@gmail.com> wrote:
> > > * No SSL at the moment. I suggest to install certbot and a Let's encrypt certificate. (Should renewal notices go to dev@?)
> > 
> > Is there any sensitive information in them? If yes, private@, if not,
> > dev@ should be fine.
> 
> Don't think there should be anything sensitive, worst case is probably 
> "Your site's certificate has not been updated and is now EOL, you need 
> to update". I've put it to dev@ and we might reconsider later on.

I see you used CN=svn-haxx.apache.org on the certificate.

Please run this by Infra.  It's conceivable that having a *.apache.org
site that _doesn't_ use the wildcard cert might impact the wildcard's
reputation in some way (e.g., break certificate pinning rules in
plugins such as HTTPS Everywhere).

[We aren't going to get a copy of the wildcard cert on a PMC VM, but
Infra might do an SSL-terminating reverse proxy for us from a box
they control.]

Second, I'm not happy about setting the address to dev@, for several
reasons.  One, it's not development-related traffic.  Two, IIRC in Let's
Encrypt the email address given is the "owner's" address, so if LE ever
need to contact the PMC for whatever reason, automated or otherwise,
that's the address they'd use.  Such traffic should go to private@.

I'm aware you aren't on that list, Daniel.  We'll just have to loop
you in on relevant threads.  That would mirror the ACL configuration
(for /repos/private/pmc/subversion/machines, and, IIRC, Infra's puppet
repos too).

Cheers,

Daniel

Re: svn.haxx.se is going away

Posted by Daniel Sahlberg <da...@gmail.com>.
Den ons 20 jan. 2021 kl 08:12 skrev Daniel Sahlberg <
daniel.l.sahlberg@gmail.com>:
[...]

> > * I will give it a few days for feedback, if nothing unexpected I will
>> ask Daniel Stenberg to switch the DNS to a CNAME.
>>
>
> I've asked for the DNS update now.
>

The DNS is updated and should have propagated by now.


> > * After the CNAME is installed (and propaged through the DNS system), I
>> will make a final rsync and update the site with any changes (in particular
>> all messages that has arrived since the last sync).
>>
>
I've made the final rsync and updated the site with any messages that has
arrived since december (just in case someone linked them from somewhere).


> > * mboxes are not yet published. I will add this after the final rsync.
>>
>
I've added  links to each index page.

Kind regards,
Daniel

Re: svn.haxx.se is going away

Posted by Daniel Sahlberg <da...@gmail.com>.
Den ons 20 jan. 2021 kl 00:16 skrev Nathan Hartman <hartman.nathan@gmail.com
>:

> On Mon, Jan 18, 2021 at 4:17 AM Daniel Sahlberg
> <da...@gmail.com> wrote:
> >
> > The site is now mirrored at http://svn-haxx.apache.org/ (it is also
> listening to http://svn.haxx.se, but the DNS is not yet updated). I
> encourage everyone to take a look and provide feedback.
>
> Thank you so much for driving this and getting it done!!
>
> Feedback -- it looks great. Just a couple of minor nits:
>
> On the main page, the first sentence ("This site was an unofficial
> Subversion related...") contains "archive archived"; it should be just
> "archive".
>
> Also the last sentence ("There is a link to the official archive on
> each page") should have a period.
>

Fixed, thanks!


> > * No SSL at the moment. I suggest to install certbot and a Let's encrypt
> certificate. (Should renewal notices go to dev@?)
>
> Is there any sensitive information in them? If yes, private@, if not,
> dev@ should be fine.
>

Don't think there should be anything sensitive, worst case is probably
"Your site's certificate has not been updated and is now EOL, you need to
update". I've put it to dev@ and we might reconsider later on.


> > * I will give it a few days for feedback, if nothing unexpected I will
> ask Daniel Stenberg to switch the DNS to a CNAME.
>

I've asked for the DNS update now.


> > * After the CNAME is installed (and propaged through the DNS system), I
> will make a final rsync and update the site with any changes (in particular
> all messages that has arrived since the last sync).
> > * mboxes are not yet published. I will add this after the final rsync.
>
> Once again, thank you for driving this.
>
> Nathan
>

Re: svn.haxx.se is going away

Posted by Daniel Sahlberg <da...@gmail.com>.
Den ons 20 jan. 2021 kl 08:54 skrev Andrew Marlow <ma...@gmail.com>:

> Hello everyone,
>
> It looks like a great job has been done on this but there is one little
> wrinkle I have found still. [svn.haxx.se] in the menu bar relocates,
> which is good but individual emails from the archive have that link at the
> bottom of the page still pointing to the obsolete site. Should those links
> also point to https://svn-haxx.apache.org while displaying [svn.haxx.se] ?
>

I believe this might be a problem if you were browsing the site using the
(unofficial - and for technical purposes only) url
https://svn-haxx.apache.org. The links should point to https://svn.haxx.se
which is still the official address.

In the meantime Daniel Stenberg has updated the DNS and the url
https://svn.haxx.se is now pointing at apache hardware (pending DNS
propagation delays). Can you check again on https://svn.haxx.se (possibly
waiting for up to an hour to clear DNS caches, you should see on the front
page if you are looking at the correct server).

Kind regards,
Daniel Sahlberg

Re: svn.haxx.se is going away

Posted by Andrew Marlow <ma...@gmail.com>.
Hello everyone,

It looks like a great job has been done on this but there is one little
wrinkle I have found still. [svn.haxx.se] in the menu bar relocates, which
is good but individual emails from the archive have that link at the bottom
of the page still pointing to the obsolete site. Should those links also
point to https://svn-haxx.apache.org while displaying [svn.haxx.se] ?

On Tue, 19 Jan 2021 at 23:16, Nathan Hartman <ha...@gmail.com>
wrote:

> On Mon, Jan 18, 2021 at 4:17 AM Daniel Sahlberg
> <da...@gmail.com> wrote:
> >
> > The site is now mirrored at http://svn-haxx.apache.org/ (it is also
> listening to http://svn.haxx.se, but the DNS is not yet updated). I
> encourage everyone to take a look and provide feedback.
>
> Thank you so much for driving this and getting it done!!
>
> Feedback -- it looks great. Just a couple of minor nits:
>
> On the main page, the first sentence ("This site was an unofficial
> Subversion related...") contains "archive archived"; it should be just
> "archive".
>
> Also the last sentence ("There is a link to the official archive on
> each page") should have a period.
>
> Other than that, everything looks good to me. I didn't find any other
> issues on any of the pages. The site works (yay!)...
>
> > * No SSL at the moment. I suggest to install certbot and a Let's encrypt
> certificate. (Should renewal notices go to dev@?)
>
> Is there any sensitive information in them? If yes, private@, if not,
> dev@ should be fine.
>
> > * I will give it a few days for feedback, if nothing unexpected I will
> ask Daniel Stenberg to switch the DNS to a CNAME.
> > * After the CNAME is installed (and propaged through the DNS system), I
> will make a final rsync and update the site with any changes (in particular
> all messages that has arrived since the last sync).
> > * mboxes are not yet published. I will add this after the final rsync.
>
> Once again, thank you for driving this.
>
> Nathan
>


-- 
Regards,

Andrew Marlow
http://www.andrewpetermarlow.co.uk

Re: svn.haxx.se is going away

Posted by Nathan Hartman <ha...@gmail.com>.
On Mon, Jan 18, 2021 at 4:17 AM Daniel Sahlberg
<da...@gmail.com> wrote:
>
> The site is now mirrored at http://svn-haxx.apache.org/ (it is also listening to http://svn.haxx.se, but the DNS is not yet updated). I encourage everyone to take a look and provide feedback.

Thank you so much for driving this and getting it done!!

Feedback -- it looks great. Just a couple of minor nits:

On the main page, the first sentence ("This site was an unofficial
Subversion related...") contains "archive archived"; it should be just
"archive".

Also the last sentence ("There is a link to the official archive on
each page") should have a period.

Other than that, everything looks good to me. I didn't find any other
issues on any of the pages. The site works (yay!)...

> * No SSL at the moment. I suggest to install certbot and a Let's encrypt certificate. (Should renewal notices go to dev@?)

Is there any sensitive information in them? If yes, private@, if not,
dev@ should be fine.

> * I will give it a few days for feedback, if nothing unexpected I will ask Daniel Stenberg to switch the DNS to a CNAME.
> * After the CNAME is installed (and propaged through the DNS system), I will make a final rsync and update the site with any changes (in particular all messages that has arrived since the last sync).
> * mboxes are not yet published. I will add this after the final rsync.

Once again, thank you for driving this.

Nathan

Re: svn.haxx.se is going away

Posted by Daniel Sahlberg <da...@gmail.com>.
The site is now mirrored at http://svn-haxx.apache.org/ (it is also
listening to http://svn.haxx.se, but the DNS is not yet updated). I
encourage everyone to take a look and provide feedback.

Points to note:
* This will be a static mirror from now on.
* No SSL at the moment. I suggest to install certbot and a Let's encrypt
certificate. (Should renewal notices go to dev@?)
* I will give it a few days for feedback, if nothing unexpected I will ask
Daniel Stenberg to switch the DNS to a CNAME.
* After the CNAME is installed (and propaged through the DNS system), I
will make a final rsync and update the site with any changes (in particular
all messages that has arrived since the last sync).
* mboxes are not yet published. I will add this after the final rsync.

/Daniel Sahlberg

Den tors 31 dec. 2020 kl 12:44 skrev Daniel Sahlberg <
daniel.l.sahlberg@gmail.com>:

> Infra replied (in JIRA) that they "doesn't typically get involved in the
> management of project VMs. Is there no one on your project who is able to
> maintain the VM or install software?". I could do it but I don't have root.
> I guess that's a PMC decision not discuessed here.
>
> Let me know if there is anything further I can do.
>
>>
> /Daniel Sahlberg
>

Re: svn.haxx.se is going away

Posted by Daniel Sahlberg <da...@gmail.com>.
Infra replied (in JIRA) that they "doesn't typically get involved in the
management of project VMs. Is there no one on your project who is able to
maintain the VM or install software?". I could do it but I don't have root.
I guess that's a PMC decision not discuessed here.

Let me know if there is anything further I can do.

>
/Daniel Sahlberg

Re: svn.haxx.se is going away

Posted by Daniel Sahlberg <da...@gmail.com>.
Common reply to Daniel Shahaf, Nathan & Greg.

As for mboxes, I'll look at how to make them accessible and take your
suggestions into consideration. Probably just a simple index file listing
all the mboxes (just like today).

I got the gitbox/github integration working and I managed to clone the
repo, but I don't have enough knowledge about puppet to really figure out
the proper change. Thanks for your help!


Den fre 25 dec. 2020 kl 18:17 skrev Daniel Shahaf <d....@daniel.shahaf.name>:

> Post a list of packages you'd like installed?
>

Should be enough with the apache2 package (plus dependencies).

I use the following config on my dev machine. It probably should be
duplicated for svn-haxx.apache.org. The Document Root could/should be
changed to have the site outside of my ~.

<VirtualHost *:80>
        ServerName svn.haxx.se
        ServerAdmin ????
        DocumentRoot /home/dsahlberg/svnhaxx
        CustomLog logs/access.log combined
        <Directory "/home/dsahlberg/svnhaxx">
                Options Indexes MultiViews FollowSymLinks
                AllowOverride All
                Require all granted
        </Directory>
</VirtualHost>

 Kind regards
Daniel Sahlberg

Re: svn.haxx.se is going away

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Sahlberg wrote on Thu, Dec 24, 2020 at 20:38:17 +0100:
> Den tis 22 dec. 2020 kl 02:08 skrev Greg Stein <gs...@gmail.com>:
> 
> > On Mon, Dec 21, 2020 at 4:03 AM Daniel Shahaf <d....@daniel.shahaf.name>
> > wrote:
> >
> >> Daniel Sahlberg wrote on Mon, 21 Dec 2020 08:55 +0100:
> >> > Den fre 27 nov. 2020 kl 19:26 skrev Daniel Shahaf <
> >> d.s@daniel.shahaf.name>:
> >> >
> >> > > Sounds good.  Nathan, Daniel Sahlberg — could you work with Infra on
> >> > > getting the data over to ASF hardware?
> >> >
> >> > I have been given access to svn-qavm and uploaded a tarball of the
> >> website
> >> > (including mboxes). I'm a bit reluctant to unpack it since it takes
> >> almost
> >> > 7GB, and there is only 14 GB disk space remaining. Is it ok to unpack or
> >> > should we ask Infra for more disk space?
> >>
> >> I vote to ask for more disk space, especially considering that some
> >> percentage is reserved for uid=0's use.
> >>
> >
> > DSahlberg hit up Infra on #asfinfra on the-asf.slack.com, and asked for
> > more space. That's been provisioned now.
> >
> 
> I've unpacked in /home/dsahlberg/svnhaxx
> 
> 
> > >...
> >
> >> > The mboxes will be preserved but I don't plan to make them available for
> >> > download (since they are not available from lists.a.o or
> >> mail-archives.a.o).
> >>
> >> Please do make them available for download.  Being able to download the
> >> raw data is useful for both backup and perusal purposes, and I doubt
> >> the bandwidth requirements would be a problem.  (Might want
> >> a robots.txt entry, though?)
> >>
> >
> > Bandwidth should not be a problem for the mboxes, but yes: a robots.txt
> > would be nice. I think search engines spidering the static email pages
> > might be useful to the community, but the spiders really shouldn't need/use
> > the mboxes.
> >
> 
> I'll figure out a way to have the mboxes downloadable. If I understand
> Google's documentation of robots.txt they don't care about robots.txt if a
> specific URL is linked from somewhere indexable, they will index it anyway.
> Maybe just make one big tarball of everything?

One big tarball would be wasteful to consume (would have to download
everything) and to produce (would need to, basically, «cp everything.tgz
tmp.tgz; tar -zcf - $new >> tmp.tgz; mv tmp.tgz everything.tgz», and you can
see that's O(#everything) rather than O(appended stuff)).  Would rather avoid
it if possible.

Not sure what to do about robots.  I suppose we could set <link
rel="canonical"> in the HTTP headers when serving the rfc822 files (example
in <https://en.wikipedia.org/wiki/Canonical_link_element#HTTP>)?

> > I think the first thing is to get httpd up and running with the desired
> > configuration. Then step two will be to memorialize that into puppet. Infra
> > can assist with the latter. I saw on Slack that Humbedooh gave you a link
> > to explore.
> >
> 
> Since I havn't got root, I can't get any further to install httpd on my own.

Post a list of packages you'd like installed?

> I couldn't figure out puppet, the links was 404 for me. I've created a
> request in Jira and I hope someone will take a look:
> https://issues.apache.org/jira/browse/INFRA-21230

I think the github repository is restricted to Apache committers only, so
you'll need to enter your github username on id.apache.org in order to get
access to that URL.  If you don't have a github account, there ought to be
a mirror of the repository on *.apache.org somewhere (at least, if Infra's
following the same policy PMCs do).

Cheers,

Daniel

Re: svn.haxx.se is going away

Posted by Greg Stein <gs...@gmail.com>.
On Fri, Nov 27, 2020 at 12:26 PM Daniel Shahaf <d....@daniel.shahaf.name>
wrote:

> Greg Stein wrote on Wed, Nov 25, 2020 at 00:08:32 -0600:
> > Hey Daniel,
> >
> > I think the best place for this content is on mbox-vm.a.o. That is where
> we
> > have our permanent list archives in mbox format.
> > We can then arrange to ship them off to lists.a.o. If you concur,
>
> I concur in the sense that it'd be great to have the mboxes stored on
> and served by whatever Infra uses for all other archives.
>
> However, when I last looked at lists.a.o I was of the opinion that Infra
> shouldn't use it.  (Back then its permalinks weren't permanent and
> weren't able to be generated or dereferenced while the user was offline
> *or while the external vendor was offline*.  I don't know whether those
> have been fixed since then.)  Unless that has changed, I wouldn't like
> Subversion to rely on that particular archive.  Instead, there's
> mod_mbox, or a static snapshot of svn.haxx.se.
>

Infra has no plans to switch away from lists.a.o. The permalinks issue is
being solved for the oldest archives (we have a copy of the ElasticSearch
database holding them, so we don't have to rely on a .csv file). The
mail-archives.a.o and mail-private.a.o mod_mbox servers will be taken
offline at some point, and a redirector left in its place.

The Subversion community can stand up its own archive on svn-qavm, or rely
on lists.a.o and a redirector. Infra has no opinion on that.

(@Greg: You know I wouldn't normally have repeated the above, but
> (1) you asked, and (2) the dev@ audience doesn't all know this context.)
>

No worries at all.


> > then I'll ask the team to get you access.
>
> Would InfraAdmin let someone else from the PMC take point?  I realize
> that this is an ASF-wide box (as opposed to a PMC box) and I'm a known
> entity at Infra, but I'm short on tuits.
>

Anybody with an @apache account. Or if somebody emails me privately with a
path on svn-qavm.a.o for the content that I should mirror onto our mbox
storage machine. I can get that content moved over from svn-qavm maybe even
a bit more easily. (iirc, DShahaf and Nathan have copies on svn-qavm?)

> You can preserve all the data you want into your homedir, and we can
> > sort from there.
>
> Sounds good.  Nathan, Daniel Sahlberg — could you work with Infra on
> getting the data over to ASF hardware?
>
> Note that svn-org@ doesn't have an equivalent @s.a.o list, and that, as
> mentioned upthread, the post-migration (from tigris.org to apache.org)
> mboxes may be in a different order than the official ones, and shouldn't
> be "deduplicated".
>
> > You indicate a desire to maintain URLs. Do you have some ideas on that?
>
> Each individual message .shtml file contains the message-id in
> a comment.  We can extract the comments and build a redirector around
> them.  (By the way, this is basically the same exercise that Infra must
> have solved back when Sebb received that CSV file from the lists.a.o
> vendor, so there may be an opportunity for code reuse.)  Of course, the
> full rsync likely has the same info available less scrapily.
>
> Or, as mentioned above, the .shtml files could just be preserved
> statically (plus or minus an appropriate message in the list of years on
> the /${listname}/ page).  In fact, I'm having trouble coming up with
> a reason _not_ to serve a static snapshot of the pages, even if we do
> build a redirector.
>

svn-haxx.spache.org is live now. It is an A record pointing to
svn-qavm.a.o. (one day, it might be a CNAME, but for $reasons it is not,
today).

Thus, anybody can configure httpd on svn-qavm however you like, and have it
respond to svn.haxx.se and svn-haxx.apache.org. Whether redirects or a
static site, or ...

DSahlberg could do this if somebody on the PMC gives him a +1 to have an
account created (effectively as a partial committer, but with no
directories specified). Then he can sign/send an ICLA, and get an account.
That can be added to svn-qavm, where he could set up a site. Tho I'll warn
that root would be required for httpd configuration.

Medium/long-term I would suggest putting the httpd configuration into
Puppet once it is stable, so it will stick around across machine
(re)provisioning.

Feel free to ping me on Slack or via email. I read svn lists sporadically
nowadays, but am more than happy to represent both svn and infra.

Cheers,
-g

Re: svn.haxx.se is going away

Posted by Daniel Sahlberg <da...@gmail.com>.
Den tis 22 dec. 2020 kl 02:08 skrev Greg Stein <gs...@gmail.com>:

> On Mon, Dec 21, 2020 at 4:03 AM Daniel Shahaf <d....@daniel.shahaf.name>
> wrote:
>
>> Daniel Sahlberg wrote on Mon, 21 Dec 2020 08:55 +0100:
>> > Den fre 27 nov. 2020 kl 19:26 skrev Daniel Shahaf <
>> d.s@daniel.shahaf.name>:
>> >
>> > > Sounds good.  Nathan, Daniel Sahlberg — could you work with Infra on
>> > > getting the data over to ASF hardware?
>> >
>> > I have been given access to svn-qavm and uploaded a tarball of the
>> website
>> > (including mboxes). I'm a bit reluctant to unpack it since it takes
>> almost
>> > 7GB, and there is only 14 GB disk space remaining. Is it ok to unpack or
>> > should we ask Infra for more disk space?
>>
>> I vote to ask for more disk space, especially considering that some
>> percentage is reserved for uid=0's use.
>>
>
> DSahlberg hit up Infra on #asfinfra on the-asf.slack.com, and asked for
> more space. That's been provisioned now.
>

I've unpacked in /home/dsahlberg/svnhaxx


> >...
>
>> > The mboxes will be preserved but I don't plan to make them available for
>> > download (since they are not available from lists.a.o or
>> mail-archives.a.o).
>>
>> Please do make them available for download.  Being able to download the
>> raw data is useful for both backup and perusal purposes, and I doubt
>> the bandwidth requirements would be a problem.  (Might want
>> a robots.txt entry, though?)
>>
>
> Bandwidth should not be a problem for the mboxes, but yes: a robots.txt
> would be nice. I think search engines spidering the static email pages
> might be useful to the community, but the spiders really shouldn't need/use
> the mboxes.
>

I'll figure out a way to have the mboxes downloadable. If I understand
Google's documentation of robots.txt they don't care about robots.txt if a
specific URL is linked from somewhere indexable, they will index it anyway.
Maybe just make one big tarball of everything?


> I think the first thing is to get httpd up and running with the desired
> configuration. Then step two will be to memorialize that into puppet. Infra
> can assist with the latter. I saw on Slack that Humbedooh gave you a link
> to explore.
>

Since I havn't got root, I can't get any further to install httpd on my own.
I couldn't figure out puppet, the links was 404 for me. I've created a
request in Jira and I hope someone will take a look:
https://issues.apache.org/jira/browse/INFRA-21230

Kind regards,
Daniel

Re: svn.haxx.se is going away

Posted by Greg Stein <gs...@gmail.com>.
On Mon, Dec 21, 2020 at 4:03 AM Daniel Shahaf <d....@daniel.shahaf.name>
wrote:

> Daniel Sahlberg wrote on Mon, 21 Dec 2020 08:55 +0100:
> > Den fre 27 nov. 2020 kl 19:26 skrev Daniel Shahaf <
> d.s@daniel.shahaf.name>:
> >
> > > Sounds good.  Nathan, Daniel Sahlberg — could you work with Infra on
> > > getting the data over to ASF hardware?
> >
> > I have been given access to svn-qavm and uploaded a tarball of the
> website
> > (including mboxes). I'm a bit reluctant to unpack it since it takes
> almost
> > 7GB, and there is only 14 GB disk space remaining. Is it ok to unpack or
> > should we ask Infra for more disk space?
>
> I vote to ask for more disk space, especially considering that some
> percentage is reserved for uid=0's use.
>

DSahlberg hit up Infra on #asfinfra on the-asf.slack.com, and asked for
more space. That's been provisioned now.

>...

> > The mboxes will be preserved but I don't plan to make them available for
> > download (since they are not available from lists.a.o or
> mail-archives.a.o).
>
> Please do make them available for download.  Being able to download the
> raw data is useful for both backup and perusal purposes, and I doubt
> the bandwidth requirements would be a problem.  (Might want
> a robots.txt entry, though?)
>

Bandwidth should not be a problem for the mboxes, but yes: a robots.txt
would be nice. I think search engines spidering the static email pages
might be useful to the community, but the spiders really shouldn't need/use
the mboxes.

Regarding the behaviour of the existing archives, see
> <https://mail-archives.apache.org/mod_mbox/subversion-dev/202012.mbox>
> (which used to also be available via
> https://subversion.apache.org/mail/, but nowadays that just redirects
> to a landing page ☹).  I don't know whether lists.a.o has equivalent
> functionality, but then again, lists.a.o has had vendor lock-in baked
> into it from day one, so a lack of a "download raw rfc822 data" feature
> might simply be another form of that.
>

I don't know if our vendor for lists.a.o plans to do an mbox download. I
doubt they retain the data in that format. The Foundation has "all the
data", of course, going back to the mid-90s. An mbox download service might
be interesting, once we decommission the mod_mbox services.

>...

> > 1. Install a web server. nginx? (just kidding)
>
> Apache HTTP Server would probably be a better choice since more dev@svn
> and Infra people are familiar with it, but it's a fair question to ask.
> (Cf. INFRA-7524)
>

Infra has no position on that. Feel free to use nginx 😁 ... but DShahaf is
correct: local support will be higher with apache httpd.

> 2. Setup httpd.conf
> > 3. Configure a DocumentRoot where I can put the files. Doesn't seem right
> > to store them in /home
>
> Hmm.  These things should all be done via puppet.  I'm not sure what's
> best practice nowadays regarding writing puppet PRs and testing them,
> though.


I think the first thing is to get httpd up and running with the desired
configuration. Then step two will be to memorialize that into puppet. Infra
can assist with the latter. I saw on Slack that Humbedooh gave you a link
to explore.

Cheers,
-g

Re: svn.haxx.se is going away

Posted by Nathan Hartman <ha...@gmail.com>.
On Mon, Dec 21, 2020 at 5:04 AM Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> Daniel Sahlberg wrote on Mon, 21 Dec 2020 08:55 +0100:
> > I have been given access to svn-qavm and uploaded a tarball of the website
> > (including mboxes). I'm a bit reluctant to unpack it since it takes almost
> > 7GB, and there is only 14 GB disk space remaining. Is it ok to unpack or
> > should we ask Infra for more disk space?
> >
>
> I vote to ask for more disk space, especially considering that some
> percentage is reserved for uid=0's use.

I sent users@infra an email asking for more disk space.

Nathan

Re: svn.haxx.se is going away

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Sahlberg wrote on Mon, 21 Dec 2020 08:55 +0100:
> Den fre 27 nov. 2020 kl 19:26 skrev Daniel Shahaf <d....@daniel.shahaf.name>:
> 
> > Sounds good.  Nathan, Daniel Sahlberg — could you work with Infra on
> > getting the data over to ASF hardware?
> >  
> 
> I have been given access to svn-qavm and uploaded a tarball of the website
> (including mboxes). I'm a bit reluctant to unpack it since it takes almost
> 7GB, and there is only 14 GB disk space remaining. Is it ok to unpack or
> should we ask Infra for more disk space?
> 

I vote to ask for more disk space, especially considering that some
percentage is reserved for uid=0's use.

> > Note that svn-org@ doesn't have an equivalent @s.a.o list, and that, as
> > mentioned upthread, the post-migration (from tigris.org to apache.org)
> > mboxes may be in a different order than the official ones, and shouldn't
> > be "deduplicated".
> >  
> 
> The mboxes will be preserved but I don't plan to make them available for
> download (since they are not available from lists.a.o or mail-archives.a.o).
> 

Please do make them available for download.  Being able to download the
raw data is useful for both backup and perusal purposes, and I doubt
the bandwidth requirements would be a problem.  (Might want
a robots.txt entry, though?)

Regarding the behaviour of the existing archives, see
<https://mail-archives.apache.org/mod_mbox/subversion-dev/202012.mbox>
(which used to also be available via
https://subversion.apache.org/mail/, but nowadays that just redirects
to a landing page ☹).  I don't know whether lists.a.o has equivalent
functionality, but then again, lists.a.o has had vendor lock-in baked
into it from day one, so a lack of a "download raw rfc822 data" feature
might simply be another form of that.

The mod_mbox product is owned by dev@httpd.

> > You indicate a desire to maintain URLs. Do you have some ideas on that?
> >
> > Each individual message .shtml file contains the message-id in
> > a comment.  We can extract the comments and build a redirector around
> > them.  (By the way, this is basically the same exercise that Infra must
> > have solved back when Sebb received that CSV file from the lists.a.o
> > vendor, so there may be an opportunity for code reuse.)  Of course, the
> > full rsync likely has the same info available less scrapily.
> >
> > Or, as mentioned above, the .shtml files could just be preserved
> > statically (plus or minus an appropriate message in the list of years on
> > the /${listname}/ page).  In fact, I'm having trouble coming up with
> > a reason _not_ to serve a static snapshot of the pages, even if we do
> > build a redirector.
> >  
> 
> No redirector as of now, only the static [s]html pages.
> 

<glass type="half-full">Yay!</glass>

> I will need some help from root to:

Not me, I'm afraid; ENOTIME.

> 1. Install a web server. nginx? (just kidding)

Apache HTTP Server would probably be a better choice since more dev@svn
and Infra people are familiar with it, but it's a fair question to ask.
(Cf. INFRA-7524)

> 2. Setup httpd.conf
> 3. Configure a DocumentRoot where I can put the files. Doesn't seem right
> to store them in /home

Hmm.  These things should all be done via puppet.  I'm not sure what's
best practice nowadays regarding writing puppet PRs and testing them,
though.

Cheers,

Daniel

Re: svn.haxx.se is going away

Posted by Daniel Sahlberg <da...@gmail.com>.
Den fre 27 nov. 2020 kl 19:26 skrev Daniel Shahaf <d....@daniel.shahaf.name>:

> Sounds good.  Nathan, Daniel Sahlberg — could you work with Infra on
> getting the data over to ASF hardware?
>

I have been given access to svn-qavm and uploaded a tarball of the website
(including mboxes). I'm a bit reluctant to unpack it since it takes almost
7GB, and there is only 14 GB disk space remaining. Is it ok to unpack or
should we ask Infra for more disk space?

Note that svn-org@ doesn't have an equivalent @s.a.o list, and that, as
> mentioned upthread, the post-migration (from tigris.org to apache.org)
> mboxes may be in a different order than the official ones, and shouldn't
> be "deduplicated".
>

The mboxes will be preserved but I don't plan to make them available for
download (since they are not available from lists.a.o or mail-archives.a.o).

> You indicate a desire to maintain URLs. Do you have some ideas on that?
>
> Each individual message .shtml file contains the message-id in
> a comment.  We can extract the comments and build a redirector around
> them.  (By the way, this is basically the same exercise that Infra must
> have solved back when Sebb received that CSV file from the lists.a.o
> vendor, so there may be an opportunity for code reuse.)  Of course, the
> full rsync likely has the same info available less scrapily.
>
> Or, as mentioned above, the .shtml files could just be preserved
> statically (plus or minus an appropriate message in the list of years on
> the /${listname}/ page).  In fact, I'm having trouble coming up with
> a reason _not_ to serve a static snapshot of the pages, even if we do
> build a redirector.
>

No redirector as of now, only the static [s]html pages.

I will need some help from root to:
1. Install a web server. nginx? (just kidding)
2. Setup httpd.conf
3. Configure a DocumentRoot where I can put the files. Doesn't seem right
to store them in /home

Kind regards
Daniel Sahlberg

Re: svn.haxx.se is going away

Posted by Daniel Sahlberg <da...@gmail.com>.
Den fre 27 nov. 2020 kl 19:26 skrev Daniel Shahaf <d....@daniel.shahaf.name>:

> Greg Stein wrote on Wed, Nov 25, 2020 at 00:08:32 -0600:
> > You can preserve all the data you want into your homedir, and we can
> > sort from there.
>
> Sounds good.  Nathan, Daniel Sahlberg — could you work with Infra on
> getting the data over to ASF hardware?
>


I can help, but I have no reputation in the project.

> You indicate a desire to maintain URLs. Do you have some ideas on that?
>

> Each individual message .shtml file contains the message-id in
> a comment.  We can extract the comments and build a redirector around
> them.  (By the way, this is basically the same exercise that Infra must
> have solved back when Sebb received that CSV file from the lists.a.o
> vendor, so there may be an opportunity for code reuse.)  Of course, the
> full rsync likely has the same info available less scrapily.
>
> Or, as mentioned above, the .shtml files could just be preserved
> statically (plus or minus an appropriate message in the list of years on
> the /${listname}/ page).  In fact, I'm having trouble coming up with
> a reason _not_ to serve a static snapshot of the pages, even if we do
> build a redirector.
>

I think it's best to start with a plain copy of the existing site, then we
can also pay homage to Daniel Stenberg's efforts for the 20 years.

Kind regards,
daniel

Re: svn.haxx.se is going away

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Greg Stein wrote on Wed, Nov 25, 2020 at 00:08:32 -0600:
> Hey Daniel,
> 
> I think the best place for this content is on mbox-vm.a.o. That is where we
> have our permanent list archives in mbox format.
> We can then arrange to ship them off to lists.a.o. If you concur,

I concur in the sense that it'd be great to have the mboxes stored on
and served by whatever Infra uses for all other archives.

However, when I last looked at lists.a.o I was of the opinion that Infra
shouldn't use it.  (Back then its permalinks weren't permanent and
weren't able to be generated or dereferenced while the user was offline
*or while the external vendor was offline*.  I don't know whether those
have been fixed since then.)  Unless that has changed, I wouldn't like
Subversion to rely on that particular archive.  Instead, there's
mod_mbox, or a static snapshot of svn.haxx.se.

(@Greg: You know I wouldn't normally have repeated the above, but
(1) you asked, and (2) the dev@ audience doesn't all know this context.)

> then I'll ask the team to get you access.

Would InfraAdmin let someone else from the PMC take point?  I realize
that this is an ASF-wide box (as opposed to a PMC box) and I'm a known
entity at Infra, but I'm short on tuits.

> You can preserve all the data you want into your homedir, and we can
> sort from there.

Sounds good.  Nathan, Daniel Sahlberg — could you work with Infra on
getting the data over to ASF hardware?

Note that svn-org@ doesn't have an equivalent @s.a.o list, and that, as
mentioned upthread, the post-migration (from tigris.org to apache.org)
mboxes may be in a different order than the official ones, and shouldn't
be "deduplicated".

> You indicate a desire to maintain URLs. Do you have some ideas on that?

Each individual message .shtml file contains the message-id in
a comment.  We can extract the comments and build a redirector around
them.  (By the way, this is basically the same exercise that Infra must
have solved back when Sebb received that CSV file from the lists.a.o
vendor, so there may be an opportunity for code reuse.)  Of course, the
full rsync likely has the same info available less scrapily.

Or, as mentioned above, the .shtml files could just be preserved
statically (plus or minus an appropriate message in the list of years on
the /${listname}/ page).  In fact, I'm having trouble coming up with
a reason _not_ to serve a static snapshot of the pages, even if we do
build a redirector.

Cheers,

Daniel

Re: svn.haxx.se is going away

Posted by Greg Stein <gs...@gmail.com>.
Hey Daniel,

I think the best place for this content is on mbox-vm.a.o. That is where we
have our permanent list archives in mbox format. We can then arrange to
ship them off to lists.a.o. If you concur, then I'll ask the team to get
you access. You can preserve all the data you want into your homedir, and
we can sort from there.

You indicate a desire to maintain URLs. Do you have some ideas on that?
Would we be able to have the DNS record for svn.haxx.se CNAME'd to one of
our boxes which simply generates 301 responses? (from your email, it
implies we don't have confirmation of that yet?)

Cheers,
Greg Stein
Infrastructure Administrator, ASF


On Tue, Nov 24, 2020 at 7:04 PM Daniel Shahaf <d....@daniel.shahaf.name>
wrote:

> Nathan Hartman wrote on Tue, 24 Nov 2020 21:27 +00:00:
> > On Tue, Nov 24, 2020 at 2:56 AM Daniel Sahlberg
> > <da...@gmail.com> wrote:
> > > Den tors 12 nov. 2020 kl 17:46 skrev Daniel Sahlberg <
> daniel.l.sahlberg@gmail.com>:
> > >> Could ASF provide this server space (basically a VirtualHost)? The
> archive is about 6.5 GB so it is not a huge amount.
> > >
> > > Any thoughts on this?
> >
> > I am looking into this; waiting for a reply...
>
> In the circumstances — it's Nov 25 and the site says it'll be taken down
> "in November 2020", not specifying a date — I'd say, better ask
> forgiveness than permission.  Let's go ahead and grab all the data we
> need to stand up the site (we have the mboxes, but not the mapping of
> *.shtml files to message-id's, nor any of the HTML/CSS/images), and if
> possible, also set it up (on svn-qavm.a.o or wherever) to ensure we've
> got everything and to prepare for a DNS repointing, if Daniel agrees.
> We can figure out the "paperwork", Puppet PRs, etc., later.
>
> I'd say the highest priority is to save the mapping of .shtml URLs to
> message-id's (which are available as comments in the source HTML),
> whether via a recursive wget(1) invocation, or by asking Daniel to run
> an appropriate grep, or however else.  Without that info, we won't be
> able to preserve old URLs.
>
> Maybe there's also a button we can press to sic the archive.org spider
> on svn.haxx.se.
>
> (We can't derive the message<->.shtml mapping from the mboxes we have.
> I only grabbed mboxes through the transition to ASF; for anything after
> that point, the order of .shtml files would be the order in which list
> mails reached haxx.se's MX, and we have no backups of that info.)
>
> Cheers,
>
> Daniel
>
> P.S.  Yes, it's a bit https://m.xkcd.com/2337/ of me to refer to both
>       Daniel and Daniel as "Daniel". :)
>

Re: svn.haxx.se is going away

Posted by Nathan Hartman <ha...@gmail.com>.
On Tue, Nov 24, 2020 at 8:04 PM Daniel Shahaf <d....@daniel.shahaf.name>
wrote:

> Nathan Hartman wrote on Tue, 24 Nov 2020 21:27 +00:00:
> > On Tue, Nov 24, 2020 at 2:56 AM Daniel Sahlberg
> > <da...@gmail.com> wrote:
> > > Den tors 12 nov. 2020 kl 17:46 skrev Daniel Sahlberg <
> daniel.l.sahlberg@gmail.com>:
> > >> Could ASF provide this server space (basically a VirtualHost)? The
> archive is about 6.5 GB so it is not a huge amount.
> > >
> > > Any thoughts on this?
> >
> > I am looking into this; waiting for a reply...
>
> In the circumstances — it's Nov 25 and the site says it'll be taken down
> "in November 2020", not specifying a date — I'd say, better ask
> forgiveness than permission.

Let's go ahead and grab all the data we
> need to stand up the site (we have the mboxes, but not the mapping of
> *.shtml files to message-id's, nor any of the HTML/CSS/images), and if
> possible, also set it up (on svn-qavm.a.o or wherever) to ensure we've
> got everything and to prepare for a DNS repointing, if Daniel agrees.
> We can figure out the "paperwork", Puppet PRs, etc., later.


Just FYI it looks like yes, we will get the server space, but I don't know
details yet. The 1st order of business is to save the data...

@Daniel Sahlberg since you've previously reached out to the operator of
svn.haxx.se about saving the site, could you perhaps ask for a way to
download the data efficiently?

Nathan

Re: svn.haxx.se is going away

Posted by Nathan Hartman <ha...@gmail.com>.
On Tue, Nov 24, 2020 at 8:04 PM Daniel Shahaf <d....@daniel.shahaf.name>
wrote:

> Nathan Hartman wrote on Tue, 24 Nov 2020 21:27 +00:00:
> > On Tue, Nov 24, 2020 at 2:56 AM Daniel Sahlberg
> > <da...@gmail.com> wrote:
> > > Den tors 12 nov. 2020 kl 17:46 skrev Daniel Sahlberg <
> daniel.l.sahlberg@gmail.com>:
> > >> Could ASF provide this server space (basically a VirtualHost)? The
> archive is about 6.5 GB so it is not a huge amount.
> > >
> > > Any thoughts on this?
> >
> > I am looking into this; waiting for a reply...
>
> In the circumstances — it's Nov 25 and the site says it'll be taken down
> "in November 2020", not specifying a date — I'd say, better ask
> forgiveness than permission.

Let's go ahead and grab all the data we
> need to stand up the site (we have the mboxes, but not the mapping of
> *.shtml files to message-id's, nor any of the HTML/CSS/images), and if
> possible, also set it up (on svn-qavm.a.o or wherever) to ensure we've
> got everything and to prepare for a DNS repointing, if Daniel agrees.
> We can figure out the "paperwork", Puppet PRs, etc., later.


Just FYI it looks like yes, we will get the server space, but I don't know
details yet. The 1st order of business is to save the data...

@Daniel Sahlberg since you've previously reached out to the operator of
svn.haxx.se about saving the site, could you perhaps ask for a way to
download the data efficiently?

Nathan

Re: svn.haxx.se is going away

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Nathan Hartman wrote on Tue, 24 Nov 2020 21:27 +00:00:
> On Tue, Nov 24, 2020 at 2:56 AM Daniel Sahlberg 
> <da...@gmail.com> wrote:
> > Den tors 12 nov. 2020 kl 17:46 skrev Daniel Sahlberg <da...@gmail.com>:
> >> Could ASF provide this server space (basically a VirtualHost)? The archive is about 6.5 GB so it is not a huge amount.
> > 
> > Any thoughts on this?
> 
> I am looking into this; waiting for a reply...

In the circumstances — it's Nov 25 and the site says it'll be taken down
"in November 2020", not specifying a date — I'd say, better ask
forgiveness than permission.  Let's go ahead and grab all the data we
need to stand up the site (we have the mboxes, but not the mapping of
*.shtml files to message-id's, nor any of the HTML/CSS/images), and if
possible, also set it up (on svn-qavm.a.o or wherever) to ensure we've
got everything and to prepare for a DNS repointing, if Daniel agrees.
We can figure out the "paperwork", Puppet PRs, etc., later.

I'd say the highest priority is to save the mapping of .shtml URLs to
message-id's (which are available as comments in the source HTML),
whether via a recursive wget(1) invocation, or by asking Daniel to run
an appropriate grep, or however else.  Without that info, we won't be
able to preserve old URLs.

Maybe there's also a button we can press to sic the archive.org spider
on svn.haxx.se.

(We can't derive the message<->.shtml mapping from the mboxes we have.
I only grabbed mboxes through the transition to ASF; for anything after
that point, the order of .shtml files would be the order in which list
mails reached haxx.se's MX, and we have no backups of that info.)

Cheers,

Daniel

P.S.  Yes, it's a bit https://m.xkcd.com/2337/ of me to refer to both
      Daniel and Daniel as "Daniel". :)

Re: svn.haxx.se is going away

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Nathan Hartman wrote on Tue, 24 Nov 2020 21:27 +00:00:
> On Tue, Nov 24, 2020 at 2:56 AM Daniel Sahlberg 
> <da...@gmail.com> wrote:
> > Den tors 12 nov. 2020 kl 17:46 skrev Daniel Sahlberg <da...@gmail.com>:
> >> Could ASF provide this server space (basically a VirtualHost)? The archive is about 6.5 GB so it is not a huge amount.
> > 
> > Any thoughts on this?
> 
> I am looking into this; waiting for a reply...

In the circumstances — it's Nov 25 and the site says it'll be taken down
"in November 2020", not specifying a date — I'd say, better ask
forgiveness than permission.  Let's go ahead and grab all the data we
need to stand up the site (we have the mboxes, but not the mapping of
*.shtml files to message-id's, nor any of the HTML/CSS/images), and if
possible, also set it up (on svn-qavm.a.o or wherever) to ensure we've
got everything and to prepare for a DNS repointing, if Daniel agrees.
We can figure out the "paperwork", Puppet PRs, etc., later.

I'd say the highest priority is to save the mapping of .shtml URLs to
message-id's (which are available as comments in the source HTML),
whether via a recursive wget(1) invocation, or by asking Daniel to run
an appropriate grep, or however else.  Without that info, we won't be
able to preserve old URLs.

Maybe there's also a button we can press to sic the archive.org spider
on svn.haxx.se.

(We can't derive the message<->.shtml mapping from the mboxes we have.
I only grabbed mboxes through the transition to ASF; for anything after
that point, the order of .shtml files would be the order in which list
mails reached haxx.se's MX, and we have no backups of that info.)

Cheers,

Daniel

P.S.  Yes, it's a bit https://m.xkcd.com/2337/ of me to refer to both
      Daniel and Daniel as "Daniel". :)

Re: svn.haxx.se is going away

Posted by Nathan Hartman <ha...@gmail.com>.
On Tue, Nov 24, 2020 at 2:56 AM Daniel Sahlberg <da...@gmail.com>
wrote:

> Den tors 12 nov. 2020 kl 17:46 skrev Daniel Sahlberg <
> daniel.l.sahlberg@gmail.com>:
>
>> Could ASF provide this server space (basically a VirtualHost)? The
>> archive is about 6.5 GB so it is not a huge amount.
>>
>
> Any thoughts on this?
>

I am looking into this; waiting for a reply...

Thanks for the nudge.

Nathan

Re: svn.haxx.se is going away

Posted by Nathan Hartman <ha...@gmail.com>.
On Tue, Nov 24, 2020 at 2:56 AM Daniel Sahlberg <da...@gmail.com>
wrote:

> Den tors 12 nov. 2020 kl 17:46 skrev Daniel Sahlberg <
> daniel.l.sahlberg@gmail.com>:
>
>> Could ASF provide this server space (basically a VirtualHost)? The
>> archive is about 6.5 GB so it is not a huge amount.
>>
>
> Any thoughts on this?
>

I am looking into this; waiting for a reply...

Thanks for the nudge.

Nathan

Re: svn.haxx.se is going away

Posted by Daniel Sahlberg <da...@gmail.com>.
Den tors 12 nov. 2020 kl 17:46 skrev Daniel Sahlberg <
daniel.l.sahlberg@gmail.com>:

> Could ASF provide this server space (basically a VirtualHost)? The archive
> is about 6.5 GB so it is not a huge amount.
>

Any thoughts on this?

Re: svn.haxx.se is going away

Posted by Daniel Sahlberg <da...@gmail.com>.
Den tors 12 nov. 2020 kl 17:46 skrev Daniel Sahlberg <
daniel.l.sahlberg@gmail.com>:

> Could ASF provide this server space (basically a VirtualHost)? The archive
> is about 6.5 GB so it is not a huge amount.
>

Any thoughts on this?

Re: svn.haxx.se is going away

Posted by Daniel Sahlberg <da...@gmail.com>.
Den tors 5 nov. 2020 kl 15:31 skrev Julian Foad <ju...@foad.me.uk>:

> Main point: Thanks to everyone helping this preservation effort.
>
> > * updating the 63+87 links in the site and source to point to links
> hosted on ASF hardware
> >
> Observation: s/hardware/domain/. While the ASF has long promoted "on our
> own hardware", the more critical and often under-valued key to keeping
> control of one's Internet assets is "on our own domain name". That's
> assumed in this context, but something to keep in mind elsewhere.
>

Agreeing with Julian's point on "on our own domain name", however this is
as it is. If we can get an agreement regarding keeping svn.haxx.se pointing
to a server where, at least, the old mailing list archive is available then
we would be better off.

Could ASF provide this server space (basically a VirtualHost)? The archive
is about 6.5 GB so it is not a huge amount.

Kind regards,
Daniel

Re: svn.haxx.se is going away

Posted by Daniel Sahlberg <da...@gmail.com>.
Den tors 5 nov. 2020 kl 15:31 skrev Julian Foad <ju...@foad.me.uk>:

> Main point: Thanks to everyone helping this preservation effort.
>
> > * updating the 63+87 links in the site and source to point to links
> hosted on ASF hardware
> >
> Observation: s/hardware/domain/. While the ASF has long promoted "on our
> own hardware", the more critical and often under-valued key to keeping
> control of one's Internet assets is "on our own domain name". That's
> assumed in this context, but something to keep in mind elsewhere.
>

Agreeing with Julian's point on "on our own domain name", however this is
as it is. If we can get an agreement regarding keeping svn.haxx.se pointing
to a server where, at least, the old mailing list archive is available then
we would be better off.

Could ASF provide this server space (basically a VirtualHost)? The archive
is about 6.5 GB so it is not a huge amount.

Kind regards,
Daniel

Re: svn.haxx.se is going away

Posted by Julian Foad <ju...@foad.me.uk>.
Main point: Thanks to everyone helping this preservation effort.

> * updating the 63+87 links in the site and source to point to links hosted on ASF hardware
> 
Observation: s/hardware/domain/. While the ASF has long promoted "on our own hardware", the more critical and often under-valued key to keeping control of one's Internet assets is "on our own domain name". That's assumed in this context, but something to keep in mind elsewhere.

- Julian

Re: svn.haxx.se is going away

Posted by Julian Foad <ju...@foad.me.uk>.
Main point: Thanks to everyone helping this preservation effort.

> * updating the 63+87 links in the site and source to point to links hosted on ASF hardware
> 
Observation: s/hardware/domain/. While the ASF has long promoted "on our own hardware", the more critical and often under-valued key to keeping control of one's Internet assets is "on our own domain name". That's assumed in this context, but something to keep in mind elsewhere.

- Julian

Re: svn.haxx.se is going away

Posted by Nathan Hartman <ha...@gmail.com>.
On Thu, Nov 5, 2020 at 5:16 AM Daniel Sahlberg <da...@gmail.com>
wrote:

>
> Would it be considered a good thing if we manage to keep svn.haxx.se
> around?
>

Yes, I would consider that a good thing.

Even if Infra would get the old lists imported (I don't know what's holding
> them back), there are a bunch of references to the archives in the source
> (63 if I'm counting correctly), and in the website (87).
>

There are many more links in emails, in log messages, etc.

IIRC Infra said there's some software-related reason that holds them back
from importing the old material.

I have reached out to Daniel Stenberg and he seems willing to discuss to
> point the domain name to another server. I could probably volunteer to keep
> the site alive, provided there is an agreement within @Dev this is a good
> thing. Or is it better to just do the job and update the sources and
> website?
>

Thank you for reaching out.

It would be ideal if 3 things happen:

* keep svn.haxx.se alive to prevent breaking the myriad links that exist
out there

* getting the early years' SVN dev & users archives (2000-2009) onto ASF
hardware one way or another; if it can't/won't be backfilled to
lists.apache.org for whatever reasons, maybe it can be put on Subversion's
website

* updating the 63+87 links in the site and source to point to links hosted
on ASF hardware

svn.haxx.se also has archives for TSVN and Subclipse dev and users, which
is another reason to keep that site alive if possible.

Nathan

Re: svn.haxx.se is going away

Posted by Nathan Hartman <ha...@gmail.com>.
On Thu, Nov 5, 2020 at 5:16 AM Daniel Sahlberg <da...@gmail.com>
wrote:

>
> Would it be considered a good thing if we manage to keep svn.haxx.se
> around?
>

Yes, I would consider that a good thing.

Even if Infra would get the old lists imported (I don't know what's holding
> them back), there are a bunch of references to the archives in the source
> (63 if I'm counting correctly), and in the website (87).
>

There are many more links in emails, in log messages, etc.

IIRC Infra said there's some software-related reason that holds them back
from importing the old material.

I have reached out to Daniel Stenberg and he seems willing to discuss to
> point the domain name to another server. I could probably volunteer to keep
> the site alive, provided there is an agreement within @Dev this is a good
> thing. Or is it better to just do the job and update the sources and
> website?
>

Thank you for reaching out.

It would be ideal if 3 things happen:

* keep svn.haxx.se alive to prevent breaking the myriad links that exist
out there

* getting the early years' SVN dev & users archives (2000-2009) onto ASF
hardware one way or another; if it can't/won't be backfilled to
lists.apache.org for whatever reasons, maybe it can be put on Subversion's
website

* updating the 63+87 links in the site and source to point to links hosted
on ASF hardware

svn.haxx.se also has archives for TSVN and Subclipse dev and users, which
is another reason to keep that site alive if possible.

Nathan

Re: svn.haxx.se is going away

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Sahlberg wrote on Thu, 05 Nov 2020 11:16 +0100:
> Den ons 4 nov. 2020 kl 22:32 skrev Nathan Hartman <hartman.nathan@gmail.com
> >:  
> 
> > On Wed, Nov 4, 2020 at 3:32 PM Mark Phippard <ma...@gmail.com> wrote:  
> > >
> > > Just a general fyi ... I went to https://svn.haxx.se/ today to search  
> > the lists and noticed there is a banner on the site saying it is going
> > offline forever soon.  
> > >
> > > I am not sure what the ramifications will be as I know there are a lot  
> > of historical links in the docs and site but I guess it is what it is.
> >
> > Daniel (danielsh) has been trying to get Infra to import the material
> > from pre-2009 (pre-migration to ASF) into lists.apache.org to avoid
> > losing the archives from the earliest period of development, which
> > arguably contain some of the most important development information.
> >
> > See the discussion here:
> >
> > https://lists.apache.org/thread.html/r97c9c5208af706b067fd8e67a7cbe79b37255958bb087bf699b722f8%40%3Cdev.subversion.apache.org%3E
> >

And https://issues.apache.org/jira/browse/INFRA-20213

> > Possibly it's still mirrored at home.apache.org but I can't check at the
> > moment.
> >
> > Nathan
> >  
> 
> Would it be considered a good thing if we manage to keep svn.haxx.se
> around? Even if Infra would get the old lists imported (I don't know what's
> holding them back), there are a bunch of references to the archives in the
> source (63 if I'm counting correctly), and in the website (87).
> 

Those in the website should be covered by
site/publish/.message-ids.tsv.  (See site/tools/ for the generating
scripts.)

The logic for converting the message-ids into URLs is embedded in [1]
(which I have tried to make discoverable, [2], but that seems to have
regressed, and I'm ENOTIME to chase it).

[1] https://svn.apache.org/repos/infra/infrastructure/trunk/projects/asf-generate-mail-archives-link
[2] https://issues.apache.org/jira/browse/INFRA-19422

> I have reached out to Daniel Stenberg and he seems willing to discuss to
> point the domain name to another server. I could probably volunteer to keep
> the site alive, provided there is an agreement within @Dev this is a good
> thing. Or is it better to just do the job and update the sources and
> website?

We should keep old links working, if possible.  Ideally, not only links
we happen to have lying around, but also other links (e.g., in people's
non-public branches of https://github.com/apache/subversion).

There's more than one way to preserve links (redirecting old URLs to
new URLs for the same messages; keeping the site online but not
updating; keeping the site online and updating, on ASF hardware, e.g.,
svn-qavm.a.o; etc.).  Any and all assistance would be most welcome!

> (Daniel S... seems to be a popular name!)

It is, yes.  And then there are people like danderson, who aren't named
"Daniel" but still get in the way of tab-completing Daniels ☺

Cheers,

Daniel

Re: svn.haxx.se is going away

Posted by Daniel Sahlberg <da...@gmail.com>.
Den ons 4 nov. 2020 kl 22:32 skrev Nathan Hartman <hartman.nathan@gmail.com
>:

> On Wed, Nov 4, 2020 at 3:32 PM Mark Phippard <ma...@gmail.com> wrote:
> >
> > Just a general fyi ... I went to https://svn.haxx.se/ today to search
> the lists and noticed there is a banner on the site saying it is going
> offline forever soon.
> >
> > I am not sure what the ramifications will be as I know there are a lot
> of historical links in the docs and site but I guess it is what it is.
>
> Daniel (danielsh) has been trying to get Infra to import the material
> from pre-2009 (pre-migration to ASF) into lists.apache.org to avoid
> losing the archives from the earliest period of development, which
> arguably contain some of the most important development information.
>
> See the discussion here:
>
> https://lists.apache.org/thread.html/r97c9c5208af706b067fd8e67a7cbe79b37255958bb087bf699b722f8%40%3Cdev.subversion.apache.org%3E
>
> Possibly it's still mirrored at home.apache.org but I can't check at the
> moment.
>
> Nathan
>

Would it be considered a good thing if we manage to keep svn.haxx.se
around? Even if Infra would get the old lists imported (I don't know what's
holding them back), there are a bunch of references to the archives in the
source (63 if I'm counting correctly), and in the website (87).

I have reached out to Daniel Stenberg and he seems willing to discuss to
point the domain name to another server. I could probably volunteer to keep
the site alive, provided there is an agreement within @Dev this is a good
thing. Or is it better to just do the job and update the sources and
website?

Kind regards
Daniel Sahlberg

(Daniel S... seems to be a popular name!)

Re: svn.haxx.se is going away

Posted by Mark Phippard <ma...@gmail.com>.
On Wed, Nov 4, 2020 at 4:32 PM Nathan Hartman <ha...@gmail.com>
wrote:

> On Wed, Nov 4, 2020 at 3:32 PM Mark Phippard <ma...@gmail.com> wrote:
> >
> > Just a general fyi ... I went to https://svn.haxx.se/ today to search
> the lists and noticed there is a banner on the site saying it is going
> offline forever soon.
> >
> > I am not sure what the ramifications will be as I know there are a lot
> of historical links in the docs and site but I guess it is what it is.
>
> Daniel (danielsh) has been trying to get Infra to import the material
> from pre-2009 (pre-migration to ASF) into lists.apache.org to avoid
> losing the archives from the earliest period of development, which
> arguably contain some of the most important development information.
>
> See the discussion here:
>
> https://lists.apache.org/thread.html/r97c9c5208af706b067fd8e67a7cbe79b37255958bb087bf699b722f8%40%3Cdev.subversion.apache.org%3E
>
> Possibly it's still mirrored at home.apache.org but I can't check at the
> moment.
>

Thanks Nathan. I am glad to hear we were aware of this. I had not seen any
discussion so just wanted to make sure interested parties maybe had some
time to act before it is too late. I notice we have a search function as
part of out website that uses their search. Hopefully that can be adapted
to the Apache list archives though not sure if it will work as well. I have
always used svn.haxx.se just for the search.

-- 
Thanks

Mark Phippard

Re: svn.haxx.se is going away

Posted by Mark Phippard <ma...@gmail.com>.
On Wed, Nov 4, 2020 at 4:32 PM Nathan Hartman <ha...@gmail.com>
wrote:

> On Wed, Nov 4, 2020 at 3:32 PM Mark Phippard <ma...@gmail.com> wrote:
> >
> > Just a general fyi ... I went to https://svn.haxx.se/ today to search
> the lists and noticed there is a banner on the site saying it is going
> offline forever soon.
> >
> > I am not sure what the ramifications will be as I know there are a lot
> of historical links in the docs and site but I guess it is what it is.
>
> Daniel (danielsh) has been trying to get Infra to import the material
> from pre-2009 (pre-migration to ASF) into lists.apache.org to avoid
> losing the archives from the earliest period of development, which
> arguably contain some of the most important development information.
>
> See the discussion here:
>
> https://lists.apache.org/thread.html/r97c9c5208af706b067fd8e67a7cbe79b37255958bb087bf699b722f8%40%3Cdev.subversion.apache.org%3E
>
> Possibly it's still mirrored at home.apache.org but I can't check at the
> moment.
>

Thanks Nathan. I am glad to hear we were aware of this. I had not seen any
discussion so just wanted to make sure interested parties maybe had some
time to act before it is too late. I notice we have a search function as
part of out website that uses their search. Hopefully that can be adapted
to the Apache list archives though not sure if it will work as well. I have
always used svn.haxx.se just for the search.

-- 
Thanks

Mark Phippard

Re: svn.haxx.se is going away

Posted by Daniel Sahlberg <da...@gmail.com>.
Den ons 4 nov. 2020 kl 22:32 skrev Nathan Hartman <hartman.nathan@gmail.com
>:

> On Wed, Nov 4, 2020 at 3:32 PM Mark Phippard <ma...@gmail.com> wrote:
> >
> > Just a general fyi ... I went to https://svn.haxx.se/ today to search
> the lists and noticed there is a banner on the site saying it is going
> offline forever soon.
> >
> > I am not sure what the ramifications will be as I know there are a lot
> of historical links in the docs and site but I guess it is what it is.
>
> Daniel (danielsh) has been trying to get Infra to import the material
> from pre-2009 (pre-migration to ASF) into lists.apache.org to avoid
> losing the archives from the earliest period of development, which
> arguably contain some of the most important development information.
>
> See the discussion here:
>
> https://lists.apache.org/thread.html/r97c9c5208af706b067fd8e67a7cbe79b37255958bb087bf699b722f8%40%3Cdev.subversion.apache.org%3E
>
> Possibly it's still mirrored at home.apache.org but I can't check at the
> moment.
>
> Nathan
>

Would it be considered a good thing if we manage to keep svn.haxx.se
around? Even if Infra would get the old lists imported (I don't know what's
holding them back), there are a bunch of references to the archives in the
source (63 if I'm counting correctly), and in the website (87).

I have reached out to Daniel Stenberg and he seems willing to discuss to
point the domain name to another server. I could probably volunteer to keep
the site alive, provided there is an agreement within @Dev this is a good
thing. Or is it better to just do the job and update the sources and
website?

Kind regards
Daniel Sahlberg

(Daniel S... seems to be a popular name!)

Re: svn.haxx.se is going away

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Nathan Hartman wrote on Wed, 04 Nov 2020 16:32 -0500:
> On Wed, Nov 4, 2020 at 3:32 PM Mark Phippard <ma...@gmail.com> wrote:
> >
> > Just a general fyi ... I went to https://svn.haxx.se/ today to search the lists and noticed there is a banner on the site saying it is going offline forever soon.
> >
> > I am not sure what the ramifications will be as I know there are a lot of historical links in the docs and site but I guess it is what it is.  
> 
> Daniel (danielsh) has been trying to get Infra to import the material
> from pre-2009 (pre-migration to ASF) into lists.apache.org to avoid
> losing the archives from the earliest period of development, which
> arguably contain some of the most important development information.
> 
> See the discussion here:
> https://lists.apache.org/thread.html/r97c9c5208af706b067fd8e67a7cbe79b37255958bb087bf699b722f8%40%3Cdev.subversion.apache.org%3E
> 
> Possibly it's still mirrored at home.apache.org but I can't check at the moment.

It is —

% ssh home.apache.org du -hs /home/danielsh/svn-haxx-se-mirror
245M    /home/danielsh/svn-haxx-se-mirror
% ssh svn-qavm.apache.org du -hs /x1/svn-haxx-se-mirror 
245M    /x1/svn-haxx-se-mirror

— but I don't know that either of these is backed up, so please someone
rsync either of those [they're identical] to their own hardware.

Cheers,

Daniel

Re: svn.haxx.se is going away

Posted by Nathan Hartman <ha...@gmail.com>.
On Wed, Nov 4, 2020 at 3:32 PM Mark Phippard <ma...@gmail.com> wrote:
>
> Just a general fyi ... I went to https://svn.haxx.se/ today to search the lists and noticed there is a banner on the site saying it is going offline forever soon.
>
> I am not sure what the ramifications will be as I know there are a lot of historical links in the docs and site but I guess it is what it is.

Daniel (danielsh) has been trying to get Infra to import the material
from pre-2009 (pre-migration to ASF) into lists.apache.org to avoid
losing the archives from the earliest period of development, which
arguably contain some of the most important development information.

See the discussion here:
https://lists.apache.org/thread.html/r97c9c5208af706b067fd8e67a7cbe79b37255958bb087bf699b722f8%40%3Cdev.subversion.apache.org%3E

Possibly it's still mirrored at home.apache.org but I can't check at the moment.

Nathan

Re: svn.haxx.se is going away

Posted by Nathan Hartman <ha...@gmail.com>.
On Wed, Nov 4, 2020 at 3:32 PM Mark Phippard <ma...@gmail.com> wrote:
>
> Just a general fyi ... I went to https://svn.haxx.se/ today to search the lists and noticed there is a banner on the site saying it is going offline forever soon.
>
> I am not sure what the ramifications will be as I know there are a lot of historical links in the docs and site but I guess it is what it is.

Daniel (danielsh) has been trying to get Infra to import the material
from pre-2009 (pre-migration to ASF) into lists.apache.org to avoid
losing the archives from the earliest period of development, which
arguably contain some of the most important development information.

See the discussion here:
https://lists.apache.org/thread.html/r97c9c5208af706b067fd8e67a7cbe79b37255958bb087bf699b722f8%40%3Cdev.subversion.apache.org%3E

Possibly it's still mirrored at home.apache.org but I can't check at the moment.

Nathan