You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@openoffice.apache.org by Terry Ellison <te...@apache.org> on 2011/08/25 17:58:40 UTC

[migration] Making the forums and wiki cut-over

Sorry in advance.  I've tried to put this type of content on the cwiki 
and it has largely been ignored by the DL, hence this email.  It is a 
long onem so have I tried to structure it and keep to simple plain text 
(as the DL forwarder seems to screw up text/html markup).

Since this is so long, any reply threads could kill us.  So can I ask 
respondents to open up a new [migration] thread to discuss specific 
points in depth rather than replying directly to this.

*BACKGROUND*

As you all know, I've been doing work on the forums and wiki, though the 
wiki has been my main focus recently.  I am acting in two roles here (i) 
the lead SysAdmin for the ooo-wiki and ooo-forum VMS; (ii) the lead (and 
currently only) application maintainer on both systems.  I  also have 
ssh access to the current prod systems running in Oracle infrastructure 
and do equivalent roles there.  So in practice, I am doing all of this 
related work, including liaising with the project, the infrastructure 
team and with Andrew Rist who represents Oracle here.

I understandably have to work within the practices of the Apache 
infrastructure team to retain my permit to act as SysAdmin on these two 
VMs.  I am also trying to meet the expectations of the project, the 
infrastructure teams and the needs of our user population across all of 
this work.

We seem to have a Catch-22 here, and this email is about how we break 
this and move these aspects of the project forward.  My interpretation 
of this Catch-22 is that whilst our current interactions on the DL are a 
good basis for individuals articulating views on a particular thread 
(and some seem to generate hundreds of viewpoints) we have no 
functioning mechanism to move to, and adopt some form of, a consensus 
policy or decision.  The exact project requirements for this wiki, these 
forums and the xxxx@openoffice.org mail forwarders are cases in point.  
However, the infrastructure team believe that we, the project, have an 
urgency about making this cut over from Oracle to Apache infrastructure, 
and are pressing me to make progress.

I can't execute any plan without a baseline requirement and set of 
assumptions, so what this note attempts is to lay down such a set, and 
the decisions that need to be made to go forward.  So PLEASE, I don't 
want any flames about my use of DECISION below.  What I simply mean is 
the if the PPMC as a body accepts these, then I will try my best to move 
this work forward.  Of course you are free to challenge / change any of 
this if that is a PPMC voted decision, but in this case I need to move 
into a different mode; to suspend work and stop the clock until we have 
an PPMC-endorsed baseline to replan on.  I am NOT going to press on 
without broad endorsement and then be criticised in retrospect for doing 
so.

*INFRASTRUCTURE DRIVERS*

The infrastructure team has a policy of bringing in new services at 
current S/W versions whenever possible -- simply because it makes it 
easier to support then,  and doing this before the service is on-line 
involves less work and risk that when it is in production.  I understand 
and agree with this goal even though it can front-load work.

*    The infrastructure stack is base on a standard Ubuntu server LAMP 
stack as at current LTS (Ubuntu 10.04-3 LTS) which included PHP 5.3.2

*    The forums are stable, but at an N-1 release level. (phpBB 3.0.8 
vs. 3.0.9).

* *DECISION*: Upgrade the ooo-forums phpBB app + customisations to 3.0.9 
before go-live. (Based on my last 5 upgrades, this 1-2 days work, the 
main part being the regression of a 1K line customisation patch when we 
rebaseline the package from 3.0.8 -> 3.0.9)

*    The prod wiki is v1.15.1 that at an N-3 major release level (that's 
30 months old: two major and 10 minor revisions behind the current 
supported).  This also runs on PHP 5.2.0.

*    We need an reverse-proxy HTTP cache for performance reasons on the 
wiki.  One of the four market leaders in this niche is another Apache 
project: Apache Traffic Server (ATS).  It makes sense to stay "in-house" 
here for both support and referenceability reasons

* *DECISION*: Adopt ATS v3.0.1 as the HTTP cache for the wiki.  (BTW, 
this work has been done and the product is excellent).

The PHP 5.3 introduced extra checking to remove an area of tolerance the 
PHP 5.x<3 allowed.  This was to do with when and how parameters can be 
passed by reference under curtain circumstances.  So moving a code base 
from 5.2 to 5.3 involved a lot of work identifying and eliminating this 
mis-codings.  This was done by the MediaWiki team in MW v1.16.  I had 
planned to move to MW v1.15.5 (the last stable 1.15.x) as our baseline 
and I've done this work integrating it with Apache Traffic Server (ATS) 
and our LAMP stack.  This is stable and performant enough to show that 
we are good.  However, I have only identified and bug-fixed the main 
path 5.2->5.3 coding issues.  During my testing I have subsequently 
discovered others and there are undoubtedly more to find.  I've also 
discussed this with the MW devs on the MW IRC channel.  Given this, the 
consensus in the @infra team (me included) is that we should bite the 
bullet now and move to current MW 1.17.0 even given the extra work.  
There are some performance risks associated with MW 1.17.0 which we need 
to mitigate.  However, given that we've already got a complete 
LAMP+ATS+MV in an ESX hosted VM performing like a dingbat, we really 
only face the 1.15.5 -> 1.17.0 issues in this step.

* *DECISION*: Upgrade the ooo-wiki MediaWiki(MV) + all extensions to MW 
v1.17.0.

* *DECISION*: I have agreed with infrastructure that we will keep 1 core 
on "standby" so we can up the VM to a 2-core VM if we are seeing 
unacceptable performance problems with one-core.

*BRANDING AND OTHER CONTENT/ACCESS CHANGES*

I've asked for feedback and "doer" support on the content aspect of the 
wiki and the forum. There has been hundreds of associate emails and 
unstructured discussion but no hard decisions taken.  Drew and Dave F 
have offered to get involved here, but we need to set up accounts etc., 
and move into execution.

* *DECISION*: We will cut over the wiki and the forums with the content 
as-is and implement branding and access control changes within the a.o 
infrastructure when volunteers come on-stream to resource this.  This is 
the standard "transfer then clean-up" approach adopted when a migration 
is time critical.

*PRIORITISATION*

One the one hand the forums involve a lot less work and technical risk.  
On the other they are arguably also used more than the wiki at the 
moment, and the post rates are a LOT higher.  Because I am a single 
resource constraint, I can't do both at the same time.  This one is 
toss-a-coin, but my instinct is to get the one that we can do quickly 
done.  But if there is a strong consensus to the swap then I can do it.

* *DECISION*: The priority is to work the forums over the wiki.  We cut 
the forums over first.

*CUT-OVER*

There are two facets to cut-over: content move and DNS-based IP 
reassignment.  Clearly we need to freeze update access to the services 
prior to start of content move and continue update-freeze on the legacy 
service.  Bringing the content across involves a backup, copy restore 
which can be rehearsed and scripted, but in the case of the wiki, this 
will be a few hours even if fully automated.

The main issues are:

*    Migration coordination is more of a Programme Management / 
Coordination challenge rather than a purely technical one.  In the old 
(paid/corporate) days, I would have a Programme Manager (PM)-type 
working along side me covering this aspect.

* *REQUEST*  Would anyone who has previous experience of doing this like 
to volunteer to take this role, so I can focus on the technical stuff?

*    We have to transfer DNS control for oo.o to Apache even if the A 
and MX records point to the @Oracle IP addresses.

* *ACTION*:  Our PM needs to identify who the authoritative controller 
for the DNS entry in A.o is and how we interface with him or her during 
this change process.

*    The DNS IP reassignment can take 24 hrs or more to ripple around 
the worldwide hierarchy of DNS servers.  During this period who goes to 
which service is undefined.

*    There are many way "to skin the cat" of the migration process.  All 
will involve some service loss, but the complexity of the rehearsal and 
planning come explode as we reduce this outage to a zero.  Complex plans 
can also go wrong so my instinct is to keep it simple: halt the service 
at a pre-notified time, transfer and start new service at  a 
pre-notified time.

* *DECISION*: Halt the forum service for a notified (24hr) window during 
cutover.  The migration uses fixed IPs, so  DNP IP reassignment is 
co-incident with service stop.

* *GOAL*: Cut over forums within 7 days from today.  Date TBD by PM.  I 
can do the content move.

* *DECISION*: Halt the wiki service for a notified (24hr) window during 
cutover.  The migration uses fixed IPs, so  DNP IP reassignment is 
co-incident with service stop.

* *GOAL*: Cut over forums within 14 days from today.  Date TBD by PM.  I 
can do the content move.

*     We have some further caching tweaks on the interaction of the 
MediaWiki applicaiton with the ATS HTTP reverse proxy cache, but these 
are probably nice-to-have than essential.  More to the point these need 
to be done on a system will production load patterns.

* *DECISION*. We will defer such tuning until post go-live.


*OTHER ISSUES*

*    I am pretty much maxed out on "high-priority" tasks at the moment, 
so I can't accept any more tasks until I've made material progress on my 
current committed list

*    A number of us have serious concerns about the continuity issues 
around the XXX@openoffice.org forwarding service.  I feel that there is 
a project consensus that this  service needs to be relocated Apache.org 
infrastructure until we can sentence its content.  The fact that we 
don't have an owner doing as I am with the wiki and forum services is a 
GRAVE CONCERN.  I was hoping to step up to do this, but with the 
decisions to upgrade phpBB and MW, the previous point now applies.

If you have got to the bottom of this, then thanks for your patience and 
time.

Regards
Terry

PS.  Please remember to break off specific discussion points onto 
separate threads.



RE: [migration] Making the forums and wiki cut-over

Posted by "Dennis E. Hamilton" <de...@acm.org>.
Just one tiny clean-up.

Based on what you say, I believe the goal is to cut over forums in 7 days and the wiki in 14 days.

(I think there is a small typo in the second CUT-OVER GOAL.)

 - Dennis

Thanks for all of this Terry.  You've managed to juggle a complex number of considerations and you have my endless admiration for it.

-----Original Message-----
From: Terry Ellison [mailto:terrye@apache.org] 
Sent: Thursday, August 25, 2011 08:59
To: ooo-dev@incubator.apache.org
Subject: [migration] Making the forums and wiki cut-over

Sorry in advance.  I've tried to put this type of content on the cwiki 
and it has largely been ignored by the DL, hence this email.  It is a 
long onem so have I tried to structure it and keep to simple plain text 
(as the DL forwarder seems to screw up text/html markup).

Since this is so long, any reply threads could kill us.  So can I ask 
respondents to open up a new [migration] thread to discuss specific 
points in depth rather than replying directly to this.

[ ... ]

*CUT-OVER*

[ ... ]

* *GOAL*: Cut over forums within 7 days from today.  Date TBD by PM.  I 
can do the content move.

* *DECISION*: Halt the wiki service for a notified (24hr) window during 
cutover.  The migration uses fixed IPs, so  DNP IP reassignment is 
co-incident with service stop.

* *GOAL*: Cut over forums within 14 days from today.  Date TBD by PM.  I 
can do the content move.

[ ... ]