You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@druid.apache.org by Gian Merlino <gi...@apache.org> on 2019/06/04 08:48:52 UTC

Re: Proposed website migration plan

An update: we do have a redirect server set up on druid.io now: note that
http://druid.io/community/ and http://druid.io/use-cases both redirect to
https://druid.apache.org. I just set up the latter redirect (on /use-cases)
as part of 'test this first on a single page'. All other druid.io URLs are
still being hosted using the content from GitHub pages at
https://github.com/druid-io/druid-io.github.io.

Search engine watch: currently, http://druid.io is the #1 link for [druid
use cases] on Google, Bing, and DuckDuckGo (and has a cool looking infobox
on Google & Bing). For [what is druid used for], it's #2 on Google, and not
ranked on the first page on Bing & DDG. Will monitor this over the next few
days.

On Mon, May 6, 2019 at 5:43 PM Gian Merlino <gi...@apache.org> wrote:

> Hi all,
>
> It sounds like we will need a redirect server that issues 301s from each
> druid.io page to the corresponding druid.apache.org page. Charles and I
> spoke offline and thought that something like Jon's original proposal is
> the best way to go. I am going to suggest we get started on this, as it's
> the last major piece of infra to move to ASF.
>
> 1) Set up a redirect server to perform 301 redirects to druid.apache.org
> 2) Post all druid.io content on druid.apache.org
> 3) Update druid.io DNS to point to the redirect server
> 4) Shut down GitHub pages hosting for druid.io
>
> Steps (2) and (3) should be done as close in time as possible so there is
> no confusion as to which version of the pages is canonical.
>
> For the redirect server, two viable options are an nginx server or an S3
> webpage redirect (
> https://docs.aws.amazon.com/AmazonS3/latest/dev/how-to-page-redirect.html).
> Just like we did with the HTML-level redirect, I suggest we test this first
> on a single page. We can do that by having the redirect server initially
> start off by hosting all druid.io content (so it's indistinguishable from
> the GitHub-pages-based site) except for a single page, which it redirects
> using HTTP 301 to druid.apache.org.
>
> I'm planning to start looking into this, so anyone around please speak up
> if you have any advice or alternative approaches to suggest.
>
> On Mon, Apr 29, 2019 at 4:01 PM Jonathan Wei <jo...@apache.org> wrote:
>
>> Thanks for checking the SEO state, that's somewhat disappointing.
>>
>> For Bing, it sounds like they really want you to use 301s (
>> https://www.bing.com/webmaster/help/webmaster-guidelines-30fba23a):
>>
>> > Bing prefers you use a 301 permanent redirect when moving content,
>> should
>> the move be permanent.  If the move is temporary, then a 302 temporary
>> redirect will work fine.  Do not use the rel=canonical tag in place of a
>> proper redirect.
>>
>> I wasn't able to find similar guidance re: this issue for DuckDuckGo.
>>
>> On Mon, Apr 29, 2019 at 10:42 AM Gian Merlino <gi...@apache.org> wrote:
>>
>> > Another update: SEO is not looking great after another day passed. For a
>> > search for "druid community", both http://druid.io/community and
>> > https://druid.apache.org/community/ have dropped off the front page of
>> > Bing
>> > completely. On Google, the legacy version is gone (as expected) but the
>> > Apache version has dropped to the #3 spot (down from #2 yesterday; and
>> down
>> > from where the legacy page was pre-migration, which was #1).
>> >
>> > I think this means we do need to try to get 301s figured out.
>> >
>> > On Sun, Apr 28, 2019 at 3:06 PM Gian Merlino <gi...@apache.org> wrote:
>> >
>> > > Google has picked up the new URL as of today but Bing hasn't. Neither
>> has
>> > > DuckDuckGo for that matter.
>> > >
>> > > Currently, Google is showing https://druid.apache.org/community/ in
>> the
>> > > #2 spot and Bing/DDG are showing http://druid.io/community in the top
>> > > spot. Ominously, the latter two _have_ picked up a page title change
>> to
>> > > "Redirecting..."
>> > >
>> > > On Fri, Apr 26, 2019 at 11:00 AM Gian Merlino <gi...@apache.org>
>> wrote:
>> > >
>> > >> An update: this is done now since a couple of days ago, but Google
>> and
>> > >> Bing are still showing http://druid.io/community for a search for
>> > "druid
>> > >> community" or even "apache druid community":
>> > >>
>> > >> - https://www.google.com/search?q=druid+community
>> > >> - https://www.bing.com/search?q=druid+community
>> > >>
>> > >> I suggest we keep an eye on the search engines and make sure they can
>> > >> figure out that the site has changed (I'm not sure how often they
>> > crawl).
>> > >> If they can then it would make sense to me to move forward with
>> > migrating
>> > >> the entire web site.
>> > >>
>> > >> On Mon, Apr 22, 2019 at 7:49 PM Jonathan Wei <jo...@apache.org>
>> wrote:
>> > >>
>> > >>> Correction: Xavier was suggesting we use
>> > >>>
>> > >>>
>> >
>> https://github.com/druid-io/druid-io.github.io/blob/src/_layouts/redirect_page.html
>> > >>> ,
>> > >>> the existing redirect system used by the Druid website.
>> > >>>
>> > >>> I've opened PRs to do the community page migration test:
>> > >>> https://github.com/apache/incubator-druid-website/pull/3
>> > >>> https://github.com/druid-io/druid-io.github.io/pull/591
>> > >>>
>> > >>> On Mon, Apr 22, 2019 at 3:04 PM Gian Merlino <gi...@apache.org>
>> wrote:
>> > >>>
>> > >>> > That sounds good to me. I would also consider adding canonical
>> tags
>> > to
>> > >>> all
>> > >>> > druid.apache.org pages so we don't have
>> druid.incubator.apache.org
>> > and
>> > >>> > druid.apache.org both floating around (not to mention http/https
>> > >>> version
>> > >>> > of
>> > >>> > both).
>> > >>> >
>> > >>> > On Mon, Apr 22, 2019 at 2:59 PM Jonathan Wei <jo...@apache.org>
>> > >>> wrote:
>> > >>> >
>> > >>> > > For redirects, Xavier has suggested using
>> > >>> > > https://help.github.com/en/articles/redirects-on-github-pages
>> to
>> > >>> > redirect
>> > >>> > > to druid.apache.org as a way to transition before the domain
>> > >>> migration
>> > >>> > > occurs, and believes that it would have the same SEO effects as
>> a
>> > 301
>> > >>> > > redirect after the new pages are indexed.
>> > >>> > >
>> > >>> > > I think we could try migrating the current Community page to
>> > >>> > > druid.apache.org with Github redirects and canonical links
>> > pointing
>> > >>> to
>> > >>> > the
>> > >>> > > https://druid.apache.org version. If that goes well, we could
>> > >>> continue
>> > >>> > > migrating more pages.
>> > >>> > >
>> > >>> > > What are the community's thoughts on that?
>> > >>> > >
>> > >>> > > Thanks,
>> > >>> > > Jon
>> > >>> > >
>> > >>> > > On Tue, Mar 12, 2019 at 7:19 PM Gian Merlino <gi...@apache.org>
>> > >>> wrote:
>> > >>> > >
>> > >>> > > > OpenOffice and Groovy both chose to sort of "meld" their
>> classic
>> > >>> and
>> > >>> > > Apache
>> > >>> > > > sites together: https://www.openoffice.org/,
>> > >>> http://groovy-lang.org/.
>> > >>> > > Note
>> > >>> > > > how when you click around, you get shuttled between the
>> classic
>> > >>> domain
>> > >>> > > and
>> > >>> > > > the Apache domain. Some pages are available on both sites,
>> like
>> > >>> > > > http://groovy-lang.org/download.html and
>> > >>> > > > https://groovy.apache.org/download.html (which don't use
>> > canonical
>> > >>> > link
>> > >>> > > > tags -- does not seem like a good example to follow!).
>> > >>> > > >
>> > >>> > > > NetBeans (still incubating) also has a "melded" site at
>> > >>> > > > https://netbeans.org/ but doesn't seem to consider itself
>> done
>> > >>> yet.
>> > >>> > They
>> > >>> > > > are discussing plans on their lists & wiki to do redirects
>> from
>> > >>> > > > netbeans.org
>> > >>> > > > to netbeans.apache.org:
>> > >>> > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> >
>> https://cwiki.apache.org/confluence/display/NETBEANS/netbeans.org+Transition+Process
>> > >>> > > > ,
>> > >>> > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> >
>> https://lists.apache.org/thread.html/ad10fb9d4c8fee571a2f6232b268a3b835f7b823d3a0983b84aeb18a@%3Cdev.netbeans.apache.org%3E
>> > >>> > > > .
>> > >>> > > > As of today the domain has been donated to ASF, but the
>> server is
>> > >>> still
>> > >>> > > run
>> > >>> > > > by Oracle, so the plan doesn't seem to be finished yet. (WHOIS
>> > for
>> > >>> > > > netbeans.org shows ASF as the registrant; netbeans.org
>> resolves
>> > to
>> > >>> > > > lb-netbeans-cms-adc.oracle.com.)
>> > >>> > > >
>> > >>> > > > The melded sites don't really seem better to me than
>> redirecting
>> > >>> all
>> > >>> > urls
>> > >>> > > > on the domain. I guess it depends on if we want to keep
>> druid.io
>> > >>> as
>> > >>> > the
>> > >>> > > > official domain forever, or if we think druid.apache.org is
>> > >>> cooler. I
>> > >>> > > > definitely think druid.apache.org is cooler so my vote is
>> there
>> > >>> :).
>> > >>> > It's
>> > >>> > > > also nice that it supports https. (druid.io does not today,
>> > since
>> > >>> it's
>> > >>> > > on
>> > >>> > > > GitHub pages, which doesn't support https for custom domains.)
>> > >>> > > >
>> > >>> > > > On Tue, Mar 12, 2019 at 7:47 PM Charles Allen
>> > >>> > > > <ch...@snap.com.invalid> wrote:
>> > >>> > > >
>> > >>> > > > > Are there other projects who have transitioned an
>> independently
>> > >>> > > > successful
>> > >>> > > > > domain name to an apache one?
>> > >>> > > > >
>> > >>> > > > > On Tue, Mar 5, 2019 at 2:13 PM David Lim <
>> davidlim@apache.org>
>> > >>> > wrote:
>> > >>> > > > >
>> > >>> > > > > > Who has control over the druid.io domain? Charles would
>> that
>> > >>> be
>> > >>> > you?
>> > >>> > > > > >
>> > >>> > > > > > We'd need support from them for the DNS redirect.
>> > >>> > > > > >
>> > >>> > > > > > On Tue, Mar 5, 2019 at 2:04 PM Jonathan Wei <
>> > jonwei@apache.org
>> > >>> >
>> > >>> > > wrote:
>> > >>> > > > > >
>> > >>> > > > > > > We still need to complete the website migration to
>> Apache
>> > >>> > > > > infrastructure.
>> > >>> > > > > > >
>> > >>> > > > > > > I'll propose the following plan:
>> > >>> > > > > > >
>> > >>> > > > > > > Proposed Apache Druid website migration plan
>> > >>> > > > > > > ========================================
>> > >>> > > > > > >
>> > >>> > > > > > > These links have some previous discussion on the website
>> > >>> > migration:
>> > >>> > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_7cae100b684e0b33e0adda993efea3d6088978700988a0ae632fdd80-40-253Cdev.druid.apache.org-253E&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=G1dTS7FlYGauxNOaQECZix2YwroWVCqJB-cT0nEeNwM&e=
>> > >>> > > > > > >
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_INFRA-2D17340&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=pwg0jE385gqei6EEEbxugKHWll7oyKoCloFc8ByhlUc&e=
>> > >>> > > > > > >
>> > >>> > > > > > > From the discussions above, the recommendation is to
>> have 2
>> > >>> > > separate
>> > >>> > > > > > repos
>> > >>> > > > > > > for the website: one for source and another for built
>> > content
>> > >>> > that
>> > >>> > > > will
>> > >>> > > > > > be
>> > >>> > > > > > > served.
>> > >>> > > > > > >
>> > >>> > > > > > > Generating site files
>> > >>> > > > > > > =======================
>> > >>> > > > > > >
>> > >>> > > > > > > The Apache site update process will be similar to our
>> > current
>> > >>> > > > process.
>> > >>> > > > > > >
>> > >>> > > > > > > Current process:
>> > >>> > > > > > > 1. Push changes to
>> > >>> > > > > > https://github.com/druid-io/druid-io.github.io/tree/src
>> > >>> > > > > > > 2. metamx bot picks up changes, builds, and commits to
>> > >>> > > > > > >
>> https://github.com/druid-io/druid-io.github.io/tree/master
>> > >>> > > > > > > 3.
>> > >>> https://github.com/druid-io/druid-io.github.io/tree/master is
>> > >>> > > > > served
>> > >>> > > > > > by
>> > >>> > > > > > > github pages
>> > >>> > > > > > >
>> > >>> > > > > > > Apache process:
>> > >>> > > > > > > 1. Push changes to
>> > >>> > > > > https://github.com/apache/incubator-druid-website-src
>> > >>> > > > > > > 2. Jenkins bot from Apache will build the website from
>> > source
>> > >>> > repo,
>> > >>> > > > > > commit
>> > >>> > > > > > > to https://github.com/apache/incubator-druid-website
>> > >>> > > > > > > 3. Apache Druid website will be served from the content
>> in
>> > >>> > > > > > > https://github.com/apache/incubator-druid-website
>> > (asf-site
>> > >>> > > branch)
>> > >>> > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > > > Hosting and SEO
>> > >>> > > > > > > ================
>> > >>> > > > > > >
>> > >>> > > > > > > The Apache site will be hosted at druid.apache.org on
>> > Apache
>> > >>> > > > > > > infrastructure:
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> >
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.apache.org_dev_project-2Dsite.html&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=_rHEo_asMXKypaunuBTXFkB6Ni3F6KqbEfkck18L7Ag&e=
>> > >>> > > > > > >
>> > >>> > > > > > > To preserve our search rankings, we can setup 301
>> redirects
>> > >>> from
>> > >>> > > the
>> > >>> > > > > old
>> > >>> > > > > > > druid.io site to the corresponding pages on the
>> > >>> druid.apache.org
>> > >>> > > > > site. (
>> > >>> > > > > > >
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__moz.com_learn_seo_redirection&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=lUeWU0dT9thy8gp11RO-Vry7zkYl_W4BXz01fyXJO0A&e=
>> > >>> > > > > > )
>> > >>> > > > > > >
>> > >>> > > > > > > However, Github pages (which currently hosts the
>> druid.io
>> > >>> site)
>> > >>> > > does
>> > >>> > > > > not
>> > >>> > > > > > > support 301 redirects, so we propose the following:
>> > >>> > > > > > > - Setup a new Nginx server that will perform 301
>> redirects
>> > to
>> > >>> > > > > > > druid.apache.org for the druid.io. Imply can host this
>> if
>> > >>> > needed.
>> > >>> > > > > > > - Update the druid.io DNS entry to point to this new
>> Nginx
>> > >>> > server
>> > >>> > > > > > > - Shut down Github pages hosting for druid.io
>> > >>> > > > > > >
>> > >>> > > > > > > In addition, we can also set canonical tags on our
>> pages:
>> > >>> > > > > > >
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__moz.com_learn_seo_canonicalization&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=T8G2c6d4EbQ_YDLFQXVebcj0UN9FNrbpPY5Xq4LAR8w&e=
>> > >>> > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > > > Action items
>> > >>> > > > > > > ===============
>> > >>> > > > > > > - Setup a Jenkins bot that builds the Apache website
>> > content
>> > >>> from
>> > >>> > > > > source
>> > >>> > > > > > > - Get the Apache website up
>> > >>> > > > > > > - Setup Nginx redirect server for druid.io
>> > >>> > > > > > > - Shutdown github pages and redirect DNS for druid.io
>> to
>> > >>> Nginx
>> > >>> > > > > redirect
>> > >>> > > > > > > server
>> > >>> > > > > > > - Add canonical tags to pages
>> > >>> > > > > > >
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> > >>
>> >
>>
>

Re: Proposed website migration plan

Posted by Gian Merlino <gi...@apache.org>.
FYI -- I've also submitted an address change request from druid.io ->
druid.apache.org through Google's automated system.

On Wed, Jun 12, 2019 at 1:56 PM Gian Merlino <gi...@apache.org> wrote:

> Sorry, I mean references to https://github.com/druid-io/druid-io.github.io
> should be changed to https://github.com/apache/incubator-druid-website-src.
> That is the change that actually happened, and the one that makes sense.
>
> On Wed, Jun 12, 2019 at 4:25 PM Gian Merlino <gi...@apache.org> wrote:
>
>> Yep, any references to https://github.com/druid-io/druid-io.github.io
>> should be changed to https://github.com/apache/incubator-druid. Those
>> have all been updated now. I didn't see any references to
>> https://github.com/druid-io/druid -- I think we got them all in a
>> previous pass.
>>
>> There are still some lingering references to separate, but affiliated
>> projects like https://github.com/druid-io/pydruid. IMO, it makes sense
>> to leave them there for now, and incorporate them as subprojects of Druid
>> once Druid is a top level project.
>>
>> On Wed, Jun 12, 2019 at 12:18 PM Julian Hyde <jh...@gmail.com>
>> wrote:
>>
>>> Looks marvelous! Thanks for making it happen.
>>>
>>> I noticed at least one reference to https://github.com/druid-io on the
>>> site. Should be changed to https://github.com/apache/incubator-druid?
>>>
>>> > On Jun 11, 2019, at 9:44 PM, Gian Merlino <gi...@apache.org> wrote:
>>> >
>>> > This is now done: druid.io is redirecting to druid.apache.org!!
>>> >
>>> > Next, we'll add the stuff required by
>>> > https://whimsy.apache.org/pods/project/druid. Then, we should be good
>>> to go
>>> > on the website migration. (Behind the scenes, Vadim Ogievetsky has been
>>> > helping tons with this -- thanks a lot!)
>>> >
>>> >> On Mon, Jun 10, 2019 at 9:00 AM David Lim <da...@apache.org>
>>> wrote:
>>> >>
>>> >> No objections from me - thank you for testing this out.
>>> >>
>>> >>>> On Mon, Jun 10, 2019 at 7:48 AM Gian Merlino <gi...@apache.org>
>>> wrote:
>>> >>>
>>> >>> It looks like Google has picked up the 301 and [druid use cases] #1
>>> >> result
>>> >>> is https://druid.apache.org/use-cases now. For [what is druid used
>>> for]
>>> >>> it's not #4 instead of #2. I think this is the best we are likely to
>>> >> get. I
>>> >>> am ready to flip the switch if there aren't any objections.
>>> >>>
>>> >>> On Fri, Jun 7, 2019 at 9:15 PM Gian Merlino <gi...@apache.org> wrote:
>>> >>>
>>> >>>> Another update: as of
>>> >>>> https://github.com/apache/incubator-druid-website-src/pull/1 and
>>> >>>> https://github.com/apache/incubator-druid-website/pull/7, the
>>> >>>> https://druid.apache.org/ site is now serving almost all pages from
>>> >>>> druid.io, except:
>>> >>>>
>>> >>>> - the index page (it still has a placeholder until we flip the
>>> switch)
>>> >>>> - the download page (it has a differently-designed download page:
>>> >> compare
>>> >>>> http://druid.io/downloads.html with
>>> >>> http://druid.apache.org/downloads.html
>>> >>>> - any docs older than 0.13.0 (they aren't Apache releases)
>>> >>>>
>>> >>>> If you navigate to https://druid.apache.org/ + any other path from
>>> >>>> druid.io, you should see the page.
>>> >>>>
>>> >>>> I'm hoping to confirm that search engines pick up the 301 for
>>> >>>> http://druid.io/use-cases before flipping the switch. Hopefully
>>> that
>>> >>>> doesn't take much longer. If it does we should talk about how we
>>> want
>>> >> to
>>> >>>> proceed.
>>> >>>>
>>> >>>>> On Tue, Jun 4, 2019 at 1:48 AM Gian Merlino <gi...@apache.org>
>>> wrote:
>>> >>>>>
>>> >>>>> An update: we do have a redirect server set up on druid.io now:
>>> note
>>> >>>>> that http://druid.io/community/ and http://druid.io/use-cases both
>>> >>>>> redirect to https://druid.apache.org. I just set up the latter
>>> >> redirect
>>> >>>>> (on /use-cases) as part of 'test this first on a single page'. All
>>> >> other
>>> >>>>> druid.io URLs are still being hosted using the content from GitHub
>>> >>> pages
>>> >>>>> at https://github.com/druid-io/druid-io.github.io.
>>> >>>>>
>>> >>>>> Search engine watch: currently, http://druid.io is the #1 link for
>>> >>>>> [druid use cases] on Google, Bing, and DuckDuckGo (and has a cool
>>> >>> looking
>>> >>>>> infobox on Google & Bing). For [what is druid used for], it's #2 on
>>> >>> Google,
>>> >>>>> and not ranked on the first page on Bing & DDG. Will monitor this
>>> over
>>> >>> the
>>> >>>>> next few days.
>>> >>>>>
>>> >>>>>> On Mon, May 6, 2019 at 5:43 PM Gian Merlino <gi...@apache.org>
>>> wrote:
>>> >>>>>>
>>> >>>>>> Hi all,
>>> >>>>>>
>>> >>>>>> It sounds like we will need a redirect server that issues 301s
>>> from
>>> >>> each
>>> >>>>>> druid.io page to the corresponding druid.apache.org page. Charles
>>> >> and
>>> >>> I
>>> >>>>>> spoke offline and thought that something like Jon's original
>>> proposal
>>> >>> is
>>> >>>>>> the best way to go. I am going to suggest we get started on this,
>>> as
>>> >>> it's
>>> >>>>>> the last major piece of infra to move to ASF.
>>> >>>>>>
>>> >>>>>> 1) Set up a redirect server to perform 301 redirects to
>>> >>> druid.apache.org
>>> >>>>>> 2) Post all druid.io content on druid.apache.org
>>> >>>>>> 3) Update druid.io DNS to point to the redirect server
>>> >>>>>> 4) Shut down GitHub pages hosting for druid.io
>>> >>>>>>
>>> >>>>>> Steps (2) and (3) should be done as close in time as possible so
>>> >> there
>>> >>>>>> is no confusion as to which version of the pages is canonical.
>>> >>>>>>
>>> >>>>>> For the redirect server, two viable options are an nginx server
>>> or an
>>> >>> S3
>>> >>>>>> webpage redirect (
>>> >>
>>> https://docs.aws.amazon.com/AmazonS3/latest/dev/how-to-page-redirect.html
>>> >>> ).
>>> >>>>>> Just like we did with the HTML-level redirect, I suggest we test
>>> this
>>> >>> first
>>> >>>>>> on a single page. We can do that by having the redirect server
>>> >>> initially
>>> >>>>>> start off by hosting all druid.io content (so it's
>>> indistinguishable
>>> >>>>>> from the GitHub-pages-based site) except for a single page, which
>>> it
>>> >>>>>> redirects using HTTP 301 to druid.apache.org.
>>> >>>>>>
>>> >>>>>> I'm planning to start looking into this, so anyone around please
>>> >> speak
>>> >>>>>> up if you have any advice or alternative approaches to suggest.
>>> >>>>>>
>>> >>>>>> On Mon, Apr 29, 2019 at 4:01 PM Jonathan Wei <jo...@apache.org>
>>> >>> wrote:
>>> >>>>>>
>>> >>>>>>> Thanks for checking the SEO state, that's somewhat disappointing.
>>> >>>>>>>
>>> >>>>>>> For Bing, it sounds like they really want you to use 301s (
>>> >>>>>>>
>>> https://www.bing.com/webmaster/help/webmaster-guidelines-30fba23a):
>>> >>>>>>>
>>> >>>>>>>> Bing prefers you use a 301 permanent redirect when moving
>>> content,
>>> >>>>>>> should
>>> >>>>>>> the move be permanent.  If the move is temporary, then a 302
>>> >> temporary
>>> >>>>>>> redirect will work fine.  Do not use the rel=canonical tag in
>>> place
>>> >>> of a
>>> >>>>>>> proper redirect.
>>> >>>>>>>
>>> >>>>>>> I wasn't able to find similar guidance re: this issue for
>>> >> DuckDuckGo.
>>> >>>>>>>
>>> >>>>>>> On Mon, Apr 29, 2019 at 10:42 AM Gian Merlino <gi...@apache.org>
>>> >>> wrote:
>>> >>>>>>>
>>> >>>>>>>> Another update: SEO is not looking great after another day
>>> passed.
>>> >>>>>>> For a
>>> >>>>>>>> search for "druid community", both http://druid.io/community
>>> and
>>> >>>>>>>> https://druid.apache.org/community/ have dropped off the front
>>> >> page
>>> >>>>>>> of
>>> >>>>>>>> Bing
>>> >>>>>>>> completely. On Google, the legacy version is gone (as expected)
>>> >> but
>>> >>>>>>> the
>>> >>>>>>>> Apache version has dropped to the #3 spot (down from #2
>>> yesterday;
>>> >>>>>>> and down
>>> >>>>>>>> from where the legacy page was pre-migration, which was #1).
>>> >>>>>>>>
>>> >>>>>>>> I think this means we do need to try to get 301s figured out.
>>> >>>>>>>>
>>> >>>>>>>> On Sun, Apr 28, 2019 at 3:06 PM Gian Merlino <gi...@apache.org>
>>> >>> wrote:
>>> >>>>>>>>
>>> >>>>>>>>> Google has picked up the new URL as of today but Bing hasn't.
>>> >>>>>>> Neither has
>>> >>>>>>>>> DuckDuckGo for that matter.
>>> >>>>>>>>>
>>> >>>>>>>>> Currently, Google is showing
>>> >> https://druid.apache.org/community/
>>> >>>>>>> in the
>>> >>>>>>>>> #2 spot and Bing/DDG are showing http://druid.io/community in
>>> >> the
>>> >>>>>>> top
>>> >>>>>>>>> spot. Ominously, the latter two _have_ picked up a page title
>>> >>>>>>> change to
>>> >>>>>>>>> "Redirecting..."
>>> >>>>>>>>>
>>> >>>>>>>>> On Fri, Apr 26, 2019 at 11:00 AM Gian Merlino <gian@apache.org
>>> >
>>> >>>>>>> wrote:
>>> >>>>>>>>>
>>> >>>>>>>>>> An update: this is done now since a couple of days ago, but
>>> >>> Google
>>> >>>>>>> and
>>> >>>>>>>>>> Bing are still showing http://druid.io/community for a search
>>> >>> for
>>> >>>>>>>> "druid
>>> >>>>>>>>>> community" or even "apache druid community":
>>> >>>>>>>>>>
>>> >>>>>>>>>> - https://www.google.com/search?q=druid+community
>>> >>>>>>>>>> - https://www.bing.com/search?q=druid+community
>>> >>>>>>>>>>
>>> >>>>>>>>>> I suggest we keep an eye on the search engines and make sure
>>> >> they
>>> >>>>>>> can
>>> >>>>>>>>>> figure out that the site has changed (I'm not sure how often
>>> >> they
>>> >>>>>>>> crawl).
>>> >>>>>>>>>> If they can then it would make sense to me to move forward
>>> with
>>> >>>>>>>> migrating
>>> >>>>>>>>>> the entire web site.
>>> >>>>>>>>>>
>>> >>>>>>>>>> On Mon, Apr 22, 2019 at 7:49 PM Jonathan Wei <
>>> >> jonwei@apache.org>
>>> >>>>>>> wrote:
>>> >>>>>>>>>>
>>> >>>>>>>>>>> Correction: Xavier was suggesting we use
>>> >>
>>> https://github.com/druid-io/druid-io.github.io/blob/src/_layouts/redirect_page.html
>>> >>>>>>>>>>> ,
>>> >>>>>>>>>>> the existing redirect system used by the Druid website.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> I've opened PRs to do the community page migration test:
>>> >>>>>>>>>>> https://github.com/apache/incubator-druid-website/pull/3
>>> >>>>>>>>>>> https://github.com/druid-io/druid-io.github.io/pull/591
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> On Mon, Apr 22, 2019 at 3:04 PM Gian Merlino <
>>> gian@apache.org
>>> >>>
>>> >>>>>>> wrote:
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> That sounds good to me. I would also consider adding
>>> >> canonical
>>> >>>>>>> tags
>>> >>>>>>>> to
>>> >>>>>>>>>>> all
>>> >>>>>>>>>>>> druid.apache.org pages so we don't have
>>> >>>>>>> druid.incubator.apache.org
>>> >>>>>>>> and
>>> >>>>>>>>>>>> druid.apache.org both floating around (not to mention
>>> >>>>>>> http/https
>>> >>>>>>>>>>> version
>>> >>>>>>>>>>>> of
>>> >>>>>>>>>>>> both).
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> On Mon, Apr 22, 2019 at 2:59 PM Jonathan Wei <
>>> >>> jonwei@apache.org
>>> >>>>>>>>
>>> >>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>> For redirects, Xavier has suggested using
>>> >>> https://help.github.com/en/articles/redirects-on-github-pages
>>> >>>>>>> to
>>> >>>>>>>>>>>> redirect
>>> >>>>>>>>>>>>> to druid.apache.org as a way to transition before the
>>> >>> domain
>>> >>>>>>>>>>> migration
>>> >>>>>>>>>>>>> occurs, and believes that it would have the same SEO
>>> >> effects
>>> >>>>>>> as a
>>> >>>>>>>> 301
>>> >>>>>>>>>>>>> redirect after the new pages are indexed.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> I think we could try migrating the current Community page
>>> >> to
>>> >>>>>>>>>>>>> druid.apache.org with Github redirects and canonical
>>> >> links
>>> >>>>>>>> pointing
>>> >>>>>>>>>>> to
>>> >>>>>>>>>>>> the
>>> >>>>>>>>>>>>> https://druid.apache.org version. If that goes well, we
>>> >>> could
>>> >>>>>>>>>>> continue
>>> >>>>>>>>>>>>> migrating more pages.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> What are the community's thoughts on that?
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> Thanks,
>>> >>>>>>>>>>>>> Jon
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> On Tue, Mar 12, 2019 at 7:19 PM Gian Merlino <
>>> >>> gian@apache.org
>>> >>>>>>>>
>>> >>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> OpenOffice and Groovy both chose to sort of "meld" their
>>> >>>>>>> classic
>>> >>>>>>>>>>> and
>>> >>>>>>>>>>>>> Apache
>>> >>>>>>>>>>>>>> sites together: https://www.openoffice.org/,
>>> >>>>>>>>>>> http://groovy-lang.org/.
>>> >>>>>>>>>>>>> Note
>>> >>>>>>>>>>>>>> how when you click around, you get shuttled between the
>>> >>>>>>> classic
>>> >>>>>>>>>>> domain
>>> >>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>> the Apache domain. Some pages are available on both
>>> >> sites,
>>> >>>>>>> like
>>> >>>>>>>>>>>>>> http://groovy-lang.org/download.html and
>>> >>>>>>>>>>>>>> https://groovy.apache.org/download.html (which don't
>>> >> use
>>> >>>>>>>> canonical
>>> >>>>>>>>>>>> link
>>> >>>>>>>>>>>>>> tags -- does not seem like a good example to follow!).
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> NetBeans (still incubating) also has a "melded" site at
>>> >>>>>>>>>>>>>> https://netbeans.org/ but doesn't seem to consider
>>> >> itself
>>> >>>>>>> done
>>> >>>>>>>>>>> yet.
>>> >>>>>>>>>>>> They
>>> >>>>>>>>>>>>>> are discussing plans on their lists & wiki to do
>>> >> redirects
>>> >>>>>>> from
>>> >>>>>>>>>>>>>> netbeans.org
>>> >>>>>>>>>>>>>> to netbeans.apache.org:
>>> >>
>>> https://cwiki.apache.org/confluence/display/NETBEANS/netbeans.org+Transition+Process
>>> >>>>>>>>>>>>>> ,
>>> >>
>>> https://lists.apache.org/thread.html/ad10fb9d4c8fee571a2f6232b268a3b835f7b823d3a0983b84aeb18a@%3Cdev.netbeans.apache.org%3E
>>> >>>>>>>>>>>>>> .
>>> >>>>>>>>>>>>>> As of today the domain has been donated to ASF, but the
>>> >>>>>>> server is
>>> >>>>>>>>>>> still
>>> >>>>>>>>>>>>> run
>>> >>>>>>>>>>>>>> by Oracle, so the plan doesn't seem to be finished yet.
>>> >>>>>>> (WHOIS
>>> >>>>>>>> for
>>> >>>>>>>>>>>>>> netbeans.org shows ASF as the registrant; netbeans.org
>>> >>>>>>> resolves
>>> >>>>>>>> to
>>> >>>>>>>>>>>>>> lb-netbeans-cms-adc.oracle.com.)
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> The melded sites don't really seem better to me than
>>> >>>>>>> redirecting
>>> >>>>>>>>>>> all
>>> >>>>>>>>>>>> urls
>>> >>>>>>>>>>>>>> on the domain. I guess it depends on if we want to keep
>>> >>>>>>> druid.io
>>> >>>>>>>>>>> as
>>> >>>>>>>>>>>> the
>>> >>>>>>>>>>>>>> official domain forever, or if we think
>>> >> druid.apache.org
>>> >>> is
>>> >>>>>>>>>>> cooler. I
>>> >>>>>>>>>>>>>> definitely think druid.apache.org is cooler so my vote
>>> >> is
>>> >>>>>>> there
>>> >>>>>>>>>>> :).
>>> >>>>>>>>>>>> It's
>>> >>>>>>>>>>>>>> also nice that it supports https. (druid.io does not
>>> >>> today,
>>> >>>>>>>> since
>>> >>>>>>>>>>> it's
>>> >>>>>>>>>>>>> on
>>> >>>>>>>>>>>>>> GitHub pages, which doesn't support https for custom
>>> >>>>>>> domains.)
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> On Tue, Mar 12, 2019 at 7:47 PM Charles Allen
>>> >>>>>>>>>>>>>> <ch...@snap.com.invalid> wrote:
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> Are there other projects who have transitioned an
>>> >>>>>>> independently
>>> >>>>>>>>>>>>>> successful
>>> >>>>>>>>>>>>>>> domain name to an apache one?
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> On Tue, Mar 5, 2019 at 2:13 PM David Lim <
>>> >>>>>>> davidlim@apache.org>
>>> >>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> Who has control over the druid.io domain? Charles
>>> >>>>>>> would that
>>> >>>>>>>>>>> be
>>> >>>>>>>>>>>> you?
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> We'd need support from them for the DNS redirect.
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> On Tue, Mar 5, 2019 at 2:04 PM Jonathan Wei <
>>> >>>>>>>> jonwei@apache.org
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> We still need to complete the website migration to
>>> >>>>>>> Apache
>>> >>>>>>>>>>>>>>> infrastructure.
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> I'll propose the following plan:
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> Proposed Apache Druid website migration plan
>>> >>>>>>>>>>>>>>>>> ========================================
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> These links have some previous discussion on the
>>> >>>>>>> website
>>> >>>>>>>>>>>> migration:
>>> >>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_7cae100b684e0b33e0adda993efea3d6088978700988a0ae632fdd80-40-253Cdev.druid.apache.org-253E&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=G1dTS7FlYGauxNOaQECZix2YwroWVCqJB-cT0nEeNwM&e=
>>> >>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_INFRA-2D17340&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=pwg0jE385gqei6EEEbxugKHWll7oyKoCloFc8ByhlUc&e=
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> From the discussions above, the recommendation is
>>> >> to
>>> >>>>>>> have 2
>>> >>>>>>>>>>>>> separate
>>> >>>>>>>>>>>>>>>> repos
>>> >>>>>>>>>>>>>>>>> for the website: one for source and another for
>>> >>> built
>>> >>>>>>>> content
>>> >>>>>>>>>>>> that
>>> >>>>>>>>>>>>>> will
>>> >>>>>>>>>>>>>>>> be
>>> >>>>>>>>>>>>>>>>> served.
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> Generating site files
>>> >>>>>>>>>>>>>>>>> =======================
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> The Apache site update process will be similar to
>>> >>> our
>>> >>>>>>>> current
>>> >>>>>>>>>>>>>> process.
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> Current process:
>>> >>>>>>>>>>>>>>>>> 1. Push changes to
>>> >>> https://github.com/druid-io/druid-io.github.io/tree/src
>>> >>>>>>>>>>>>>>>>> 2. metamx bot picks up changes, builds, and
>>> >> commits
>>> >>> to
>>> >>>>>>> https://github.com/druid-io/druid-io.github.io/tree/master
>>> >>>>>>>>>>>>>>>>> 3.
>>> >>>>>>>>>>> https://github.com/druid-io/druid-io.github.io/tree/master
>>> is
>>> >>>>>>>>>>>>>>> served
>>> >>>>>>>>>>>>>>>> by
>>> >>>>>>>>>>>>>>>>> github pages
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> Apache process:
>>> >>>>>>>>>>>>>>>>> 1. Push changes to
>>> >>>>>>>>>>>>>>> https://github.com/apache/incubator-druid-website-src
>>> >>>>>>>>>>>>>>>>> 2. Jenkins bot from Apache will build the website
>>> >>> from
>>> >>>>>>>> source
>>> >>>>>>>>>>>> repo,
>>> >>>>>>>>>>>>>>>> commit
>>> >>>>>>>>>>>>>>>>> to
>>> >>> https://github.com/apache/incubator-druid-website
>>> >>>>>>>>>>>>>>>>> 3. Apache Druid website will be served from the
>>> >>>>>>> content in
>>> >>>>>>>>>>>>>>>>> https://github.com/apache/incubator-druid-website
>>> >>>>>>>> (asf-site
>>> >>>>>>>>>>>>> branch)
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> Hosting and SEO
>>> >>>>>>>>>>>>>>>>> ================
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> The Apache site will be hosted at
>>> >> druid.apache.org
>>> >>> on
>>> >>>>>>>> Apache
>>> >>>>>>>>>>>>>>>>> infrastructure:
>>> >>
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.apache.org_dev_project-2Dsite.html&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=_rHEo_asMXKypaunuBTXFkB6Ni3F6KqbEfkck18L7Ag&e=
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> To preserve our search rankings, we can setup 301
>>> >>>>>>> redirects
>>> >>>>>>>>>>> from
>>> >>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>> old
>>> >>>>>>>>>>>>>>>>> druid.io site to the corresponding pages on the
>>> >>>>>>>>>>> druid.apache.org
>>> >>>>>>>>>>>>>>> site. (
>>> >>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__moz.com_learn_seo_redirection&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=lUeWU0dT9thy8gp11RO-Vry7zkYl_W4BXz01fyXJO0A&e=
>>> >>>>>>>>>>>>>>>> )
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> However, Github pages (which currently hosts the
>>> >>>>>>> druid.io
>>> >>>>>>>>>>> site)
>>> >>>>>>>>>>>>> does
>>> >>>>>>>>>>>>>>> not
>>> >>>>>>>>>>>>>>>>> support 301 redirects, so we propose the
>>> >> following:
>>> >>>>>>>>>>>>>>>>> - Setup a new Nginx server that will perform 301
>>> >>>>>>> redirects
>>> >>>>>>>> to
>>> >>>>>>>>>>>>>>>>> druid.apache.org for the druid.io. Imply can host
>>> >>>>>>> this if
>>> >>>>>>>>>>>> needed.
>>> >>>>>>>>>>>>>>>>> - Update the druid.io DNS entry to point to this
>>> >>> new
>>> >>>>>>> Nginx
>>> >>>>>>>>>>>> server
>>> >>>>>>>>>>>>>>>>> - Shut down Github pages hosting for druid.io
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> In addition, we can also set canonical tags on our
>>> >>>>>>> pages:
>>> >>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__moz.com_learn_seo_canonicalization&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=T8G2c6d4EbQ_YDLFQXVebcj0UN9FNrbpPY5Xq4LAR8w&e=
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> Action items
>>> >>>>>>>>>>>>>>>>> ===============
>>> >>>>>>>>>>>>>>>>> - Setup a Jenkins bot that builds the Apache
>>> >> website
>>> >>>>>>>> content
>>> >>>>>>>>>>> from
>>> >>>>>>>>>>>>>>> source
>>> >>>>>>>>>>>>>>>>> - Get the Apache website up
>>> >>>>>>>>>>>>>>>>> - Setup Nginx redirect server for druid.io
>>> >>>>>>>>>>>>>>>>> - Shutdown github pages and redirect DNS for
>>> >>> druid.io
>>> >>>>>>> to
>>> >>>>>>>>>>> Nginx
>>> >>>>>>>>>>>>>>> redirect
>>> >>>>>>>>>>>>>>>>> server
>>> >>>>>>>>>>>>>>>>> - Add canonical tags to pages
>>> >>
>>>
>>

Re: Proposed website migration plan

Posted by Gian Merlino <gi...@apache.org>.
Sorry, I mean references to https://github.com/druid-io/druid-io.github.io
should be changed to https://github.com/apache/incubator-druid-website-src.
That is the change that actually happened, and the one that makes sense.

On Wed, Jun 12, 2019 at 4:25 PM Gian Merlino <gi...@apache.org> wrote:

> Yep, any references to https://github.com/druid-io/druid-io.github.io
> should be changed to https://github.com/apache/incubator-druid. Those
> have all been updated now. I didn't see any references to
> https://github.com/druid-io/druid -- I think we got them all in a
> previous pass.
>
> There are still some lingering references to separate, but affiliated
> projects like https://github.com/druid-io/pydruid. IMO, it makes sense to
> leave them there for now, and incorporate them as subprojects of Druid once
> Druid is a top level project.
>
> On Wed, Jun 12, 2019 at 12:18 PM Julian Hyde <jh...@gmail.com>
> wrote:
>
>> Looks marvelous! Thanks for making it happen.
>>
>> I noticed at least one reference to https://github.com/druid-io on the
>> site. Should be changed to https://github.com/apache/incubator-druid?
>>
>> > On Jun 11, 2019, at 9:44 PM, Gian Merlino <gi...@apache.org> wrote:
>> >
>> > This is now done: druid.io is redirecting to druid.apache.org!!
>> >
>> > Next, we'll add the stuff required by
>> > https://whimsy.apache.org/pods/project/druid. Then, we should be good
>> to go
>> > on the website migration. (Behind the scenes, Vadim Ogievetsky has been
>> > helping tons with this -- thanks a lot!)
>> >
>> >> On Mon, Jun 10, 2019 at 9:00 AM David Lim <da...@apache.org> wrote:
>> >>
>> >> No objections from me - thank you for testing this out.
>> >>
>> >>>> On Mon, Jun 10, 2019 at 7:48 AM Gian Merlino <gi...@apache.org>
>> wrote:
>> >>>
>> >>> It looks like Google has picked up the 301 and [druid use cases] #1
>> >> result
>> >>> is https://druid.apache.org/use-cases now. For [what is druid used
>> for]
>> >>> it's not #4 instead of #2. I think this is the best we are likely to
>> >> get. I
>> >>> am ready to flip the switch if there aren't any objections.
>> >>>
>> >>> On Fri, Jun 7, 2019 at 9:15 PM Gian Merlino <gi...@apache.org> wrote:
>> >>>
>> >>>> Another update: as of
>> >>>> https://github.com/apache/incubator-druid-website-src/pull/1 and
>> >>>> https://github.com/apache/incubator-druid-website/pull/7, the
>> >>>> https://druid.apache.org/ site is now serving almost all pages from
>> >>>> druid.io, except:
>> >>>>
>> >>>> - the index page (it still has a placeholder until we flip the
>> switch)
>> >>>> - the download page (it has a differently-designed download page:
>> >> compare
>> >>>> http://druid.io/downloads.html with
>> >>> http://druid.apache.org/downloads.html
>> >>>> - any docs older than 0.13.0 (they aren't Apache releases)
>> >>>>
>> >>>> If you navigate to https://druid.apache.org/ + any other path from
>> >>>> druid.io, you should see the page.
>> >>>>
>> >>>> I'm hoping to confirm that search engines pick up the 301 for
>> >>>> http://druid.io/use-cases before flipping the switch. Hopefully that
>> >>>> doesn't take much longer. If it does we should talk about how we want
>> >> to
>> >>>> proceed.
>> >>>>
>> >>>>> On Tue, Jun 4, 2019 at 1:48 AM Gian Merlino <gi...@apache.org>
>> wrote:
>> >>>>>
>> >>>>> An update: we do have a redirect server set up on druid.io now:
>> note
>> >>>>> that http://druid.io/community/ and http://druid.io/use-cases both
>> >>>>> redirect to https://druid.apache.org. I just set up the latter
>> >> redirect
>> >>>>> (on /use-cases) as part of 'test this first on a single page'. All
>> >> other
>> >>>>> druid.io URLs are still being hosted using the content from GitHub
>> >>> pages
>> >>>>> at https://github.com/druid-io/druid-io.github.io.
>> >>>>>
>> >>>>> Search engine watch: currently, http://druid.io is the #1 link for
>> >>>>> [druid use cases] on Google, Bing, and DuckDuckGo (and has a cool
>> >>> looking
>> >>>>> infobox on Google & Bing). For [what is druid used for], it's #2 on
>> >>> Google,
>> >>>>> and not ranked on the first page on Bing & DDG. Will monitor this
>> over
>> >>> the
>> >>>>> next few days.
>> >>>>>
>> >>>>>> On Mon, May 6, 2019 at 5:43 PM Gian Merlino <gi...@apache.org>
>> wrote:
>> >>>>>>
>> >>>>>> Hi all,
>> >>>>>>
>> >>>>>> It sounds like we will need a redirect server that issues 301s from
>> >>> each
>> >>>>>> druid.io page to the corresponding druid.apache.org page. Charles
>> >> and
>> >>> I
>> >>>>>> spoke offline and thought that something like Jon's original
>> proposal
>> >>> is
>> >>>>>> the best way to go. I am going to suggest we get started on this,
>> as
>> >>> it's
>> >>>>>> the last major piece of infra to move to ASF.
>> >>>>>>
>> >>>>>> 1) Set up a redirect server to perform 301 redirects to
>> >>> druid.apache.org
>> >>>>>> 2) Post all druid.io content on druid.apache.org
>> >>>>>> 3) Update druid.io DNS to point to the redirect server
>> >>>>>> 4) Shut down GitHub pages hosting for druid.io
>> >>>>>>
>> >>>>>> Steps (2) and (3) should be done as close in time as possible so
>> >> there
>> >>>>>> is no confusion as to which version of the pages is canonical.
>> >>>>>>
>> >>>>>> For the redirect server, two viable options are an nginx server or
>> an
>> >>> S3
>> >>>>>> webpage redirect (
>> >>
>> https://docs.aws.amazon.com/AmazonS3/latest/dev/how-to-page-redirect.html
>> >>> ).
>> >>>>>> Just like we did with the HTML-level redirect, I suggest we test
>> this
>> >>> first
>> >>>>>> on a single page. We can do that by having the redirect server
>> >>> initially
>> >>>>>> start off by hosting all druid.io content (so it's
>> indistinguishable
>> >>>>>> from the GitHub-pages-based site) except for a single page, which
>> it
>> >>>>>> redirects using HTTP 301 to druid.apache.org.
>> >>>>>>
>> >>>>>> I'm planning to start looking into this, so anyone around please
>> >> speak
>> >>>>>> up if you have any advice or alternative approaches to suggest.
>> >>>>>>
>> >>>>>> On Mon, Apr 29, 2019 at 4:01 PM Jonathan Wei <jo...@apache.org>
>> >>> wrote:
>> >>>>>>
>> >>>>>>> Thanks for checking the SEO state, that's somewhat disappointing.
>> >>>>>>>
>> >>>>>>> For Bing, it sounds like they really want you to use 301s (
>> >>>>>>> https://www.bing.com/webmaster/help/webmaster-guidelines-30fba23a
>> ):
>> >>>>>>>
>> >>>>>>>> Bing prefers you use a 301 permanent redirect when moving
>> content,
>> >>>>>>> should
>> >>>>>>> the move be permanent.  If the move is temporary, then a 302
>> >> temporary
>> >>>>>>> redirect will work fine.  Do not use the rel=canonical tag in
>> place
>> >>> of a
>> >>>>>>> proper redirect.
>> >>>>>>>
>> >>>>>>> I wasn't able to find similar guidance re: this issue for
>> >> DuckDuckGo.
>> >>>>>>>
>> >>>>>>> On Mon, Apr 29, 2019 at 10:42 AM Gian Merlino <gi...@apache.org>
>> >>> wrote:
>> >>>>>>>
>> >>>>>>>> Another update: SEO is not looking great after another day
>> passed.
>> >>>>>>> For a
>> >>>>>>>> search for "druid community", both http://druid.io/community and
>> >>>>>>>> https://druid.apache.org/community/ have dropped off the front
>> >> page
>> >>>>>>> of
>> >>>>>>>> Bing
>> >>>>>>>> completely. On Google, the legacy version is gone (as expected)
>> >> but
>> >>>>>>> the
>> >>>>>>>> Apache version has dropped to the #3 spot (down from #2
>> yesterday;
>> >>>>>>> and down
>> >>>>>>>> from where the legacy page was pre-migration, which was #1).
>> >>>>>>>>
>> >>>>>>>> I think this means we do need to try to get 301s figured out.
>> >>>>>>>>
>> >>>>>>>> On Sun, Apr 28, 2019 at 3:06 PM Gian Merlino <gi...@apache.org>
>> >>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Google has picked up the new URL as of today but Bing hasn't.
>> >>>>>>> Neither has
>> >>>>>>>>> DuckDuckGo for that matter.
>> >>>>>>>>>
>> >>>>>>>>> Currently, Google is showing
>> >> https://druid.apache.org/community/
>> >>>>>>> in the
>> >>>>>>>>> #2 spot and Bing/DDG are showing http://druid.io/community in
>> >> the
>> >>>>>>> top
>> >>>>>>>>> spot. Ominously, the latter two _have_ picked up a page title
>> >>>>>>> change to
>> >>>>>>>>> "Redirecting..."
>> >>>>>>>>>
>> >>>>>>>>> On Fri, Apr 26, 2019 at 11:00 AM Gian Merlino <gi...@apache.org>
>> >>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> An update: this is done now since a couple of days ago, but
>> >>> Google
>> >>>>>>> and
>> >>>>>>>>>> Bing are still showing http://druid.io/community for a search
>> >>> for
>> >>>>>>>> "druid
>> >>>>>>>>>> community" or even "apache druid community":
>> >>>>>>>>>>
>> >>>>>>>>>> - https://www.google.com/search?q=druid+community
>> >>>>>>>>>> - https://www.bing.com/search?q=druid+community
>> >>>>>>>>>>
>> >>>>>>>>>> I suggest we keep an eye on the search engines and make sure
>> >> they
>> >>>>>>> can
>> >>>>>>>>>> figure out that the site has changed (I'm not sure how often
>> >> they
>> >>>>>>>> crawl).
>> >>>>>>>>>> If they can then it would make sense to me to move forward with
>> >>>>>>>> migrating
>> >>>>>>>>>> the entire web site.
>> >>>>>>>>>>
>> >>>>>>>>>> On Mon, Apr 22, 2019 at 7:49 PM Jonathan Wei <
>> >> jonwei@apache.org>
>> >>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> Correction: Xavier was suggesting we use
>> >>
>> https://github.com/druid-io/druid-io.github.io/blob/src/_layouts/redirect_page.html
>> >>>>>>>>>>> ,
>> >>>>>>>>>>> the existing redirect system used by the Druid website.
>> >>>>>>>>>>>
>> >>>>>>>>>>> I've opened PRs to do the community page migration test:
>> >>>>>>>>>>> https://github.com/apache/incubator-druid-website/pull/3
>> >>>>>>>>>>> https://github.com/druid-io/druid-io.github.io/pull/591
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Mon, Apr 22, 2019 at 3:04 PM Gian Merlino <gian@apache.org
>> >>>
>> >>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> That sounds good to me. I would also consider adding
>> >> canonical
>> >>>>>>> tags
>> >>>>>>>> to
>> >>>>>>>>>>> all
>> >>>>>>>>>>>> druid.apache.org pages so we don't have
>> >>>>>>> druid.incubator.apache.org
>> >>>>>>>> and
>> >>>>>>>>>>>> druid.apache.org both floating around (not to mention
>> >>>>>>> http/https
>> >>>>>>>>>>> version
>> >>>>>>>>>>>> of
>> >>>>>>>>>>>> both).
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On Mon, Apr 22, 2019 at 2:59 PM Jonathan Wei <
>> >>> jonwei@apache.org
>> >>>>>>>>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> For redirects, Xavier has suggested using
>> >>> https://help.github.com/en/articles/redirects-on-github-pages
>> >>>>>>> to
>> >>>>>>>>>>>> redirect
>> >>>>>>>>>>>>> to druid.apache.org as a way to transition before the
>> >>> domain
>> >>>>>>>>>>> migration
>> >>>>>>>>>>>>> occurs, and believes that it would have the same SEO
>> >> effects
>> >>>>>>> as a
>> >>>>>>>> 301
>> >>>>>>>>>>>>> redirect after the new pages are indexed.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> I think we could try migrating the current Community page
>> >> to
>> >>>>>>>>>>>>> druid.apache.org with Github redirects and canonical
>> >> links
>> >>>>>>>> pointing
>> >>>>>>>>>>> to
>> >>>>>>>>>>>> the
>> >>>>>>>>>>>>> https://druid.apache.org version. If that goes well, we
>> >>> could
>> >>>>>>>>>>> continue
>> >>>>>>>>>>>>> migrating more pages.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> What are the community's thoughts on that?
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Thanks,
>> >>>>>>>>>>>>> Jon
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On Tue, Mar 12, 2019 at 7:19 PM Gian Merlino <
>> >>> gian@apache.org
>> >>>>>>>>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> OpenOffice and Groovy both chose to sort of "meld" their
>> >>>>>>> classic
>> >>>>>>>>>>> and
>> >>>>>>>>>>>>> Apache
>> >>>>>>>>>>>>>> sites together: https://www.openoffice.org/,
>> >>>>>>>>>>> http://groovy-lang.org/.
>> >>>>>>>>>>>>> Note
>> >>>>>>>>>>>>>> how when you click around, you get shuttled between the
>> >>>>>>> classic
>> >>>>>>>>>>> domain
>> >>>>>>>>>>>>> and
>> >>>>>>>>>>>>>> the Apache domain. Some pages are available on both
>> >> sites,
>> >>>>>>> like
>> >>>>>>>>>>>>>> http://groovy-lang.org/download.html and
>> >>>>>>>>>>>>>> https://groovy.apache.org/download.html (which don't
>> >> use
>> >>>>>>>> canonical
>> >>>>>>>>>>>> link
>> >>>>>>>>>>>>>> tags -- does not seem like a good example to follow!).
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> NetBeans (still incubating) also has a "melded" site at
>> >>>>>>>>>>>>>> https://netbeans.org/ but doesn't seem to consider
>> >> itself
>> >>>>>>> done
>> >>>>>>>>>>> yet.
>> >>>>>>>>>>>> They
>> >>>>>>>>>>>>>> are discussing plans on their lists & wiki to do
>> >> redirects
>> >>>>>>> from
>> >>>>>>>>>>>>>> netbeans.org
>> >>>>>>>>>>>>>> to netbeans.apache.org:
>> >>
>> https://cwiki.apache.org/confluence/display/NETBEANS/netbeans.org+Transition+Process
>> >>>>>>>>>>>>>> ,
>> >>
>> https://lists.apache.org/thread.html/ad10fb9d4c8fee571a2f6232b268a3b835f7b823d3a0983b84aeb18a@%3Cdev.netbeans.apache.org%3E
>> >>>>>>>>>>>>>> .
>> >>>>>>>>>>>>>> As of today the domain has been donated to ASF, but the
>> >>>>>>> server is
>> >>>>>>>>>>> still
>> >>>>>>>>>>>>> run
>> >>>>>>>>>>>>>> by Oracle, so the plan doesn't seem to be finished yet.
>> >>>>>>> (WHOIS
>> >>>>>>>> for
>> >>>>>>>>>>>>>> netbeans.org shows ASF as the registrant; netbeans.org
>> >>>>>>> resolves
>> >>>>>>>> to
>> >>>>>>>>>>>>>> lb-netbeans-cms-adc.oracle.com.)
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> The melded sites don't really seem better to me than
>> >>>>>>> redirecting
>> >>>>>>>>>>> all
>> >>>>>>>>>>>> urls
>> >>>>>>>>>>>>>> on the domain. I guess it depends on if we want to keep
>> >>>>>>> druid.io
>> >>>>>>>>>>> as
>> >>>>>>>>>>>> the
>> >>>>>>>>>>>>>> official domain forever, or if we think
>> >> druid.apache.org
>> >>> is
>> >>>>>>>>>>> cooler. I
>> >>>>>>>>>>>>>> definitely think druid.apache.org is cooler so my vote
>> >> is
>> >>>>>>> there
>> >>>>>>>>>>> :).
>> >>>>>>>>>>>> It's
>> >>>>>>>>>>>>>> also nice that it supports https. (druid.io does not
>> >>> today,
>> >>>>>>>> since
>> >>>>>>>>>>> it's
>> >>>>>>>>>>>>> on
>> >>>>>>>>>>>>>> GitHub pages, which doesn't support https for custom
>> >>>>>>> domains.)
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> On Tue, Mar 12, 2019 at 7:47 PM Charles Allen
>> >>>>>>>>>>>>>> <ch...@snap.com.invalid> wrote:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Are there other projects who have transitioned an
>> >>>>>>> independently
>> >>>>>>>>>>>>>> successful
>> >>>>>>>>>>>>>>> domain name to an apache one?
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> On Tue, Mar 5, 2019 at 2:13 PM David Lim <
>> >>>>>>> davidlim@apache.org>
>> >>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Who has control over the druid.io domain? Charles
>> >>>>>>> would that
>> >>>>>>>>>>> be
>> >>>>>>>>>>>> you?
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> We'd need support from them for the DNS redirect.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> On Tue, Mar 5, 2019 at 2:04 PM Jonathan Wei <
>> >>>>>>>> jonwei@apache.org
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> We still need to complete the website migration to
>> >>>>>>> Apache
>> >>>>>>>>>>>>>>> infrastructure.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> I'll propose the following plan:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Proposed Apache Druid website migration plan
>> >>>>>>>>>>>>>>>>> ========================================
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> These links have some previous discussion on the
>> >>>>>>> website
>> >>>>>>>>>>>> migration:
>> >>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_7cae100b684e0b33e0adda993efea3d6088978700988a0ae632fdd80-40-253Cdev.druid.apache.org-253E&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=G1dTS7FlYGauxNOaQECZix2YwroWVCqJB-cT0nEeNwM&e=
>> >>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_INFRA-2D17340&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=pwg0jE385gqei6EEEbxugKHWll7oyKoCloFc8ByhlUc&e=
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> From the discussions above, the recommendation is
>> >> to
>> >>>>>>> have 2
>> >>>>>>>>>>>>> separate
>> >>>>>>>>>>>>>>>> repos
>> >>>>>>>>>>>>>>>>> for the website: one for source and another for
>> >>> built
>> >>>>>>>> content
>> >>>>>>>>>>>> that
>> >>>>>>>>>>>>>> will
>> >>>>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>> served.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Generating site files
>> >>>>>>>>>>>>>>>>> =======================
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> The Apache site update process will be similar to
>> >>> our
>> >>>>>>>> current
>> >>>>>>>>>>>>>> process.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Current process:
>> >>>>>>>>>>>>>>>>> 1. Push changes to
>> >>> https://github.com/druid-io/druid-io.github.io/tree/src
>> >>>>>>>>>>>>>>>>> 2. metamx bot picks up changes, builds, and
>> >> commits
>> >>> to
>> >>>>>>> https://github.com/druid-io/druid-io.github.io/tree/master
>> >>>>>>>>>>>>>>>>> 3.
>> >>>>>>>>>>> https://github.com/druid-io/druid-io.github.io/tree/master is
>> >>>>>>>>>>>>>>> served
>> >>>>>>>>>>>>>>>> by
>> >>>>>>>>>>>>>>>>> github pages
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Apache process:
>> >>>>>>>>>>>>>>>>> 1. Push changes to
>> >>>>>>>>>>>>>>> https://github.com/apache/incubator-druid-website-src
>> >>>>>>>>>>>>>>>>> 2. Jenkins bot from Apache will build the website
>> >>> from
>> >>>>>>>> source
>> >>>>>>>>>>>> repo,
>> >>>>>>>>>>>>>>>> commit
>> >>>>>>>>>>>>>>>>> to
>> >>> https://github.com/apache/incubator-druid-website
>> >>>>>>>>>>>>>>>>> 3. Apache Druid website will be served from the
>> >>>>>>> content in
>> >>>>>>>>>>>>>>>>> https://github.com/apache/incubator-druid-website
>> >>>>>>>> (asf-site
>> >>>>>>>>>>>>> branch)
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Hosting and SEO
>> >>>>>>>>>>>>>>>>> ================
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> The Apache site will be hosted at
>> >> druid.apache.org
>> >>> on
>> >>>>>>>> Apache
>> >>>>>>>>>>>>>>>>> infrastructure:
>> >>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.apache.org_dev_project-2Dsite.html&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=_rHEo_asMXKypaunuBTXFkB6Ni3F6KqbEfkck18L7Ag&e=
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> To preserve our search rankings, we can setup 301
>> >>>>>>> redirects
>> >>>>>>>>>>> from
>> >>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>> old
>> >>>>>>>>>>>>>>>>> druid.io site to the corresponding pages on the
>> >>>>>>>>>>> druid.apache.org
>> >>>>>>>>>>>>>>> site. (
>> >>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__moz.com_learn_seo_redirection&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=lUeWU0dT9thy8gp11RO-Vry7zkYl_W4BXz01fyXJO0A&e=
>> >>>>>>>>>>>>>>>> )
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> However, Github pages (which currently hosts the
>> >>>>>>> druid.io
>> >>>>>>>>>>> site)
>> >>>>>>>>>>>>> does
>> >>>>>>>>>>>>>>> not
>> >>>>>>>>>>>>>>>>> support 301 redirects, so we propose the
>> >> following:
>> >>>>>>>>>>>>>>>>> - Setup a new Nginx server that will perform 301
>> >>>>>>> redirects
>> >>>>>>>> to
>> >>>>>>>>>>>>>>>>> druid.apache.org for the druid.io. Imply can host
>> >>>>>>> this if
>> >>>>>>>>>>>> needed.
>> >>>>>>>>>>>>>>>>> - Update the druid.io DNS entry to point to this
>> >>> new
>> >>>>>>> Nginx
>> >>>>>>>>>>>> server
>> >>>>>>>>>>>>>>>>> - Shut down Github pages hosting for druid.io
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> In addition, we can also set canonical tags on our
>> >>>>>>> pages:
>> >>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__moz.com_learn_seo_canonicalization&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=T8G2c6d4EbQ_YDLFQXVebcj0UN9FNrbpPY5Xq4LAR8w&e=
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Action items
>> >>>>>>>>>>>>>>>>> ===============
>> >>>>>>>>>>>>>>>>> - Setup a Jenkins bot that builds the Apache
>> >> website
>> >>>>>>>> content
>> >>>>>>>>>>> from
>> >>>>>>>>>>>>>>> source
>> >>>>>>>>>>>>>>>>> - Get the Apache website up
>> >>>>>>>>>>>>>>>>> - Setup Nginx redirect server for druid.io
>> >>>>>>>>>>>>>>>>> - Shutdown github pages and redirect DNS for
>> >>> druid.io
>> >>>>>>> to
>> >>>>>>>>>>> Nginx
>> >>>>>>>>>>>>>>> redirect
>> >>>>>>>>>>>>>>>>> server
>> >>>>>>>>>>>>>>>>> - Add canonical tags to pages
>> >>
>>
>

Re: Proposed website migration plan

Posted by Gian Merlino <gi...@apache.org>.
Yep, any references to https://github.com/druid-io/druid-io.github.io
should be changed to https://github.com/apache/incubator-druid. Those have
all been updated now. I didn't see any references to
https://github.com/druid-io/druid -- I think we got them all in a previous
pass.

There are still some lingering references to separate, but affiliated
projects like https://github.com/druid-io/pydruid. IMO, it makes sense to
leave them there for now, and incorporate them as subprojects of Druid once
Druid is a top level project.

On Wed, Jun 12, 2019 at 12:18 PM Julian Hyde <jh...@gmail.com> wrote:

> Looks marvelous! Thanks for making it happen.
>
> I noticed at least one reference to https://github.com/druid-io on the
> site. Should be changed to https://github.com/apache/incubator-druid?
>
> > On Jun 11, 2019, at 9:44 PM, Gian Merlino <gi...@apache.org> wrote:
> >
> > This is now done: druid.io is redirecting to druid.apache.org!!
> >
> > Next, we'll add the stuff required by
> > https://whimsy.apache.org/pods/project/druid. Then, we should be good
> to go
> > on the website migration. (Behind the scenes, Vadim Ogievetsky has been
> > helping tons with this -- thanks a lot!)
> >
> >> On Mon, Jun 10, 2019 at 9:00 AM David Lim <da...@apache.org> wrote:
> >>
> >> No objections from me - thank you for testing this out.
> >>
> >>>> On Mon, Jun 10, 2019 at 7:48 AM Gian Merlino <gi...@apache.org> wrote:
> >>>
> >>> It looks like Google has picked up the 301 and [druid use cases] #1
> >> result
> >>> is https://druid.apache.org/use-cases now. For [what is druid used
> for]
> >>> it's not #4 instead of #2. I think this is the best we are likely to
> >> get. I
> >>> am ready to flip the switch if there aren't any objections.
> >>>
> >>> On Fri, Jun 7, 2019 at 9:15 PM Gian Merlino <gi...@apache.org> wrote:
> >>>
> >>>> Another update: as of
> >>>> https://github.com/apache/incubator-druid-website-src/pull/1 and
> >>>> https://github.com/apache/incubator-druid-website/pull/7, the
> >>>> https://druid.apache.org/ site is now serving almost all pages from
> >>>> druid.io, except:
> >>>>
> >>>> - the index page (it still has a placeholder until we flip the switch)
> >>>> - the download page (it has a differently-designed download page:
> >> compare
> >>>> http://druid.io/downloads.html with
> >>> http://druid.apache.org/downloads.html
> >>>> - any docs older than 0.13.0 (they aren't Apache releases)
> >>>>
> >>>> If you navigate to https://druid.apache.org/ + any other path from
> >>>> druid.io, you should see the page.
> >>>>
> >>>> I'm hoping to confirm that search engines pick up the 301 for
> >>>> http://druid.io/use-cases before flipping the switch. Hopefully that
> >>>> doesn't take much longer. If it does we should talk about how we want
> >> to
> >>>> proceed.
> >>>>
> >>>>> On Tue, Jun 4, 2019 at 1:48 AM Gian Merlino <gi...@apache.org> wrote:
> >>>>>
> >>>>> An update: we do have a redirect server set up on druid.io now: note
> >>>>> that http://druid.io/community/ and http://druid.io/use-cases both
> >>>>> redirect to https://druid.apache.org. I just set up the latter
> >> redirect
> >>>>> (on /use-cases) as part of 'test this first on a single page'. All
> >> other
> >>>>> druid.io URLs are still being hosted using the content from GitHub
> >>> pages
> >>>>> at https://github.com/druid-io/druid-io.github.io.
> >>>>>
> >>>>> Search engine watch: currently, http://druid.io is the #1 link for
> >>>>> [druid use cases] on Google, Bing, and DuckDuckGo (and has a cool
> >>> looking
> >>>>> infobox on Google & Bing). For [what is druid used for], it's #2 on
> >>> Google,
> >>>>> and not ranked on the first page on Bing & DDG. Will monitor this
> over
> >>> the
> >>>>> next few days.
> >>>>>
> >>>>>> On Mon, May 6, 2019 at 5:43 PM Gian Merlino <gi...@apache.org>
> wrote:
> >>>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> It sounds like we will need a redirect server that issues 301s from
> >>> each
> >>>>>> druid.io page to the corresponding druid.apache.org page. Charles
> >> and
> >>> I
> >>>>>> spoke offline and thought that something like Jon's original
> proposal
> >>> is
> >>>>>> the best way to go. I am going to suggest we get started on this, as
> >>> it's
> >>>>>> the last major piece of infra to move to ASF.
> >>>>>>
> >>>>>> 1) Set up a redirect server to perform 301 redirects to
> >>> druid.apache.org
> >>>>>> 2) Post all druid.io content on druid.apache.org
> >>>>>> 3) Update druid.io DNS to point to the redirect server
> >>>>>> 4) Shut down GitHub pages hosting for druid.io
> >>>>>>
> >>>>>> Steps (2) and (3) should be done as close in time as possible so
> >> there
> >>>>>> is no confusion as to which version of the pages is canonical.
> >>>>>>
> >>>>>> For the redirect server, two viable options are an nginx server or
> an
> >>> S3
> >>>>>> webpage redirect (
> >>
> https://docs.aws.amazon.com/AmazonS3/latest/dev/how-to-page-redirect.html
> >>> ).
> >>>>>> Just like we did with the HTML-level redirect, I suggest we test
> this
> >>> first
> >>>>>> on a single page. We can do that by having the redirect server
> >>> initially
> >>>>>> start off by hosting all druid.io content (so it's
> indistinguishable
> >>>>>> from the GitHub-pages-based site) except for a single page, which it
> >>>>>> redirects using HTTP 301 to druid.apache.org.
> >>>>>>
> >>>>>> I'm planning to start looking into this, so anyone around please
> >> speak
> >>>>>> up if you have any advice or alternative approaches to suggest.
> >>>>>>
> >>>>>> On Mon, Apr 29, 2019 at 4:01 PM Jonathan Wei <jo...@apache.org>
> >>> wrote:
> >>>>>>
> >>>>>>> Thanks for checking the SEO state, that's somewhat disappointing.
> >>>>>>>
> >>>>>>> For Bing, it sounds like they really want you to use 301s (
> >>>>>>> https://www.bing.com/webmaster/help/webmaster-guidelines-30fba23a
> ):
> >>>>>>>
> >>>>>>>> Bing prefers you use a 301 permanent redirect when moving content,
> >>>>>>> should
> >>>>>>> the move be permanent.  If the move is temporary, then a 302
> >> temporary
> >>>>>>> redirect will work fine.  Do not use the rel=canonical tag in place
> >>> of a
> >>>>>>> proper redirect.
> >>>>>>>
> >>>>>>> I wasn't able to find similar guidance re: this issue for
> >> DuckDuckGo.
> >>>>>>>
> >>>>>>> On Mon, Apr 29, 2019 at 10:42 AM Gian Merlino <gi...@apache.org>
> >>> wrote:
> >>>>>>>
> >>>>>>>> Another update: SEO is not looking great after another day passed.
> >>>>>>> For a
> >>>>>>>> search for "druid community", both http://druid.io/community and
> >>>>>>>> https://druid.apache.org/community/ have dropped off the front
> >> page
> >>>>>>> of
> >>>>>>>> Bing
> >>>>>>>> completely. On Google, the legacy version is gone (as expected)
> >> but
> >>>>>>> the
> >>>>>>>> Apache version has dropped to the #3 spot (down from #2 yesterday;
> >>>>>>> and down
> >>>>>>>> from where the legacy page was pre-migration, which was #1).
> >>>>>>>>
> >>>>>>>> I think this means we do need to try to get 301s figured out.
> >>>>>>>>
> >>>>>>>> On Sun, Apr 28, 2019 at 3:06 PM Gian Merlino <gi...@apache.org>
> >>> wrote:
> >>>>>>>>
> >>>>>>>>> Google has picked up the new URL as of today but Bing hasn't.
> >>>>>>> Neither has
> >>>>>>>>> DuckDuckGo for that matter.
> >>>>>>>>>
> >>>>>>>>> Currently, Google is showing
> >> https://druid.apache.org/community/
> >>>>>>> in the
> >>>>>>>>> #2 spot and Bing/DDG are showing http://druid.io/community in
> >> the
> >>>>>>> top
> >>>>>>>>> spot. Ominously, the latter two _have_ picked up a page title
> >>>>>>> change to
> >>>>>>>>> "Redirecting..."
> >>>>>>>>>
> >>>>>>>>> On Fri, Apr 26, 2019 at 11:00 AM Gian Merlino <gi...@apache.org>
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> An update: this is done now since a couple of days ago, but
> >>> Google
> >>>>>>> and
> >>>>>>>>>> Bing are still showing http://druid.io/community for a search
> >>> for
> >>>>>>>> "druid
> >>>>>>>>>> community" or even "apache druid community":
> >>>>>>>>>>
> >>>>>>>>>> - https://www.google.com/search?q=druid+community
> >>>>>>>>>> - https://www.bing.com/search?q=druid+community
> >>>>>>>>>>
> >>>>>>>>>> I suggest we keep an eye on the search engines and make sure
> >> they
> >>>>>>> can
> >>>>>>>>>> figure out that the site has changed (I'm not sure how often
> >> they
> >>>>>>>> crawl).
> >>>>>>>>>> If they can then it would make sense to me to move forward with
> >>>>>>>> migrating
> >>>>>>>>>> the entire web site.
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Apr 22, 2019 at 7:49 PM Jonathan Wei <
> >> jonwei@apache.org>
> >>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Correction: Xavier was suggesting we use
> >>
> https://github.com/druid-io/druid-io.github.io/blob/src/_layouts/redirect_page.html
> >>>>>>>>>>> ,
> >>>>>>>>>>> the existing redirect system used by the Druid website.
> >>>>>>>>>>>
> >>>>>>>>>>> I've opened PRs to do the community page migration test:
> >>>>>>>>>>> https://github.com/apache/incubator-druid-website/pull/3
> >>>>>>>>>>> https://github.com/druid-io/druid-io.github.io/pull/591
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Apr 22, 2019 at 3:04 PM Gian Merlino <gian@apache.org
> >>>
> >>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> That sounds good to me. I would also consider adding
> >> canonical
> >>>>>>> tags
> >>>>>>>> to
> >>>>>>>>>>> all
> >>>>>>>>>>>> druid.apache.org pages so we don't have
> >>>>>>> druid.incubator.apache.org
> >>>>>>>> and
> >>>>>>>>>>>> druid.apache.org both floating around (not to mention
> >>>>>>> http/https
> >>>>>>>>>>> version
> >>>>>>>>>>>> of
> >>>>>>>>>>>> both).
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Mon, Apr 22, 2019 at 2:59 PM Jonathan Wei <
> >>> jonwei@apache.org
> >>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> For redirects, Xavier has suggested using
> >>> https://help.github.com/en/articles/redirects-on-github-pages
> >>>>>>> to
> >>>>>>>>>>>> redirect
> >>>>>>>>>>>>> to druid.apache.org as a way to transition before the
> >>> domain
> >>>>>>>>>>> migration
> >>>>>>>>>>>>> occurs, and believes that it would have the same SEO
> >> effects
> >>>>>>> as a
> >>>>>>>> 301
> >>>>>>>>>>>>> redirect after the new pages are indexed.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I think we could try migrating the current Community page
> >> to
> >>>>>>>>>>>>> druid.apache.org with Github redirects and canonical
> >> links
> >>>>>>>> pointing
> >>>>>>>>>>> to
> >>>>>>>>>>>> the
> >>>>>>>>>>>>> https://druid.apache.org version. If that goes well, we
> >>> could
> >>>>>>>>>>> continue
> >>>>>>>>>>>>> migrating more pages.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> What are the community's thoughts on that?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Jon
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Tue, Mar 12, 2019 at 7:19 PM Gian Merlino <
> >>> gian@apache.org
> >>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> OpenOffice and Groovy both chose to sort of "meld" their
> >>>>>>> classic
> >>>>>>>>>>> and
> >>>>>>>>>>>>> Apache
> >>>>>>>>>>>>>> sites together: https://www.openoffice.org/,
> >>>>>>>>>>> http://groovy-lang.org/.
> >>>>>>>>>>>>> Note
> >>>>>>>>>>>>>> how when you click around, you get shuttled between the
> >>>>>>> classic
> >>>>>>>>>>> domain
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>> the Apache domain. Some pages are available on both
> >> sites,
> >>>>>>> like
> >>>>>>>>>>>>>> http://groovy-lang.org/download.html and
> >>>>>>>>>>>>>> https://groovy.apache.org/download.html (which don't
> >> use
> >>>>>>>> canonical
> >>>>>>>>>>>> link
> >>>>>>>>>>>>>> tags -- does not seem like a good example to follow!).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> NetBeans (still incubating) also has a "melded" site at
> >>>>>>>>>>>>>> https://netbeans.org/ but doesn't seem to consider
> >> itself
> >>>>>>> done
> >>>>>>>>>>> yet.
> >>>>>>>>>>>> They
> >>>>>>>>>>>>>> are discussing plans on their lists & wiki to do
> >> redirects
> >>>>>>> from
> >>>>>>>>>>>>>> netbeans.org
> >>>>>>>>>>>>>> to netbeans.apache.org:
> >>
> https://cwiki.apache.org/confluence/display/NETBEANS/netbeans.org+Transition+Process
> >>>>>>>>>>>>>> ,
> >>
> https://lists.apache.org/thread.html/ad10fb9d4c8fee571a2f6232b268a3b835f7b823d3a0983b84aeb18a@%3Cdev.netbeans.apache.org%3E
> >>>>>>>>>>>>>> .
> >>>>>>>>>>>>>> As of today the domain has been donated to ASF, but the
> >>>>>>> server is
> >>>>>>>>>>> still
> >>>>>>>>>>>>> run
> >>>>>>>>>>>>>> by Oracle, so the plan doesn't seem to be finished yet.
> >>>>>>> (WHOIS
> >>>>>>>> for
> >>>>>>>>>>>>>> netbeans.org shows ASF as the registrant; netbeans.org
> >>>>>>> resolves
> >>>>>>>> to
> >>>>>>>>>>>>>> lb-netbeans-cms-adc.oracle.com.)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The melded sites don't really seem better to me than
> >>>>>>> redirecting
> >>>>>>>>>>> all
> >>>>>>>>>>>> urls
> >>>>>>>>>>>>>> on the domain. I guess it depends on if we want to keep
> >>>>>>> druid.io
> >>>>>>>>>>> as
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>> official domain forever, or if we think
> >> druid.apache.org
> >>> is
> >>>>>>>>>>> cooler. I
> >>>>>>>>>>>>>> definitely think druid.apache.org is cooler so my vote
> >> is
> >>>>>>> there
> >>>>>>>>>>> :).
> >>>>>>>>>>>> It's
> >>>>>>>>>>>>>> also nice that it supports https. (druid.io does not
> >>> today,
> >>>>>>>> since
> >>>>>>>>>>> it's
> >>>>>>>>>>>>> on
> >>>>>>>>>>>>>> GitHub pages, which doesn't support https for custom
> >>>>>>> domains.)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Tue, Mar 12, 2019 at 7:47 PM Charles Allen
> >>>>>>>>>>>>>> <ch...@snap.com.invalid> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Are there other projects who have transitioned an
> >>>>>>> independently
> >>>>>>>>>>>>>> successful
> >>>>>>>>>>>>>>> domain name to an apache one?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Tue, Mar 5, 2019 at 2:13 PM David Lim <
> >>>>>>> davidlim@apache.org>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Who has control over the druid.io domain? Charles
> >>>>>>> would that
> >>>>>>>>>>> be
> >>>>>>>>>>>> you?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> We'd need support from them for the DNS redirect.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Tue, Mar 5, 2019 at 2:04 PM Jonathan Wei <
> >>>>>>>> jonwei@apache.org
> >>>>>>>>>>>>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> We still need to complete the website migration to
> >>>>>>> Apache
> >>>>>>>>>>>>>>> infrastructure.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I'll propose the following plan:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Proposed Apache Druid website migration plan
> >>>>>>>>>>>>>>>>> ========================================
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> These links have some previous discussion on the
> >>>>>>> website
> >>>>>>>>>>>> migration:
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_7cae100b684e0b33e0adda993efea3d6088978700988a0ae632fdd80-40-253Cdev.druid.apache.org-253E&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=G1dTS7FlYGauxNOaQECZix2YwroWVCqJB-cT0nEeNwM&e=
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_INFRA-2D17340&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=pwg0jE385gqei6EEEbxugKHWll7oyKoCloFc8ByhlUc&e=
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> From the discussions above, the recommendation is
> >> to
> >>>>>>> have 2
> >>>>>>>>>>>>> separate
> >>>>>>>>>>>>>>>> repos
> >>>>>>>>>>>>>>>>> for the website: one for source and another for
> >>> built
> >>>>>>>> content
> >>>>>>>>>>>> that
> >>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>> served.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Generating site files
> >>>>>>>>>>>>>>>>> =======================
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> The Apache site update process will be similar to
> >>> our
> >>>>>>>> current
> >>>>>>>>>>>>>> process.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Current process:
> >>>>>>>>>>>>>>>>> 1. Push changes to
> >>> https://github.com/druid-io/druid-io.github.io/tree/src
> >>>>>>>>>>>>>>>>> 2. metamx bot picks up changes, builds, and
> >> commits
> >>> to
> >>>>>>> https://github.com/druid-io/druid-io.github.io/tree/master
> >>>>>>>>>>>>>>>>> 3.
> >>>>>>>>>>> https://github.com/druid-io/druid-io.github.io/tree/master is
> >>>>>>>>>>>>>>> served
> >>>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>> github pages
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Apache process:
> >>>>>>>>>>>>>>>>> 1. Push changes to
> >>>>>>>>>>>>>>> https://github.com/apache/incubator-druid-website-src
> >>>>>>>>>>>>>>>>> 2. Jenkins bot from Apache will build the website
> >>> from
> >>>>>>>> source
> >>>>>>>>>>>> repo,
> >>>>>>>>>>>>>>>> commit
> >>>>>>>>>>>>>>>>> to
> >>> https://github.com/apache/incubator-druid-website
> >>>>>>>>>>>>>>>>> 3. Apache Druid website will be served from the
> >>>>>>> content in
> >>>>>>>>>>>>>>>>> https://github.com/apache/incubator-druid-website
> >>>>>>>> (asf-site
> >>>>>>>>>>>>> branch)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hosting and SEO
> >>>>>>>>>>>>>>>>> ================
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> The Apache site will be hosted at
> >> druid.apache.org
> >>> on
> >>>>>>>> Apache
> >>>>>>>>>>>>>>>>> infrastructure:
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.apache.org_dev_project-2Dsite.html&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=_rHEo_asMXKypaunuBTXFkB6Ni3F6KqbEfkck18L7Ag&e=
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> To preserve our search rankings, we can setup 301
> >>>>>>> redirects
> >>>>>>>>>>> from
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> old
> >>>>>>>>>>>>>>>>> druid.io site to the corresponding pages on the
> >>>>>>>>>>> druid.apache.org
> >>>>>>>>>>>>>>> site. (
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__moz.com_learn_seo_redirection&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=lUeWU0dT9thy8gp11RO-Vry7zkYl_W4BXz01fyXJO0A&e=
> >>>>>>>>>>>>>>>> )
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> However, Github pages (which currently hosts the
> >>>>>>> druid.io
> >>>>>>>>>>> site)
> >>>>>>>>>>>>> does
> >>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>> support 301 redirects, so we propose the
> >> following:
> >>>>>>>>>>>>>>>>> - Setup a new Nginx server that will perform 301
> >>>>>>> redirects
> >>>>>>>> to
> >>>>>>>>>>>>>>>>> druid.apache.org for the druid.io. Imply can host
> >>>>>>> this if
> >>>>>>>>>>>> needed.
> >>>>>>>>>>>>>>>>> - Update the druid.io DNS entry to point to this
> >>> new
> >>>>>>> Nginx
> >>>>>>>>>>>> server
> >>>>>>>>>>>>>>>>> - Shut down Github pages hosting for druid.io
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> In addition, we can also set canonical tags on our
> >>>>>>> pages:
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__moz.com_learn_seo_canonicalization&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=T8G2c6d4EbQ_YDLFQXVebcj0UN9FNrbpPY5Xq4LAR8w&e=
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Action items
> >>>>>>>>>>>>>>>>> ===============
> >>>>>>>>>>>>>>>>> - Setup a Jenkins bot that builds the Apache
> >> website
> >>>>>>>> content
> >>>>>>>>>>> from
> >>>>>>>>>>>>>>> source
> >>>>>>>>>>>>>>>>> - Get the Apache website up
> >>>>>>>>>>>>>>>>> - Setup Nginx redirect server for druid.io
> >>>>>>>>>>>>>>>>> - Shutdown github pages and redirect DNS for
> >>> druid.io
> >>>>>>> to
> >>>>>>>>>>> Nginx
> >>>>>>>>>>>>>>> redirect
> >>>>>>>>>>>>>>>>> server
> >>>>>>>>>>>>>>>>> - Add canonical tags to pages
> >>
>

Re: Proposed website migration plan

Posted by Julian Hyde <jh...@gmail.com>.
Looks marvelous! Thanks for making it happen. 

I noticed at least one reference to https://github.com/druid-io on the site. Should be changed to https://github.com/apache/incubator-druid? 

> On Jun 11, 2019, at 9:44 PM, Gian Merlino <gi...@apache.org> wrote:
> 
> This is now done: druid.io is redirecting to druid.apache.org!!
> 
> Next, we'll add the stuff required by
> https://whimsy.apache.org/pods/project/druid. Then, we should be good to go
> on the website migration. (Behind the scenes, Vadim Ogievetsky has been
> helping tons with this -- thanks a lot!)
> 
>> On Mon, Jun 10, 2019 at 9:00 AM David Lim <da...@apache.org> wrote:
>> 
>> No objections from me - thank you for testing this out.
>> 
>>>> On Mon, Jun 10, 2019 at 7:48 AM Gian Merlino <gi...@apache.org> wrote:
>>> 
>>> It looks like Google has picked up the 301 and [druid use cases] #1
>> result
>>> is https://druid.apache.org/use-cases now. For [what is druid used for]
>>> it's not #4 instead of #2. I think this is the best we are likely to
>> get. I
>>> am ready to flip the switch if there aren't any objections.
>>> 
>>> On Fri, Jun 7, 2019 at 9:15 PM Gian Merlino <gi...@apache.org> wrote:
>>> 
>>>> Another update: as of
>>>> https://github.com/apache/incubator-druid-website-src/pull/1 and
>>>> https://github.com/apache/incubator-druid-website/pull/7, the
>>>> https://druid.apache.org/ site is now serving almost all pages from
>>>> druid.io, except:
>>>> 
>>>> - the index page (it still has a placeholder until we flip the switch)
>>>> - the download page (it has a differently-designed download page:
>> compare
>>>> http://druid.io/downloads.html with
>>> http://druid.apache.org/downloads.html
>>>> - any docs older than 0.13.0 (they aren't Apache releases)
>>>> 
>>>> If you navigate to https://druid.apache.org/ + any other path from
>>>> druid.io, you should see the page.
>>>> 
>>>> I'm hoping to confirm that search engines pick up the 301 for
>>>> http://druid.io/use-cases before flipping the switch. Hopefully that
>>>> doesn't take much longer. If it does we should talk about how we want
>> to
>>>> proceed.
>>>> 
>>>>> On Tue, Jun 4, 2019 at 1:48 AM Gian Merlino <gi...@apache.org> wrote:
>>>>> 
>>>>> An update: we do have a redirect server set up on druid.io now: note
>>>>> that http://druid.io/community/ and http://druid.io/use-cases both
>>>>> redirect to https://druid.apache.org. I just set up the latter
>> redirect
>>>>> (on /use-cases) as part of 'test this first on a single page'. All
>> other
>>>>> druid.io URLs are still being hosted using the content from GitHub
>>> pages
>>>>> at https://github.com/druid-io/druid-io.github.io.
>>>>> 
>>>>> Search engine watch: currently, http://druid.io is the #1 link for
>>>>> [druid use cases] on Google, Bing, and DuckDuckGo (and has a cool
>>> looking
>>>>> infobox on Google & Bing). For [what is druid used for], it's #2 on
>>> Google,
>>>>> and not ranked on the first page on Bing & DDG. Will monitor this over
>>> the
>>>>> next few days.
>>>>> 
>>>>>> On Mon, May 6, 2019 at 5:43 PM Gian Merlino <gi...@apache.org> wrote:
>>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> It sounds like we will need a redirect server that issues 301s from
>>> each
>>>>>> druid.io page to the corresponding druid.apache.org page. Charles
>> and
>>> I
>>>>>> spoke offline and thought that something like Jon's original proposal
>>> is
>>>>>> the best way to go. I am going to suggest we get started on this, as
>>> it's
>>>>>> the last major piece of infra to move to ASF.
>>>>>> 
>>>>>> 1) Set up a redirect server to perform 301 redirects to
>>> druid.apache.org
>>>>>> 2) Post all druid.io content on druid.apache.org
>>>>>> 3) Update druid.io DNS to point to the redirect server
>>>>>> 4) Shut down GitHub pages hosting for druid.io
>>>>>> 
>>>>>> Steps (2) and (3) should be done as close in time as possible so
>> there
>>>>>> is no confusion as to which version of the pages is canonical.
>>>>>> 
>>>>>> For the redirect server, two viable options are an nginx server or an
>>> S3
>>>>>> webpage redirect (
>> https://docs.aws.amazon.com/AmazonS3/latest/dev/how-to-page-redirect.html
>>> ).
>>>>>> Just like we did with the HTML-level redirect, I suggest we test this
>>> first
>>>>>> on a single page. We can do that by having the redirect server
>>> initially
>>>>>> start off by hosting all druid.io content (so it's indistinguishable
>>>>>> from the GitHub-pages-based site) except for a single page, which it
>>>>>> redirects using HTTP 301 to druid.apache.org.
>>>>>> 
>>>>>> I'm planning to start looking into this, so anyone around please
>> speak
>>>>>> up if you have any advice or alternative approaches to suggest.
>>>>>> 
>>>>>> On Mon, Apr 29, 2019 at 4:01 PM Jonathan Wei <jo...@apache.org>
>>> wrote:
>>>>>> 
>>>>>>> Thanks for checking the SEO state, that's somewhat disappointing.
>>>>>>> 
>>>>>>> For Bing, it sounds like they really want you to use 301s (
>>>>>>> https://www.bing.com/webmaster/help/webmaster-guidelines-30fba23a):
>>>>>>> 
>>>>>>>> Bing prefers you use a 301 permanent redirect when moving content,
>>>>>>> should
>>>>>>> the move be permanent.  If the move is temporary, then a 302
>> temporary
>>>>>>> redirect will work fine.  Do not use the rel=canonical tag in place
>>> of a
>>>>>>> proper redirect.
>>>>>>> 
>>>>>>> I wasn't able to find similar guidance re: this issue for
>> DuckDuckGo.
>>>>>>> 
>>>>>>> On Mon, Apr 29, 2019 at 10:42 AM Gian Merlino <gi...@apache.org>
>>> wrote:
>>>>>>> 
>>>>>>>> Another update: SEO is not looking great after another day passed.
>>>>>>> For a
>>>>>>>> search for "druid community", both http://druid.io/community and
>>>>>>>> https://druid.apache.org/community/ have dropped off the front
>> page
>>>>>>> of
>>>>>>>> Bing
>>>>>>>> completely. On Google, the legacy version is gone (as expected)
>> but
>>>>>>> the
>>>>>>>> Apache version has dropped to the #3 spot (down from #2 yesterday;
>>>>>>> and down
>>>>>>>> from where the legacy page was pre-migration, which was #1).
>>>>>>>> 
>>>>>>>> I think this means we do need to try to get 301s figured out.
>>>>>>>> 
>>>>>>>> On Sun, Apr 28, 2019 at 3:06 PM Gian Merlino <gi...@apache.org>
>>> wrote:
>>>>>>>> 
>>>>>>>>> Google has picked up the new URL as of today but Bing hasn't.
>>>>>>> Neither has
>>>>>>>>> DuckDuckGo for that matter.
>>>>>>>>> 
>>>>>>>>> Currently, Google is showing
>> https://druid.apache.org/community/
>>>>>>> in the
>>>>>>>>> #2 spot and Bing/DDG are showing http://druid.io/community in
>> the
>>>>>>> top
>>>>>>>>> spot. Ominously, the latter two _have_ picked up a page title
>>>>>>> change to
>>>>>>>>> "Redirecting..."
>>>>>>>>> 
>>>>>>>>> On Fri, Apr 26, 2019 at 11:00 AM Gian Merlino <gi...@apache.org>
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> An update: this is done now since a couple of days ago, but
>>> Google
>>>>>>> and
>>>>>>>>>> Bing are still showing http://druid.io/community for a search
>>> for
>>>>>>>> "druid
>>>>>>>>>> community" or even "apache druid community":
>>>>>>>>>> 
>>>>>>>>>> - https://www.google.com/search?q=druid+community
>>>>>>>>>> - https://www.bing.com/search?q=druid+community
>>>>>>>>>> 
>>>>>>>>>> I suggest we keep an eye on the search engines and make sure
>> they
>>>>>>> can
>>>>>>>>>> figure out that the site has changed (I'm not sure how often
>> they
>>>>>>>> crawl).
>>>>>>>>>> If they can then it would make sense to me to move forward with
>>>>>>>> migrating
>>>>>>>>>> the entire web site.
>>>>>>>>>> 
>>>>>>>>>> On Mon, Apr 22, 2019 at 7:49 PM Jonathan Wei <
>> jonwei@apache.org>
>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Correction: Xavier was suggesting we use
>> https://github.com/druid-io/druid-io.github.io/blob/src/_layouts/redirect_page.html
>>>>>>>>>>> ,
>>>>>>>>>>> the existing redirect system used by the Druid website.
>>>>>>>>>>> 
>>>>>>>>>>> I've opened PRs to do the community page migration test:
>>>>>>>>>>> https://github.com/apache/incubator-druid-website/pull/3
>>>>>>>>>>> https://github.com/druid-io/druid-io.github.io/pull/591
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Apr 22, 2019 at 3:04 PM Gian Merlino <gian@apache.org
>>> 
>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> That sounds good to me. I would also consider adding
>> canonical
>>>>>>> tags
>>>>>>>> to
>>>>>>>>>>> all
>>>>>>>>>>>> druid.apache.org pages so we don't have
>>>>>>> druid.incubator.apache.org
>>>>>>>> and
>>>>>>>>>>>> druid.apache.org both floating around (not to mention
>>>>>>> http/https
>>>>>>>>>>> version
>>>>>>>>>>>> of
>>>>>>>>>>>> both).
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Apr 22, 2019 at 2:59 PM Jonathan Wei <
>>> jonwei@apache.org
>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> For redirects, Xavier has suggested using
>>> https://help.github.com/en/articles/redirects-on-github-pages
>>>>>>> to
>>>>>>>>>>>> redirect
>>>>>>>>>>>>> to druid.apache.org as a way to transition before the
>>> domain
>>>>>>>>>>> migration
>>>>>>>>>>>>> occurs, and believes that it would have the same SEO
>> effects
>>>>>>> as a
>>>>>>>> 301
>>>>>>>>>>>>> redirect after the new pages are indexed.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I think we could try migrating the current Community page
>> to
>>>>>>>>>>>>> druid.apache.org with Github redirects and canonical
>> links
>>>>>>>> pointing
>>>>>>>>>>> to
>>>>>>>>>>>> the
>>>>>>>>>>>>> https://druid.apache.org version. If that goes well, we
>>> could
>>>>>>>>>>> continue
>>>>>>>>>>>>> migrating more pages.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> What are the community's thoughts on that?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Jon
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, Mar 12, 2019 at 7:19 PM Gian Merlino <
>>> gian@apache.org
>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> OpenOffice and Groovy both chose to sort of "meld" their
>>>>>>> classic
>>>>>>>>>>> and
>>>>>>>>>>>>> Apache
>>>>>>>>>>>>>> sites together: https://www.openoffice.org/,
>>>>>>>>>>> http://groovy-lang.org/.
>>>>>>>>>>>>> Note
>>>>>>>>>>>>>> how when you click around, you get shuttled between the
>>>>>>> classic
>>>>>>>>>>> domain
>>>>>>>>>>>>> and
>>>>>>>>>>>>>> the Apache domain. Some pages are available on both
>> sites,
>>>>>>> like
>>>>>>>>>>>>>> http://groovy-lang.org/download.html and
>>>>>>>>>>>>>> https://groovy.apache.org/download.html (which don't
>> use
>>>>>>>> canonical
>>>>>>>>>>>> link
>>>>>>>>>>>>>> tags -- does not seem like a good example to follow!).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> NetBeans (still incubating) also has a "melded" site at
>>>>>>>>>>>>>> https://netbeans.org/ but doesn't seem to consider
>> itself
>>>>>>> done
>>>>>>>>>>> yet.
>>>>>>>>>>>> They
>>>>>>>>>>>>>> are discussing plans on their lists & wiki to do
>> redirects
>>>>>>> from
>>>>>>>>>>>>>> netbeans.org
>>>>>>>>>>>>>> to netbeans.apache.org:
>> https://cwiki.apache.org/confluence/display/NETBEANS/netbeans.org+Transition+Process
>>>>>>>>>>>>>> ,
>> https://lists.apache.org/thread.html/ad10fb9d4c8fee571a2f6232b268a3b835f7b823d3a0983b84aeb18a@%3Cdev.netbeans.apache.org%3E
>>>>>>>>>>>>>> .
>>>>>>>>>>>>>> As of today the domain has been donated to ASF, but the
>>>>>>> server is
>>>>>>>>>>> still
>>>>>>>>>>>>> run
>>>>>>>>>>>>>> by Oracle, so the plan doesn't seem to be finished yet.
>>>>>>> (WHOIS
>>>>>>>> for
>>>>>>>>>>>>>> netbeans.org shows ASF as the registrant; netbeans.org
>>>>>>> resolves
>>>>>>>> to
>>>>>>>>>>>>>> lb-netbeans-cms-adc.oracle.com.)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The melded sites don't really seem better to me than
>>>>>>> redirecting
>>>>>>>>>>> all
>>>>>>>>>>>> urls
>>>>>>>>>>>>>> on the domain. I guess it depends on if we want to keep
>>>>>>> druid.io
>>>>>>>>>>> as
>>>>>>>>>>>> the
>>>>>>>>>>>>>> official domain forever, or if we think
>> druid.apache.org
>>> is
>>>>>>>>>>> cooler. I
>>>>>>>>>>>>>> definitely think druid.apache.org is cooler so my vote
>> is
>>>>>>> there
>>>>>>>>>>> :).
>>>>>>>>>>>> It's
>>>>>>>>>>>>>> also nice that it supports https. (druid.io does not
>>> today,
>>>>>>>> since
>>>>>>>>>>> it's
>>>>>>>>>>>>> on
>>>>>>>>>>>>>> GitHub pages, which doesn't support https for custom
>>>>>>> domains.)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Mar 12, 2019 at 7:47 PM Charles Allen
>>>>>>>>>>>>>> <ch...@snap.com.invalid> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Are there other projects who have transitioned an
>>>>>>> independently
>>>>>>>>>>>>>> successful
>>>>>>>>>>>>>>> domain name to an apache one?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Tue, Mar 5, 2019 at 2:13 PM David Lim <
>>>>>>> davidlim@apache.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Who has control over the druid.io domain? Charles
>>>>>>> would that
>>>>>>>>>>> be
>>>>>>>>>>>> you?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> We'd need support from them for the DNS redirect.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, Mar 5, 2019 at 2:04 PM Jonathan Wei <
>>>>>>>> jonwei@apache.org
>>>>>>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> We still need to complete the website migration to
>>>>>>> Apache
>>>>>>>>>>>>>>> infrastructure.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I'll propose the following plan:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Proposed Apache Druid website migration plan
>>>>>>>>>>>>>>>>> ========================================
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> These links have some previous discussion on the
>>>>>>> website
>>>>>>>>>>>> migration:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_7cae100b684e0b33e0adda993efea3d6088978700988a0ae632fdd80-40-253Cdev.druid.apache.org-253E&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=G1dTS7FlYGauxNOaQECZix2YwroWVCqJB-cT0nEeNwM&e=
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_INFRA-2D17340&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=pwg0jE385gqei6EEEbxugKHWll7oyKoCloFc8ByhlUc&e=
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> From the discussions above, the recommendation is
>> to
>>>>>>> have 2
>>>>>>>>>>>>> separate
>>>>>>>>>>>>>>>> repos
>>>>>>>>>>>>>>>>> for the website: one for source and another for
>>> built
>>>>>>>> content
>>>>>>>>>>>> that
>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>> served.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Generating site files
>>>>>>>>>>>>>>>>> =======================
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> The Apache site update process will be similar to
>>> our
>>>>>>>> current
>>>>>>>>>>>>>> process.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Current process:
>>>>>>>>>>>>>>>>> 1. Push changes to
>>> https://github.com/druid-io/druid-io.github.io/tree/src
>>>>>>>>>>>>>>>>> 2. metamx bot picks up changes, builds, and
>> commits
>>> to
>>>>>>> https://github.com/druid-io/druid-io.github.io/tree/master
>>>>>>>>>>>>>>>>> 3.
>>>>>>>>>>> https://github.com/druid-io/druid-io.github.io/tree/master is
>>>>>>>>>>>>>>> served
>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>> github pages
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Apache process:
>>>>>>>>>>>>>>>>> 1. Push changes to
>>>>>>>>>>>>>>> https://github.com/apache/incubator-druid-website-src
>>>>>>>>>>>>>>>>> 2. Jenkins bot from Apache will build the website
>>> from
>>>>>>>> source
>>>>>>>>>>>> repo,
>>>>>>>>>>>>>>>> commit
>>>>>>>>>>>>>>>>> to
>>> https://github.com/apache/incubator-druid-website
>>>>>>>>>>>>>>>>> 3. Apache Druid website will be served from the
>>>>>>> content in
>>>>>>>>>>>>>>>>> https://github.com/apache/incubator-druid-website
>>>>>>>> (asf-site
>>>>>>>>>>>>> branch)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hosting and SEO
>>>>>>>>>>>>>>>>> ================
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> The Apache site will be hosted at
>> druid.apache.org
>>> on
>>>>>>>> Apache
>>>>>>>>>>>>>>>>> infrastructure:
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.apache.org_dev_project-2Dsite.html&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=_rHEo_asMXKypaunuBTXFkB6Ni3F6KqbEfkck18L7Ag&e=
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> To preserve our search rankings, we can setup 301
>>>>>>> redirects
>>>>>>>>>>> from
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> old
>>>>>>>>>>>>>>>>> druid.io site to the corresponding pages on the
>>>>>>>>>>> druid.apache.org
>>>>>>>>>>>>>>> site. (
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__moz.com_learn_seo_redirection&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=lUeWU0dT9thy8gp11RO-Vry7zkYl_W4BXz01fyXJO0A&e=
>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> However, Github pages (which currently hosts the
>>>>>>> druid.io
>>>>>>>>>>> site)
>>>>>>>>>>>>> does
>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>> support 301 redirects, so we propose the
>> following:
>>>>>>>>>>>>>>>>> - Setup a new Nginx server that will perform 301
>>>>>>> redirects
>>>>>>>> to
>>>>>>>>>>>>>>>>> druid.apache.org for the druid.io. Imply can host
>>>>>>> this if
>>>>>>>>>>>> needed.
>>>>>>>>>>>>>>>>> - Update the druid.io DNS entry to point to this
>>> new
>>>>>>> Nginx
>>>>>>>>>>>> server
>>>>>>>>>>>>>>>>> - Shut down Github pages hosting for druid.io
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> In addition, we can also set canonical tags on our
>>>>>>> pages:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__moz.com_learn_seo_canonicalization&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=T8G2c6d4EbQ_YDLFQXVebcj0UN9FNrbpPY5Xq4LAR8w&e=
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Action items
>>>>>>>>>>>>>>>>> ===============
>>>>>>>>>>>>>>>>> - Setup a Jenkins bot that builds the Apache
>> website
>>>>>>>> content
>>>>>>>>>>> from
>>>>>>>>>>>>>>> source
>>>>>>>>>>>>>>>>> - Get the Apache website up
>>>>>>>>>>>>>>>>> - Setup Nginx redirect server for druid.io
>>>>>>>>>>>>>>>>> - Shutdown github pages and redirect DNS for
>>> druid.io
>>>>>>> to
>>>>>>>>>>> Nginx
>>>>>>>>>>>>>>> redirect
>>>>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>>>> - Add canonical tags to pages
>> 

Re: Proposed website migration plan

Posted by Gian Merlino <gi...@apache.org>.
This is now done: druid.io is redirecting to druid.apache.org!!

Next, we'll add the stuff required by
https://whimsy.apache.org/pods/project/druid. Then, we should be good to go
on the website migration. (Behind the scenes, Vadim Ogievetsky has been
helping tons with this -- thanks a lot!)

On Mon, Jun 10, 2019 at 9:00 AM David Lim <da...@apache.org> wrote:

> No objections from me - thank you for testing this out.
>
> On Mon, Jun 10, 2019 at 7:48 AM Gian Merlino <gi...@apache.org> wrote:
>
> > It looks like Google has picked up the 301 and [druid use cases] #1
> result
> > is https://druid.apache.org/use-cases now. For [what is druid used for]
> > it's not #4 instead of #2. I think this is the best we are likely to
> get. I
> > am ready to flip the switch if there aren't any objections.
> >
> > On Fri, Jun 7, 2019 at 9:15 PM Gian Merlino <gi...@apache.org> wrote:
> >
> > > Another update: as of
> > > https://github.com/apache/incubator-druid-website-src/pull/1 and
> > > https://github.com/apache/incubator-druid-website/pull/7, the
> > > https://druid.apache.org/ site is now serving almost all pages from
> > > druid.io, except:
> > >
> > > - the index page (it still has a placeholder until we flip the switch)
> > > - the download page (it has a differently-designed download page:
> compare
> > > http://druid.io/downloads.html with
> > http://druid.apache.org/downloads.html
> > > - any docs older than 0.13.0 (they aren't Apache releases)
> > >
> > > If you navigate to https://druid.apache.org/ + any other path from
> > > druid.io, you should see the page.
> > >
> > > I'm hoping to confirm that search engines pick up the 301 for
> > > http://druid.io/use-cases before flipping the switch. Hopefully that
> > > doesn't take much longer. If it does we should talk about how we want
> to
> > > proceed.
> > >
> > > On Tue, Jun 4, 2019 at 1:48 AM Gian Merlino <gi...@apache.org> wrote:
> > >
> > >> An update: we do have a redirect server set up on druid.io now: note
> > >> that http://druid.io/community/ and http://druid.io/use-cases both
> > >> redirect to https://druid.apache.org. I just set up the latter
> redirect
> > >> (on /use-cases) as part of 'test this first on a single page'. All
> other
> > >> druid.io URLs are still being hosted using the content from GitHub
> > pages
> > >> at https://github.com/druid-io/druid-io.github.io.
> > >>
> > >> Search engine watch: currently, http://druid.io is the #1 link for
> > >> [druid use cases] on Google, Bing, and DuckDuckGo (and has a cool
> > looking
> > >> infobox on Google & Bing). For [what is druid used for], it's #2 on
> > Google,
> > >> and not ranked on the first page on Bing & DDG. Will monitor this over
> > the
> > >> next few days.
> > >>
> > >> On Mon, May 6, 2019 at 5:43 PM Gian Merlino <gi...@apache.org> wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>> It sounds like we will need a redirect server that issues 301s from
> > each
> > >>> druid.io page to the corresponding druid.apache.org page. Charles
> and
> > I
> > >>> spoke offline and thought that something like Jon's original proposal
> > is
> > >>> the best way to go. I am going to suggest we get started on this, as
> > it's
> > >>> the last major piece of infra to move to ASF.
> > >>>
> > >>> 1) Set up a redirect server to perform 301 redirects to
> > druid.apache.org
> > >>> 2) Post all druid.io content on druid.apache.org
> > >>> 3) Update druid.io DNS to point to the redirect server
> > >>> 4) Shut down GitHub pages hosting for druid.io
> > >>>
> > >>> Steps (2) and (3) should be done as close in time as possible so
> there
> > >>> is no confusion as to which version of the pages is canonical.
> > >>>
> > >>> For the redirect server, two viable options are an nginx server or an
> > S3
> > >>> webpage redirect (
> > >>>
> >
> https://docs.aws.amazon.com/AmazonS3/latest/dev/how-to-page-redirect.html
> > ).
> > >>> Just like we did with the HTML-level redirect, I suggest we test this
> > first
> > >>> on a single page. We can do that by having the redirect server
> > initially
> > >>> start off by hosting all druid.io content (so it's indistinguishable
> > >>> from the GitHub-pages-based site) except for a single page, which it
> > >>> redirects using HTTP 301 to druid.apache.org.
> > >>>
> > >>> I'm planning to start looking into this, so anyone around please
> speak
> > >>> up if you have any advice or alternative approaches to suggest.
> > >>>
> > >>> On Mon, Apr 29, 2019 at 4:01 PM Jonathan Wei <jo...@apache.org>
> > wrote:
> > >>>
> > >>>> Thanks for checking the SEO state, that's somewhat disappointing.
> > >>>>
> > >>>> For Bing, it sounds like they really want you to use 301s (
> > >>>> https://www.bing.com/webmaster/help/webmaster-guidelines-30fba23a):
> > >>>>
> > >>>> > Bing prefers you use a 301 permanent redirect when moving content,
> > >>>> should
> > >>>> the move be permanent.  If the move is temporary, then a 302
> temporary
> > >>>> redirect will work fine.  Do not use the rel=canonical tag in place
> > of a
> > >>>> proper redirect.
> > >>>>
> > >>>> I wasn't able to find similar guidance re: this issue for
> DuckDuckGo.
> > >>>>
> > >>>> On Mon, Apr 29, 2019 at 10:42 AM Gian Merlino <gi...@apache.org>
> > wrote:
> > >>>>
> > >>>> > Another update: SEO is not looking great after another day passed.
> > >>>> For a
> > >>>> > search for "druid community", both http://druid.io/community and
> > >>>> > https://druid.apache.org/community/ have dropped off the front
> page
> > >>>> of
> > >>>> > Bing
> > >>>> > completely. On Google, the legacy version is gone (as expected)
> but
> > >>>> the
> > >>>> > Apache version has dropped to the #3 spot (down from #2 yesterday;
> > >>>> and down
> > >>>> > from where the legacy page was pre-migration, which was #1).
> > >>>> >
> > >>>> > I think this means we do need to try to get 301s figured out.
> > >>>> >
> > >>>> > On Sun, Apr 28, 2019 at 3:06 PM Gian Merlino <gi...@apache.org>
> > wrote:
> > >>>> >
> > >>>> > > Google has picked up the new URL as of today but Bing hasn't.
> > >>>> Neither has
> > >>>> > > DuckDuckGo for that matter.
> > >>>> > >
> > >>>> > > Currently, Google is showing
> https://druid.apache.org/community/
> > >>>> in the
> > >>>> > > #2 spot and Bing/DDG are showing http://druid.io/community in
> the
> > >>>> top
> > >>>> > > spot. Ominously, the latter two _have_ picked up a page title
> > >>>> change to
> > >>>> > > "Redirecting..."
> > >>>> > >
> > >>>> > > On Fri, Apr 26, 2019 at 11:00 AM Gian Merlino <gi...@apache.org>
> > >>>> wrote:
> > >>>> > >
> > >>>> > >> An update: this is done now since a couple of days ago, but
> > Google
> > >>>> and
> > >>>> > >> Bing are still showing http://druid.io/community for a search
> > for
> > >>>> > "druid
> > >>>> > >> community" or even "apache druid community":
> > >>>> > >>
> > >>>> > >> - https://www.google.com/search?q=druid+community
> > >>>> > >> - https://www.bing.com/search?q=druid+community
> > >>>> > >>
> > >>>> > >> I suggest we keep an eye on the search engines and make sure
> they
> > >>>> can
> > >>>> > >> figure out that the site has changed (I'm not sure how often
> they
> > >>>> > crawl).
> > >>>> > >> If they can then it would make sense to me to move forward with
> > >>>> > migrating
> > >>>> > >> the entire web site.
> > >>>> > >>
> > >>>> > >> On Mon, Apr 22, 2019 at 7:49 PM Jonathan Wei <
> jonwei@apache.org>
> > >>>> wrote:
> > >>>> > >>
> > >>>> > >>> Correction: Xavier was suggesting we use
> > >>>> > >>>
> > >>>> > >>>
> > >>>> >
> > >>>>
> >
> https://github.com/druid-io/druid-io.github.io/blob/src/_layouts/redirect_page.html
> > >>>> > >>> ,
> > >>>> > >>> the existing redirect system used by the Druid website.
> > >>>> > >>>
> > >>>> > >>> I've opened PRs to do the community page migration test:
> > >>>> > >>> https://github.com/apache/incubator-druid-website/pull/3
> > >>>> > >>> https://github.com/druid-io/druid-io.github.io/pull/591
> > >>>> > >>>
> > >>>> > >>> On Mon, Apr 22, 2019 at 3:04 PM Gian Merlino <gian@apache.org
> >
> > >>>> wrote:
> > >>>> > >>>
> > >>>> > >>> > That sounds good to me. I would also consider adding
> canonical
> > >>>> tags
> > >>>> > to
> > >>>> > >>> all
> > >>>> > >>> > druid.apache.org pages so we don't have
> > >>>> druid.incubator.apache.org
> > >>>> > and
> > >>>> > >>> > druid.apache.org both floating around (not to mention
> > >>>> http/https
> > >>>> > >>> version
> > >>>> > >>> > of
> > >>>> > >>> > both).
> > >>>> > >>> >
> > >>>> > >>> > On Mon, Apr 22, 2019 at 2:59 PM Jonathan Wei <
> > jonwei@apache.org
> > >>>> >
> > >>>> > >>> wrote:
> > >>>> > >>> >
> > >>>> > >>> > > For redirects, Xavier has suggested using
> > >>>> > >>> > >
> > https://help.github.com/en/articles/redirects-on-github-pages
> > >>>> to
> > >>>> > >>> > redirect
> > >>>> > >>> > > to druid.apache.org as a way to transition before the
> > domain
> > >>>> > >>> migration
> > >>>> > >>> > > occurs, and believes that it would have the same SEO
> effects
> > >>>> as a
> > >>>> > 301
> > >>>> > >>> > > redirect after the new pages are indexed.
> > >>>> > >>> > >
> > >>>> > >>> > > I think we could try migrating the current Community page
> to
> > >>>> > >>> > > druid.apache.org with Github redirects and canonical
> links
> > >>>> > pointing
> > >>>> > >>> to
> > >>>> > >>> > the
> > >>>> > >>> > > https://druid.apache.org version. If that goes well, we
> > could
> > >>>> > >>> continue
> > >>>> > >>> > > migrating more pages.
> > >>>> > >>> > >
> > >>>> > >>> > > What are the community's thoughts on that?
> > >>>> > >>> > >
> > >>>> > >>> > > Thanks,
> > >>>> > >>> > > Jon
> > >>>> > >>> > >
> > >>>> > >>> > > On Tue, Mar 12, 2019 at 7:19 PM Gian Merlino <
> > gian@apache.org
> > >>>> >
> > >>>> > >>> wrote:
> > >>>> > >>> > >
> > >>>> > >>> > > > OpenOffice and Groovy both chose to sort of "meld" their
> > >>>> classic
> > >>>> > >>> and
> > >>>> > >>> > > Apache
> > >>>> > >>> > > > sites together: https://www.openoffice.org/,
> > >>>> > >>> http://groovy-lang.org/.
> > >>>> > >>> > > Note
> > >>>> > >>> > > > how when you click around, you get shuttled between the
> > >>>> classic
> > >>>> > >>> domain
> > >>>> > >>> > > and
> > >>>> > >>> > > > the Apache domain. Some pages are available on both
> sites,
> > >>>> like
> > >>>> > >>> > > > http://groovy-lang.org/download.html and
> > >>>> > >>> > > > https://groovy.apache.org/download.html (which don't
> use
> > >>>> > canonical
> > >>>> > >>> > link
> > >>>> > >>> > > > tags -- does not seem like a good example to follow!).
> > >>>> > >>> > > >
> > >>>> > >>> > > > NetBeans (still incubating) also has a "melded" site at
> > >>>> > >>> > > > https://netbeans.org/ but doesn't seem to consider
> itself
> > >>>> done
> > >>>> > >>> yet.
> > >>>> > >>> > They
> > >>>> > >>> > > > are discussing plans on their lists & wiki to do
> redirects
> > >>>> from
> > >>>> > >>> > > > netbeans.org
> > >>>> > >>> > > > to netbeans.apache.org:
> > >>>> > >>> > > >
> > >>>> > >>> > > >
> > >>>> > >>> > >
> > >>>> > >>> >
> > >>>> > >>>
> > >>>> >
> > >>>>
> >
> https://cwiki.apache.org/confluence/display/NETBEANS/netbeans.org+Transition+Process
> > >>>> > >>> > > > ,
> > >>>> > >>> > > >
> > >>>> > >>> > > >
> > >>>> > >>> > >
> > >>>> > >>> >
> > >>>> > >>>
> > >>>> >
> > >>>>
> >
> https://lists.apache.org/thread.html/ad10fb9d4c8fee571a2f6232b268a3b835f7b823d3a0983b84aeb18a@%3Cdev.netbeans.apache.org%3E
> > >>>> > >>> > > > .
> > >>>> > >>> > > > As of today the domain has been donated to ASF, but the
> > >>>> server is
> > >>>> > >>> still
> > >>>> > >>> > > run
> > >>>> > >>> > > > by Oracle, so the plan doesn't seem to be finished yet.
> > >>>> (WHOIS
> > >>>> > for
> > >>>> > >>> > > > netbeans.org shows ASF as the registrant; netbeans.org
> > >>>> resolves
> > >>>> > to
> > >>>> > >>> > > > lb-netbeans-cms-adc.oracle.com.)
> > >>>> > >>> > > >
> > >>>> > >>> > > > The melded sites don't really seem better to me than
> > >>>> redirecting
> > >>>> > >>> all
> > >>>> > >>> > urls
> > >>>> > >>> > > > on the domain. I guess it depends on if we want to keep
> > >>>> druid.io
> > >>>> > >>> as
> > >>>> > >>> > the
> > >>>> > >>> > > > official domain forever, or if we think
> druid.apache.org
> > is
> > >>>> > >>> cooler. I
> > >>>> > >>> > > > definitely think druid.apache.org is cooler so my vote
> is
> > >>>> there
> > >>>> > >>> :).
> > >>>> > >>> > It's
> > >>>> > >>> > > > also nice that it supports https. (druid.io does not
> > today,
> > >>>> > since
> > >>>> > >>> it's
> > >>>> > >>> > > on
> > >>>> > >>> > > > GitHub pages, which doesn't support https for custom
> > >>>> domains.)
> > >>>> > >>> > > >
> > >>>> > >>> > > > On Tue, Mar 12, 2019 at 7:47 PM Charles Allen
> > >>>> > >>> > > > <ch...@snap.com.invalid> wrote:
> > >>>> > >>> > > >
> > >>>> > >>> > > > > Are there other projects who have transitioned an
> > >>>> independently
> > >>>> > >>> > > > successful
> > >>>> > >>> > > > > domain name to an apache one?
> > >>>> > >>> > > > >
> > >>>> > >>> > > > > On Tue, Mar 5, 2019 at 2:13 PM David Lim <
> > >>>> davidlim@apache.org>
> > >>>> > >>> > wrote:
> > >>>> > >>> > > > >
> > >>>> > >>> > > > > > Who has control over the druid.io domain? Charles
> > >>>> would that
> > >>>> > >>> be
> > >>>> > >>> > you?
> > >>>> > >>> > > > > >
> > >>>> > >>> > > > > > We'd need support from them for the DNS redirect.
> > >>>> > >>> > > > > >
> > >>>> > >>> > > > > > On Tue, Mar 5, 2019 at 2:04 PM Jonathan Wei <
> > >>>> > jonwei@apache.org
> > >>>> > >>> >
> > >>>> > >>> > > wrote:
> > >>>> > >>> > > > > >
> > >>>> > >>> > > > > > > We still need to complete the website migration to
> > >>>> Apache
> > >>>> > >>> > > > > infrastructure.
> > >>>> > >>> > > > > > >
> > >>>> > >>> > > > > > > I'll propose the following plan:
> > >>>> > >>> > > > > > >
> > >>>> > >>> > > > > > > Proposed Apache Druid website migration plan
> > >>>> > >>> > > > > > > ========================================
> > >>>> > >>> > > > > > >
> > >>>> > >>> > > > > > > These links have some previous discussion on the
> > >>>> website
> > >>>> > >>> > migration:
> > >>>> > >>> > > > > > >
> > >>>> > >>> > > > > > >
> > >>>> > >>> > > > > > >
> > >>>> > >>> > > > > >
> > >>>> > >>> > > > >
> > >>>> > >>> > > >
> > >>>> > >>> > >
> > >>>> > >>> >
> > >>>> > >>>
> > >>>> >
> > >>>>
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_7cae100b684e0b33e0adda993efea3d6088978700988a0ae632fdd80-40-253Cdev.druid.apache.org-253E&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=G1dTS7FlYGauxNOaQECZix2YwroWVCqJB-cT0nEeNwM&e=
> > >>>> > >>> > > > > > >
> > >>>> > >>> > > > > >
> > >>>> > >>> > > > >
> > >>>> > >>> > > >
> > >>>> > >>> > >
> > >>>> > >>> >
> > >>>> > >>>
> > >>>> >
> > >>>>
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_INFRA-2D17340&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=pwg0jE385gqei6EEEbxugKHWll7oyKoCloFc8ByhlUc&e=
> > >>>> > >>> > > > > > >
> > >>>> > >>> > > > > > > From the discussions above, the recommendation is
> to
> > >>>> have 2
> > >>>> > >>> > > separate
> > >>>> > >>> > > > > > repos
> > >>>> > >>> > > > > > > for the website: one for source and another for
> > built
> > >>>> > content
> > >>>> > >>> > that
> > >>>> > >>> > > > will
> > >>>> > >>> > > > > > be
> > >>>> > >>> > > > > > > served.
> > >>>> > >>> > > > > > >
> > >>>> > >>> > > > > > > Generating site files
> > >>>> > >>> > > > > > > =======================
> > >>>> > >>> > > > > > >
> > >>>> > >>> > > > > > > The Apache site update process will be similar to
> > our
> > >>>> > current
> > >>>> > >>> > > > process.
> > >>>> > >>> > > > > > >
> > >>>> > >>> > > > > > > Current process:
> > >>>> > >>> > > > > > > 1. Push changes to
> > >>>> > >>> > > > > >
> > https://github.com/druid-io/druid-io.github.io/tree/src
> > >>>> > >>> > > > > > > 2. metamx bot picks up changes, builds, and
> commits
> > to
> > >>>> > >>> > > > > > >
> > >>>> https://github.com/druid-io/druid-io.github.io/tree/master
> > >>>> > >>> > > > > > > 3.
> > >>>> > >>> https://github.com/druid-io/druid-io.github.io/tree/master is
> > >>>> > >>> > > > > served
> > >>>> > >>> > > > > > by
> > >>>> > >>> > > > > > > github pages
> > >>>> > >>> > > > > > >
> > >>>> > >>> > > > > > > Apache process:
> > >>>> > >>> > > > > > > 1. Push changes to
> > >>>> > >>> > > > > https://github.com/apache/incubator-druid-website-src
> > >>>> > >>> > > > > > > 2. Jenkins bot from Apache will build the website
> > from
> > >>>> > source
> > >>>> > >>> > repo,
> > >>>> > >>> > > > > > commit
> > >>>> > >>> > > > > > > to
> > https://github.com/apache/incubator-druid-website
> > >>>> > >>> > > > > > > 3. Apache Druid website will be served from the
> > >>>> content in
> > >>>> > >>> > > > > > > https://github.com/apache/incubator-druid-website
> > >>>> > (asf-site
> > >>>> > >>> > > branch)
> > >>>> > >>> > > > > > >
> > >>>> > >>> > > > > > >
> > >>>> > >>> > > > > > > Hosting and SEO
> > >>>> > >>> > > > > > > ================
> > >>>> > >>> > > > > > >
> > >>>> > >>> > > > > > > The Apache site will be hosted at
> druid.apache.org
> > on
> > >>>> > Apache
> > >>>> > >>> > > > > > > infrastructure:
> > >>>> > >>> > > > > >
> > >>>> > >>> > > > >
> > >>>> > >>> > > >
> > >>>> > >>> > >
> > >>>> > >>> >
> > >>>> > >>>
> > >>>> >
> > >>>>
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.apache.org_dev_project-2Dsite.html&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=_rHEo_asMXKypaunuBTXFkB6Ni3F6KqbEfkck18L7Ag&e=
> > >>>> > >>> > > > > > >
> > >>>> > >>> > > > > > > To preserve our search rankings, we can setup 301
> > >>>> redirects
> > >>>> > >>> from
> > >>>> > >>> > > the
> > >>>> > >>> > > > > old
> > >>>> > >>> > > > > > > druid.io site to the corresponding pages on the
> > >>>> > >>> druid.apache.org
> > >>>> > >>> > > > > site. (
> > >>>> > >>> > > > > > >
> > >>>> > >>> > > > > >
> > >>>> > >>> > > > >
> > >>>> > >>> > > >
> > >>>> > >>> > >
> > >>>> > >>> >
> > >>>> > >>>
> > >>>> >
> > >>>>
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__moz.com_learn_seo_redirection&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=lUeWU0dT9thy8gp11RO-Vry7zkYl_W4BXz01fyXJO0A&e=
> > >>>> > >>> > > > > > )
> > >>>> > >>> > > > > > >
> > >>>> > >>> > > > > > > However, Github pages (which currently hosts the
> > >>>> druid.io
> > >>>> > >>> site)
> > >>>> > >>> > > does
> > >>>> > >>> > > > > not
> > >>>> > >>> > > > > > > support 301 redirects, so we propose the
> following:
> > >>>> > >>> > > > > > > - Setup a new Nginx server that will perform 301
> > >>>> redirects
> > >>>> > to
> > >>>> > >>> > > > > > > druid.apache.org for the druid.io. Imply can host
> > >>>> this if
> > >>>> > >>> > needed.
> > >>>> > >>> > > > > > > - Update the druid.io DNS entry to point to this
> > new
> > >>>> Nginx
> > >>>> > >>> > server
> > >>>> > >>> > > > > > > - Shut down Github pages hosting for druid.io
> > >>>> > >>> > > > > > >
> > >>>> > >>> > > > > > > In addition, we can also set canonical tags on our
> > >>>> pages:
> > >>>> > >>> > > > > > >
> > >>>> > >>> > > > > >
> > >>>> > >>> > > > >
> > >>>> > >>> > > >
> > >>>> > >>> > >
> > >>>> > >>> >
> > >>>> > >>>
> > >>>> >
> > >>>>
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__moz.com_learn_seo_canonicalization&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=T8G2c6d4EbQ_YDLFQXVebcj0UN9FNrbpPY5Xq4LAR8w&e=
> > >>>> > >>> > > > > > >
> > >>>> > >>> > > > > > >
> > >>>> > >>> > > > > > > Action items
> > >>>> > >>> > > > > > > ===============
> > >>>> > >>> > > > > > > - Setup a Jenkins bot that builds the Apache
> website
> > >>>> > content
> > >>>> > >>> from
> > >>>> > >>> > > > > source
> > >>>> > >>> > > > > > > - Get the Apache website up
> > >>>> > >>> > > > > > > - Setup Nginx redirect server for druid.io
> > >>>> > >>> > > > > > > - Shutdown github pages and redirect DNS for
> > druid.io
> > >>>> to
> > >>>> > >>> Nginx
> > >>>> > >>> > > > > redirect
> > >>>> > >>> > > > > > > server
> > >>>> > >>> > > > > > > - Add canonical tags to pages
> > >>>> > >>> > > > > > >
> > >>>> > >>> > > > > >
> > >>>> > >>> > > > >
> > >>>> > >>> > > >
> > >>>> > >>> > >
> > >>>> > >>> >
> > >>>> > >>>
> > >>>> > >>
> > >>>> >
> > >>>>
> > >>>
> >
>

Re: Proposed website migration plan

Posted by David Lim <da...@apache.org>.
No objections from me - thank you for testing this out.

On Mon, Jun 10, 2019 at 7:48 AM Gian Merlino <gi...@apache.org> wrote:

> It looks like Google has picked up the 301 and [druid use cases] #1 result
> is https://druid.apache.org/use-cases now. For [what is druid used for]
> it's not #4 instead of #2. I think this is the best we are likely to get. I
> am ready to flip the switch if there aren't any objections.
>
> On Fri, Jun 7, 2019 at 9:15 PM Gian Merlino <gi...@apache.org> wrote:
>
> > Another update: as of
> > https://github.com/apache/incubator-druid-website-src/pull/1 and
> > https://github.com/apache/incubator-druid-website/pull/7, the
> > https://druid.apache.org/ site is now serving almost all pages from
> > druid.io, except:
> >
> > - the index page (it still has a placeholder until we flip the switch)
> > - the download page (it has a differently-designed download page: compare
> > http://druid.io/downloads.html with
> http://druid.apache.org/downloads.html
> > - any docs older than 0.13.0 (they aren't Apache releases)
> >
> > If you navigate to https://druid.apache.org/ + any other path from
> > druid.io, you should see the page.
> >
> > I'm hoping to confirm that search engines pick up the 301 for
> > http://druid.io/use-cases before flipping the switch. Hopefully that
> > doesn't take much longer. If it does we should talk about how we want to
> > proceed.
> >
> > On Tue, Jun 4, 2019 at 1:48 AM Gian Merlino <gi...@apache.org> wrote:
> >
> >> An update: we do have a redirect server set up on druid.io now: note
> >> that http://druid.io/community/ and http://druid.io/use-cases both
> >> redirect to https://druid.apache.org. I just set up the latter redirect
> >> (on /use-cases) as part of 'test this first on a single page'. All other
> >> druid.io URLs are still being hosted using the content from GitHub
> pages
> >> at https://github.com/druid-io/druid-io.github.io.
> >>
> >> Search engine watch: currently, http://druid.io is the #1 link for
> >> [druid use cases] on Google, Bing, and DuckDuckGo (and has a cool
> looking
> >> infobox on Google & Bing). For [what is druid used for], it's #2 on
> Google,
> >> and not ranked on the first page on Bing & DDG. Will monitor this over
> the
> >> next few days.
> >>
> >> On Mon, May 6, 2019 at 5:43 PM Gian Merlino <gi...@apache.org> wrote:
> >>
> >>> Hi all,
> >>>
> >>> It sounds like we will need a redirect server that issues 301s from
> each
> >>> druid.io page to the corresponding druid.apache.org page. Charles and
> I
> >>> spoke offline and thought that something like Jon's original proposal
> is
> >>> the best way to go. I am going to suggest we get started on this, as
> it's
> >>> the last major piece of infra to move to ASF.
> >>>
> >>> 1) Set up a redirect server to perform 301 redirects to
> druid.apache.org
> >>> 2) Post all druid.io content on druid.apache.org
> >>> 3) Update druid.io DNS to point to the redirect server
> >>> 4) Shut down GitHub pages hosting for druid.io
> >>>
> >>> Steps (2) and (3) should be done as close in time as possible so there
> >>> is no confusion as to which version of the pages is canonical.
> >>>
> >>> For the redirect server, two viable options are an nginx server or an
> S3
> >>> webpage redirect (
> >>>
> https://docs.aws.amazon.com/AmazonS3/latest/dev/how-to-page-redirect.html
> ).
> >>> Just like we did with the HTML-level redirect, I suggest we test this
> first
> >>> on a single page. We can do that by having the redirect server
> initially
> >>> start off by hosting all druid.io content (so it's indistinguishable
> >>> from the GitHub-pages-based site) except for a single page, which it
> >>> redirects using HTTP 301 to druid.apache.org.
> >>>
> >>> I'm planning to start looking into this, so anyone around please speak
> >>> up if you have any advice or alternative approaches to suggest.
> >>>
> >>> On Mon, Apr 29, 2019 at 4:01 PM Jonathan Wei <jo...@apache.org>
> wrote:
> >>>
> >>>> Thanks for checking the SEO state, that's somewhat disappointing.
> >>>>
> >>>> For Bing, it sounds like they really want you to use 301s (
> >>>> https://www.bing.com/webmaster/help/webmaster-guidelines-30fba23a):
> >>>>
> >>>> > Bing prefers you use a 301 permanent redirect when moving content,
> >>>> should
> >>>> the move be permanent.  If the move is temporary, then a 302 temporary
> >>>> redirect will work fine.  Do not use the rel=canonical tag in place
> of a
> >>>> proper redirect.
> >>>>
> >>>> I wasn't able to find similar guidance re: this issue for DuckDuckGo.
> >>>>
> >>>> On Mon, Apr 29, 2019 at 10:42 AM Gian Merlino <gi...@apache.org>
> wrote:
> >>>>
> >>>> > Another update: SEO is not looking great after another day passed.
> >>>> For a
> >>>> > search for "druid community", both http://druid.io/community and
> >>>> > https://druid.apache.org/community/ have dropped off the front page
> >>>> of
> >>>> > Bing
> >>>> > completely. On Google, the legacy version is gone (as expected) but
> >>>> the
> >>>> > Apache version has dropped to the #3 spot (down from #2 yesterday;
> >>>> and down
> >>>> > from where the legacy page was pre-migration, which was #1).
> >>>> >
> >>>> > I think this means we do need to try to get 301s figured out.
> >>>> >
> >>>> > On Sun, Apr 28, 2019 at 3:06 PM Gian Merlino <gi...@apache.org>
> wrote:
> >>>> >
> >>>> > > Google has picked up the new URL as of today but Bing hasn't.
> >>>> Neither has
> >>>> > > DuckDuckGo for that matter.
> >>>> > >
> >>>> > > Currently, Google is showing https://druid.apache.org/community/
> >>>> in the
> >>>> > > #2 spot and Bing/DDG are showing http://druid.io/community in the
> >>>> top
> >>>> > > spot. Ominously, the latter two _have_ picked up a page title
> >>>> change to
> >>>> > > "Redirecting..."
> >>>> > >
> >>>> > > On Fri, Apr 26, 2019 at 11:00 AM Gian Merlino <gi...@apache.org>
> >>>> wrote:
> >>>> > >
> >>>> > >> An update: this is done now since a couple of days ago, but
> Google
> >>>> and
> >>>> > >> Bing are still showing http://druid.io/community for a search
> for
> >>>> > "druid
> >>>> > >> community" or even "apache druid community":
> >>>> > >>
> >>>> > >> - https://www.google.com/search?q=druid+community
> >>>> > >> - https://www.bing.com/search?q=druid+community
> >>>> > >>
> >>>> > >> I suggest we keep an eye on the search engines and make sure they
> >>>> can
> >>>> > >> figure out that the site has changed (I'm not sure how often they
> >>>> > crawl).
> >>>> > >> If they can then it would make sense to me to move forward with
> >>>> > migrating
> >>>> > >> the entire web site.
> >>>> > >>
> >>>> > >> On Mon, Apr 22, 2019 at 7:49 PM Jonathan Wei <jo...@apache.org>
> >>>> wrote:
> >>>> > >>
> >>>> > >>> Correction: Xavier was suggesting we use
> >>>> > >>>
> >>>> > >>>
> >>>> >
> >>>>
> https://github.com/druid-io/druid-io.github.io/blob/src/_layouts/redirect_page.html
> >>>> > >>> ,
> >>>> > >>> the existing redirect system used by the Druid website.
> >>>> > >>>
> >>>> > >>> I've opened PRs to do the community page migration test:
> >>>> > >>> https://github.com/apache/incubator-druid-website/pull/3
> >>>> > >>> https://github.com/druid-io/druid-io.github.io/pull/591
> >>>> > >>>
> >>>> > >>> On Mon, Apr 22, 2019 at 3:04 PM Gian Merlino <gi...@apache.org>
> >>>> wrote:
> >>>> > >>>
> >>>> > >>> > That sounds good to me. I would also consider adding canonical
> >>>> tags
> >>>> > to
> >>>> > >>> all
> >>>> > >>> > druid.apache.org pages so we don't have
> >>>> druid.incubator.apache.org
> >>>> > and
> >>>> > >>> > druid.apache.org both floating around (not to mention
> >>>> http/https
> >>>> > >>> version
> >>>> > >>> > of
> >>>> > >>> > both).
> >>>> > >>> >
> >>>> > >>> > On Mon, Apr 22, 2019 at 2:59 PM Jonathan Wei <
> jonwei@apache.org
> >>>> >
> >>>> > >>> wrote:
> >>>> > >>> >
> >>>> > >>> > > For redirects, Xavier has suggested using
> >>>> > >>> > >
> https://help.github.com/en/articles/redirects-on-github-pages
> >>>> to
> >>>> > >>> > redirect
> >>>> > >>> > > to druid.apache.org as a way to transition before the
> domain
> >>>> > >>> migration
> >>>> > >>> > > occurs, and believes that it would have the same SEO effects
> >>>> as a
> >>>> > 301
> >>>> > >>> > > redirect after the new pages are indexed.
> >>>> > >>> > >
> >>>> > >>> > > I think we could try migrating the current Community page to
> >>>> > >>> > > druid.apache.org with Github redirects and canonical links
> >>>> > pointing
> >>>> > >>> to
> >>>> > >>> > the
> >>>> > >>> > > https://druid.apache.org version. If that goes well, we
> could
> >>>> > >>> continue
> >>>> > >>> > > migrating more pages.
> >>>> > >>> > >
> >>>> > >>> > > What are the community's thoughts on that?
> >>>> > >>> > >
> >>>> > >>> > > Thanks,
> >>>> > >>> > > Jon
> >>>> > >>> > >
> >>>> > >>> > > On Tue, Mar 12, 2019 at 7:19 PM Gian Merlino <
> gian@apache.org
> >>>> >
> >>>> > >>> wrote:
> >>>> > >>> > >
> >>>> > >>> > > > OpenOffice and Groovy both chose to sort of "meld" their
> >>>> classic
> >>>> > >>> and
> >>>> > >>> > > Apache
> >>>> > >>> > > > sites together: https://www.openoffice.org/,
> >>>> > >>> http://groovy-lang.org/.
> >>>> > >>> > > Note
> >>>> > >>> > > > how when you click around, you get shuttled between the
> >>>> classic
> >>>> > >>> domain
> >>>> > >>> > > and
> >>>> > >>> > > > the Apache domain. Some pages are available on both sites,
> >>>> like
> >>>> > >>> > > > http://groovy-lang.org/download.html and
> >>>> > >>> > > > https://groovy.apache.org/download.html (which don't use
> >>>> > canonical
> >>>> > >>> > link
> >>>> > >>> > > > tags -- does not seem like a good example to follow!).
> >>>> > >>> > > >
> >>>> > >>> > > > NetBeans (still incubating) also has a "melded" site at
> >>>> > >>> > > > https://netbeans.org/ but doesn't seem to consider itself
> >>>> done
> >>>> > >>> yet.
> >>>> > >>> > They
> >>>> > >>> > > > are discussing plans on their lists & wiki to do redirects
> >>>> from
> >>>> > >>> > > > netbeans.org
> >>>> > >>> > > > to netbeans.apache.org:
> >>>> > >>> > > >
> >>>> > >>> > > >
> >>>> > >>> > >
> >>>> > >>> >
> >>>> > >>>
> >>>> >
> >>>>
> https://cwiki.apache.org/confluence/display/NETBEANS/netbeans.org+Transition+Process
> >>>> > >>> > > > ,
> >>>> > >>> > > >
> >>>> > >>> > > >
> >>>> > >>> > >
> >>>> > >>> >
> >>>> > >>>
> >>>> >
> >>>>
> https://lists.apache.org/thread.html/ad10fb9d4c8fee571a2f6232b268a3b835f7b823d3a0983b84aeb18a@%3Cdev.netbeans.apache.org%3E
> >>>> > >>> > > > .
> >>>> > >>> > > > As of today the domain has been donated to ASF, but the
> >>>> server is
> >>>> > >>> still
> >>>> > >>> > > run
> >>>> > >>> > > > by Oracle, so the plan doesn't seem to be finished yet.
> >>>> (WHOIS
> >>>> > for
> >>>> > >>> > > > netbeans.org shows ASF as the registrant; netbeans.org
> >>>> resolves
> >>>> > to
> >>>> > >>> > > > lb-netbeans-cms-adc.oracle.com.)
> >>>> > >>> > > >
> >>>> > >>> > > > The melded sites don't really seem better to me than
> >>>> redirecting
> >>>> > >>> all
> >>>> > >>> > urls
> >>>> > >>> > > > on the domain. I guess it depends on if we want to keep
> >>>> druid.io
> >>>> > >>> as
> >>>> > >>> > the
> >>>> > >>> > > > official domain forever, or if we think druid.apache.org
> is
> >>>> > >>> cooler. I
> >>>> > >>> > > > definitely think druid.apache.org is cooler so my vote is
> >>>> there
> >>>> > >>> :).
> >>>> > >>> > It's
> >>>> > >>> > > > also nice that it supports https. (druid.io does not
> today,
> >>>> > since
> >>>> > >>> it's
> >>>> > >>> > > on
> >>>> > >>> > > > GitHub pages, which doesn't support https for custom
> >>>> domains.)
> >>>> > >>> > > >
> >>>> > >>> > > > On Tue, Mar 12, 2019 at 7:47 PM Charles Allen
> >>>> > >>> > > > <ch...@snap.com.invalid> wrote:
> >>>> > >>> > > >
> >>>> > >>> > > > > Are there other projects who have transitioned an
> >>>> independently
> >>>> > >>> > > > successful
> >>>> > >>> > > > > domain name to an apache one?
> >>>> > >>> > > > >
> >>>> > >>> > > > > On Tue, Mar 5, 2019 at 2:13 PM David Lim <
> >>>> davidlim@apache.org>
> >>>> > >>> > wrote:
> >>>> > >>> > > > >
> >>>> > >>> > > > > > Who has control over the druid.io domain? Charles
> >>>> would that
> >>>> > >>> be
> >>>> > >>> > you?
> >>>> > >>> > > > > >
> >>>> > >>> > > > > > We'd need support from them for the DNS redirect.
> >>>> > >>> > > > > >
> >>>> > >>> > > > > > On Tue, Mar 5, 2019 at 2:04 PM Jonathan Wei <
> >>>> > jonwei@apache.org
> >>>> > >>> >
> >>>> > >>> > > wrote:
> >>>> > >>> > > > > >
> >>>> > >>> > > > > > > We still need to complete the website migration to
> >>>> Apache
> >>>> > >>> > > > > infrastructure.
> >>>> > >>> > > > > > >
> >>>> > >>> > > > > > > I'll propose the following plan:
> >>>> > >>> > > > > > >
> >>>> > >>> > > > > > > Proposed Apache Druid website migration plan
> >>>> > >>> > > > > > > ========================================
> >>>> > >>> > > > > > >
> >>>> > >>> > > > > > > These links have some previous discussion on the
> >>>> website
> >>>> > >>> > migration:
> >>>> > >>> > > > > > >
> >>>> > >>> > > > > > >
> >>>> > >>> > > > > > >
> >>>> > >>> > > > > >
> >>>> > >>> > > > >
> >>>> > >>> > > >
> >>>> > >>> > >
> >>>> > >>> >
> >>>> > >>>
> >>>> >
> >>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_7cae100b684e0b33e0adda993efea3d6088978700988a0ae632fdd80-40-253Cdev.druid.apache.org-253E&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=G1dTS7FlYGauxNOaQECZix2YwroWVCqJB-cT0nEeNwM&e=
> >>>> > >>> > > > > > >
> >>>> > >>> > > > > >
> >>>> > >>> > > > >
> >>>> > >>> > > >
> >>>> > >>> > >
> >>>> > >>> >
> >>>> > >>>
> >>>> >
> >>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_INFRA-2D17340&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=pwg0jE385gqei6EEEbxugKHWll7oyKoCloFc8ByhlUc&e=
> >>>> > >>> > > > > > >
> >>>> > >>> > > > > > > From the discussions above, the recommendation is to
> >>>> have 2
> >>>> > >>> > > separate
> >>>> > >>> > > > > > repos
> >>>> > >>> > > > > > > for the website: one for source and another for
> built
> >>>> > content
> >>>> > >>> > that
> >>>> > >>> > > > will
> >>>> > >>> > > > > > be
> >>>> > >>> > > > > > > served.
> >>>> > >>> > > > > > >
> >>>> > >>> > > > > > > Generating site files
> >>>> > >>> > > > > > > =======================
> >>>> > >>> > > > > > >
> >>>> > >>> > > > > > > The Apache site update process will be similar to
> our
> >>>> > current
> >>>> > >>> > > > process.
> >>>> > >>> > > > > > >
> >>>> > >>> > > > > > > Current process:
> >>>> > >>> > > > > > > 1. Push changes to
> >>>> > >>> > > > > >
> https://github.com/druid-io/druid-io.github.io/tree/src
> >>>> > >>> > > > > > > 2. metamx bot picks up changes, builds, and commits
> to
> >>>> > >>> > > > > > >
> >>>> https://github.com/druid-io/druid-io.github.io/tree/master
> >>>> > >>> > > > > > > 3.
> >>>> > >>> https://github.com/druid-io/druid-io.github.io/tree/master is
> >>>> > >>> > > > > served
> >>>> > >>> > > > > > by
> >>>> > >>> > > > > > > github pages
> >>>> > >>> > > > > > >
> >>>> > >>> > > > > > > Apache process:
> >>>> > >>> > > > > > > 1. Push changes to
> >>>> > >>> > > > > https://github.com/apache/incubator-druid-website-src
> >>>> > >>> > > > > > > 2. Jenkins bot from Apache will build the website
> from
> >>>> > source
> >>>> > >>> > repo,
> >>>> > >>> > > > > > commit
> >>>> > >>> > > > > > > to
> https://github.com/apache/incubator-druid-website
> >>>> > >>> > > > > > > 3. Apache Druid website will be served from the
> >>>> content in
> >>>> > >>> > > > > > > https://github.com/apache/incubator-druid-website
> >>>> > (asf-site
> >>>> > >>> > > branch)
> >>>> > >>> > > > > > >
> >>>> > >>> > > > > > >
> >>>> > >>> > > > > > > Hosting and SEO
> >>>> > >>> > > > > > > ================
> >>>> > >>> > > > > > >
> >>>> > >>> > > > > > > The Apache site will be hosted at druid.apache.org
> on
> >>>> > Apache
> >>>> > >>> > > > > > > infrastructure:
> >>>> > >>> > > > > >
> >>>> > >>> > > > >
> >>>> > >>> > > >
> >>>> > >>> > >
> >>>> > >>> >
> >>>> > >>>
> >>>> >
> >>>>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.apache.org_dev_project-2Dsite.html&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=_rHEo_asMXKypaunuBTXFkB6Ni3F6KqbEfkck18L7Ag&e=
> >>>> > >>> > > > > > >
> >>>> > >>> > > > > > > To preserve our search rankings, we can setup 301
> >>>> redirects
> >>>> > >>> from
> >>>> > >>> > > the
> >>>> > >>> > > > > old
> >>>> > >>> > > > > > > druid.io site to the corresponding pages on the
> >>>> > >>> druid.apache.org
> >>>> > >>> > > > > site. (
> >>>> > >>> > > > > > >
> >>>> > >>> > > > > >
> >>>> > >>> > > > >
> >>>> > >>> > > >
> >>>> > >>> > >
> >>>> > >>> >
> >>>> > >>>
> >>>> >
> >>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__moz.com_learn_seo_redirection&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=lUeWU0dT9thy8gp11RO-Vry7zkYl_W4BXz01fyXJO0A&e=
> >>>> > >>> > > > > > )
> >>>> > >>> > > > > > >
> >>>> > >>> > > > > > > However, Github pages (which currently hosts the
> >>>> druid.io
> >>>> > >>> site)
> >>>> > >>> > > does
> >>>> > >>> > > > > not
> >>>> > >>> > > > > > > support 301 redirects, so we propose the following:
> >>>> > >>> > > > > > > - Setup a new Nginx server that will perform 301
> >>>> redirects
> >>>> > to
> >>>> > >>> > > > > > > druid.apache.org for the druid.io. Imply can host
> >>>> this if
> >>>> > >>> > needed.
> >>>> > >>> > > > > > > - Update the druid.io DNS entry to point to this
> new
> >>>> Nginx
> >>>> > >>> > server
> >>>> > >>> > > > > > > - Shut down Github pages hosting for druid.io
> >>>> > >>> > > > > > >
> >>>> > >>> > > > > > > In addition, we can also set canonical tags on our
> >>>> pages:
> >>>> > >>> > > > > > >
> >>>> > >>> > > > > >
> >>>> > >>> > > > >
> >>>> > >>> > > >
> >>>> > >>> > >
> >>>> > >>> >
> >>>> > >>>
> >>>> >
> >>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__moz.com_learn_seo_canonicalization&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=T8G2c6d4EbQ_YDLFQXVebcj0UN9FNrbpPY5Xq4LAR8w&e=
> >>>> > >>> > > > > > >
> >>>> > >>> > > > > > >
> >>>> > >>> > > > > > > Action items
> >>>> > >>> > > > > > > ===============
> >>>> > >>> > > > > > > - Setup a Jenkins bot that builds the Apache website
> >>>> > content
> >>>> > >>> from
> >>>> > >>> > > > > source
> >>>> > >>> > > > > > > - Get the Apache website up
> >>>> > >>> > > > > > > - Setup Nginx redirect server for druid.io
> >>>> > >>> > > > > > > - Shutdown github pages and redirect DNS for
> druid.io
> >>>> to
> >>>> > >>> Nginx
> >>>> > >>> > > > > redirect
> >>>> > >>> > > > > > > server
> >>>> > >>> > > > > > > - Add canonical tags to pages
> >>>> > >>> > > > > > >
> >>>> > >>> > > > > >
> >>>> > >>> > > > >
> >>>> > >>> > > >
> >>>> > >>> > >
> >>>> > >>> >
> >>>> > >>>
> >>>> > >>
> >>>> >
> >>>>
> >>>
>

Re: Proposed website migration plan

Posted by Gian Merlino <gi...@apache.org>.
It looks like Google has picked up the 301 and [druid use cases] #1 result
is https://druid.apache.org/use-cases now. For [what is druid used for]
it's not #4 instead of #2. I think this is the best we are likely to get. I
am ready to flip the switch if there aren't any objections.

On Fri, Jun 7, 2019 at 9:15 PM Gian Merlino <gi...@apache.org> wrote:

> Another update: as of
> https://github.com/apache/incubator-druid-website-src/pull/1 and
> https://github.com/apache/incubator-druid-website/pull/7, the
> https://druid.apache.org/ site is now serving almost all pages from
> druid.io, except:
>
> - the index page (it still has a placeholder until we flip the switch)
> - the download page (it has a differently-designed download page: compare
> http://druid.io/downloads.html with http://druid.apache.org/downloads.html
> - any docs older than 0.13.0 (they aren't Apache releases)
>
> If you navigate to https://druid.apache.org/ + any other path from
> druid.io, you should see the page.
>
> I'm hoping to confirm that search engines pick up the 301 for
> http://druid.io/use-cases before flipping the switch. Hopefully that
> doesn't take much longer. If it does we should talk about how we want to
> proceed.
>
> On Tue, Jun 4, 2019 at 1:48 AM Gian Merlino <gi...@apache.org> wrote:
>
>> An update: we do have a redirect server set up on druid.io now: note
>> that http://druid.io/community/ and http://druid.io/use-cases both
>> redirect to https://druid.apache.org. I just set up the latter redirect
>> (on /use-cases) as part of 'test this first on a single page'. All other
>> druid.io URLs are still being hosted using the content from GitHub pages
>> at https://github.com/druid-io/druid-io.github.io.
>>
>> Search engine watch: currently, http://druid.io is the #1 link for
>> [druid use cases] on Google, Bing, and DuckDuckGo (and has a cool looking
>> infobox on Google & Bing). For [what is druid used for], it's #2 on Google,
>> and not ranked on the first page on Bing & DDG. Will monitor this over the
>> next few days.
>>
>> On Mon, May 6, 2019 at 5:43 PM Gian Merlino <gi...@apache.org> wrote:
>>
>>> Hi all,
>>>
>>> It sounds like we will need a redirect server that issues 301s from each
>>> druid.io page to the corresponding druid.apache.org page. Charles and I
>>> spoke offline and thought that something like Jon's original proposal is
>>> the best way to go. I am going to suggest we get started on this, as it's
>>> the last major piece of infra to move to ASF.
>>>
>>> 1) Set up a redirect server to perform 301 redirects to druid.apache.org
>>> 2) Post all druid.io content on druid.apache.org
>>> 3) Update druid.io DNS to point to the redirect server
>>> 4) Shut down GitHub pages hosting for druid.io
>>>
>>> Steps (2) and (3) should be done as close in time as possible so there
>>> is no confusion as to which version of the pages is canonical.
>>>
>>> For the redirect server, two viable options are an nginx server or an S3
>>> webpage redirect (
>>> https://docs.aws.amazon.com/AmazonS3/latest/dev/how-to-page-redirect.html).
>>> Just like we did with the HTML-level redirect, I suggest we test this first
>>> on a single page. We can do that by having the redirect server initially
>>> start off by hosting all druid.io content (so it's indistinguishable
>>> from the GitHub-pages-based site) except for a single page, which it
>>> redirects using HTTP 301 to druid.apache.org.
>>>
>>> I'm planning to start looking into this, so anyone around please speak
>>> up if you have any advice or alternative approaches to suggest.
>>>
>>> On Mon, Apr 29, 2019 at 4:01 PM Jonathan Wei <jo...@apache.org> wrote:
>>>
>>>> Thanks for checking the SEO state, that's somewhat disappointing.
>>>>
>>>> For Bing, it sounds like they really want you to use 301s (
>>>> https://www.bing.com/webmaster/help/webmaster-guidelines-30fba23a):
>>>>
>>>> > Bing prefers you use a 301 permanent redirect when moving content,
>>>> should
>>>> the move be permanent.  If the move is temporary, then a 302 temporary
>>>> redirect will work fine.  Do not use the rel=canonical tag in place of a
>>>> proper redirect.
>>>>
>>>> I wasn't able to find similar guidance re: this issue for DuckDuckGo.
>>>>
>>>> On Mon, Apr 29, 2019 at 10:42 AM Gian Merlino <gi...@apache.org> wrote:
>>>>
>>>> > Another update: SEO is not looking great after another day passed.
>>>> For a
>>>> > search for "druid community", both http://druid.io/community and
>>>> > https://druid.apache.org/community/ have dropped off the front page
>>>> of
>>>> > Bing
>>>> > completely. On Google, the legacy version is gone (as expected) but
>>>> the
>>>> > Apache version has dropped to the #3 spot (down from #2 yesterday;
>>>> and down
>>>> > from where the legacy page was pre-migration, which was #1).
>>>> >
>>>> > I think this means we do need to try to get 301s figured out.
>>>> >
>>>> > On Sun, Apr 28, 2019 at 3:06 PM Gian Merlino <gi...@apache.org> wrote:
>>>> >
>>>> > > Google has picked up the new URL as of today but Bing hasn't.
>>>> Neither has
>>>> > > DuckDuckGo for that matter.
>>>> > >
>>>> > > Currently, Google is showing https://druid.apache.org/community/
>>>> in the
>>>> > > #2 spot and Bing/DDG are showing http://druid.io/community in the
>>>> top
>>>> > > spot. Ominously, the latter two _have_ picked up a page title
>>>> change to
>>>> > > "Redirecting..."
>>>> > >
>>>> > > On Fri, Apr 26, 2019 at 11:00 AM Gian Merlino <gi...@apache.org>
>>>> wrote:
>>>> > >
>>>> > >> An update: this is done now since a couple of days ago, but Google
>>>> and
>>>> > >> Bing are still showing http://druid.io/community for a search for
>>>> > "druid
>>>> > >> community" or even "apache druid community":
>>>> > >>
>>>> > >> - https://www.google.com/search?q=druid+community
>>>> > >> - https://www.bing.com/search?q=druid+community
>>>> > >>
>>>> > >> I suggest we keep an eye on the search engines and make sure they
>>>> can
>>>> > >> figure out that the site has changed (I'm not sure how often they
>>>> > crawl).
>>>> > >> If they can then it would make sense to me to move forward with
>>>> > migrating
>>>> > >> the entire web site.
>>>> > >>
>>>> > >> On Mon, Apr 22, 2019 at 7:49 PM Jonathan Wei <jo...@apache.org>
>>>> wrote:
>>>> > >>
>>>> > >>> Correction: Xavier was suggesting we use
>>>> > >>>
>>>> > >>>
>>>> >
>>>> https://github.com/druid-io/druid-io.github.io/blob/src/_layouts/redirect_page.html
>>>> > >>> ,
>>>> > >>> the existing redirect system used by the Druid website.
>>>> > >>>
>>>> > >>> I've opened PRs to do the community page migration test:
>>>> > >>> https://github.com/apache/incubator-druid-website/pull/3
>>>> > >>> https://github.com/druid-io/druid-io.github.io/pull/591
>>>> > >>>
>>>> > >>> On Mon, Apr 22, 2019 at 3:04 PM Gian Merlino <gi...@apache.org>
>>>> wrote:
>>>> > >>>
>>>> > >>> > That sounds good to me. I would also consider adding canonical
>>>> tags
>>>> > to
>>>> > >>> all
>>>> > >>> > druid.apache.org pages so we don't have
>>>> druid.incubator.apache.org
>>>> > and
>>>> > >>> > druid.apache.org both floating around (not to mention
>>>> http/https
>>>> > >>> version
>>>> > >>> > of
>>>> > >>> > both).
>>>> > >>> >
>>>> > >>> > On Mon, Apr 22, 2019 at 2:59 PM Jonathan Wei <jonwei@apache.org
>>>> >
>>>> > >>> wrote:
>>>> > >>> >
>>>> > >>> > > For redirects, Xavier has suggested using
>>>> > >>> > > https://help.github.com/en/articles/redirects-on-github-pages
>>>> to
>>>> > >>> > redirect
>>>> > >>> > > to druid.apache.org as a way to transition before the domain
>>>> > >>> migration
>>>> > >>> > > occurs, and believes that it would have the same SEO effects
>>>> as a
>>>> > 301
>>>> > >>> > > redirect after the new pages are indexed.
>>>> > >>> > >
>>>> > >>> > > I think we could try migrating the current Community page to
>>>> > >>> > > druid.apache.org with Github redirects and canonical links
>>>> > pointing
>>>> > >>> to
>>>> > >>> > the
>>>> > >>> > > https://druid.apache.org version. If that goes well, we could
>>>> > >>> continue
>>>> > >>> > > migrating more pages.
>>>> > >>> > >
>>>> > >>> > > What are the community's thoughts on that?
>>>> > >>> > >
>>>> > >>> > > Thanks,
>>>> > >>> > > Jon
>>>> > >>> > >
>>>> > >>> > > On Tue, Mar 12, 2019 at 7:19 PM Gian Merlino <gian@apache.org
>>>> >
>>>> > >>> wrote:
>>>> > >>> > >
>>>> > >>> > > > OpenOffice and Groovy both chose to sort of "meld" their
>>>> classic
>>>> > >>> and
>>>> > >>> > > Apache
>>>> > >>> > > > sites together: https://www.openoffice.org/,
>>>> > >>> http://groovy-lang.org/.
>>>> > >>> > > Note
>>>> > >>> > > > how when you click around, you get shuttled between the
>>>> classic
>>>> > >>> domain
>>>> > >>> > > and
>>>> > >>> > > > the Apache domain. Some pages are available on both sites,
>>>> like
>>>> > >>> > > > http://groovy-lang.org/download.html and
>>>> > >>> > > > https://groovy.apache.org/download.html (which don't use
>>>> > canonical
>>>> > >>> > link
>>>> > >>> > > > tags -- does not seem like a good example to follow!).
>>>> > >>> > > >
>>>> > >>> > > > NetBeans (still incubating) also has a "melded" site at
>>>> > >>> > > > https://netbeans.org/ but doesn't seem to consider itself
>>>> done
>>>> > >>> yet.
>>>> > >>> > They
>>>> > >>> > > > are discussing plans on their lists & wiki to do redirects
>>>> from
>>>> > >>> > > > netbeans.org
>>>> > >>> > > > to netbeans.apache.org:
>>>> > >>> > > >
>>>> > >>> > > >
>>>> > >>> > >
>>>> > >>> >
>>>> > >>>
>>>> >
>>>> https://cwiki.apache.org/confluence/display/NETBEANS/netbeans.org+Transition+Process
>>>> > >>> > > > ,
>>>> > >>> > > >
>>>> > >>> > > >
>>>> > >>> > >
>>>> > >>> >
>>>> > >>>
>>>> >
>>>> https://lists.apache.org/thread.html/ad10fb9d4c8fee571a2f6232b268a3b835f7b823d3a0983b84aeb18a@%3Cdev.netbeans.apache.org%3E
>>>> > >>> > > > .
>>>> > >>> > > > As of today the domain has been donated to ASF, but the
>>>> server is
>>>> > >>> still
>>>> > >>> > > run
>>>> > >>> > > > by Oracle, so the plan doesn't seem to be finished yet.
>>>> (WHOIS
>>>> > for
>>>> > >>> > > > netbeans.org shows ASF as the registrant; netbeans.org
>>>> resolves
>>>> > to
>>>> > >>> > > > lb-netbeans-cms-adc.oracle.com.)
>>>> > >>> > > >
>>>> > >>> > > > The melded sites don't really seem better to me than
>>>> redirecting
>>>> > >>> all
>>>> > >>> > urls
>>>> > >>> > > > on the domain. I guess it depends on if we want to keep
>>>> druid.io
>>>> > >>> as
>>>> > >>> > the
>>>> > >>> > > > official domain forever, or if we think druid.apache.org is
>>>> > >>> cooler. I
>>>> > >>> > > > definitely think druid.apache.org is cooler so my vote is
>>>> there
>>>> > >>> :).
>>>> > >>> > It's
>>>> > >>> > > > also nice that it supports https. (druid.io does not today,
>>>> > since
>>>> > >>> it's
>>>> > >>> > > on
>>>> > >>> > > > GitHub pages, which doesn't support https for custom
>>>> domains.)
>>>> > >>> > > >
>>>> > >>> > > > On Tue, Mar 12, 2019 at 7:47 PM Charles Allen
>>>> > >>> > > > <ch...@snap.com.invalid> wrote:
>>>> > >>> > > >
>>>> > >>> > > > > Are there other projects who have transitioned an
>>>> independently
>>>> > >>> > > > successful
>>>> > >>> > > > > domain name to an apache one?
>>>> > >>> > > > >
>>>> > >>> > > > > On Tue, Mar 5, 2019 at 2:13 PM David Lim <
>>>> davidlim@apache.org>
>>>> > >>> > wrote:
>>>> > >>> > > > >
>>>> > >>> > > > > > Who has control over the druid.io domain? Charles
>>>> would that
>>>> > >>> be
>>>> > >>> > you?
>>>> > >>> > > > > >
>>>> > >>> > > > > > We'd need support from them for the DNS redirect.
>>>> > >>> > > > > >
>>>> > >>> > > > > > On Tue, Mar 5, 2019 at 2:04 PM Jonathan Wei <
>>>> > jonwei@apache.org
>>>> > >>> >
>>>> > >>> > > wrote:
>>>> > >>> > > > > >
>>>> > >>> > > > > > > We still need to complete the website migration to
>>>> Apache
>>>> > >>> > > > > infrastructure.
>>>> > >>> > > > > > >
>>>> > >>> > > > > > > I'll propose the following plan:
>>>> > >>> > > > > > >
>>>> > >>> > > > > > > Proposed Apache Druid website migration plan
>>>> > >>> > > > > > > ========================================
>>>> > >>> > > > > > >
>>>> > >>> > > > > > > These links have some previous discussion on the
>>>> website
>>>> > >>> > migration:
>>>> > >>> > > > > > >
>>>> > >>> > > > > > >
>>>> > >>> > > > > > >
>>>> > >>> > > > > >
>>>> > >>> > > > >
>>>> > >>> > > >
>>>> > >>> > >
>>>> > >>> >
>>>> > >>>
>>>> >
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_7cae100b684e0b33e0adda993efea3d6088978700988a0ae632fdd80-40-253Cdev.druid.apache.org-253E&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=G1dTS7FlYGauxNOaQECZix2YwroWVCqJB-cT0nEeNwM&e=
>>>> > >>> > > > > > >
>>>> > >>> > > > > >
>>>> > >>> > > > >
>>>> > >>> > > >
>>>> > >>> > >
>>>> > >>> >
>>>> > >>>
>>>> >
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_INFRA-2D17340&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=pwg0jE385gqei6EEEbxugKHWll7oyKoCloFc8ByhlUc&e=
>>>> > >>> > > > > > >
>>>> > >>> > > > > > > From the discussions above, the recommendation is to
>>>> have 2
>>>> > >>> > > separate
>>>> > >>> > > > > > repos
>>>> > >>> > > > > > > for the website: one for source and another for built
>>>> > content
>>>> > >>> > that
>>>> > >>> > > > will
>>>> > >>> > > > > > be
>>>> > >>> > > > > > > served.
>>>> > >>> > > > > > >
>>>> > >>> > > > > > > Generating site files
>>>> > >>> > > > > > > =======================
>>>> > >>> > > > > > >
>>>> > >>> > > > > > > The Apache site update process will be similar to our
>>>> > current
>>>> > >>> > > > process.
>>>> > >>> > > > > > >
>>>> > >>> > > > > > > Current process:
>>>> > >>> > > > > > > 1. Push changes to
>>>> > >>> > > > > > https://github.com/druid-io/druid-io.github.io/tree/src
>>>> > >>> > > > > > > 2. metamx bot picks up changes, builds, and commits to
>>>> > >>> > > > > > >
>>>> https://github.com/druid-io/druid-io.github.io/tree/master
>>>> > >>> > > > > > > 3.
>>>> > >>> https://github.com/druid-io/druid-io.github.io/tree/master is
>>>> > >>> > > > > served
>>>> > >>> > > > > > by
>>>> > >>> > > > > > > github pages
>>>> > >>> > > > > > >
>>>> > >>> > > > > > > Apache process:
>>>> > >>> > > > > > > 1. Push changes to
>>>> > >>> > > > > https://github.com/apache/incubator-druid-website-src
>>>> > >>> > > > > > > 2. Jenkins bot from Apache will build the website from
>>>> > source
>>>> > >>> > repo,
>>>> > >>> > > > > > commit
>>>> > >>> > > > > > > to https://github.com/apache/incubator-druid-website
>>>> > >>> > > > > > > 3. Apache Druid website will be served from the
>>>> content in
>>>> > >>> > > > > > > https://github.com/apache/incubator-druid-website
>>>> > (asf-site
>>>> > >>> > > branch)
>>>> > >>> > > > > > >
>>>> > >>> > > > > > >
>>>> > >>> > > > > > > Hosting and SEO
>>>> > >>> > > > > > > ================
>>>> > >>> > > > > > >
>>>> > >>> > > > > > > The Apache site will be hosted at druid.apache.org on
>>>> > Apache
>>>> > >>> > > > > > > infrastructure:
>>>> > >>> > > > > >
>>>> > >>> > > > >
>>>> > >>> > > >
>>>> > >>> > >
>>>> > >>> >
>>>> > >>>
>>>> >
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.apache.org_dev_project-2Dsite.html&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=_rHEo_asMXKypaunuBTXFkB6Ni3F6KqbEfkck18L7Ag&e=
>>>> > >>> > > > > > >
>>>> > >>> > > > > > > To preserve our search rankings, we can setup 301
>>>> redirects
>>>> > >>> from
>>>> > >>> > > the
>>>> > >>> > > > > old
>>>> > >>> > > > > > > druid.io site to the corresponding pages on the
>>>> > >>> druid.apache.org
>>>> > >>> > > > > site. (
>>>> > >>> > > > > > >
>>>> > >>> > > > > >
>>>> > >>> > > > >
>>>> > >>> > > >
>>>> > >>> > >
>>>> > >>> >
>>>> > >>>
>>>> >
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__moz.com_learn_seo_redirection&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=lUeWU0dT9thy8gp11RO-Vry7zkYl_W4BXz01fyXJO0A&e=
>>>> > >>> > > > > > )
>>>> > >>> > > > > > >
>>>> > >>> > > > > > > However, Github pages (which currently hosts the
>>>> druid.io
>>>> > >>> site)
>>>> > >>> > > does
>>>> > >>> > > > > not
>>>> > >>> > > > > > > support 301 redirects, so we propose the following:
>>>> > >>> > > > > > > - Setup a new Nginx server that will perform 301
>>>> redirects
>>>> > to
>>>> > >>> > > > > > > druid.apache.org for the druid.io. Imply can host
>>>> this if
>>>> > >>> > needed.
>>>> > >>> > > > > > > - Update the druid.io DNS entry to point to this new
>>>> Nginx
>>>> > >>> > server
>>>> > >>> > > > > > > - Shut down Github pages hosting for druid.io
>>>> > >>> > > > > > >
>>>> > >>> > > > > > > In addition, we can also set canonical tags on our
>>>> pages:
>>>> > >>> > > > > > >
>>>> > >>> > > > > >
>>>> > >>> > > > >
>>>> > >>> > > >
>>>> > >>> > >
>>>> > >>> >
>>>> > >>>
>>>> >
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__moz.com_learn_seo_canonicalization&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=T8G2c6d4EbQ_YDLFQXVebcj0UN9FNrbpPY5Xq4LAR8w&e=
>>>> > >>> > > > > > >
>>>> > >>> > > > > > >
>>>> > >>> > > > > > > Action items
>>>> > >>> > > > > > > ===============
>>>> > >>> > > > > > > - Setup a Jenkins bot that builds the Apache website
>>>> > content
>>>> > >>> from
>>>> > >>> > > > > source
>>>> > >>> > > > > > > - Get the Apache website up
>>>> > >>> > > > > > > - Setup Nginx redirect server for druid.io
>>>> > >>> > > > > > > - Shutdown github pages and redirect DNS for druid.io
>>>> to
>>>> > >>> Nginx
>>>> > >>> > > > > redirect
>>>> > >>> > > > > > > server
>>>> > >>> > > > > > > - Add canonical tags to pages
>>>> > >>> > > > > > >
>>>> > >>> > > > > >
>>>> > >>> > > > >
>>>> > >>> > > >
>>>> > >>> > >
>>>> > >>> >
>>>> > >>>
>>>> > >>
>>>> >
>>>>
>>>

Re: Proposed website migration plan

Posted by Gian Merlino <gi...@apache.org>.
Another update: as of
https://github.com/apache/incubator-druid-website-src/pull/1 and
https://github.com/apache/incubator-druid-website/pull/7, the
https://druid.apache.org/ site is now serving almost all pages from druid.io,
except:

- the index page (it still has a placeholder until we flip the switch)
- the download page (it has a differently-designed download page: compare
http://druid.io/downloads.html with http://druid.apache.org/downloads.html
- any docs older than 0.13.0 (they aren't Apache releases)

If you navigate to https://druid.apache.org/ + any other path from druid.io,
you should see the page.

I'm hoping to confirm that search engines pick up the 301 for
http://druid.io/use-cases before flipping the switch. Hopefully that
doesn't take much longer. If it does we should talk about how we want to
proceed.

On Tue, Jun 4, 2019 at 1:48 AM Gian Merlino <gi...@apache.org> wrote:

> An update: we do have a redirect server set up on druid.io now: note that
> http://druid.io/community/ and http://druid.io/use-cases both redirect to
> https://druid.apache.org. I just set up the latter redirect (on
> /use-cases) as part of 'test this first on a single page'. All other
> druid.io URLs are still being hosted using the content from GitHub pages
> at https://github.com/druid-io/druid-io.github.io.
>
> Search engine watch: currently, http://druid.io is the #1 link for [druid
> use cases] on Google, Bing, and DuckDuckGo (and has a cool looking infobox
> on Google & Bing). For [what is druid used for], it's #2 on Google, and not
> ranked on the first page on Bing & DDG. Will monitor this over the next few
> days.
>
> On Mon, May 6, 2019 at 5:43 PM Gian Merlino <gi...@apache.org> wrote:
>
>> Hi all,
>>
>> It sounds like we will need a redirect server that issues 301s from each
>> druid.io page to the corresponding druid.apache.org page. Charles and I
>> spoke offline and thought that something like Jon's original proposal is
>> the best way to go. I am going to suggest we get started on this, as it's
>> the last major piece of infra to move to ASF.
>>
>> 1) Set up a redirect server to perform 301 redirects to druid.apache.org
>> 2) Post all druid.io content on druid.apache.org
>> 3) Update druid.io DNS to point to the redirect server
>> 4) Shut down GitHub pages hosting for druid.io
>>
>> Steps (2) and (3) should be done as close in time as possible so there is
>> no confusion as to which version of the pages is canonical.
>>
>> For the redirect server, two viable options are an nginx server or an S3
>> webpage redirect (
>> https://docs.aws.amazon.com/AmazonS3/latest/dev/how-to-page-redirect.html).
>> Just like we did with the HTML-level redirect, I suggest we test this first
>> on a single page. We can do that by having the redirect server initially
>> start off by hosting all druid.io content (so it's indistinguishable
>> from the GitHub-pages-based site) except for a single page, which it
>> redirects using HTTP 301 to druid.apache.org.
>>
>> I'm planning to start looking into this, so anyone around please speak up
>> if you have any advice or alternative approaches to suggest.
>>
>> On Mon, Apr 29, 2019 at 4:01 PM Jonathan Wei <jo...@apache.org> wrote:
>>
>>> Thanks for checking the SEO state, that's somewhat disappointing.
>>>
>>> For Bing, it sounds like they really want you to use 301s (
>>> https://www.bing.com/webmaster/help/webmaster-guidelines-30fba23a):
>>>
>>> > Bing prefers you use a 301 permanent redirect when moving content,
>>> should
>>> the move be permanent.  If the move is temporary, then a 302 temporary
>>> redirect will work fine.  Do not use the rel=canonical tag in place of a
>>> proper redirect.
>>>
>>> I wasn't able to find similar guidance re: this issue for DuckDuckGo.
>>>
>>> On Mon, Apr 29, 2019 at 10:42 AM Gian Merlino <gi...@apache.org> wrote:
>>>
>>> > Another update: SEO is not looking great after another day passed. For
>>> a
>>> > search for "druid community", both http://druid.io/community and
>>> > https://druid.apache.org/community/ have dropped off the front page of
>>> > Bing
>>> > completely. On Google, the legacy version is gone (as expected) but the
>>> > Apache version has dropped to the #3 spot (down from #2 yesterday; and
>>> down
>>> > from where the legacy page was pre-migration, which was #1).
>>> >
>>> > I think this means we do need to try to get 301s figured out.
>>> >
>>> > On Sun, Apr 28, 2019 at 3:06 PM Gian Merlino <gi...@apache.org> wrote:
>>> >
>>> > > Google has picked up the new URL as of today but Bing hasn't.
>>> Neither has
>>> > > DuckDuckGo for that matter.
>>> > >
>>> > > Currently, Google is showing https://druid.apache.org/community/ in
>>> the
>>> > > #2 spot and Bing/DDG are showing http://druid.io/community in the
>>> top
>>> > > spot. Ominously, the latter two _have_ picked up a page title change
>>> to
>>> > > "Redirecting..."
>>> > >
>>> > > On Fri, Apr 26, 2019 at 11:00 AM Gian Merlino <gi...@apache.org>
>>> wrote:
>>> > >
>>> > >> An update: this is done now since a couple of days ago, but Google
>>> and
>>> > >> Bing are still showing http://druid.io/community for a search for
>>> > "druid
>>> > >> community" or even "apache druid community":
>>> > >>
>>> > >> - https://www.google.com/search?q=druid+community
>>> > >> - https://www.bing.com/search?q=druid+community
>>> > >>
>>> > >> I suggest we keep an eye on the search engines and make sure they
>>> can
>>> > >> figure out that the site has changed (I'm not sure how often they
>>> > crawl).
>>> > >> If they can then it would make sense to me to move forward with
>>> > migrating
>>> > >> the entire web site.
>>> > >>
>>> > >> On Mon, Apr 22, 2019 at 7:49 PM Jonathan Wei <jo...@apache.org>
>>> wrote:
>>> > >>
>>> > >>> Correction: Xavier was suggesting we use
>>> > >>>
>>> > >>>
>>> >
>>> https://github.com/druid-io/druid-io.github.io/blob/src/_layouts/redirect_page.html
>>> > >>> ,
>>> > >>> the existing redirect system used by the Druid website.
>>> > >>>
>>> > >>> I've opened PRs to do the community page migration test:
>>> > >>> https://github.com/apache/incubator-druid-website/pull/3
>>> > >>> https://github.com/druid-io/druid-io.github.io/pull/591
>>> > >>>
>>> > >>> On Mon, Apr 22, 2019 at 3:04 PM Gian Merlino <gi...@apache.org>
>>> wrote:
>>> > >>>
>>> > >>> > That sounds good to me. I would also consider adding canonical
>>> tags
>>> > to
>>> > >>> all
>>> > >>> > druid.apache.org pages so we don't have
>>> druid.incubator.apache.org
>>> > and
>>> > >>> > druid.apache.org both floating around (not to mention http/https
>>> > >>> version
>>> > >>> > of
>>> > >>> > both).
>>> > >>> >
>>> > >>> > On Mon, Apr 22, 2019 at 2:59 PM Jonathan Wei <jo...@apache.org>
>>> > >>> wrote:
>>> > >>> >
>>> > >>> > > For redirects, Xavier has suggested using
>>> > >>> > > https://help.github.com/en/articles/redirects-on-github-pages
>>> to
>>> > >>> > redirect
>>> > >>> > > to druid.apache.org as a way to transition before the domain
>>> > >>> migration
>>> > >>> > > occurs, and believes that it would have the same SEO effects
>>> as a
>>> > 301
>>> > >>> > > redirect after the new pages are indexed.
>>> > >>> > >
>>> > >>> > > I think we could try migrating the current Community page to
>>> > >>> > > druid.apache.org with Github redirects and canonical links
>>> > pointing
>>> > >>> to
>>> > >>> > the
>>> > >>> > > https://druid.apache.org version. If that goes well, we could
>>> > >>> continue
>>> > >>> > > migrating more pages.
>>> > >>> > >
>>> > >>> > > What are the community's thoughts on that?
>>> > >>> > >
>>> > >>> > > Thanks,
>>> > >>> > > Jon
>>> > >>> > >
>>> > >>> > > On Tue, Mar 12, 2019 at 7:19 PM Gian Merlino <gi...@apache.org>
>>> > >>> wrote:
>>> > >>> > >
>>> > >>> > > > OpenOffice and Groovy both chose to sort of "meld" their
>>> classic
>>> > >>> and
>>> > >>> > > Apache
>>> > >>> > > > sites together: https://www.openoffice.org/,
>>> > >>> http://groovy-lang.org/.
>>> > >>> > > Note
>>> > >>> > > > how when you click around, you get shuttled between the
>>> classic
>>> > >>> domain
>>> > >>> > > and
>>> > >>> > > > the Apache domain. Some pages are available on both sites,
>>> like
>>> > >>> > > > http://groovy-lang.org/download.html and
>>> > >>> > > > https://groovy.apache.org/download.html (which don't use
>>> > canonical
>>> > >>> > link
>>> > >>> > > > tags -- does not seem like a good example to follow!).
>>> > >>> > > >
>>> > >>> > > > NetBeans (still incubating) also has a "melded" site at
>>> > >>> > > > https://netbeans.org/ but doesn't seem to consider itself
>>> done
>>> > >>> yet.
>>> > >>> > They
>>> > >>> > > > are discussing plans on their lists & wiki to do redirects
>>> from
>>> > >>> > > > netbeans.org
>>> > >>> > > > to netbeans.apache.org:
>>> > >>> > > >
>>> > >>> > > >
>>> > >>> > >
>>> > >>> >
>>> > >>>
>>> >
>>> https://cwiki.apache.org/confluence/display/NETBEANS/netbeans.org+Transition+Process
>>> > >>> > > > ,
>>> > >>> > > >
>>> > >>> > > >
>>> > >>> > >
>>> > >>> >
>>> > >>>
>>> >
>>> https://lists.apache.org/thread.html/ad10fb9d4c8fee571a2f6232b268a3b835f7b823d3a0983b84aeb18a@%3Cdev.netbeans.apache.org%3E
>>> > >>> > > > .
>>> > >>> > > > As of today the domain has been donated to ASF, but the
>>> server is
>>> > >>> still
>>> > >>> > > run
>>> > >>> > > > by Oracle, so the plan doesn't seem to be finished yet.
>>> (WHOIS
>>> > for
>>> > >>> > > > netbeans.org shows ASF as the registrant; netbeans.org
>>> resolves
>>> > to
>>> > >>> > > > lb-netbeans-cms-adc.oracle.com.)
>>> > >>> > > >
>>> > >>> > > > The melded sites don't really seem better to me than
>>> redirecting
>>> > >>> all
>>> > >>> > urls
>>> > >>> > > > on the domain. I guess it depends on if we want to keep
>>> druid.io
>>> > >>> as
>>> > >>> > the
>>> > >>> > > > official domain forever, or if we think druid.apache.org is
>>> > >>> cooler. I
>>> > >>> > > > definitely think druid.apache.org is cooler so my vote is
>>> there
>>> > >>> :).
>>> > >>> > It's
>>> > >>> > > > also nice that it supports https. (druid.io does not today,
>>> > since
>>> > >>> it's
>>> > >>> > > on
>>> > >>> > > > GitHub pages, which doesn't support https for custom
>>> domains.)
>>> > >>> > > >
>>> > >>> > > > On Tue, Mar 12, 2019 at 7:47 PM Charles Allen
>>> > >>> > > > <ch...@snap.com.invalid> wrote:
>>> > >>> > > >
>>> > >>> > > > > Are there other projects who have transitioned an
>>> independently
>>> > >>> > > > successful
>>> > >>> > > > > domain name to an apache one?
>>> > >>> > > > >
>>> > >>> > > > > On Tue, Mar 5, 2019 at 2:13 PM David Lim <
>>> davidlim@apache.org>
>>> > >>> > wrote:
>>> > >>> > > > >
>>> > >>> > > > > > Who has control over the druid.io domain? Charles would
>>> that
>>> > >>> be
>>> > >>> > you?
>>> > >>> > > > > >
>>> > >>> > > > > > We'd need support from them for the DNS redirect.
>>> > >>> > > > > >
>>> > >>> > > > > > On Tue, Mar 5, 2019 at 2:04 PM Jonathan Wei <
>>> > jonwei@apache.org
>>> > >>> >
>>> > >>> > > wrote:
>>> > >>> > > > > >
>>> > >>> > > > > > > We still need to complete the website migration to
>>> Apache
>>> > >>> > > > > infrastructure.
>>> > >>> > > > > > >
>>> > >>> > > > > > > I'll propose the following plan:
>>> > >>> > > > > > >
>>> > >>> > > > > > > Proposed Apache Druid website migration plan
>>> > >>> > > > > > > ========================================
>>> > >>> > > > > > >
>>> > >>> > > > > > > These links have some previous discussion on the
>>> website
>>> > >>> > migration:
>>> > >>> > > > > > >
>>> > >>> > > > > > >
>>> > >>> > > > > > >
>>> > >>> > > > > >
>>> > >>> > > > >
>>> > >>> > > >
>>> > >>> > >
>>> > >>> >
>>> > >>>
>>> >
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_7cae100b684e0b33e0adda993efea3d6088978700988a0ae632fdd80-40-253Cdev.druid.apache.org-253E&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=G1dTS7FlYGauxNOaQECZix2YwroWVCqJB-cT0nEeNwM&e=
>>> > >>> > > > > > >
>>> > >>> > > > > >
>>> > >>> > > > >
>>> > >>> > > >
>>> > >>> > >
>>> > >>> >
>>> > >>>
>>> >
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_INFRA-2D17340&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=pwg0jE385gqei6EEEbxugKHWll7oyKoCloFc8ByhlUc&e=
>>> > >>> > > > > > >
>>> > >>> > > > > > > From the discussions above, the recommendation is to
>>> have 2
>>> > >>> > > separate
>>> > >>> > > > > > repos
>>> > >>> > > > > > > for the website: one for source and another for built
>>> > content
>>> > >>> > that
>>> > >>> > > > will
>>> > >>> > > > > > be
>>> > >>> > > > > > > served.
>>> > >>> > > > > > >
>>> > >>> > > > > > > Generating site files
>>> > >>> > > > > > > =======================
>>> > >>> > > > > > >
>>> > >>> > > > > > > The Apache site update process will be similar to our
>>> > current
>>> > >>> > > > process.
>>> > >>> > > > > > >
>>> > >>> > > > > > > Current process:
>>> > >>> > > > > > > 1. Push changes to
>>> > >>> > > > > > https://github.com/druid-io/druid-io.github.io/tree/src
>>> > >>> > > > > > > 2. metamx bot picks up changes, builds, and commits to
>>> > >>> > > > > > >
>>> https://github.com/druid-io/druid-io.github.io/tree/master
>>> > >>> > > > > > > 3.
>>> > >>> https://github.com/druid-io/druid-io.github.io/tree/master is
>>> > >>> > > > > served
>>> > >>> > > > > > by
>>> > >>> > > > > > > github pages
>>> > >>> > > > > > >
>>> > >>> > > > > > > Apache process:
>>> > >>> > > > > > > 1. Push changes to
>>> > >>> > > > > https://github.com/apache/incubator-druid-website-src
>>> > >>> > > > > > > 2. Jenkins bot from Apache will build the website from
>>> > source
>>> > >>> > repo,
>>> > >>> > > > > > commit
>>> > >>> > > > > > > to https://github.com/apache/incubator-druid-website
>>> > >>> > > > > > > 3. Apache Druid website will be served from the
>>> content in
>>> > >>> > > > > > > https://github.com/apache/incubator-druid-website
>>> > (asf-site
>>> > >>> > > branch)
>>> > >>> > > > > > >
>>> > >>> > > > > > >
>>> > >>> > > > > > > Hosting and SEO
>>> > >>> > > > > > > ================
>>> > >>> > > > > > >
>>> > >>> > > > > > > The Apache site will be hosted at druid.apache.org on
>>> > Apache
>>> > >>> > > > > > > infrastructure:
>>> > >>> > > > > >
>>> > >>> > > > >
>>> > >>> > > >
>>> > >>> > >
>>> > >>> >
>>> > >>>
>>> >
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.apache.org_dev_project-2Dsite.html&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=_rHEo_asMXKypaunuBTXFkB6Ni3F6KqbEfkck18L7Ag&e=
>>> > >>> > > > > > >
>>> > >>> > > > > > > To preserve our search rankings, we can setup 301
>>> redirects
>>> > >>> from
>>> > >>> > > the
>>> > >>> > > > > old
>>> > >>> > > > > > > druid.io site to the corresponding pages on the
>>> > >>> druid.apache.org
>>> > >>> > > > > site. (
>>> > >>> > > > > > >
>>> > >>> > > > > >
>>> > >>> > > > >
>>> > >>> > > >
>>> > >>> > >
>>> > >>> >
>>> > >>>
>>> >
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__moz.com_learn_seo_redirection&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=lUeWU0dT9thy8gp11RO-Vry7zkYl_W4BXz01fyXJO0A&e=
>>> > >>> > > > > > )
>>> > >>> > > > > > >
>>> > >>> > > > > > > However, Github pages (which currently hosts the
>>> druid.io
>>> > >>> site)
>>> > >>> > > does
>>> > >>> > > > > not
>>> > >>> > > > > > > support 301 redirects, so we propose the following:
>>> > >>> > > > > > > - Setup a new Nginx server that will perform 301
>>> redirects
>>> > to
>>> > >>> > > > > > > druid.apache.org for the druid.io. Imply can host
>>> this if
>>> > >>> > needed.
>>> > >>> > > > > > > - Update the druid.io DNS entry to point to this new
>>> Nginx
>>> > >>> > server
>>> > >>> > > > > > > - Shut down Github pages hosting for druid.io
>>> > >>> > > > > > >
>>> > >>> > > > > > > In addition, we can also set canonical tags on our
>>> pages:
>>> > >>> > > > > > >
>>> > >>> > > > > >
>>> > >>> > > > >
>>> > >>> > > >
>>> > >>> > >
>>> > >>> >
>>> > >>>
>>> >
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__moz.com_learn_seo_canonicalization&d=DwIBaQ&c=ncDTmphkJTvjIDPh0hpF_w&r=HrLGT1qWNhseJBMYABL0GFSZESht5gBoLejor3SqMSo&m=uPTu9gAHxe2KnNDGURBYp1G94UBX5LCRMknoapXwTwI&s=T8G2c6d4EbQ_YDLFQXVebcj0UN9FNrbpPY5Xq4LAR8w&e=
>>> > >>> > > > > > >
>>> > >>> > > > > > >
>>> > >>> > > > > > > Action items
>>> > >>> > > > > > > ===============
>>> > >>> > > > > > > - Setup a Jenkins bot that builds the Apache website
>>> > content
>>> > >>> from
>>> > >>> > > > > source
>>> > >>> > > > > > > - Get the Apache website up
>>> > >>> > > > > > > - Setup Nginx redirect server for druid.io
>>> > >>> > > > > > > - Shutdown github pages and redirect DNS for druid.io
>>> to
>>> > >>> Nginx
>>> > >>> > > > > redirect
>>> > >>> > > > > > > server
>>> > >>> > > > > > > - Add canonical tags to pages
>>> > >>> > > > > > >
>>> > >>> > > > > >
>>> > >>> > > > >
>>> > >>> > > >
>>> > >>> > >
>>> > >>> >
>>> > >>>
>>> > >>
>>> >
>>>
>>