You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Ahmet Altay <al...@google.com> on 2020/05/01 03:09:20 UTC

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

Nam,

 - Website looks good and looks the same as the current website. (Visually
comparing a few pages, not a deep analysis.)
- contribute.md looks good. (this is new content.)
- website/Dockerfile and website/README.md changes look good.
- I do not know what is the new version of some files, for example:
website/src/_data/authors.yml,  website/src/_data/capability-matrix.yml --
what replaces them?

There are 887 file changes. It is not easy to review this. I wanted to go
commit by commit, but that did not help much. How about we try to organize
this review as reviewable commits.
- Changes to the mechanics (jekyll to hugo), themes, build files, website
related readmes etc. This will likely be a smaller change in number of
files. (This will likely have many completed new, and completely deleted
files. Only a few files have meaningful diffs.)
- Changes to the content. This might be a large number of files with
minimal changes. I do not think we can manually review each file, but at
least a quick review of minimal changes to each file would be good enough.

What do you think?

Ahmet

On Thu, Apr 30, 2020 at 4:29 PM Hannah Jiang <ha...@google.com> wrote:

> Since we want to move forward with the PR, I would like to ask the
>> community to hold off changes to the current Beam website for a week, until
>> we are able to review and merge the PR. Is this acceptable to everyone?
>
> Do we have an exact date when we can push changes to the website? I have
> PRs to update documents so would like to plan ahead.
>
> On Thu, Apr 30, 2020 at 1:17 PM Nam Bui <na...@polidea.com> wrote:
>
>> Hey guys,
>>
>> I tried my best to handle renamed files in Git. I have no clue why GitHub
>> doesn't show it, but finally, I made this commit [1] (thanks for your
>> idea @bhulette) so you guys can review changes with ease (there is no bunch
>> of deleted markdown files anymore :D). Also, new staged version is
>> deployed, you could check it out [2].
>>
>> In case you are interested in translation, here is the proof of concept
>> [3] (the earth icon on the right corner is temporarily used for switching
>> languages). You can take a look at the translation guide for this PoC [4].
>>
>> [1]
>> https://github.com/apache/beam/pull/11554/commits/b267bb360866a723ac2536f408f23de648c7cd4d
>> [2]
>> http://apache-beam-website-pull-requests.storage.googleapis.com/11554/index.html
>> [3] https://safe-relation.surge.sh/
>> [4]
>> https://github.com/PolideaInternal/beam/blob/website-develop/website/CONTRIBUTE.md#translation-guide
>>
>>
>> On Thu, Apr 30, 2020 at 7:24 PM Brian Hulette <bh...@google.com>
>> wrote:
>>
>>> Changing the URLs is fine with me as long as the old urls will work too.
>>>
>>> But do we need to change the filenames for the blog posts to accomplish
>>> that? It's nice that the blog post markdown files start with a date so they
>>> naturally sort chronologically. It looks like this hugo PR [1] made it
>>> possible to extract date metadata and slug
>>> (i.e. dataflow-python-sdk-is-now-public) separately from the filename.
>>>
>>> [1] https://github.com/gohugoio/hugo/pull/4494
>>>
>>> On Thu, Apr 30, 2020 at 10:06 AM Ahmet Altay <al...@google.com> wrote:
>>>
>>>>
>>>>
>>>> On Thu, Apr 30, 2020 at 9:55 AM Thomas Weise <th...@apache.org> wrote:
>>>>
>>>>> For changed URLs, will previous URLs be mapped to avoid broken
>>>>> external links?
>>>>>
>>>>
>>>> I believe the answer is yes from Nam's response "For now, we keep the
>>>> old URLs working in terms of redirecting them". I very much agree that this
>>>> is very important and should work for all existing urls.
>>>>
>>>>
>>>>>
>>>>>
>>>>> On Thu, Apr 30, 2020 at 9:34 AM Aizhamal Nurmamat kyzy <
>>>>> aizhamal@apache.org> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> To give a little more context regarding the URLs, the date should
>>>>>> still appear on the blog post, but not on the URL.
>>>>>> For example, we'd have:
>>>>>>
>>>>>> https://beam.apache.org/beam/python/sdk/2016/02/25/python-sdk-now-public.html
>>>>>> become
>>>>>> https://beam.apache.org/blog/dataflow-python-sdk-is-now-public/.
>>>>>>
>>>>>
>>>> I am not a content marketer. IMO, this is a good change. In the past, a
>>>> few times, we edited dates on posts (e.g. a release date was entered
>>>> incorrectly) and we had to either have a mismatch between dates in the url
>>>> and the date in the blog, or change the url. This change simplifies, by
>>>> having date only in place (in content metadata).
>>>>
>>>>
>>>>>
>>>>>> The blog posts would have a small header showing the title, author
>>>>>> and publish date. But the URL would not have it.
>>>>>> Thoughts?
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 30, 2020 at 9:23 AM Nam Bui <na...@polidea.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> @altay: Hey hey. Yeah, I didn't expect the baseUrl of staging
>>>>>>> version is "
>>>>>>> http://apache-beam-website-pull-requests.storage.googleapis.com/11554/"
>>>>>>> which also includes "/11554", and Hugo considers it as a path so it breaks
>>>>>>> the path of "static files" (like images). We made a fix. Now I'm working on
>>>>>>> "getting git to recognize files as renames" as you suggested.
>>>>>>>
>>>>>>> @robert: The dates are nice but it causes verbose/long/ugly URLs. We
>>>>>>> discussed with Aizhamal in the development stage and agreed to get rid of
>>>>>>> this. For now, we keep the old URLs working in terms of redirecting them.
>>>>>>> However, from now on, we should change the name convention on blog posts to
>>>>>>> have a fancy URL like "beam.apache.org/blog/myblogpost.md". :)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 30, 2020 at 2:57 AM Robert Bradshaw <ro...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Wed, Apr 29, 2020 at 5:08 PM Ahmet Altay <al...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Nam, this looks better. At least links are working, and the
>>>>>>>>> website visually looks similar and generally in good shape. I think there
>>>>>>>>> are still issues. For example, I do not see any of the images (e.g. the
>>>>>>>>> beam logo on top left is missing.)
>>>>>>>>>
>>>>>>>>> On Wed, Apr 29, 2020 at 3:11 PM Brian Hulette <bh...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I left a comment on the PR [1]. I think the reason all of the
>>>>>>>>>> website content is not being tracked as file renames is because there was a
>>>>>>>>>> series of commits that created files in the new directory, and then one
>>>>>>>>>> commit that deleted the old directory. If there were a single commit with
>>>>>>>>>> all of the deleted and new files, git would surely recognize they are
>>>>>>>>>> effectively renameds and mark them as such. Maybe we just need to get all
>>>>>>>>>> these commits squashed into one?
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://github.com/apache/beam/pull/11554#issuecomment-621489844
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Nam, could you try this? If we can get git to recognize these as
>>>>>>>>> renames, review process would be much easier.
>>>>>>>>>
>>>>>>>>
>>>>>>>> +1.
>>>>>>>>
>>>>>>>> Alternatively, create a commit that just moves the files into a new
>>>>>>>> location (which git can always detect), then sit the edits on top of that
>>>>>>>> (which should preserve history better).
>>>>>>>>
>>>>>>>> Also, is there a reason the dates were removed from the blog post
>>>>>>>> filenames? For content like that, the dates are nice.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Apr 29, 2020 at 10:39 AM Nam Bui <na...@polidea.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi guys,
>>>>>>>>>>>
>>>>>>>>>>> I'm Nam - from the responsible team of Apache Beam website
>>>>>>>>>>> migration. I am pleased to answer some of the questions here.
>>>>>>>>>>>
>>>>>>>>>>> @aizhamal: Thanks for informing to the community. :)
>>>>>>>>>>> @altay, @robertwb: Yes. there is a problem with the staged
>>>>>>>>>>> version at the moment. We didn't expect some behaviours on the build
>>>>>>>>>>> process. So, we fixed it today and been waiting for @pablo to re-run it
>>>>>>>>>>> again. The purpose of this PR is to migrate completely Beam site from
>>>>>>>>>>> Jekyll to Hugo. Therefore, a bunch of deleted markdown files are from
>>>>>>>>>>> Jekyll which was located at `beam/website/src`, and Hugo is located at
>>>>>>>>>>> `beam/website/www` now. In `beam/website/README.md`, I wrote down about
>>>>>>>>>>> running the Hugo website locally, although it is actually same as Jekyll
>>>>>>>>>>> (because it's also set up with Docker & Gradle). In
>>>>>>>>>>> `beam/website/CONTRIBUTE.md`, I guided people on how to get started with
>>>>>>>>>>> Hugo on the Beam website. There is also a link in the "Translation Guide"
>>>>>>>>>>> section which points to a branch of multilingual provenance, and it will
>>>>>>>>>>> become a next PR soon.
>>>>>>>>>>>
>>>>>>>>>>> Please let me know if you need more details. Feel free to ask
>>>>>>>>>>> any questions and I will get back to you with answers. I'm so sorry if I
>>>>>>>>>>> answer a little bit due to the timezone. :)
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Nam
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Apr 28, 2020 at 8:49 PM Aizhamal Nurmamat kyzy <
>>>>>>>>>>> aizhamal@apache.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Adding +Nam Bui <na...@polidea.com> and +Karolina Rosół
>>>>>>>>>>>> <ka...@polidea.com> to follow up on questions.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Apr 28, 2020 at 11:34 AM Ahmet Altay <al...@google.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I am having trouble reviewing the staged version. What is the
>>>>>>>>>>>>> best way to review this change?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do we expect any changes to markdown files, beyond some
>>>>>>>>>>>>> metadata?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 10:45 AM Robert Bradshaw <
>>>>>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks. It'll be great to better support more languages.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I looked at the PR and there seems to be no
>>>>>>>>>>>>>> provenance/history. E.g. all the content seems to be entirely new files
>>>>>>>>>>>>>> rather than diffs from the old. (There also seems to be a huge amount of
>>>>>>>>>>>>>> auto-generated js code as well.)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I agree. This makes it very hard to review. I also see a bunch
>>>>>>>>>>>>> of deleted markdown files. Are they not getting migrated?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 10:23 AM Aizhamal Nurmamat kyzy <
>>>>>>>>>>>>>> aizhamal@apache.org> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hello everybody,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We are almost done migrating the Apache Beam website from
>>>>>>>>>>>>>>> Jekyll to Hugo. You can see the PR in [1], and we'd love to hear your
>>>>>>>>>>>>>>> feedback/comments on the PR. It includes  detailed guidelines on
>>>>>>>>>>>>>>> contributing to the new Hugo-based website and adding translations to pages
>>>>>>>>>>>>>>> [2]. For those who are curious about adding new languages, we will provide
>>>>>>>>>>>>>>> a proof of concept in the next couple of days in this thread.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Since we want to move forward with the PR, I would like to
>>>>>>>>>>>>>>> ask the community to hold off changes to the current Beam website for a
>>>>>>>>>>>>>>> week, until we are able to review and merge the PR. Is this acceptable to
>>>>>>>>>>>>>>> everyone?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In case anyone missed my previous email with the background
>>>>>>>>>>>>>>> for the website migration, you can find more context here [3].
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Aizhamal
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1] https://github.com/apache/beam/pull/11554
>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/256b7042bf504b94f161ca03b388a2ba247918d9/website/CONTRIBUTE.md
>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>> https://lists.apache.org/thread.html/r7fa6d710c0a1959cce5108e460d71c306ce5756cf96af818b41cb7ca%40%3Cdev.beam.apache.org%3E
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

Posted by Nam Bui <na...@polidea.com>.
Hey Kenn. Thanks so much for your useful information and research. It's
great to know.

On Mon, May 4, 2020 at 6:33 PM Kenneth Knowles <ke...@apache.org> wrote:

> Regarding the detection of renames, I now recall that I have encountered
> this before: it is controlled by the config diff.renameLimit. The default
> value these days is high enough to work for this PR. I've confirmed this:
>
>     git diff --shortstat $(git merge-base github/pr/11554 github/master)
> github/pr/11554
>      631 files changed, 10360 insertions(+), 9938 deletions(-)
>
> (in case the commits change, that is: git diff --shortstat
> 763b7ccd17a420eb634d6799adcd3ecfcf33d6a7
> 0162c9db3e7faf0d0e243c580ffa5ca5f497db98)
>
> But the GitHub UI does not match this. I believe the reason it works for
> the individual commit and fails for the overall PR is that it is calculated
> as part of displaying the end-to-end diff. Since it is n^2 perhaps GitHub
> sets it lower. Git doesn't store anything about any of this, but always
> computes it on the fly (by design, so that improvements apply to old git
> repos automatically).
>
> Other relevant flags are `git diff --find-copies` which finds copied files
> if the original was modified in the commit and `git diff
> --find-copies-harder` which finds copied files from anywhere in the repo.
> These could support a copy --> modify new --> delete old workflow, but I
> doubt any such workflow would preserve `git blame` in the GitHub UI. (you
> can still use these flags with git blame offline to get better blame
> accuracy)
>
> Kenn
>
> On Mon, May 4, 2020 at 1:21 AM Nam Bui <na...@polidea.com> wrote:
>
>> Hey guys,
>>
>> How was your weekend? Thanks for some of the compliments and also
>> recommendations.
>>
>> About the commits, as Brian said, we worked together on the-asf slack. It
>> was the tough one, we even did a few experiments. And finally came up with
>> a solution that preserved all commits and used `git mv`.
>> IMHO, I know it's really difficult to review all of them at first, even
>> though we made a commit [1] which helps you to compare changes since there
>> are tons of files. Therefore, I recommend to check out my work, take a look
>> at Hugo structure and you will link it to Jekyll one quickly. There are no
>> chances about file or directory names, just organize the structure. I write
>> a short details here, hope it would be helpful in terms of reviewing.
>>
>> 1. Syntax
>> - I strongly prefer this one [2]. He wrote about Hugo syntax which is
>> corresponding to Jekyll syntax. It would make sense to your overview,
>> instead of skimming one by one markdown file.
>>
>> 2. Project structure
>> - The main part of Hugo is in "website/www/site". You will briefly
>> confused a little bit here with many directories, so please read this one
>> [3] first, then you'll get into it very quickly. The most important thing
>> here is the flow. In Jekyll, you write a markdown file and then pick the
>> layout with "layout: home" in frontmatter as an example. In Hugo, we have
>> separated "content" and "layouts" directory, the "layouts" will mimic the
>> structure of the "content", and at the end, Hugo will know how to connect
>> each of them behind the scene.
>> - In Jekyll, the components are in "website/src/_include" and it will be
>> moved to "website/src/layouts/partials" in Hugo.
>>
>> 3. Shortcodes.
>> - Just thinking "shortcodes" as utility functions and we will reuse it
>> many times in markdown files. One of the unique features from Hugo, and
>> it's located at "website/www/layouts/shortcodes".
>>
>> A quick Q&A:
>> @Altay: there are some deleted files if you see them in [1]. Some of them
>> have the different behaviour in Hugo. For instance,
>> "_data/capabilitymatrix.md" will be used directly in
>> frontmatter "website/www/site/content/en/blog/capability-matrix.md", the
>> reason is, it will take more works in Hugo to retrieve data from files and
>> pass them into "shortcodes" in markdown files (other data files are not
>> deleted because they are used in "layouts" HTML files).
>> @Robert: thanks for your review and comments on GitHub. I will walk
>> through all of them today.
>>
>> Best regards,
>> Nam
>>
>>
>> [1]
>> https://github.com/apache/beam/pull/11554/commits/b267bb360866a723ac2536f408f23de648c7cd4d
>> [2] https://simpleit.rocks/golang/hugo/migrating-a-jekyll-blog-to-hugo/
>> [3] https://gohugo.io/getting-started/directory-structure/
>>
>> On Fri, May 1, 2020 at 6:24 PM Brian Hulette <bh...@google.com> wrote:
>>
>>> Regarding move detection: I worked with Nam on this some on the-asf
>>> slack. We couldn't make squashing into a single large commit work - when I
>>> did it, `git log` still showed many dropped and added files. Breaking out a
>>> single commit with the file moves was the best we could manage. I tested a
>>> PR that used this approach on a single file and the github UI did pick up
>>> on it [1]. Sadly it seems to give up on the larger PR.
>>>
>>> I figured this was good enough though, it's difficult to review all of
>>> the changes at once, but you can at least review the individual commits
>>> without being obfuscated by the moves.
>>>
>>> [1] https://github.com/apache/beam/pull/11579
>>>
>>>
>>> On Fri, May 1, 2020 at 9:11 AM Robert Bradshaw <ro...@google.com>
>>> wrote:
>>>
>>>> I just took a look, and added a couple of comments, but it mostly looks
>>>> good. Thanks for creating a commit that preserves changes; that's a big
>>>> improvement.
>>>>
>>>> +1 to Ahmet's suggestion about braking the huge commit up a bit more. I
>>>> would suggest one that adds the mechanics (etc.), one that applies a script
>>>> to auto-convert the content (where we can review the script and that it's
>>>> application give the resulting diff), and a final one that takes care of
>>>> the things that the script wasn't able to handle (or messed up, rather than
>>>> spending a huge amount of time getting the script perfect).
>>>>
>>>> On Fri, May 1, 2020 at 6:44 AM Kenneth Knowles <ke...@apache.org> wrote:
>>>>
>>>>> I believe taking Brian and Robert's advice to help git detect moves
>>>>> (even more than you already have) will make this much more manageable. I
>>>>> just tried it out and squashing commits brings it to "631 files changed,
>>>>> 10363 insertions(+), 9945 deletions(-)" according to git, so that is more
>>>>> manageable than +47k - 47k. I'm not saying that a total squash is best.
>>>>> There may be a better way to factor the changes.
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Thu, Apr 30, 2020 at 8:09 PM Ahmet Altay <al...@google.com> wrote:
>>>>>
>>>>>> Nam,
>>>>>>
>>>>>>  - Website looks good and looks the same as the current website.
>>>>>> (Visually comparing a few pages, not a deep analysis.)
>>>>>> - contribute.md looks good. (this is new content.)
>>>>>> - website/Dockerfile and website/README.md changes look good.
>>>>>> - I do not know what is the new version of some files, for example:
>>>>>> website/src/_data/authors.yml,  website/src/_data/capability-matrix.yml --
>>>>>> what replaces them?
>>>>>>
>>>>>> There are 887 file changes. It is not easy to review this. I wanted
>>>>>> to go commit by commit, but that did not help much. How about we try to
>>>>>> organize this review as reviewable commits.
>>>>>> - Changes to the mechanics (jekyll to hugo), themes, build files,
>>>>>> website related readmes etc. This will likely be a smaller change in number
>>>>>> of files. (This will likely have many completed new, and completely deleted
>>>>>> files. Only a few files have meaningful diffs.)
>>>>>> - Changes to the content. This might be a large number of files with
>>>>>> minimal changes. I do not think we can manually review each file, but at
>>>>>> least a quick review of minimal changes to each file would be good enough.
>>>>>>
>>>>>> What do you think?
>>>>>>
>>>>>> Ahmet
>>>>>>
>>>>>> On Thu, Apr 30, 2020 at 4:29 PM Hannah Jiang <ha...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Since we want to move forward with the PR, I would like to ask the
>>>>>>>> community to hold off changes to the current Beam website for a week, until
>>>>>>>> we are able to review and merge the PR. Is this acceptable to everyone?
>>>>>>>
>>>>>>> Do we have an exact date when we can push changes to the website? I
>>>>>>> have PRs to update documents so would like to plan ahead.
>>>>>>>
>>>>>>> On Thu, Apr 30, 2020 at 1:17 PM Nam Bui <na...@polidea.com> wrote:
>>>>>>>
>>>>>>>> Hey guys,
>>>>>>>>
>>>>>>>> I tried my best to handle renamed files in Git. I have no clue why
>>>>>>>> GitHub doesn't show it, but finally, I made this commit [1] (thanks for
>>>>>>>> your idea @bhulette) so you guys can review changes with ease (there is no
>>>>>>>> bunch of deleted markdown files anymore :D). Also, new staged version is
>>>>>>>> deployed, you could check it out [2].
>>>>>>>>
>>>>>>>> In case you are interested in translation, here is the proof of
>>>>>>>> concept [3] (the earth icon on the right corner is temporarily used for
>>>>>>>> switching languages). You can take a look at the translation guide for this
>>>>>>>> PoC [4].
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://github.com/apache/beam/pull/11554/commits/b267bb360866a723ac2536f408f23de648c7cd4d
>>>>>>>> [2]
>>>>>>>> http://apache-beam-website-pull-requests.storage.googleapis.com/11554/index.html
>>>>>>>> [3] https://safe-relation.surge.sh/
>>>>>>>> [4]
>>>>>>>> https://github.com/PolideaInternal/beam/blob/website-develop/website/CONTRIBUTE.md#translation-guide
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Apr 30, 2020 at 7:24 PM Brian Hulette <bh...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Changing the URLs is fine with me as long as the old urls will
>>>>>>>>> work too.
>>>>>>>>>
>>>>>>>>> But do we need to change the filenames for the blog posts to
>>>>>>>>> accomplish that? It's nice that the blog post markdown files start with a
>>>>>>>>> date so they naturally sort chronologically. It looks like this hugo PR [1]
>>>>>>>>> made it possible to extract date metadata and slug
>>>>>>>>> (i.e. dataflow-python-sdk-is-now-public) separately from the filename.
>>>>>>>>>
>>>>>>>>> [1] https://github.com/gohugoio/hugo/pull/4494
>>>>>>>>>
>>>>>>>>> On Thu, Apr 30, 2020 at 10:06 AM Ahmet Altay <al...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Apr 30, 2020 at 9:55 AM Thomas Weise <th...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> For changed URLs, will previous URLs be mapped to avoid broken
>>>>>>>>>>> external links?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I believe the answer is yes from Nam's response "For now, we keep
>>>>>>>>>> the old URLs working in terms of redirecting them". I very much agree that
>>>>>>>>>> this is very important and should work for all existing urls.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Apr 30, 2020 at 9:34 AM Aizhamal Nurmamat kyzy <
>>>>>>>>>>> aizhamal@apache.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> To give a little more context regarding the URLs, the date
>>>>>>>>>>>> should still appear on the blog post, but not on the URL.
>>>>>>>>>>>> For example, we'd have:
>>>>>>>>>>>>
>>>>>>>>>>>> https://beam.apache.org/beam/python/sdk/2016/02/25/python-sdk-now-public.html
>>>>>>>>>>>> become
>>>>>>>>>>>> https://beam.apache.org/blog/dataflow-python-sdk-is-now-public/
>>>>>>>>>>>> .
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> I am not a content marketer. IMO, this is a good change. In the
>>>>>>>>>> past, a few times, we edited dates on posts (e.g. a release date was
>>>>>>>>>> entered incorrectly) and we had to either have a mismatch between dates in
>>>>>>>>>> the url and the date in the blog, or change the url. This change
>>>>>>>>>> simplifies, by having date only in place (in content metadata).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> The blog posts would have a small header showing the title,
>>>>>>>>>>>> author and publish date. But the URL would not have it.
>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Apr 30, 2020 at 9:23 AM Nam Bui <na...@polidea.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> @altay: Hey hey. Yeah, I didn't expect the baseUrl of staging
>>>>>>>>>>>>> version is "
>>>>>>>>>>>>> http://apache-beam-website-pull-requests.storage.googleapis.com/11554/"
>>>>>>>>>>>>> which also includes "/11554", and Hugo considers it as a path so it breaks
>>>>>>>>>>>>> the path of "static files" (like images). We made a fix. Now I'm working on
>>>>>>>>>>>>> "getting git to recognize files as renames" as you suggested.
>>>>>>>>>>>>>
>>>>>>>>>>>>> @robert: The dates are nice but it causes verbose/long/ugly
>>>>>>>>>>>>> URLs. We discussed with Aizhamal in the development stage and agreed to get
>>>>>>>>>>>>> rid of this. For now, we keep the old URLs working in terms of redirecting
>>>>>>>>>>>>> them. However, from now on, we should change the name convention on blog
>>>>>>>>>>>>> posts to have a fancy URL like "
>>>>>>>>>>>>> beam.apache.org/blog/myblogpost.md". :)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Apr 30, 2020 at 2:57 AM Robert Bradshaw <
>>>>>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Apr 29, 2020 at 5:08 PM Ahmet Altay <al...@google.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Nam, this looks better. At least links are working, and the
>>>>>>>>>>>>>>> website visually looks similar and generally in good shape. I think there
>>>>>>>>>>>>>>> are still issues. For example, I do not see any of the images (e.g. the
>>>>>>>>>>>>>>> beam logo on top left is missing.)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Apr 29, 2020 at 3:11 PM Brian Hulette <
>>>>>>>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I left a comment on the PR [1]. I think the reason all of
>>>>>>>>>>>>>>>> the website content is not being tracked as file renames is because there
>>>>>>>>>>>>>>>> was a series of commits that created files in the new directory, and then
>>>>>>>>>>>>>>>> one commit that deleted the old directory. If there were a single commit
>>>>>>>>>>>>>>>> with all of the deleted and new files, git would surely recognize they are
>>>>>>>>>>>>>>>> effectively renameds and mark them as such. Maybe we just need to get all
>>>>>>>>>>>>>>>> these commits squashed into one?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>> https://github.com/apache/beam/pull/11554#issuecomment-621489844
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Nam, could you try this? If we can get git to recognize
>>>>>>>>>>>>>>> these as renames, review process would be much easier.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +1.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Alternatively, create a commit that just moves the files into
>>>>>>>>>>>>>> a new location (which git can always detect), then sit the edits on top of
>>>>>>>>>>>>>> that (which should preserve history better).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also, is there a reason the dates were removed from the blog
>>>>>>>>>>>>>> post filenames? For content like that, the dates are nice.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Apr 29, 2020 at 10:39 AM Nam Bui <
>>>>>>>>>>>>>>>> nam.bui@polidea.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm Nam - from the responsible team of Apache Beam website
>>>>>>>>>>>>>>>>> migration. I am pleased to answer some of the questions here.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> @aizhamal: Thanks for informing to the community. :)
>>>>>>>>>>>>>>>>> @altay, @robertwb: Yes. there is a problem with the staged
>>>>>>>>>>>>>>>>> version at the moment. We didn't expect some behaviours on the build
>>>>>>>>>>>>>>>>> process. So, we fixed it today and been waiting for @pablo to re-run it
>>>>>>>>>>>>>>>>> again. The purpose of this PR is to migrate completely Beam site from
>>>>>>>>>>>>>>>>> Jekyll to Hugo. Therefore, a bunch of deleted markdown files are from
>>>>>>>>>>>>>>>>> Jekyll which was located at `beam/website/src`, and Hugo is located at
>>>>>>>>>>>>>>>>> `beam/website/www` now. In `beam/website/README.md`, I wrote down about
>>>>>>>>>>>>>>>>> running the Hugo website locally, although it is actually same as Jekyll
>>>>>>>>>>>>>>>>> (because it's also set up with Docker & Gradle). In
>>>>>>>>>>>>>>>>> `beam/website/CONTRIBUTE.md`, I guided people on how to get started with
>>>>>>>>>>>>>>>>> Hugo on the Beam website. There is also a link in the "Translation Guide"
>>>>>>>>>>>>>>>>> section which points to a branch of multilingual provenance, and it will
>>>>>>>>>>>>>>>>> become a next PR soon.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Please let me know if you need more details. Feel free to
>>>>>>>>>>>>>>>>> ask any questions and I will get back to you with answers. I'm so sorry if
>>>>>>>>>>>>>>>>> I answer a little bit due to the timezone. :)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>> Nam
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 8:49 PM Aizhamal Nurmamat kyzy <
>>>>>>>>>>>>>>>>> aizhamal@apache.org> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Adding +Nam Bui <na...@polidea.com> and +Karolina Rosół
>>>>>>>>>>>>>>>>>> <ka...@polidea.com> to follow up on questions.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 11:34 AM Ahmet Altay <
>>>>>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I am having trouble reviewing the staged version. What
>>>>>>>>>>>>>>>>>>> is the best way to review this change?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Do we expect any changes to markdown files, beyond some
>>>>>>>>>>>>>>>>>>> metadata?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 10:45 AM Robert Bradshaw <
>>>>>>>>>>>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks. It'll be great to better support more
>>>>>>>>>>>>>>>>>>>> languages.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I looked at the PR and there seems to be no
>>>>>>>>>>>>>>>>>>>> provenance/history. E.g. all the content seems to be entirely new files
>>>>>>>>>>>>>>>>>>>> rather than diffs from the old. (There also seems to be a huge amount of
>>>>>>>>>>>>>>>>>>>> auto-generated js code as well.)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I agree. This makes it very hard to review. I also see a
>>>>>>>>>>>>>>>>>>> bunch of deleted markdown files. Are they not getting migrated?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 10:23 AM Aizhamal Nurmamat kyzy
>>>>>>>>>>>>>>>>>>>> <ai...@apache.org> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hello everybody,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> We are almost done migrating the Apache Beam website
>>>>>>>>>>>>>>>>>>>>> from Jekyll to Hugo. You can see the PR in [1], and we'd love to hear your
>>>>>>>>>>>>>>>>>>>>> feedback/comments on the PR. It includes  detailed guidelines on
>>>>>>>>>>>>>>>>>>>>> contributing to the new Hugo-based website and adding translations to pages
>>>>>>>>>>>>>>>>>>>>> [2]. For those who are curious about adding new languages, we will provide
>>>>>>>>>>>>>>>>>>>>> a proof of concept in the next couple of days in this thread.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Since we want to move forward with the PR, I would
>>>>>>>>>>>>>>>>>>>>> like to ask the community to hold off changes to the current Beam website
>>>>>>>>>>>>>>>>>>>>> for a week, until we are able to review and merge the PR. Is this
>>>>>>>>>>>>>>>>>>>>> acceptable to everyone?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> In case anyone missed my previous email with the
>>>>>>>>>>>>>>>>>>>>> background for the website migration, you can find more context here [3].
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> Aizhamal
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> [1] https://github.com/apache/beam/pull/11554
>>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/256b7042bf504b94f161ca03b388a2ba247918d9/website/CONTRIBUTE.md
>>>>>>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>>>>>>>> https://lists.apache.org/thread.html/r7fa6d710c0a1959cce5108e460d71c306ce5756cf96af818b41cb7ca%40%3Cdev.beam.apache.org%3E
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

Posted by Kenneth Knowles <ke...@apache.org>.
Regarding the detection of renames, I now recall that I have encountered
this before: it is controlled by the config diff.renameLimit. The default
value these days is high enough to work for this PR. I've confirmed this:

    git diff --shortstat $(git merge-base github/pr/11554 github/master)
github/pr/11554
     631 files changed, 10360 insertions(+), 9938 deletions(-)

(in case the commits change, that is: git diff --shortstat
763b7ccd17a420eb634d6799adcd3ecfcf33d6a7
0162c9db3e7faf0d0e243c580ffa5ca5f497db98)

But the GitHub UI does not match this. I believe the reason it works for
the individual commit and fails for the overall PR is that it is calculated
as part of displaying the end-to-end diff. Since it is n^2 perhaps GitHub
sets it lower. Git doesn't store anything about any of this, but always
computes it on the fly (by design, so that improvements apply to old git
repos automatically).

Other relevant flags are `git diff --find-copies` which finds copied files
if the original was modified in the commit and `git diff
--find-copies-harder` which finds copied files from anywhere in the repo.
These could support a copy --> modify new --> delete old workflow, but I
doubt any such workflow would preserve `git blame` in the GitHub UI. (you
can still use these flags with git blame offline to get better blame
accuracy)

Kenn

On Mon, May 4, 2020 at 1:21 AM Nam Bui <na...@polidea.com> wrote:

> Hey guys,
>
> How was your weekend? Thanks for some of the compliments and also
> recommendations.
>
> About the commits, as Brian said, we worked together on the-asf slack. It
> was the tough one, we even did a few experiments. And finally came up with
> a solution that preserved all commits and used `git mv`.
> IMHO, I know it's really difficult to review all of them at first, even
> though we made a commit [1] which helps you to compare changes since there
> are tons of files. Therefore, I recommend to check out my work, take a look
> at Hugo structure and you will link it to Jekyll one quickly. There are no
> chances about file or directory names, just organize the structure. I write
> a short details here, hope it would be helpful in terms of reviewing.
>
> 1. Syntax
> - I strongly prefer this one [2]. He wrote about Hugo syntax which is
> corresponding to Jekyll syntax. It would make sense to your overview,
> instead of skimming one by one markdown file.
>
> 2. Project structure
> - The main part of Hugo is in "website/www/site". You will briefly
> confused a little bit here with many directories, so please read this one
> [3] first, then you'll get into it very quickly. The most important thing
> here is the flow. In Jekyll, you write a markdown file and then pick the
> layout with "layout: home" in frontmatter as an example. In Hugo, we have
> separated "content" and "layouts" directory, the "layouts" will mimic the
> structure of the "content", and at the end, Hugo will know how to connect
> each of them behind the scene.
> - In Jekyll, the components are in "website/src/_include" and it will be
> moved to "website/src/layouts/partials" in Hugo.
>
> 3. Shortcodes.
> - Just thinking "shortcodes" as utility functions and we will reuse it
> many times in markdown files. One of the unique features from Hugo, and
> it's located at "website/www/layouts/shortcodes".
>
> A quick Q&A:
> @Altay: there are some deleted files if you see them in [1]. Some of them
> have the different behaviour in Hugo. For instance,
> "_data/capabilitymatrix.md" will be used directly in
> frontmatter "website/www/site/content/en/blog/capability-matrix.md", the
> reason is, it will take more works in Hugo to retrieve data from files and
> pass them into "shortcodes" in markdown files (other data files are not
> deleted because they are used in "layouts" HTML files).
> @Robert: thanks for your review and comments on GitHub. I will walk
> through all of them today.
>
> Best regards,
> Nam
>
>
> [1]
> https://github.com/apache/beam/pull/11554/commits/b267bb360866a723ac2536f408f23de648c7cd4d
> [2] https://simpleit.rocks/golang/hugo/migrating-a-jekyll-blog-to-hugo/
> [3] https://gohugo.io/getting-started/directory-structure/
>
> On Fri, May 1, 2020 at 6:24 PM Brian Hulette <bh...@google.com> wrote:
>
>> Regarding move detection: I worked with Nam on this some on the-asf
>> slack. We couldn't make squashing into a single large commit work - when I
>> did it, `git log` still showed many dropped and added files. Breaking out a
>> single commit with the file moves was the best we could manage. I tested a
>> PR that used this approach on a single file and the github UI did pick up
>> on it [1]. Sadly it seems to give up on the larger PR.
>>
>> I figured this was good enough though, it's difficult to review all of
>> the changes at once, but you can at least review the individual commits
>> without being obfuscated by the moves.
>>
>> [1] https://github.com/apache/beam/pull/11579
>>
>>
>> On Fri, May 1, 2020 at 9:11 AM Robert Bradshaw <ro...@google.com>
>> wrote:
>>
>>> I just took a look, and added a couple of comments, but it mostly looks
>>> good. Thanks for creating a commit that preserves changes; that's a big
>>> improvement.
>>>
>>> +1 to Ahmet's suggestion about braking the huge commit up a bit more. I
>>> would suggest one that adds the mechanics (etc.), one that applies a script
>>> to auto-convert the content (where we can review the script and that it's
>>> application give the resulting diff), and a final one that takes care of
>>> the things that the script wasn't able to handle (or messed up, rather than
>>> spending a huge amount of time getting the script perfect).
>>>
>>> On Fri, May 1, 2020 at 6:44 AM Kenneth Knowles <ke...@apache.org> wrote:
>>>
>>>> I believe taking Brian and Robert's advice to help git detect moves
>>>> (even more than you already have) will make this much more manageable. I
>>>> just tried it out and squashing commits brings it to "631 files changed,
>>>> 10363 insertions(+), 9945 deletions(-)" according to git, so that is more
>>>> manageable than +47k - 47k. I'm not saying that a total squash is best.
>>>> There may be a better way to factor the changes.
>>>>
>>>> Kenn
>>>>
>>>> On Thu, Apr 30, 2020 at 8:09 PM Ahmet Altay <al...@google.com> wrote:
>>>>
>>>>> Nam,
>>>>>
>>>>>  - Website looks good and looks the same as the current website.
>>>>> (Visually comparing a few pages, not a deep analysis.)
>>>>> - contribute.md looks good. (this is new content.)
>>>>> - website/Dockerfile and website/README.md changes look good.
>>>>> - I do not know what is the new version of some files, for example:
>>>>> website/src/_data/authors.yml,  website/src/_data/capability-matrix.yml --
>>>>> what replaces them?
>>>>>
>>>>> There are 887 file changes. It is not easy to review this. I wanted to
>>>>> go commit by commit, but that did not help much. How about we try to
>>>>> organize this review as reviewable commits.
>>>>> - Changes to the mechanics (jekyll to hugo), themes, build files,
>>>>> website related readmes etc. This will likely be a smaller change in number
>>>>> of files. (This will likely have many completed new, and completely deleted
>>>>> files. Only a few files have meaningful diffs.)
>>>>> - Changes to the content. This might be a large number of files with
>>>>> minimal changes. I do not think we can manually review each file, but at
>>>>> least a quick review of minimal changes to each file would be good enough.
>>>>>
>>>>> What do you think?
>>>>>
>>>>> Ahmet
>>>>>
>>>>> On Thu, Apr 30, 2020 at 4:29 PM Hannah Jiang <ha...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Since we want to move forward with the PR, I would like to ask the
>>>>>>> community to hold off changes to the current Beam website for a week, until
>>>>>>> we are able to review and merge the PR. Is this acceptable to everyone?
>>>>>>
>>>>>> Do we have an exact date when we can push changes to the website? I
>>>>>> have PRs to update documents so would like to plan ahead.
>>>>>>
>>>>>> On Thu, Apr 30, 2020 at 1:17 PM Nam Bui <na...@polidea.com> wrote:
>>>>>>
>>>>>>> Hey guys,
>>>>>>>
>>>>>>> I tried my best to handle renamed files in Git. I have no clue why
>>>>>>> GitHub doesn't show it, but finally, I made this commit [1] (thanks for
>>>>>>> your idea @bhulette) so you guys can review changes with ease (there is no
>>>>>>> bunch of deleted markdown files anymore :D). Also, new staged version is
>>>>>>> deployed, you could check it out [2].
>>>>>>>
>>>>>>> In case you are interested in translation, here is the proof of
>>>>>>> concept [3] (the earth icon on the right corner is temporarily used for
>>>>>>> switching languages). You can take a look at the translation guide for this
>>>>>>> PoC [4].
>>>>>>>
>>>>>>> [1]
>>>>>>> https://github.com/apache/beam/pull/11554/commits/b267bb360866a723ac2536f408f23de648c7cd4d
>>>>>>> [2]
>>>>>>> http://apache-beam-website-pull-requests.storage.googleapis.com/11554/index.html
>>>>>>> [3] https://safe-relation.surge.sh/
>>>>>>> [4]
>>>>>>> https://github.com/PolideaInternal/beam/blob/website-develop/website/CONTRIBUTE.md#translation-guide
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 30, 2020 at 7:24 PM Brian Hulette <bh...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Changing the URLs is fine with me as long as the old urls will work
>>>>>>>> too.
>>>>>>>>
>>>>>>>> But do we need to change the filenames for the blog posts to
>>>>>>>> accomplish that? It's nice that the blog post markdown files start with a
>>>>>>>> date so they naturally sort chronologically. It looks like this hugo PR [1]
>>>>>>>> made it possible to extract date metadata and slug
>>>>>>>> (i.e. dataflow-python-sdk-is-now-public) separately from the filename.
>>>>>>>>
>>>>>>>> [1] https://github.com/gohugoio/hugo/pull/4494
>>>>>>>>
>>>>>>>> On Thu, Apr 30, 2020 at 10:06 AM Ahmet Altay <al...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Apr 30, 2020 at 9:55 AM Thomas Weise <th...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> For changed URLs, will previous URLs be mapped to avoid broken
>>>>>>>>>> external links?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I believe the answer is yes from Nam's response "For now, we keep
>>>>>>>>> the old URLs working in terms of redirecting them". I very much agree that
>>>>>>>>> this is very important and should work for all existing urls.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Apr 30, 2020 at 9:34 AM Aizhamal Nurmamat kyzy <
>>>>>>>>>> aizhamal@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> To give a little more context regarding the URLs, the date
>>>>>>>>>>> should still appear on the blog post, but not on the URL.
>>>>>>>>>>> For example, we'd have:
>>>>>>>>>>>
>>>>>>>>>>> https://beam.apache.org/beam/python/sdk/2016/02/25/python-sdk-now-public.html
>>>>>>>>>>> become
>>>>>>>>>>> https://beam.apache.org/blog/dataflow-python-sdk-is-now-public/.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> I am not a content marketer. IMO, this is a good change. In the
>>>>>>>>> past, a few times, we edited dates on posts (e.g. a release date was
>>>>>>>>> entered incorrectly) and we had to either have a mismatch between dates in
>>>>>>>>> the url and the date in the blog, or change the url. This change
>>>>>>>>> simplifies, by having date only in place (in content metadata).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> The blog posts would have a small header showing the title,
>>>>>>>>>>> author and publish date. But the URL would not have it.
>>>>>>>>>>> Thoughts?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Apr 30, 2020 at 9:23 AM Nam Bui <na...@polidea.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> @altay: Hey hey. Yeah, I didn't expect the baseUrl of staging
>>>>>>>>>>>> version is "
>>>>>>>>>>>> http://apache-beam-website-pull-requests.storage.googleapis.com/11554/"
>>>>>>>>>>>> which also includes "/11554", and Hugo considers it as a path so it breaks
>>>>>>>>>>>> the path of "static files" (like images). We made a fix. Now I'm working on
>>>>>>>>>>>> "getting git to recognize files as renames" as you suggested.
>>>>>>>>>>>>
>>>>>>>>>>>> @robert: The dates are nice but it causes verbose/long/ugly
>>>>>>>>>>>> URLs. We discussed with Aizhamal in the development stage and agreed to get
>>>>>>>>>>>> rid of this. For now, we keep the old URLs working in terms of redirecting
>>>>>>>>>>>> them. However, from now on, we should change the name convention on blog
>>>>>>>>>>>> posts to have a fancy URL like "
>>>>>>>>>>>> beam.apache.org/blog/myblogpost.md". :)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Apr 30, 2020 at 2:57 AM Robert Bradshaw <
>>>>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Apr 29, 2020 at 5:08 PM Ahmet Altay <al...@google.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Nam, this looks better. At least links are working, and the
>>>>>>>>>>>>>> website visually looks similar and generally in good shape. I think there
>>>>>>>>>>>>>> are still issues. For example, I do not see any of the images (e.g. the
>>>>>>>>>>>>>> beam logo on top left is missing.)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Apr 29, 2020 at 3:11 PM Brian Hulette <
>>>>>>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I left a comment on the PR [1]. I think the reason all of
>>>>>>>>>>>>>>> the website content is not being tracked as file renames is because there
>>>>>>>>>>>>>>> was a series of commits that created files in the new directory, and then
>>>>>>>>>>>>>>> one commit that deleted the old directory. If there were a single commit
>>>>>>>>>>>>>>> with all of the deleted and new files, git would surely recognize they are
>>>>>>>>>>>>>>> effectively renameds and mark them as such. Maybe we just need to get all
>>>>>>>>>>>>>>> these commits squashed into one?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>> https://github.com/apache/beam/pull/11554#issuecomment-621489844
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Nam, could you try this? If we can get git to recognize these
>>>>>>>>>>>>>> as renames, review process would be much easier.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> +1.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Alternatively, create a commit that just moves the files into
>>>>>>>>>>>>> a new location (which git can always detect), then sit the edits on top of
>>>>>>>>>>>>> that (which should preserve history better).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also, is there a reason the dates were removed from the blog
>>>>>>>>>>>>> post filenames? For content like that, the dates are nice.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Apr 29, 2020 at 10:39 AM Nam Bui <
>>>>>>>>>>>>>>> nam.bui@polidea.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm Nam - from the responsible team of Apache Beam website
>>>>>>>>>>>>>>>> migration. I am pleased to answer some of the questions here.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> @aizhamal: Thanks for informing to the community. :)
>>>>>>>>>>>>>>>> @altay, @robertwb: Yes. there is a problem with the staged
>>>>>>>>>>>>>>>> version at the moment. We didn't expect some behaviours on the build
>>>>>>>>>>>>>>>> process. So, we fixed it today and been waiting for @pablo to re-run it
>>>>>>>>>>>>>>>> again. The purpose of this PR is to migrate completely Beam site from
>>>>>>>>>>>>>>>> Jekyll to Hugo. Therefore, a bunch of deleted markdown files are from
>>>>>>>>>>>>>>>> Jekyll which was located at `beam/website/src`, and Hugo is located at
>>>>>>>>>>>>>>>> `beam/website/www` now. In `beam/website/README.md`, I wrote down about
>>>>>>>>>>>>>>>> running the Hugo website locally, although it is actually same as Jekyll
>>>>>>>>>>>>>>>> (because it's also set up with Docker & Gradle). In
>>>>>>>>>>>>>>>> `beam/website/CONTRIBUTE.md`, I guided people on how to get started with
>>>>>>>>>>>>>>>> Hugo on the Beam website. There is also a link in the "Translation Guide"
>>>>>>>>>>>>>>>> section which points to a branch of multilingual provenance, and it will
>>>>>>>>>>>>>>>> become a next PR soon.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please let me know if you need more details. Feel free to
>>>>>>>>>>>>>>>> ask any questions and I will get back to you with answers. I'm so sorry if
>>>>>>>>>>>>>>>> I answer a little bit due to the timezone. :)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>> Nam
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 8:49 PM Aizhamal Nurmamat kyzy <
>>>>>>>>>>>>>>>> aizhamal@apache.org> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Adding +Nam Bui <na...@polidea.com> and +Karolina Rosół
>>>>>>>>>>>>>>>>> <ka...@polidea.com> to follow up on questions.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 11:34 AM Ahmet Altay <
>>>>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I am having trouble reviewing the staged version. What is
>>>>>>>>>>>>>>>>>> the best way to review this change?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Do we expect any changes to markdown files, beyond some
>>>>>>>>>>>>>>>>>> metadata?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 10:45 AM Robert Bradshaw <
>>>>>>>>>>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks. It'll be great to better support more languages.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I looked at the PR and there seems to be no
>>>>>>>>>>>>>>>>>>> provenance/history. E.g. all the content seems to be entirely new files
>>>>>>>>>>>>>>>>>>> rather than diffs from the old. (There also seems to be a huge amount of
>>>>>>>>>>>>>>>>>>> auto-generated js code as well.)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I agree. This makes it very hard to review. I also see a
>>>>>>>>>>>>>>>>>> bunch of deleted markdown files. Are they not getting migrated?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 10:23 AM Aizhamal Nurmamat kyzy <
>>>>>>>>>>>>>>>>>>> aizhamal@apache.org> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hello everybody,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> We are almost done migrating the Apache Beam website
>>>>>>>>>>>>>>>>>>>> from Jekyll to Hugo. You can see the PR in [1], and we'd love to hear your
>>>>>>>>>>>>>>>>>>>> feedback/comments on the PR. It includes  detailed guidelines on
>>>>>>>>>>>>>>>>>>>> contributing to the new Hugo-based website and adding translations to pages
>>>>>>>>>>>>>>>>>>>> [2]. For those who are curious about adding new languages, we will provide
>>>>>>>>>>>>>>>>>>>> a proof of concept in the next couple of days in this thread.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Since we want to move forward with the PR, I would like
>>>>>>>>>>>>>>>>>>>> to ask the community to hold off changes to the current Beam website for a
>>>>>>>>>>>>>>>>>>>> week, until we are able to review and merge the PR. Is this acceptable to
>>>>>>>>>>>>>>>>>>>> everyone?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> In case anyone missed my previous email with the
>>>>>>>>>>>>>>>>>>>> background for the website migration, you can find more context here [3].
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> Aizhamal
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> [1] https://github.com/apache/beam/pull/11554
>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/256b7042bf504b94f161ca03b388a2ba247918d9/website/CONTRIBUTE.md
>>>>>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>>>>>>> https://lists.apache.org/thread.html/r7fa6d710c0a1959cce5108e460d71c306ce5756cf96af818b41cb7ca%40%3Cdev.beam.apache.org%3E
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

Posted by Nam Bui <na...@polidea.com>.
Hey guys,

How was your weekend? Thanks for some of the compliments and also
recommendations.

About the commits, as Brian said, we worked together on the-asf slack. It
was the tough one, we even did a few experiments. And finally came up with
a solution that preserved all commits and used `git mv`.
IMHO, I know it's really difficult to review all of them at first, even
though we made a commit [1] which helps you to compare changes since there
are tons of files. Therefore, I recommend to check out my work, take a look
at Hugo structure and you will link it to Jekyll one quickly. There are no
chances about file or directory names, just organize the structure. I write
a short details here, hope it would be helpful in terms of reviewing.

1. Syntax
- I strongly prefer this one [2]. He wrote about Hugo syntax which is
corresponding to Jekyll syntax. It would make sense to your overview,
instead of skimming one by one markdown file.

2. Project structure
- The main part of Hugo is in "website/www/site". You will briefly confused
a little bit here with many directories, so please read this one [3] first,
then you'll get into it very quickly. The most important thing here is the
flow. In Jekyll, you write a markdown file and then pick the layout with
"layout: home" in frontmatter as an example. In Hugo, we have separated
"content" and "layouts" directory, the "layouts" will mimic the structure
of the "content", and at the end, Hugo will know how to connect each of
them behind the scene.
- In Jekyll, the components are in "website/src/_include" and it will be
moved to "website/src/layouts/partials" in Hugo.

3. Shortcodes.
- Just thinking "shortcodes" as utility functions and we will reuse it many
times in markdown files. One of the unique features from Hugo, and it's
located at "website/www/layouts/shortcodes".

A quick Q&A:
@Altay: there are some deleted files if you see them in [1]. Some of them
have the different behaviour in Hugo. For instance,
"_data/capabilitymatrix.md" will be used directly in
frontmatter "website/www/site/content/en/blog/capability-matrix.md", the
reason is, it will take more works in Hugo to retrieve data from files and
pass them into "shortcodes" in markdown files (other data files are not
deleted because they are used in "layouts" HTML files).
@Robert: thanks for your review and comments on GitHub. I will walk through
all of them today.

Best regards,
Nam


[1]
https://github.com/apache/beam/pull/11554/commits/b267bb360866a723ac2536f408f23de648c7cd4d
[2] https://simpleit.rocks/golang/hugo/migrating-a-jekyll-blog-to-hugo/
[3] https://gohugo.io/getting-started/directory-structure/

On Fri, May 1, 2020 at 6:24 PM Brian Hulette <bh...@google.com> wrote:

> Regarding move detection: I worked with Nam on this some on the-asf slack.
> We couldn't make squashing into a single large commit work - when I did it,
> `git log` still showed many dropped and added files. Breaking out a single
> commit with the file moves was the best we could manage. I tested a PR that
> used this approach on a single file and the github UI did pick up on it
> [1]. Sadly it seems to give up on the larger PR.
>
> I figured this was good enough though, it's difficult to review all of the
> changes at once, but you can at least review the individual commits without
> being obfuscated by the moves.
>
> [1] https://github.com/apache/beam/pull/11579
>
>
> On Fri, May 1, 2020 at 9:11 AM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> I just took a look, and added a couple of comments, but it mostly looks
>> good. Thanks for creating a commit that preserves changes; that's a big
>> improvement.
>>
>> +1 to Ahmet's suggestion about braking the huge commit up a bit more. I
>> would suggest one that adds the mechanics (etc.), one that applies a script
>> to auto-convert the content (where we can review the script and that it's
>> application give the resulting diff), and a final one that takes care of
>> the things that the script wasn't able to handle (or messed up, rather than
>> spending a huge amount of time getting the script perfect).
>>
>> On Fri, May 1, 2020 at 6:44 AM Kenneth Knowles <ke...@apache.org> wrote:
>>
>>> I believe taking Brian and Robert's advice to help git detect moves
>>> (even more than you already have) will make this much more manageable. I
>>> just tried it out and squashing commits brings it to "631 files changed,
>>> 10363 insertions(+), 9945 deletions(-)" according to git, so that is more
>>> manageable than +47k - 47k. I'm not saying that a total squash is best.
>>> There may be a better way to factor the changes.
>>>
>>> Kenn
>>>
>>> On Thu, Apr 30, 2020 at 8:09 PM Ahmet Altay <al...@google.com> wrote:
>>>
>>>> Nam,
>>>>
>>>>  - Website looks good and looks the same as the current website.
>>>> (Visually comparing a few pages, not a deep analysis.)
>>>> - contribute.md looks good. (this is new content.)
>>>> - website/Dockerfile and website/README.md changes look good.
>>>> - I do not know what is the new version of some files, for example:
>>>> website/src/_data/authors.yml,  website/src/_data/capability-matrix.yml --
>>>> what replaces them?
>>>>
>>>> There are 887 file changes. It is not easy to review this. I wanted to
>>>> go commit by commit, but that did not help much. How about we try to
>>>> organize this review as reviewable commits.
>>>> - Changes to the mechanics (jekyll to hugo), themes, build files,
>>>> website related readmes etc. This will likely be a smaller change in number
>>>> of files. (This will likely have many completed new, and completely deleted
>>>> files. Only a few files have meaningful diffs.)
>>>> - Changes to the content. This might be a large number of files with
>>>> minimal changes. I do not think we can manually review each file, but at
>>>> least a quick review of minimal changes to each file would be good enough.
>>>>
>>>> What do you think?
>>>>
>>>> Ahmet
>>>>
>>>> On Thu, Apr 30, 2020 at 4:29 PM Hannah Jiang <ha...@google.com>
>>>> wrote:
>>>>
>>>>> Since we want to move forward with the PR, I would like to ask the
>>>>>> community to hold off changes to the current Beam website for a week, until
>>>>>> we are able to review and merge the PR. Is this acceptable to everyone?
>>>>>
>>>>> Do we have an exact date when we can push changes to the website? I
>>>>> have PRs to update documents so would like to plan ahead.
>>>>>
>>>>> On Thu, Apr 30, 2020 at 1:17 PM Nam Bui <na...@polidea.com> wrote:
>>>>>
>>>>>> Hey guys,
>>>>>>
>>>>>> I tried my best to handle renamed files in Git. I have no clue why
>>>>>> GitHub doesn't show it, but finally, I made this commit [1] (thanks for
>>>>>> your idea @bhulette) so you guys can review changes with ease (there is no
>>>>>> bunch of deleted markdown files anymore :D). Also, new staged version is
>>>>>> deployed, you could check it out [2].
>>>>>>
>>>>>> In case you are interested in translation, here is the proof of
>>>>>> concept [3] (the earth icon on the right corner is temporarily used for
>>>>>> switching languages). You can take a look at the translation guide for this
>>>>>> PoC [4].
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/apache/beam/pull/11554/commits/b267bb360866a723ac2536f408f23de648c7cd4d
>>>>>> [2]
>>>>>> http://apache-beam-website-pull-requests.storage.googleapis.com/11554/index.html
>>>>>> [3] https://safe-relation.surge.sh/
>>>>>> [4]
>>>>>> https://github.com/PolideaInternal/beam/blob/website-develop/website/CONTRIBUTE.md#translation-guide
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 30, 2020 at 7:24 PM Brian Hulette <bh...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Changing the URLs is fine with me as long as the old urls will work
>>>>>>> too.
>>>>>>>
>>>>>>> But do we need to change the filenames for the blog posts to
>>>>>>> accomplish that? It's nice that the blog post markdown files start with a
>>>>>>> date so they naturally sort chronologically. It looks like this hugo PR [1]
>>>>>>> made it possible to extract date metadata and slug
>>>>>>> (i.e. dataflow-python-sdk-is-now-public) separately from the filename.
>>>>>>>
>>>>>>> [1] https://github.com/gohugoio/hugo/pull/4494
>>>>>>>
>>>>>>> On Thu, Apr 30, 2020 at 10:06 AM Ahmet Altay <al...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Apr 30, 2020 at 9:55 AM Thomas Weise <th...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> For changed URLs, will previous URLs be mapped to avoid broken
>>>>>>>>> external links?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I believe the answer is yes from Nam's response "For now, we keep
>>>>>>>> the old URLs working in terms of redirecting them". I very much agree that
>>>>>>>> this is very important and should work for all existing urls.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Apr 30, 2020 at 9:34 AM Aizhamal Nurmamat kyzy <
>>>>>>>>> aizhamal@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> To give a little more context regarding the URLs, the date should
>>>>>>>>>> still appear on the blog post, but not on the URL.
>>>>>>>>>> For example, we'd have:
>>>>>>>>>>
>>>>>>>>>> https://beam.apache.org/beam/python/sdk/2016/02/25/python-sdk-now-public.html
>>>>>>>>>> become
>>>>>>>>>> https://beam.apache.org/blog/dataflow-python-sdk-is-now-public/.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>> I am not a content marketer. IMO, this is a good change. In the
>>>>>>>> past, a few times, we edited dates on posts (e.g. a release date was
>>>>>>>> entered incorrectly) and we had to either have a mismatch between dates in
>>>>>>>> the url and the date in the blog, or change the url. This change
>>>>>>>> simplifies, by having date only in place (in content metadata).
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> The blog posts would have a small header showing the title,
>>>>>>>>>> author and publish date. But the URL would not have it.
>>>>>>>>>> Thoughts?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Apr 30, 2020 at 9:23 AM Nam Bui <na...@polidea.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> @altay: Hey hey. Yeah, I didn't expect the baseUrl of staging
>>>>>>>>>>> version is "
>>>>>>>>>>> http://apache-beam-website-pull-requests.storage.googleapis.com/11554/"
>>>>>>>>>>> which also includes "/11554", and Hugo considers it as a path so it breaks
>>>>>>>>>>> the path of "static files" (like images). We made a fix. Now I'm working on
>>>>>>>>>>> "getting git to recognize files as renames" as you suggested.
>>>>>>>>>>>
>>>>>>>>>>> @robert: The dates are nice but it causes verbose/long/ugly
>>>>>>>>>>> URLs. We discussed with Aizhamal in the development stage and agreed to get
>>>>>>>>>>> rid of this. For now, we keep the old URLs working in terms of redirecting
>>>>>>>>>>> them. However, from now on, we should change the name convention on blog
>>>>>>>>>>> posts to have a fancy URL like "
>>>>>>>>>>> beam.apache.org/blog/myblogpost.md". :)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Apr 30, 2020 at 2:57 AM Robert Bradshaw <
>>>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Apr 29, 2020 at 5:08 PM Ahmet Altay <al...@google.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Nam, this looks better. At least links are working, and the
>>>>>>>>>>>>> website visually looks similar and generally in good shape. I think there
>>>>>>>>>>>>> are still issues. For example, I do not see any of the images (e.g. the
>>>>>>>>>>>>> beam logo on top left is missing.)
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Apr 29, 2020 at 3:11 PM Brian Hulette <
>>>>>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I left a comment on the PR [1]. I think the reason all of the
>>>>>>>>>>>>>> website content is not being tracked as file renames is because there was a
>>>>>>>>>>>>>> series of commits that created files in the new directory, and then one
>>>>>>>>>>>>>> commit that deleted the old directory. If there were a single commit with
>>>>>>>>>>>>>> all of the deleted and new files, git would surely recognize they are
>>>>>>>>>>>>>> effectively renameds and mark them as such. Maybe we just need to get all
>>>>>>>>>>>>>> these commits squashed into one?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>> https://github.com/apache/beam/pull/11554#issuecomment-621489844
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Nam, could you try this? If we can get git to recognize these
>>>>>>>>>>>>> as renames, review process would be much easier.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> +1.
>>>>>>>>>>>>
>>>>>>>>>>>> Alternatively, create a commit that just moves the files into a
>>>>>>>>>>>> new location (which git can always detect), then sit the edits on top of
>>>>>>>>>>>> that (which should preserve history better).
>>>>>>>>>>>>
>>>>>>>>>>>> Also, is there a reason the dates were removed from the blog
>>>>>>>>>>>> post filenames? For content like that, the dates are nice.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Apr 29, 2020 at 10:39 AM Nam Bui <na...@polidea.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm Nam - from the responsible team of Apache Beam website
>>>>>>>>>>>>>>> migration. I am pleased to answer some of the questions here.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> @aizhamal: Thanks for informing to the community. :)
>>>>>>>>>>>>>>> @altay, @robertwb: Yes. there is a problem with the staged
>>>>>>>>>>>>>>> version at the moment. We didn't expect some behaviours on the build
>>>>>>>>>>>>>>> process. So, we fixed it today and been waiting for @pablo to re-run it
>>>>>>>>>>>>>>> again. The purpose of this PR is to migrate completely Beam site from
>>>>>>>>>>>>>>> Jekyll to Hugo. Therefore, a bunch of deleted markdown files are from
>>>>>>>>>>>>>>> Jekyll which was located at `beam/website/src`, and Hugo is located at
>>>>>>>>>>>>>>> `beam/website/www` now. In `beam/website/README.md`, I wrote down about
>>>>>>>>>>>>>>> running the Hugo website locally, although it is actually same as Jekyll
>>>>>>>>>>>>>>> (because it's also set up with Docker & Gradle). In
>>>>>>>>>>>>>>> `beam/website/CONTRIBUTE.md`, I guided people on how to get started with
>>>>>>>>>>>>>>> Hugo on the Beam website. There is also a link in the "Translation Guide"
>>>>>>>>>>>>>>> section which points to a branch of multilingual provenance, and it will
>>>>>>>>>>>>>>> become a next PR soon.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please let me know if you need more details. Feel free to
>>>>>>>>>>>>>>> ask any questions and I will get back to you with answers. I'm so sorry if
>>>>>>>>>>>>>>> I answer a little bit due to the timezone. :)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>> Nam
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 8:49 PM Aizhamal Nurmamat kyzy <
>>>>>>>>>>>>>>> aizhamal@apache.org> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Adding +Nam Bui <na...@polidea.com> and +Karolina Rosół
>>>>>>>>>>>>>>>> <ka...@polidea.com> to follow up on questions.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 11:34 AM Ahmet Altay <
>>>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I am having trouble reviewing the staged version. What is
>>>>>>>>>>>>>>>>> the best way to review this change?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Do we expect any changes to markdown files, beyond some
>>>>>>>>>>>>>>>>> metadata?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 10:45 AM Robert Bradshaw <
>>>>>>>>>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks. It'll be great to better support more languages.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I looked at the PR and there seems to be no
>>>>>>>>>>>>>>>>>> provenance/history. E.g. all the content seems to be entirely new files
>>>>>>>>>>>>>>>>>> rather than diffs from the old. (There also seems to be a huge amount of
>>>>>>>>>>>>>>>>>> auto-generated js code as well.)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I agree. This makes it very hard to review. I also see a
>>>>>>>>>>>>>>>>> bunch of deleted markdown files. Are they not getting migrated?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 10:23 AM Aizhamal Nurmamat kyzy <
>>>>>>>>>>>>>>>>>> aizhamal@apache.org> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hello everybody,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> We are almost done migrating the Apache Beam website
>>>>>>>>>>>>>>>>>>> from Jekyll to Hugo. You can see the PR in [1], and we'd love to hear your
>>>>>>>>>>>>>>>>>>> feedback/comments on the PR. It includes  detailed guidelines on
>>>>>>>>>>>>>>>>>>> contributing to the new Hugo-based website and adding translations to pages
>>>>>>>>>>>>>>>>>>> [2]. For those who are curious about adding new languages, we will provide
>>>>>>>>>>>>>>>>>>> a proof of concept in the next couple of days in this thread.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Since we want to move forward with the PR, I would like
>>>>>>>>>>>>>>>>>>> to ask the community to hold off changes to the current Beam website for a
>>>>>>>>>>>>>>>>>>> week, until we are able to review and merge the PR. Is this acceptable to
>>>>>>>>>>>>>>>>>>> everyone?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In case anyone missed my previous email with the
>>>>>>>>>>>>>>>>>>> background for the website migration, you can find more context here [3].
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Aizhamal
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> [1] https://github.com/apache/beam/pull/11554
>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/256b7042bf504b94f161ca03b388a2ba247918d9/website/CONTRIBUTE.md
>>>>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>>>>>> https://lists.apache.org/thread.html/r7fa6d710c0a1959cce5108e460d71c306ce5756cf96af818b41cb7ca%40%3Cdev.beam.apache.org%3E
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

Posted by Brian Hulette <bh...@google.com>.
Regarding move detection: I worked with Nam on this some on the-asf slack.
We couldn't make squashing into a single large commit work - when I did it,
`git log` still showed many dropped and added files. Breaking out a single
commit with the file moves was the best we could manage. I tested a PR that
used this approach on a single file and the github UI did pick up on it
[1]. Sadly it seems to give up on the larger PR.

I figured this was good enough though, it's difficult to review all of the
changes at once, but you can at least review the individual commits without
being obfuscated by the moves.

[1] https://github.com/apache/beam/pull/11579


On Fri, May 1, 2020 at 9:11 AM Robert Bradshaw <ro...@google.com> wrote:

> I just took a look, and added a couple of comments, but it mostly looks
> good. Thanks for creating a commit that preserves changes; that's a big
> improvement.
>
> +1 to Ahmet's suggestion about braking the huge commit up a bit more. I
> would suggest one that adds the mechanics (etc.), one that applies a script
> to auto-convert the content (where we can review the script and that it's
> application give the resulting diff), and a final one that takes care of
> the things that the script wasn't able to handle (or messed up, rather than
> spending a huge amount of time getting the script perfect).
>
> On Fri, May 1, 2020 at 6:44 AM Kenneth Knowles <ke...@apache.org> wrote:
>
>> I believe taking Brian and Robert's advice to help git detect moves (even
>> more than you already have) will make this much more manageable. I just
>> tried it out and squashing commits brings it to "631 files changed, 10363
>> insertions(+), 9945 deletions(-)" according to git, so that is more
>> manageable than +47k - 47k. I'm not saying that a total squash is best.
>> There may be a better way to factor the changes.
>>
>> Kenn
>>
>> On Thu, Apr 30, 2020 at 8:09 PM Ahmet Altay <al...@google.com> wrote:
>>
>>> Nam,
>>>
>>>  - Website looks good and looks the same as the current website.
>>> (Visually comparing a few pages, not a deep analysis.)
>>> - contribute.md looks good. (this is new content.)
>>> - website/Dockerfile and website/README.md changes look good.
>>> - I do not know what is the new version of some files, for example:
>>> website/src/_data/authors.yml,  website/src/_data/capability-matrix.yml --
>>> what replaces them?
>>>
>>> There are 887 file changes. It is not easy to review this. I wanted to
>>> go commit by commit, but that did not help much. How about we try to
>>> organize this review as reviewable commits.
>>> - Changes to the mechanics (jekyll to hugo), themes, build files,
>>> website related readmes etc. This will likely be a smaller change in number
>>> of files. (This will likely have many completed new, and completely deleted
>>> files. Only a few files have meaningful diffs.)
>>> - Changes to the content. This might be a large number of files with
>>> minimal changes. I do not think we can manually review each file, but at
>>> least a quick review of minimal changes to each file would be good enough.
>>>
>>> What do you think?
>>>
>>> Ahmet
>>>
>>> On Thu, Apr 30, 2020 at 4:29 PM Hannah Jiang <ha...@google.com>
>>> wrote:
>>>
>>>> Since we want to move forward with the PR, I would like to ask the
>>>>> community to hold off changes to the current Beam website for a week, until
>>>>> we are able to review and merge the PR. Is this acceptable to everyone?
>>>>
>>>> Do we have an exact date when we can push changes to the website? I
>>>> have PRs to update documents so would like to plan ahead.
>>>>
>>>> On Thu, Apr 30, 2020 at 1:17 PM Nam Bui <na...@polidea.com> wrote:
>>>>
>>>>> Hey guys,
>>>>>
>>>>> I tried my best to handle renamed files in Git. I have no clue why
>>>>> GitHub doesn't show it, but finally, I made this commit [1] (thanks for
>>>>> your idea @bhulette) so you guys can review changes with ease (there is no
>>>>> bunch of deleted markdown files anymore :D). Also, new staged version is
>>>>> deployed, you could check it out [2].
>>>>>
>>>>> In case you are interested in translation, here is the proof of
>>>>> concept [3] (the earth icon on the right corner is temporarily used for
>>>>> switching languages). You can take a look at the translation guide for this
>>>>> PoC [4].
>>>>>
>>>>> [1]
>>>>> https://github.com/apache/beam/pull/11554/commits/b267bb360866a723ac2536f408f23de648c7cd4d
>>>>> [2]
>>>>> http://apache-beam-website-pull-requests.storage.googleapis.com/11554/index.html
>>>>> [3] https://safe-relation.surge.sh/
>>>>> [4]
>>>>> https://github.com/PolideaInternal/beam/blob/website-develop/website/CONTRIBUTE.md#translation-guide
>>>>>
>>>>>
>>>>> On Thu, Apr 30, 2020 at 7:24 PM Brian Hulette <bh...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Changing the URLs is fine with me as long as the old urls will work
>>>>>> too.
>>>>>>
>>>>>> But do we need to change the filenames for the blog posts to
>>>>>> accomplish that? It's nice that the blog post markdown files start with a
>>>>>> date so they naturally sort chronologically. It looks like this hugo PR [1]
>>>>>> made it possible to extract date metadata and slug
>>>>>> (i.e. dataflow-python-sdk-is-now-public) separately from the filename.
>>>>>>
>>>>>> [1] https://github.com/gohugoio/hugo/pull/4494
>>>>>>
>>>>>> On Thu, Apr 30, 2020 at 10:06 AM Ahmet Altay <al...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 30, 2020 at 9:55 AM Thomas Weise <th...@apache.org> wrote:
>>>>>>>
>>>>>>>> For changed URLs, will previous URLs be mapped to avoid broken
>>>>>>>> external links?
>>>>>>>>
>>>>>>>
>>>>>>> I believe the answer is yes from Nam's response "For now, we keep
>>>>>>> the old URLs working in terms of redirecting them". I very much agree that
>>>>>>> this is very important and should work for all existing urls.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Apr 30, 2020 at 9:34 AM Aizhamal Nurmamat kyzy <
>>>>>>>> aizhamal@apache.org> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> To give a little more context regarding the URLs, the date should
>>>>>>>>> still appear on the blog post, but not on the URL.
>>>>>>>>> For example, we'd have:
>>>>>>>>>
>>>>>>>>> https://beam.apache.org/beam/python/sdk/2016/02/25/python-sdk-now-public.html
>>>>>>>>> become
>>>>>>>>> https://beam.apache.org/blog/dataflow-python-sdk-is-now-public/.
>>>>>>>>>
>>>>>>>>
>>>>>>> I am not a content marketer. IMO, this is a good change. In the
>>>>>>> past, a few times, we edited dates on posts (e.g. a release date was
>>>>>>> entered incorrectly) and we had to either have a mismatch between dates in
>>>>>>> the url and the date in the blog, or change the url. This change
>>>>>>> simplifies, by having date only in place (in content metadata).
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> The blog posts would have a small header showing the title, author
>>>>>>>>> and publish date. But the URL would not have it.
>>>>>>>>> Thoughts?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Apr 30, 2020 at 9:23 AM Nam Bui <na...@polidea.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> @altay: Hey hey. Yeah, I didn't expect the baseUrl of staging
>>>>>>>>>> version is "
>>>>>>>>>> http://apache-beam-website-pull-requests.storage.googleapis.com/11554/"
>>>>>>>>>> which also includes "/11554", and Hugo considers it as a path so it breaks
>>>>>>>>>> the path of "static files" (like images). We made a fix. Now I'm working on
>>>>>>>>>> "getting git to recognize files as renames" as you suggested.
>>>>>>>>>>
>>>>>>>>>> @robert: The dates are nice but it causes verbose/long/ugly URLs.
>>>>>>>>>> We discussed with Aizhamal in the development stage and agreed to get rid
>>>>>>>>>> of this. For now, we keep the old URLs working in terms of redirecting
>>>>>>>>>> them. However, from now on, we should change the name convention on blog
>>>>>>>>>> posts to have a fancy URL like "
>>>>>>>>>> beam.apache.org/blog/myblogpost.md". :)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Apr 30, 2020 at 2:57 AM Robert Bradshaw <
>>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Wed, Apr 29, 2020 at 5:08 PM Ahmet Altay <al...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Nam, this looks better. At least links are working, and the
>>>>>>>>>>>> website visually looks similar and generally in good shape. I think there
>>>>>>>>>>>> are still issues. For example, I do not see any of the images (e.g. the
>>>>>>>>>>>> beam logo on top left is missing.)
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Apr 29, 2020 at 3:11 PM Brian Hulette <
>>>>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I left a comment on the PR [1]. I think the reason all of the
>>>>>>>>>>>>> website content is not being tracked as file renames is because there was a
>>>>>>>>>>>>> series of commits that created files in the new directory, and then one
>>>>>>>>>>>>> commit that deleted the old directory. If there were a single commit with
>>>>>>>>>>>>> all of the deleted and new files, git would surely recognize they are
>>>>>>>>>>>>> effectively renameds and mark them as such. Maybe we just need to get all
>>>>>>>>>>>>> these commits squashed into one?
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> https://github.com/apache/beam/pull/11554#issuecomment-621489844
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Nam, could you try this? If we can get git to recognize these
>>>>>>>>>>>> as renames, review process would be much easier.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> +1.
>>>>>>>>>>>
>>>>>>>>>>> Alternatively, create a commit that just moves the files into a
>>>>>>>>>>> new location (which git can always detect), then sit the edits on top of
>>>>>>>>>>> that (which should preserve history better).
>>>>>>>>>>>
>>>>>>>>>>> Also, is there a reason the dates were removed from the blog
>>>>>>>>>>> post filenames? For content like that, the dates are nice.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Apr 29, 2020 at 10:39 AM Nam Bui <na...@polidea.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm Nam - from the responsible team of Apache Beam website
>>>>>>>>>>>>>> migration. I am pleased to answer some of the questions here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @aizhamal: Thanks for informing to the community. :)
>>>>>>>>>>>>>> @altay, @robertwb: Yes. there is a problem with the staged
>>>>>>>>>>>>>> version at the moment. We didn't expect some behaviours on the build
>>>>>>>>>>>>>> process. So, we fixed it today and been waiting for @pablo to re-run it
>>>>>>>>>>>>>> again. The purpose of this PR is to migrate completely Beam site from
>>>>>>>>>>>>>> Jekyll to Hugo. Therefore, a bunch of deleted markdown files are from
>>>>>>>>>>>>>> Jekyll which was located at `beam/website/src`, and Hugo is located at
>>>>>>>>>>>>>> `beam/website/www` now. In `beam/website/README.md`, I wrote down about
>>>>>>>>>>>>>> running the Hugo website locally, although it is actually same as Jekyll
>>>>>>>>>>>>>> (because it's also set up with Docker & Gradle). In
>>>>>>>>>>>>>> `beam/website/CONTRIBUTE.md`, I guided people on how to get started with
>>>>>>>>>>>>>> Hugo on the Beam website. There is also a link in the "Translation Guide"
>>>>>>>>>>>>>> section which points to a branch of multilingual provenance, and it will
>>>>>>>>>>>>>> become a next PR soon.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please let me know if you need more details. Feel free to ask
>>>>>>>>>>>>>> any questions and I will get back to you with answers. I'm so sorry if I
>>>>>>>>>>>>>> answer a little bit due to the timezone. :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>> Nam
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 8:49 PM Aizhamal Nurmamat kyzy <
>>>>>>>>>>>>>> aizhamal@apache.org> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Adding +Nam Bui <na...@polidea.com> and +Karolina Rosół
>>>>>>>>>>>>>>> <ka...@polidea.com> to follow up on questions.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 11:34 AM Ahmet Altay <
>>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am having trouble reviewing the staged version. What is
>>>>>>>>>>>>>>>> the best way to review this change?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Do we expect any changes to markdown files, beyond some
>>>>>>>>>>>>>>>> metadata?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 10:45 AM Robert Bradshaw <
>>>>>>>>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks. It'll be great to better support more languages.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I looked at the PR and there seems to be no
>>>>>>>>>>>>>>>>> provenance/history. E.g. all the content seems to be entirely new files
>>>>>>>>>>>>>>>>> rather than diffs from the old. (There also seems to be a huge amount of
>>>>>>>>>>>>>>>>> auto-generated js code as well.)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I agree. This makes it very hard to review. I also see a
>>>>>>>>>>>>>>>> bunch of deleted markdown files. Are they not getting migrated?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 10:23 AM Aizhamal Nurmamat kyzy <
>>>>>>>>>>>>>>>>> aizhamal@apache.org> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hello everybody,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We are almost done migrating the Apache Beam website from
>>>>>>>>>>>>>>>>>> Jekyll to Hugo. You can see the PR in [1], and we'd love to hear your
>>>>>>>>>>>>>>>>>> feedback/comments on the PR. It includes  detailed guidelines on
>>>>>>>>>>>>>>>>>> contributing to the new Hugo-based website and adding translations to pages
>>>>>>>>>>>>>>>>>> [2]. For those who are curious about adding new languages, we will provide
>>>>>>>>>>>>>>>>>> a proof of concept in the next couple of days in this thread.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Since we want to move forward with the PR, I would like
>>>>>>>>>>>>>>>>>> to ask the community to hold off changes to the current Beam website for a
>>>>>>>>>>>>>>>>>> week, until we are able to review and merge the PR. Is this acceptable to
>>>>>>>>>>>>>>>>>> everyone?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In case anyone missed my previous email with the
>>>>>>>>>>>>>>>>>> background for the website migration, you can find more context here [3].
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Aizhamal
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [1] https://github.com/apache/beam/pull/11554
>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/256b7042bf504b94f161ca03b388a2ba247918d9/website/CONTRIBUTE.md
>>>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>>>>> https://lists.apache.org/thread.html/r7fa6d710c0a1959cce5108e460d71c306ce5756cf96af818b41cb7ca%40%3Cdev.beam.apache.org%3E
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

Posted by Robert Bradshaw <ro...@google.com>.
I just took a look, and added a couple of comments, but it mostly looks
good. Thanks for creating a commit that preserves changes; that's a big
improvement.

+1 to Ahmet's suggestion about braking the huge commit up a bit more. I
would suggest one that adds the mechanics (etc.), one that applies a script
to auto-convert the content (where we can review the script and that it's
application give the resulting diff), and a final one that takes care of
the things that the script wasn't able to handle (or messed up, rather than
spending a huge amount of time getting the script perfect).

On Fri, May 1, 2020 at 6:44 AM Kenneth Knowles <ke...@apache.org> wrote:

> I believe taking Brian and Robert's advice to help git detect moves (even
> more than you already have) will make this much more manageable. I just
> tried it out and squashing commits brings it to "631 files changed, 10363
> insertions(+), 9945 deletions(-)" according to git, so that is more
> manageable than +47k - 47k. I'm not saying that a total squash is best.
> There may be a better way to factor the changes.
>
> Kenn
>
> On Thu, Apr 30, 2020 at 8:09 PM Ahmet Altay <al...@google.com> wrote:
>
>> Nam,
>>
>>  - Website looks good and looks the same as the current website.
>> (Visually comparing a few pages, not a deep analysis.)
>> - contribute.md looks good. (this is new content.)
>> - website/Dockerfile and website/README.md changes look good.
>> - I do not know what is the new version of some files, for example:
>> website/src/_data/authors.yml,  website/src/_data/capability-matrix.yml --
>> what replaces them?
>>
>> There are 887 file changes. It is not easy to review this. I wanted to go
>> commit by commit, but that did not help much. How about we try to organize
>> this review as reviewable commits.
>> - Changes to the mechanics (jekyll to hugo), themes, build files, website
>> related readmes etc. This will likely be a smaller change in number of
>> files. (This will likely have many completed new, and completely deleted
>> files. Only a few files have meaningful diffs.)
>> - Changes to the content. This might be a large number of files with
>> minimal changes. I do not think we can manually review each file, but at
>> least a quick review of minimal changes to each file would be good enough.
>>
>> What do you think?
>>
>> Ahmet
>>
>> On Thu, Apr 30, 2020 at 4:29 PM Hannah Jiang <ha...@google.com>
>> wrote:
>>
>>> Since we want to move forward with the PR, I would like to ask the
>>>> community to hold off changes to the current Beam website for a week, until
>>>> we are able to review and merge the PR. Is this acceptable to everyone?
>>>
>>> Do we have an exact date when we can push changes to the website? I have
>>> PRs to update documents so would like to plan ahead.
>>>
>>> On Thu, Apr 30, 2020 at 1:17 PM Nam Bui <na...@polidea.com> wrote:
>>>
>>>> Hey guys,
>>>>
>>>> I tried my best to handle renamed files in Git. I have no clue why
>>>> GitHub doesn't show it, but finally, I made this commit [1] (thanks for
>>>> your idea @bhulette) so you guys can review changes with ease (there is no
>>>> bunch of deleted markdown files anymore :D). Also, new staged version is
>>>> deployed, you could check it out [2].
>>>>
>>>> In case you are interested in translation, here is the proof of concept
>>>> [3] (the earth icon on the right corner is temporarily used for switching
>>>> languages). You can take a look at the translation guide for this PoC [4].
>>>>
>>>> [1]
>>>> https://github.com/apache/beam/pull/11554/commits/b267bb360866a723ac2536f408f23de648c7cd4d
>>>> [2]
>>>> http://apache-beam-website-pull-requests.storage.googleapis.com/11554/index.html
>>>> [3] https://safe-relation.surge.sh/
>>>> [4]
>>>> https://github.com/PolideaInternal/beam/blob/website-develop/website/CONTRIBUTE.md#translation-guide
>>>>
>>>>
>>>> On Thu, Apr 30, 2020 at 7:24 PM Brian Hulette <bh...@google.com>
>>>> wrote:
>>>>
>>>>> Changing the URLs is fine with me as long as the old urls will work
>>>>> too.
>>>>>
>>>>> But do we need to change the filenames for the blog posts to
>>>>> accomplish that? It's nice that the blog post markdown files start with a
>>>>> date so they naturally sort chronologically. It looks like this hugo PR [1]
>>>>> made it possible to extract date metadata and slug
>>>>> (i.e. dataflow-python-sdk-is-now-public) separately from the filename.
>>>>>
>>>>> [1] https://github.com/gohugoio/hugo/pull/4494
>>>>>
>>>>> On Thu, Apr 30, 2020 at 10:06 AM Ahmet Altay <al...@google.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 30, 2020 at 9:55 AM Thomas Weise <th...@apache.org> wrote:
>>>>>>
>>>>>>> For changed URLs, will previous URLs be mapped to avoid broken
>>>>>>> external links?
>>>>>>>
>>>>>>
>>>>>> I believe the answer is yes from Nam's response "For now, we keep the
>>>>>> old URLs working in terms of redirecting them". I very much agree that this
>>>>>> is very important and should work for all existing urls.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 30, 2020 at 9:34 AM Aizhamal Nurmamat kyzy <
>>>>>>> aizhamal@apache.org> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> To give a little more context regarding the URLs, the date should
>>>>>>>> still appear on the blog post, but not on the URL.
>>>>>>>> For example, we'd have:
>>>>>>>>
>>>>>>>> https://beam.apache.org/beam/python/sdk/2016/02/25/python-sdk-now-public.html
>>>>>>>> become
>>>>>>>> https://beam.apache.org/blog/dataflow-python-sdk-is-now-public/.
>>>>>>>>
>>>>>>>
>>>>>> I am not a content marketer. IMO, this is a good change. In the past,
>>>>>> a few times, we edited dates on posts (e.g. a release date was entered
>>>>>> incorrectly) and we had to either have a mismatch between dates in the url
>>>>>> and the date in the blog, or change the url. This change simplifies, by
>>>>>> having date only in place (in content metadata).
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> The blog posts would have a small header showing the title, author
>>>>>>>> and publish date. But the URL would not have it.
>>>>>>>> Thoughts?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Apr 30, 2020 at 9:23 AM Nam Bui <na...@polidea.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> @altay: Hey hey. Yeah, I didn't expect the baseUrl of staging
>>>>>>>>> version is "
>>>>>>>>> http://apache-beam-website-pull-requests.storage.googleapis.com/11554/"
>>>>>>>>> which also includes "/11554", and Hugo considers it as a path so it breaks
>>>>>>>>> the path of "static files" (like images). We made a fix. Now I'm working on
>>>>>>>>> "getting git to recognize files as renames" as you suggested.
>>>>>>>>>
>>>>>>>>> @robert: The dates are nice but it causes verbose/long/ugly URLs.
>>>>>>>>> We discussed with Aizhamal in the development stage and agreed to get rid
>>>>>>>>> of this. For now, we keep the old URLs working in terms of redirecting
>>>>>>>>> them. However, from now on, we should change the name convention on blog
>>>>>>>>> posts to have a fancy URL like "beam.apache.org/blog/myblogpost.md".
>>>>>>>>> :)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Apr 30, 2020 at 2:57 AM Robert Bradshaw <
>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> On Wed, Apr 29, 2020 at 5:08 PM Ahmet Altay <al...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Nam, this looks better. At least links are working, and the
>>>>>>>>>>> website visually looks similar and generally in good shape. I think there
>>>>>>>>>>> are still issues. For example, I do not see any of the images (e.g. the
>>>>>>>>>>> beam logo on top left is missing.)
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Apr 29, 2020 at 3:11 PM Brian Hulette <
>>>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I left a comment on the PR [1]. I think the reason all of the
>>>>>>>>>>>> website content is not being tracked as file renames is because there was a
>>>>>>>>>>>> series of commits that created files in the new directory, and then one
>>>>>>>>>>>> commit that deleted the old directory. If there were a single commit with
>>>>>>>>>>>> all of the deleted and new files, git would surely recognize they are
>>>>>>>>>>>> effectively renameds and mark them as such. Maybe we just need to get all
>>>>>>>>>>>> these commits squashed into one?
>>>>>>>>>>>>
>>>>>>>>>>>> [1]
>>>>>>>>>>>> https://github.com/apache/beam/pull/11554#issuecomment-621489844
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Nam, could you try this? If we can get git to recognize these as
>>>>>>>>>>> renames, review process would be much easier.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> +1.
>>>>>>>>>>
>>>>>>>>>> Alternatively, create a commit that just moves the files into a
>>>>>>>>>> new location (which git can always detect), then sit the edits on top of
>>>>>>>>>> that (which should preserve history better).
>>>>>>>>>>
>>>>>>>>>> Also, is there a reason the dates were removed from the blog post
>>>>>>>>>> filenames? For content like that, the dates are nice.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Apr 29, 2020 at 10:39 AM Nam Bui <na...@polidea.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm Nam - from the responsible team of Apache Beam website
>>>>>>>>>>>>> migration. I am pleased to answer some of the questions here.
>>>>>>>>>>>>>
>>>>>>>>>>>>> @aizhamal: Thanks for informing to the community. :)
>>>>>>>>>>>>> @altay, @robertwb: Yes. there is a problem with the staged
>>>>>>>>>>>>> version at the moment. We didn't expect some behaviours on the build
>>>>>>>>>>>>> process. So, we fixed it today and been waiting for @pablo to re-run it
>>>>>>>>>>>>> again. The purpose of this PR is to migrate completely Beam site from
>>>>>>>>>>>>> Jekyll to Hugo. Therefore, a bunch of deleted markdown files are from
>>>>>>>>>>>>> Jekyll which was located at `beam/website/src`, and Hugo is located at
>>>>>>>>>>>>> `beam/website/www` now. In `beam/website/README.md`, I wrote down about
>>>>>>>>>>>>> running the Hugo website locally, although it is actually same as Jekyll
>>>>>>>>>>>>> (because it's also set up with Docker & Gradle). In
>>>>>>>>>>>>> `beam/website/CONTRIBUTE.md`, I guided people on how to get started with
>>>>>>>>>>>>> Hugo on the Beam website. There is also a link in the "Translation Guide"
>>>>>>>>>>>>> section which points to a branch of multilingual provenance, and it will
>>>>>>>>>>>>> become a next PR soon.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please let me know if you need more details. Feel free to ask
>>>>>>>>>>>>> any questions and I will get back to you with answers. I'm so sorry if I
>>>>>>>>>>>>> answer a little bit due to the timezone. :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>> Nam
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 8:49 PM Aizhamal Nurmamat kyzy <
>>>>>>>>>>>>> aizhamal@apache.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Adding +Nam Bui <na...@polidea.com> and +Karolina Rosół
>>>>>>>>>>>>>> <ka...@polidea.com> to follow up on questions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 11:34 AM Ahmet Altay <
>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am having trouble reviewing the staged version. What is
>>>>>>>>>>>>>>> the best way to review this change?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Do we expect any changes to markdown files, beyond some
>>>>>>>>>>>>>>> metadata?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 10:45 AM Robert Bradshaw <
>>>>>>>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks. It'll be great to better support more languages.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I looked at the PR and there seems to be no
>>>>>>>>>>>>>>>> provenance/history. E.g. all the content seems to be entirely new files
>>>>>>>>>>>>>>>> rather than diffs from the old. (There also seems to be a huge amount of
>>>>>>>>>>>>>>>> auto-generated js code as well.)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I agree. This makes it very hard to review. I also see a
>>>>>>>>>>>>>>> bunch of deleted markdown files. Are they not getting migrated?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 10:23 AM Aizhamal Nurmamat kyzy <
>>>>>>>>>>>>>>>> aizhamal@apache.org> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hello everybody,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We are almost done migrating the Apache Beam website from
>>>>>>>>>>>>>>>>> Jekyll to Hugo. You can see the PR in [1], and we'd love to hear your
>>>>>>>>>>>>>>>>> feedback/comments on the PR. It includes  detailed guidelines on
>>>>>>>>>>>>>>>>> contributing to the new Hugo-based website and adding translations to pages
>>>>>>>>>>>>>>>>> [2]. For those who are curious about adding new languages, we will provide
>>>>>>>>>>>>>>>>> a proof of concept in the next couple of days in this thread.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Since we want to move forward with the PR, I would like to
>>>>>>>>>>>>>>>>> ask the community to hold off changes to the current Beam website for a
>>>>>>>>>>>>>>>>> week, until we are able to review and merge the PR. Is this acceptable to
>>>>>>>>>>>>>>>>> everyone?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In case anyone missed my previous email with the
>>>>>>>>>>>>>>>>> background for the website migration, you can find more context here [3].
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Aizhamal
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [1] https://github.com/apache/beam/pull/11554
>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/256b7042bf504b94f161ca03b388a2ba247918d9/website/CONTRIBUTE.md
>>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>>>> https://lists.apache.org/thread.html/r7fa6d710c0a1959cce5108e460d71c306ce5756cf96af818b41cb7ca%40%3Cdev.beam.apache.org%3E
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

Posted by Kenneth Knowles <ke...@apache.org>.
I believe taking Brian and Robert's advice to help git detect moves (even
more than you already have) will make this much more manageable. I just
tried it out and squashing commits brings it to "631 files changed, 10363
insertions(+), 9945 deletions(-)" according to git, so that is more
manageable than +47k - 47k. I'm not saying that a total squash is best.
There may be a better way to factor the changes.

Kenn

On Thu, Apr 30, 2020 at 8:09 PM Ahmet Altay <al...@google.com> wrote:

> Nam,
>
>  - Website looks good and looks the same as the current website. (Visually
> comparing a few pages, not a deep analysis.)
> - contribute.md looks good. (this is new content.)
> - website/Dockerfile and website/README.md changes look good.
> - I do not know what is the new version of some files, for example:
> website/src/_data/authors.yml,  website/src/_data/capability-matrix.yml --
> what replaces them?
>
> There are 887 file changes. It is not easy to review this. I wanted to go
> commit by commit, but that did not help much. How about we try to organize
> this review as reviewable commits.
> - Changes to the mechanics (jekyll to hugo), themes, build files, website
> related readmes etc. This will likely be a smaller change in number of
> files. (This will likely have many completed new, and completely deleted
> files. Only a few files have meaningful diffs.)
> - Changes to the content. This might be a large number of files with
> minimal changes. I do not think we can manually review each file, but at
> least a quick review of minimal changes to each file would be good enough.
>
> What do you think?
>
> Ahmet
>
> On Thu, Apr 30, 2020 at 4:29 PM Hannah Jiang <ha...@google.com>
> wrote:
>
>> Since we want to move forward with the PR, I would like to ask the
>>> community to hold off changes to the current Beam website for a week, until
>>> we are able to review and merge the PR. Is this acceptable to everyone?
>>
>> Do we have an exact date when we can push changes to the website? I have
>> PRs to update documents so would like to plan ahead.
>>
>> On Thu, Apr 30, 2020 at 1:17 PM Nam Bui <na...@polidea.com> wrote:
>>
>>> Hey guys,
>>>
>>> I tried my best to handle renamed files in Git. I have no clue why
>>> GitHub doesn't show it, but finally, I made this commit [1] (thanks for
>>> your idea @bhulette) so you guys can review changes with ease (there is no
>>> bunch of deleted markdown files anymore :D). Also, new staged version is
>>> deployed, you could check it out [2].
>>>
>>> In case you are interested in translation, here is the proof of concept
>>> [3] (the earth icon on the right corner is temporarily used for switching
>>> languages). You can take a look at the translation guide for this PoC [4].
>>>
>>> [1]
>>> https://github.com/apache/beam/pull/11554/commits/b267bb360866a723ac2536f408f23de648c7cd4d
>>> [2]
>>> http://apache-beam-website-pull-requests.storage.googleapis.com/11554/index.html
>>> [3] https://safe-relation.surge.sh/
>>> [4]
>>> https://github.com/PolideaInternal/beam/blob/website-develop/website/CONTRIBUTE.md#translation-guide
>>>
>>>
>>> On Thu, Apr 30, 2020 at 7:24 PM Brian Hulette <bh...@google.com>
>>> wrote:
>>>
>>>> Changing the URLs is fine with me as long as the old urls will work too.
>>>>
>>>> But do we need to change the filenames for the blog posts to accomplish
>>>> that? It's nice that the blog post markdown files start with a date so they
>>>> naturally sort chronologically. It looks like this hugo PR [1] made it
>>>> possible to extract date metadata and slug
>>>> (i.e. dataflow-python-sdk-is-now-public) separately from the filename.
>>>>
>>>> [1] https://github.com/gohugoio/hugo/pull/4494
>>>>
>>>> On Thu, Apr 30, 2020 at 10:06 AM Ahmet Altay <al...@google.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, Apr 30, 2020 at 9:55 AM Thomas Weise <th...@apache.org> wrote:
>>>>>
>>>>>> For changed URLs, will previous URLs be mapped to avoid broken
>>>>>> external links?
>>>>>>
>>>>>
>>>>> I believe the answer is yes from Nam's response "For now, we keep the
>>>>> old URLs working in terms of redirecting them". I very much agree that this
>>>>> is very important and should work for all existing urls.
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 30, 2020 at 9:34 AM Aizhamal Nurmamat kyzy <
>>>>>> aizhamal@apache.org> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> To give a little more context regarding the URLs, the date should
>>>>>>> still appear on the blog post, but not on the URL.
>>>>>>> For example, we'd have:
>>>>>>>
>>>>>>> https://beam.apache.org/beam/python/sdk/2016/02/25/python-sdk-now-public.html
>>>>>>> become
>>>>>>> https://beam.apache.org/blog/dataflow-python-sdk-is-now-public/.
>>>>>>>
>>>>>>
>>>>> I am not a content marketer. IMO, this is a good change. In the past,
>>>>> a few times, we edited dates on posts (e.g. a release date was entered
>>>>> incorrectly) and we had to either have a mismatch between dates in the url
>>>>> and the date in the blog, or change the url. This change simplifies, by
>>>>> having date only in place (in content metadata).
>>>>>
>>>>>
>>>>>>
>>>>>>> The blog posts would have a small header showing the title, author
>>>>>>> and publish date. But the URL would not have it.
>>>>>>> Thoughts?
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 30, 2020 at 9:23 AM Nam Bui <na...@polidea.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> @altay: Hey hey. Yeah, I didn't expect the baseUrl of staging
>>>>>>>> version is "
>>>>>>>> http://apache-beam-website-pull-requests.storage.googleapis.com/11554/"
>>>>>>>> which also includes "/11554", and Hugo considers it as a path so it breaks
>>>>>>>> the path of "static files" (like images). We made a fix. Now I'm working on
>>>>>>>> "getting git to recognize files as renames" as you suggested.
>>>>>>>>
>>>>>>>> @robert: The dates are nice but it causes verbose/long/ugly URLs.
>>>>>>>> We discussed with Aizhamal in the development stage and agreed to get rid
>>>>>>>> of this. For now, we keep the old URLs working in terms of redirecting
>>>>>>>> them. However, from now on, we should change the name convention on blog
>>>>>>>> posts to have a fancy URL like "beam.apache.org/blog/myblogpost.md".
>>>>>>>> :)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Apr 30, 2020 at 2:57 AM Robert Bradshaw <
>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>
>>>>>>>>> On Wed, Apr 29, 2020 at 5:08 PM Ahmet Altay <al...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Nam, this looks better. At least links are working, and the
>>>>>>>>>> website visually looks similar and generally in good shape. I think there
>>>>>>>>>> are still issues. For example, I do not see any of the images (e.g. the
>>>>>>>>>> beam logo on top left is missing.)
>>>>>>>>>>
>>>>>>>>>> On Wed, Apr 29, 2020 at 3:11 PM Brian Hulette <
>>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I left a comment on the PR [1]. I think the reason all of the
>>>>>>>>>>> website content is not being tracked as file renames is because there was a
>>>>>>>>>>> series of commits that created files in the new directory, and then one
>>>>>>>>>>> commit that deleted the old directory. If there were a single commit with
>>>>>>>>>>> all of the deleted and new files, git would surely recognize they are
>>>>>>>>>>> effectively renameds and mark them as such. Maybe we just need to get all
>>>>>>>>>>> these commits squashed into one?
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> https://github.com/apache/beam/pull/11554#issuecomment-621489844
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Nam, could you try this? If we can get git to recognize these as
>>>>>>>>>> renames, review process would be much easier.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> +1.
>>>>>>>>>
>>>>>>>>> Alternatively, create a commit that just moves the files into a
>>>>>>>>> new location (which git can always detect), then sit the edits on top of
>>>>>>>>> that (which should preserve history better).
>>>>>>>>>
>>>>>>>>> Also, is there a reason the dates were removed from the blog post
>>>>>>>>> filenames? For content like that, the dates are nice.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Apr 29, 2020 at 10:39 AM Nam Bui <na...@polidea.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>
>>>>>>>>>>>> I'm Nam - from the responsible team of Apache Beam website
>>>>>>>>>>>> migration. I am pleased to answer some of the questions here.
>>>>>>>>>>>>
>>>>>>>>>>>> @aizhamal: Thanks for informing to the community. :)
>>>>>>>>>>>> @altay, @robertwb: Yes. there is a problem with the staged
>>>>>>>>>>>> version at the moment. We didn't expect some behaviours on the build
>>>>>>>>>>>> process. So, we fixed it today and been waiting for @pablo to re-run it
>>>>>>>>>>>> again. The purpose of this PR is to migrate completely Beam site from
>>>>>>>>>>>> Jekyll to Hugo. Therefore, a bunch of deleted markdown files are from
>>>>>>>>>>>> Jekyll which was located at `beam/website/src`, and Hugo is located at
>>>>>>>>>>>> `beam/website/www` now. In `beam/website/README.md`, I wrote down about
>>>>>>>>>>>> running the Hugo website locally, although it is actually same as Jekyll
>>>>>>>>>>>> (because it's also set up with Docker & Gradle). In
>>>>>>>>>>>> `beam/website/CONTRIBUTE.md`, I guided people on how to get started with
>>>>>>>>>>>> Hugo on the Beam website. There is also a link in the "Translation Guide"
>>>>>>>>>>>> section which points to a branch of multilingual provenance, and it will
>>>>>>>>>>>> become a next PR soon.
>>>>>>>>>>>>
>>>>>>>>>>>> Please let me know if you need more details. Feel free to ask
>>>>>>>>>>>> any questions and I will get back to you with answers. I'm so sorry if I
>>>>>>>>>>>> answer a little bit due to the timezone. :)
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Nam
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Apr 28, 2020 at 8:49 PM Aizhamal Nurmamat kyzy <
>>>>>>>>>>>> aizhamal@apache.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Adding +Nam Bui <na...@polidea.com> and +Karolina Rosół
>>>>>>>>>>>>> <ka...@polidea.com> to follow up on questions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 11:34 AM Ahmet Altay <al...@google.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am having trouble reviewing the staged version. What is the
>>>>>>>>>>>>>> best way to review this change?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Do we expect any changes to markdown files, beyond some
>>>>>>>>>>>>>> metadata?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 10:45 AM Robert Bradshaw <
>>>>>>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks. It'll be great to better support more languages.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I looked at the PR and there seems to be no
>>>>>>>>>>>>>>> provenance/history. E.g. all the content seems to be entirely new files
>>>>>>>>>>>>>>> rather than diffs from the old. (There also seems to be a huge amount of
>>>>>>>>>>>>>>> auto-generated js code as well.)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I agree. This makes it very hard to review. I also see a
>>>>>>>>>>>>>> bunch of deleted markdown files. Are they not getting migrated?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Apr 28, 2020 at 10:23 AM Aizhamal Nurmamat kyzy <
>>>>>>>>>>>>>>> aizhamal@apache.org> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hello everybody,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We are almost done migrating the Apache Beam website from
>>>>>>>>>>>>>>>> Jekyll to Hugo. You can see the PR in [1], and we'd love to hear your
>>>>>>>>>>>>>>>> feedback/comments on the PR. It includes  detailed guidelines on
>>>>>>>>>>>>>>>> contributing to the new Hugo-based website and adding translations to pages
>>>>>>>>>>>>>>>> [2]. For those who are curious about adding new languages, we will provide
>>>>>>>>>>>>>>>> a proof of concept in the next couple of days in this thread.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Since we want to move forward with the PR, I would like to
>>>>>>>>>>>>>>>> ask the community to hold off changes to the current Beam website for a
>>>>>>>>>>>>>>>> week, until we are able to review and merge the PR. Is this acceptable to
>>>>>>>>>>>>>>>> everyone?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In case anyone missed my previous email with the background
>>>>>>>>>>>>>>>> for the website migration, you can find more context here [3].
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Aizhamal
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [1] https://github.com/apache/beam/pull/11554
>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/256b7042bf504b94f161ca03b388a2ba247918d9/website/CONTRIBUTE.md
>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>>> https://lists.apache.org/thread.html/r7fa6d710c0a1959cce5108e460d71c306ce5756cf96af818b41cb7ca%40%3Cdev.beam.apache.org%3E
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>