You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Daniel Oliveira <da...@google.com> on 2018/08/02 18:51:33 UTC

Re: Removing documentation for old Beam versions

The older docs should be recorded in the commit history of the website
repository, right? If they're not currently used in the website and they're
in the commit history then I don't see a reason to save them.

On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <eh...@google.com> wrote:

> Hi all,
> I'm writing a PR for apache/beam-site and beam_PreCommit_Website_Stage is
> timing out after 100 minutes, because it's trying to deletes 22k files and
> then copy 22k files (warning large file
> <https://builds.apache.org/job/beam_PreCommit_Website_Stage/1276/consoleText>
> ).
>
> It seems that we could save a lot of time by deleting the older javadoc
> and pydoc files for older versions. Is there a good reason to keep around
> this kind of documentation for older versions (say 1 year back)?
>

Re: Removing documentation for old Beam versions

Posted by Robert Bradshaw <ro...@google.com>.
On Thu, Sep 27, 2018 at 7:25 PM Melissa Pashniak <me...@google.com>
wrote:

>
> Ideally (IMO anyway), we would have versioned entire doc sets like most
> Apache projects that I looked at do (Spark [1], Flink, Hadoop, etc.) with a
> Latest + past releases, so users can read Beam docs appropriate to the
> version they are using. Would this run into the same situation as the
> javadoc/pydoc if we leave the generated html in apache/beam? It would be
> great if what we do can handle this future scenario without needing another
> overhaul.
>

+1

First priority is completing the migration of all the source files into the
beam repo, to simplify the developer process. I'm OK with intermediate
solutions here as long as we're not checking in generated output to
(master/release branches of) of the beam repo.

After that we can refine the process, offering automatically versioned docs
(including latest) and easy previewing of changes (e.g. during reviews).
Presumably we could borrow with Flink, Spark, ... are doing here with
little modification. (I don't know if looking at and borrowing from other
projects would help or slow finishing the initial migration, but my
impression is that where we are in the process now it wouldn't help.)


> [1] https://spark.apache.org/documentation.html
>
>
> On Wed, Sep 26, 2018 at 9:58 AM Udi Meiri <eh...@google.com> wrote:
>
>> Just to be clear, generated html for javadoc and pydoc will be put in
>> apache/beam-site, but generated html for .md files will be put in
>> apache/beam under the asf-site branch.
>>
>> On Wed, Sep 26, 2018 at 9:34 AM Thomas Weise <th...@apache.org> wrote:
>>
>>> Looks like the is agreement that all sources should be in the main beam
>>> repository, the remaining discussion was where the generated content should
>>> be served from, specifically the generated docs.
>>>
>>> If the setup that Alan found allows us to keep using the beam-site
>>> repository for the generated stuff and that does not unreasonably
>>> complicate the CI process, then I'm in favor of that. It looks cleaner to
>>> not mingle source and generated files in the same repo. Otherwise we can do
>>> the asf-site branch in the main repo and get rid of docs from it once we
>>> found a better solution.
>>>
>>>
>>> On Wed, Sep 26, 2018 at 7:09 AM Robert Bradshaw <ro...@google.com>
>>> wrote:
>>>
>>>> OK, thanks. That link was very helpful. Of the three options we must
>>>> use, checking into git seems preferable than checking into svn let alone
>>>> the CMS. Keeping the same repo means that it's harder to generate the docs
>>>> for version X while head is checked out.
>>>>
>>>> I'm in favor of moving forward with this in the short term, but we
>>>> should expore other options (like Flink has) for the longer term.
>>>>
>>>>
>>>>
>>>> On Wed, Sep 26, 2018 at 3:53 PM Scott Wegner <sc...@apache.org> wrote:
>>>>
>>>>> Yes. There are few options for publishing your ASF website, described
>>>>> here: https://www.apache.org/dev/project-site.html. We can publish
>>>>> from a Git repo, SVN, or a UI-based CMS interface.
>>>>>
>>>>> On Wed, Sep 26, 2018 at 9:45 AM Robert Bradshaw <ro...@google.com>
>>>>> wrote:
>>>>>
>>>>>> I am also definitely in favor of a single repository. Perhaps I'm
>>>>>> just misunderstanding why the generated must be put in a git repository at
>>>>>> all--is it because that's the easiest way to serve them?
>>>>>>
>>>>>> On Wed, Sep 26, 2018 at 3:39 PM Scott Wegner <sc...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Alan found the place where website publishing is configured [1],
>>>>>>> which has examples of project sites being configured with more than one git
>>>>>>> root. This is great for us because it allows us to leave generated
>>>>>>> javadocs/pydocs in the beam-site repository and publish website markdown
>>>>>>> content from the main repo.
>>>>>>>
>>>>>>> Alan has a PR ready to publish generated HTML in a post-commit job
>>>>>>> [2]. Once that goes through the last step is to upgrade the publishing
>>>>>>> config.
>>>>>>>
>>>>>>> [1]
>>>>>>> https://github.com/apache/infrastructure-puppet/blob/deployment/modules/gitwcsub/files/config/gitwcsub.cfg
>>>>>>> [2] https://github.com/apache/beam/pull/6431
>>>>>>>
>>>>>>> On Mon, Sep 24, 2018 at 4:35 PM Scott Wegner <sw...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> > We could add a new default branch (master?) and keep all the
>>>>>>>> non-generated files (src/) there, and put generated files (content/) in the
>>>>>>>> asf-site branch (like we already do).
>>>>>>>>
>>>>>>>> I'm strongly in favor of having sources in a single repository. We
>>>>>>>> have significant process and infrastructure built up for the apache/beam
>>>>>>>> repo (for build, PR, CI, release, etc.) that we can take advantage of by
>>>>>>>> putting website sources in the same repo. The current beam-site repo PR
>>>>>>>> automation is flaky because it was custom-built and not given the same
>>>>>>>> level of attention as the main repo.
>>>>>>>>
>>>>>>>> The caveat to consolidating website sources in the main repo is
>>>>>>>> that it incentivizes putting the generated sources branch on the same repo.
>>>>>>>> I've documented a few of the reasons in the Appendix of the design doc [1]:
>>>>>>>>  - It's easier to maintain a single repository; easily apply
>>>>>>>> existing tooling/infrastructure
>>>>>>>> - Jenkins tooling for publishing generated HTML may not work
>>>>>>>> cross-repo [2]
>>>>>>>>
>>>>>>>> My preference is to move forward with the migration of sources to
>>>>>>>> apache/beam [master], and website generated HTML to apache/beam [asf-site].
>>>>>>>> I like the idea of separating the publishing/hosting of generated
>>>>>>>> javadocs/pydocs since they add so much cruft, but it should not hold up the
>>>>>>>> migration.
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.wqwi2jpoiiuc
>>>>>>>>
>>>>>>>> [2]
>>>>>>>> https://stackoverflow.com/questions/14843696/checkout-multiple-git-repos-into-same-jenkins-workspace
>>>>>>>>
>>>>>>>> On Mon, Sep 24, 2018 at 2:33 PM Udi Meiri <eh...@google.com> wrote:
>>>>>>>>
>>>>>>>>> Staying on beam-site SGTM. We could add a new default branch
>>>>>>>>> (master?) and keep all the non-generated files (src/) there, and put
>>>>>>>>> generated files (content/) in the asf-site branch (like we already do).
>>>>>>>>> That way there's no confusion as to which files you should update.
>>>>>>>>> (This is of course assuming we still place generated docs in git
>>>>>>>>> repos.)
>>>>>>>>>
>>>>>>>>> On Mon, Sep 24, 2018 at 11:23 AM Thomas Weise <th...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> My thought was to leave the asf-site branch in the beam-site
>>>>>>>>>> repository, add generated docs to that branch (until we have a better
>>>>>>>>>> solution), and have only sources in the beam repo.
>>>>>>>>>>
>>>>>>>>>> Scott had filed https://issues.apache.org/jira/browse/BEAM-5459 -
>>>>>>>>>> it would eliminate the need to place generated docs into git repos.
>>>>>>>>>>
>>>>>>>>>> On Mon, Sep 24, 2018 at 11:06 AM Udi Meiri <eh...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I believe that beam.apache.org is populated from the asf-site
>>>>>>>>>>> branch of the apache/beam-site repo. (gitpubsub:
>>>>>>>>>>> https://www.apache.org/dev/project-site.html#intro)
>>>>>>>>>>> If we move the markdown-based docs to apache/beam, leave
>>>>>>>>>>> generated javadoc and pydoc in apache/beam-site, and point gitpubsub to
>>>>>>>>>>> apache/beam, then javadoc and pydoc will not get pushed to the website.
>>>>>>>>>>>
>>>>>>>>>>> Is there some place where we can push javadoc and pydoc files?
>>>>>>>>>>> Or perhaps there an alternative way to push updates to
>>>>>>>>>>> beam.apache.org? (not requiring the asf-site branch)
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Sep 21, 2018 at 6:40 PM Thomas Weise <th...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Scott,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for bringing the discussion back here.
>>>>>>>>>>>>
>>>>>>>>>>>> I agree that we should separate the changes for hosting of
>>>>>>>>>>>> generated java/pydocs from the rest of website automation so that we can
>>>>>>>>>>>> make the switch and fix the contributor headache soon.
>>>>>>>>>>>>
>>>>>>>>>>>> But perhaps we can avoid adding 4m lines of generated code to
>>>>>>>>>>>> the main beam repository (and keep on adding with every release) if we
>>>>>>>>>>>> continue to serve the site from the old beam-site repo? (I left a comment
>>>>>>>>>>>> the doc.)
>>>>>>>>>>>>
>>>>>>>>>>>> About trying buildbot, as mentioned earlier I would be happy to
>>>>>>>>>>>> help with it. I prefer a setup that keeps the docs separate from the web
>>>>>>>>>>>> site.
>>>>>>>>>>>>
>>>>>>>>>>>> Thomas
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Sep 21, 2018 at 10:28 AM Scott Wegner <sc...@apache.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Re-opening this thread as it came up today in the discussion
>>>>>>>>>>>>> for PR#6458 [1]. This PR is part of the work for Beam-Site Automation
>>>>>>>>>>>>> Reliability improvements; design doc here:
>>>>>>>>>>>>> https://s.apache.org/beam-site-automation
>>>>>>>>>>>>>
>>>>>>>>>>>>> The current plan is to keep generated javadoc/pydoc sources
>>>>>>>>>>>>> only on the asf-site branch, which is necessary for the current
>>>>>>>>>>>>> githubpubsub publishing mechanism. This maintains our current approach, the
>>>>>>>>>>>>> only change being that we're moving the asf-site branch from the retiring
>>>>>>>>>>>>> apache/beam-site repository into a new apache/beam repo branch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The concern for committing generated content is the extra
>>>>>>>>>>>>> overhead during git fetch. I did some analysis to measure the impact [2],
>>>>>>>>>>>>> and found that fetching a week of source + generated content history from
>>>>>>>>>>>>> apache/beam-site took 0.39 seconds.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I like the idea of publishing javadoc/pydoc snapshots to an
>>>>>>>>>>>>> external location like Flink does with buildbot, but that work is separable
>>>>>>>>>>>>> and shouldn't be a prerequisite for this effort. The goal of this work is
>>>>>>>>>>>>> to improve the reliability of automation for contributing website changes.
>>>>>>>>>>>>> At last measure, only about half of beam-site PR merges use Mergebot
>>>>>>>>>>>>> without experiencing some reliability issue [3].
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've opened BEAM-5459 [4] to track moving our generated docs
>>>>>>>>>>>>> out of git. Thomas, would you have bandwidth to look into this?
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> https://github.com/apache/beam/pull/6458#issuecomment-423406643
>>>>>>>>>>>>>
>>>>>>>>>>>>> [2]
>>>>>>>>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.uqzivheohd7j
>>>>>>>>>>>>> [3]
>>>>>>>>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.a208cwi78xmu
>>>>>>>>>>>>> [4] https://issues.apache.org/jira/browse/BEAM-5459
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Aug 24, 2018 at 11:48 AM Thomas Weise <th...@apache.org>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Udi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Good to know you will continue this work.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Let me know if you want to try the buildbot route (which does
>>>>>>>>>>>>>> not require generated documentation to be checked into the repo). Happy to
>>>>>>>>>>>>>> help with that.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thomas
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Aug 24, 2018 at 11:36 AM Udi Meiri <eh...@google.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm picking up the website migration. The plan is to not
>>>>>>>>>>>>>>> include generated files in the master branch.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> However, I've been told that even putting generated files a
>>>>>>>>>>>>>>> separate branch could blow up the git repository for all (e.g. make git
>>>>>>>>>>>>>>> pulls a lot longer?).
>>>>>>>>>>>>>>> Not sure if this is a real issue or not.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw <
>>>>>>>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise <th...@apache.org>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > Yes, I think the separation of generated code will need
>>>>>>>>>>>>>>>> to occur prior to completing the merge and switching the web site to the
>>>>>>>>>>>>>>>> main repo.
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > There should be no reason to check generated
>>>>>>>>>>>>>>>> documentation into either of the repos/branches.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Huge +1 to this. Thomas, would have time to set something
>>>>>>>>>>>>>>>> like this up
>>>>>>>>>>>>>>>> for Beam? If not, could anyone else pick this up?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> > Please see as an example how this was solved in Flink,
>>>>>>>>>>>>>>>> using the ASF buildbot infrastructure.
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > Documentation per version/release, for example:
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > The buildbot configuration is here (requires committer
>>>>>>>>>>>>>>>> access):
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > Thanks,
>>>>>>>>>>>>>>>> > Thomas
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin <
>>>>>>>>>>>>>>>> migryz@google.com> wrote:
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> Last time I talked with Scott I brought this idea in. I
>>>>>>>>>>>>>>>> believe the plan was either to publish compiled site to website directly,
>>>>>>>>>>>>>>>> or keep it in separate storage from apache/beam repo.
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> One of the main reasons not to check in compiled version
>>>>>>>>>>>>>>>> of website is that every developer will have to pull all the versions of
>>>>>>>>>>>>>>>> website every time they clone repo, which is not that good of an idea to do.
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> Regards,
>>>>>>>>>>>>>>>> >> --Mikhail
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> Have feedback?
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <
>>>>>>>>>>>>>>>> ehudm@google.com> wrote:
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>> Pablo, the docs are generated into versioned paths,
>>>>>>>>>>>>>>>> e.g.,
>>>>>>>>>>>>>>>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/
>>>>>>>>>>>>>>>> so tags are not necessary?
>>>>>>>>>>>>>>>> >>> Also, once apache/beam-site is merged with apache/beam
>>>>>>>>>>>>>>>> the release branch should have the relevant docs (although perhaps it's
>>>>>>>>>>>>>>>> better to put them in a different repo or storage system).
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>> Thomas, I would very much like to not have
>>>>>>>>>>>>>>>> javadoc/pydoc generation be part of the website review process, as it takes
>>>>>>>>>>>>>>>> up a lot of time when changes are staged (10s of thousands of files),
>>>>>>>>>>>>>>>> especially when a PR is updated and existing staged files need to be
>>>>>>>>>>>>>>>> deleted.
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <
>>>>>>>>>>>>>>>> migryz@google.com> wrote:
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>> +1 For removing old documentation.
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>> @Thomas: Migration work is in backlog and will be
>>>>>>>>>>>>>>>> picked up in near time.
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>> --Mikhail
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>> Have feedback?
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <
>>>>>>>>>>>>>>>> thw@apache.org> wrote:
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>>> +1 for removing pre 2.0 documentation (as well as the
>>>>>>>>>>>>>>>> entries from https://beam.apache.org/get-started/downloads/
>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>>> Isn't it part of the beam-site changes that we will
>>>>>>>>>>>>>>>> no longer check in generated documentation into the repository? Those can
>>>>>>>>>>>>>>>> be generated and deployed independently (when a commit to a branch occurs),
>>>>>>>>>>>>>>>> such as done in the Apex and Flink projects.
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>>> I was told that Scott who was working in the
>>>>>>>>>>>>>>>> beam-site changes is on leave now and the migration is still pending (see
>>>>>>>>>>>>>>>> note at https://github.com/apache/beam/tree/master/website).
>>>>>>>>>>>>>>>> Is anyone else going to pick it up?
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>>> Thanks,
>>>>>>>>>>>>>>>> >>>>> Thomas
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <
>>>>>>>>>>>>>>>> pabloem@google.com> wrote:
>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>> >>>>>> Is it worth adding a tag / branch to the
>>>>>>>>>>>>>>>> repositories every time we make a release, so that people are able to dive
>>>>>>>>>>>>>>>> in and find the docs?
>>>>>>>>>>>>>>>> >>>>>> Best
>>>>>>>>>>>>>>>> >>>>>> -P.
>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>> >>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <
>>>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>> >>>>>>> I would guess that users are still using some of
>>>>>>>>>>>>>>>> these old releases. It is unclear from Beam website which releases are
>>>>>>>>>>>>>>>> still supported or not. It probably makes sense to drop documentation for
>>>>>>>>>>>>>>>> releases < 2.0. (I would suggest keeping docs for 2.0). For the future I
>>>>>>>>>>>>>>>> can work on updating the Beam website to clarify the state of each release.
>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>> >>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <
>>>>>>>>>>>>>>>> ehudm@google.com> wrote:
>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>> The older docs are not directly linked to and are
>>>>>>>>>>>>>>>> in Github commit history.
>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>> If there are no objections I'm going to delete
>>>>>>>>>>>>>>>> javadocs and pydocs for releases older than 1 year,
>>>>>>>>>>>>>>>> >>>>>>>> meaning 2.0.0 and older (going by the dates here).
>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>>>>>>>>>>>>>>>> danoliveira@google.com> wrote:
>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>> The older docs should be recorded in the commit
>>>>>>>>>>>>>>>> history of the website repository, right? If they're not currently used in
>>>>>>>>>>>>>>>> the website and they're in the commit history then I don't see a reason to
>>>>>>>>>>>>>>>> save them.
>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <
>>>>>>>>>>>>>>>> ehudm@google.com> wrote:
>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>> >>>>>>>>>> I'm writing a PR for apache/beam-site and
>>>>>>>>>>>>>>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
>>>>>>>>>>>>>>>> trying to deletes 22k files and then copy 22k files (warning large file).
>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>> It seems that we could save a lot of time by
>>>>>>>>>>>>>>>> deleting the older javadoc and pydoc files for older versions. Is there a
>>>>>>>>>>>>>>>> good reason to keep around this kind of documentation for older versions
>>>>>>>>>>>>>>>> (say 1 year back)?
>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>> >>>>>> --
>>>>>>>>>>>>>>>> >>>>>> Got feedback? go/pabloem-feedback
>>>>>>>>>>>>>>>> <https://goto.google.com/pabloem-feedback>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>>
>>>>

Re: Removing documentation for old Beam versions

Posted by Melissa Pashniak <me...@google.com>.
Ideally (IMO anyway), we would have versioned entire doc sets like most
Apache projects that I looked at do (Spark [1], Flink, Hadoop, etc.) with a
Latest + past releases, so users can read Beam docs appropriate to the
version they are using. Would this run into the same situation as the
javadoc/pydoc if we leave the generated html in apache/beam? It would be
great if what we do can handle this future scenario without needing another
overhaul.

[1] https://spark.apache.org/documentation.html


On Wed, Sep 26, 2018 at 9:58 AM Udi Meiri <eh...@google.com> wrote:

> Just to be clear, generated html for javadoc and pydoc will be put in
> apache/beam-site, but generated html for .md files will be put in
> apache/beam under the asf-site branch.
>
> On Wed, Sep 26, 2018 at 9:34 AM Thomas Weise <th...@apache.org> wrote:
>
>> Looks like the is agreement that all sources should be in the main beam
>> repository, the remaining discussion was where the generated content should
>> be served from, specifically the generated docs.
>>
>> If the setup that Alan found allows us to keep using the beam-site
>> repository for the generated stuff and that does not unreasonably
>> complicate the CI process, then I'm in favor of that. It looks cleaner to
>> not mingle source and generated files in the same repo. Otherwise we can do
>> the asf-site branch in the main repo and get rid of docs from it once we
>> found a better solution.
>>
>>
>> On Wed, Sep 26, 2018 at 7:09 AM Robert Bradshaw <ro...@google.com>
>> wrote:
>>
>>> OK, thanks. That link was very helpful. Of the three options we must
>>> use, checking into git seems preferable than checking into svn let alone
>>> the CMS. Keeping the same repo means that it's harder to generate the docs
>>> for version X while head is checked out.
>>>
>>> I'm in favor of moving forward with this in the short term, but we
>>> should expore other options (like Flink has) for the longer term.
>>>
>>>
>>>
>>> On Wed, Sep 26, 2018 at 3:53 PM Scott Wegner <sc...@apache.org> wrote:
>>>
>>>> Yes. There are few options for publishing your ASF website, described
>>>> here: https://www.apache.org/dev/project-site.html. We can publish
>>>> from a Git repo, SVN, or a UI-based CMS interface.
>>>>
>>>> On Wed, Sep 26, 2018 at 9:45 AM Robert Bradshaw <ro...@google.com>
>>>> wrote:
>>>>
>>>>> I am also definitely in favor of a single repository. Perhaps I'm just
>>>>> misunderstanding why the generated must be put in a git repository at
>>>>> all--is it because that's the easiest way to serve them?
>>>>>
>>>>> On Wed, Sep 26, 2018 at 3:39 PM Scott Wegner <sc...@apache.org> wrote:
>>>>>
>>>>>> Alan found the place where website publishing is configured [1],
>>>>>> which has examples of project sites being configured with more than one git
>>>>>> root. This is great for us because it allows us to leave generated
>>>>>> javadocs/pydocs in the beam-site repository and publish website markdown
>>>>>> content from the main repo.
>>>>>>
>>>>>> Alan has a PR ready to publish generated HTML in a post-commit job
>>>>>> [2]. Once that goes through the last step is to upgrade the publishing
>>>>>> config.
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/apache/infrastructure-puppet/blob/deployment/modules/gitwcsub/files/config/gitwcsub.cfg
>>>>>> [2] https://github.com/apache/beam/pull/6431
>>>>>>
>>>>>> On Mon, Sep 24, 2018 at 4:35 PM Scott Wegner <sw...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> > We could add a new default branch (master?) and keep all the
>>>>>>> non-generated files (src/) there, and put generated files (content/) in the
>>>>>>> asf-site branch (like we already do).
>>>>>>>
>>>>>>> I'm strongly in favor of having sources in a single repository. We
>>>>>>> have significant process and infrastructure built up for the apache/beam
>>>>>>> repo (for build, PR, CI, release, etc.) that we can take advantage of by
>>>>>>> putting website sources in the same repo. The current beam-site repo PR
>>>>>>> automation is flaky because it was custom-built and not given the same
>>>>>>> level of attention as the main repo.
>>>>>>>
>>>>>>> The caveat to consolidating website sources in the main repo is that
>>>>>>> it incentivizes putting the generated sources branch on the same repo. I've
>>>>>>> documented a few of the reasons in the Appendix of the design doc [1]:
>>>>>>>  - It's easier to maintain a single repository; easily apply
>>>>>>> existing tooling/infrastructure
>>>>>>> - Jenkins tooling for publishing generated HTML may not work
>>>>>>> cross-repo [2]
>>>>>>>
>>>>>>> My preference is to move forward with the migration of sources to
>>>>>>> apache/beam [master], and website generated HTML to apache/beam [asf-site].
>>>>>>> I like the idea of separating the publishing/hosting of generated
>>>>>>> javadocs/pydocs since they add so much cruft, but it should not hold up the
>>>>>>> migration.
>>>>>>>
>>>>>>> [1]
>>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.wqwi2jpoiiuc
>>>>>>>
>>>>>>> [2]
>>>>>>> https://stackoverflow.com/questions/14843696/checkout-multiple-git-repos-into-same-jenkins-workspace
>>>>>>>
>>>>>>> On Mon, Sep 24, 2018 at 2:33 PM Udi Meiri <eh...@google.com> wrote:
>>>>>>>
>>>>>>>> Staying on beam-site SGTM. We could add a new default branch
>>>>>>>> (master?) and keep all the non-generated files (src/) there, and put
>>>>>>>> generated files (content/) in the asf-site branch (like we already do).
>>>>>>>> That way there's no confusion as to which files you should update.
>>>>>>>> (This is of course assuming we still place generated docs in git
>>>>>>>> repos.)
>>>>>>>>
>>>>>>>> On Mon, Sep 24, 2018 at 11:23 AM Thomas Weise <th...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> My thought was to leave the asf-site branch in the beam-site
>>>>>>>>> repository, add generated docs to that branch (until we have a better
>>>>>>>>> solution), and have only sources in the beam repo.
>>>>>>>>>
>>>>>>>>> Scott had filed https://issues.apache.org/jira/browse/BEAM-5459 -
>>>>>>>>> it would eliminate the need to place generated docs into git repos.
>>>>>>>>>
>>>>>>>>> On Mon, Sep 24, 2018 at 11:06 AM Udi Meiri <eh...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I believe that beam.apache.org is populated from the asf-site
>>>>>>>>>> branch of the apache/beam-site repo. (gitpubsub:
>>>>>>>>>> https://www.apache.org/dev/project-site.html#intro)
>>>>>>>>>> If we move the markdown-based docs to apache/beam, leave
>>>>>>>>>> generated javadoc and pydoc in apache/beam-site, and point gitpubsub to
>>>>>>>>>> apache/beam, then javadoc and pydoc will not get pushed to the website.
>>>>>>>>>>
>>>>>>>>>> Is there some place where we can push javadoc and pydoc files? Or
>>>>>>>>>> perhaps there an alternative way to push updates to
>>>>>>>>>> beam.apache.org? (not requiring the asf-site branch)
>>>>>>>>>>
>>>>>>>>>> On Fri, Sep 21, 2018 at 6:40 PM Thomas Weise <th...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Scott,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for bringing the discussion back here.
>>>>>>>>>>>
>>>>>>>>>>> I agree that we should separate the changes for hosting of
>>>>>>>>>>> generated java/pydocs from the rest of website automation so that we can
>>>>>>>>>>> make the switch and fix the contributor headache soon.
>>>>>>>>>>>
>>>>>>>>>>> But perhaps we can avoid adding 4m lines of generated code to
>>>>>>>>>>> the main beam repository (and keep on adding with every release) if we
>>>>>>>>>>> continue to serve the site from the old beam-site repo? (I left a comment
>>>>>>>>>>> the doc.)
>>>>>>>>>>>
>>>>>>>>>>> About trying buildbot, as mentioned earlier I would be happy to
>>>>>>>>>>> help with it. I prefer a setup that keeps the docs separate from the web
>>>>>>>>>>> site.
>>>>>>>>>>>
>>>>>>>>>>> Thomas
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Sep 21, 2018 at 10:28 AM Scott Wegner <sc...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Re-opening this thread as it came up today in the discussion
>>>>>>>>>>>> for PR#6458 [1]. This PR is part of the work for Beam-Site Automation
>>>>>>>>>>>> Reliability improvements; design doc here:
>>>>>>>>>>>> https://s.apache.org/beam-site-automation
>>>>>>>>>>>>
>>>>>>>>>>>> The current plan is to keep generated javadoc/pydoc sources
>>>>>>>>>>>> only on the asf-site branch, which is necessary for the current
>>>>>>>>>>>> githubpubsub publishing mechanism. This maintains our current approach, the
>>>>>>>>>>>> only change being that we're moving the asf-site branch from the retiring
>>>>>>>>>>>> apache/beam-site repository into a new apache/beam repo branch.
>>>>>>>>>>>>
>>>>>>>>>>>> The concern for committing generated content is the extra
>>>>>>>>>>>> overhead during git fetch. I did some analysis to measure the impact [2],
>>>>>>>>>>>> and found that fetching a week of source + generated content history from
>>>>>>>>>>>> apache/beam-site took 0.39 seconds.
>>>>>>>>>>>>
>>>>>>>>>>>> I like the idea of publishing javadoc/pydoc snapshots to an
>>>>>>>>>>>> external location like Flink does with buildbot, but that work is separable
>>>>>>>>>>>> and shouldn't be a prerequisite for this effort. The goal of this work is
>>>>>>>>>>>> to improve the reliability of automation for contributing website changes.
>>>>>>>>>>>> At last measure, only about half of beam-site PR merges use Mergebot
>>>>>>>>>>>> without experiencing some reliability issue [3].
>>>>>>>>>>>>
>>>>>>>>>>>> I've opened BEAM-5459 [4] to track moving our generated docs
>>>>>>>>>>>> out of git. Thomas, would you have bandwidth to look into this?
>>>>>>>>>>>>
>>>>>>>>>>>> [1]
>>>>>>>>>>>> https://github.com/apache/beam/pull/6458#issuecomment-423406643
>>>>>>>>>>>>
>>>>>>>>>>>> [2]
>>>>>>>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.uqzivheohd7j
>>>>>>>>>>>> [3]
>>>>>>>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.a208cwi78xmu
>>>>>>>>>>>> [4] https://issues.apache.org/jira/browse/BEAM-5459
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Aug 24, 2018 at 11:48 AM Thomas Weise <th...@apache.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Udi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Good to know you will continue this work.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Let me know if you want to try the buildbot route (which does
>>>>>>>>>>>>> not require generated documentation to be checked into the repo). Happy to
>>>>>>>>>>>>> help with that.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thomas
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Aug 24, 2018 at 11:36 AM Udi Meiri <eh...@google.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm picking up the website migration. The plan is to not
>>>>>>>>>>>>>> include generated files in the master branch.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> However, I've been told that even putting generated files a
>>>>>>>>>>>>>> separate branch could blow up the git repository for all (e.g. make git
>>>>>>>>>>>>>> pulls a lot longer?).
>>>>>>>>>>>>>> Not sure if this is a real issue or not.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw <
>>>>>>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise <th...@apache.org>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Yes, I think the separation of generated code will need to
>>>>>>>>>>>>>>> occur prior to completing the merge and switching the web site to the main
>>>>>>>>>>>>>>> repo.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > There should be no reason to check generated documentation
>>>>>>>>>>>>>>> into either of the repos/branches.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Huge +1 to this. Thomas, would have time to set something
>>>>>>>>>>>>>>> like this up
>>>>>>>>>>>>>>> for Beam? If not, could anyone else pick this up?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > Please see as an example how this was solved in Flink,
>>>>>>>>>>>>>>> using the ASF buildbot infrastructure.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Documentation per version/release, for example:
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > The buildbot configuration is here (requires committer
>>>>>>>>>>>>>>> access):
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Thanks,
>>>>>>>>>>>>>>> > Thomas
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin <
>>>>>>>>>>>>>>> migryz@google.com> wrote:
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> Last time I talked with Scott I brought this idea in. I
>>>>>>>>>>>>>>> believe the plan was either to publish compiled site to website directly,
>>>>>>>>>>>>>>> or keep it in separate storage from apache/beam repo.
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> One of the main reasons not to check in compiled version
>>>>>>>>>>>>>>> of website is that every developer will have to pull all the versions of
>>>>>>>>>>>>>>> website every time they clone repo, which is not that good of an idea to do.
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> Regards,
>>>>>>>>>>>>>>> >> --Mikhail
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> Have feedback?
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <
>>>>>>>>>>>>>>> ehudm@google.com> wrote:
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> Pablo, the docs are generated into versioned paths,
>>>>>>>>>>>>>>> e.g.,
>>>>>>>>>>>>>>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/
>>>>>>>>>>>>>>> so tags are not necessary?
>>>>>>>>>>>>>>> >>> Also, once apache/beam-site is merged with apache/beam
>>>>>>>>>>>>>>> the release branch should have the relevant docs (although perhaps it's
>>>>>>>>>>>>>>> better to put them in a different repo or storage system).
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> Thomas, I would very much like to not have javadoc/pydoc
>>>>>>>>>>>>>>> generation be part of the website review process, as it takes up a lot of
>>>>>>>>>>>>>>> time when changes are staged (10s of thousands of files), especially when a
>>>>>>>>>>>>>>> PR is updated and existing staged files need to be deleted.
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <
>>>>>>>>>>>>>>> migryz@google.com> wrote:
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> +1 For removing old documentation.
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> @Thomas: Migration work is in backlog and will be
>>>>>>>>>>>>>>> picked up in near time.
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> --Mikhail
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> Have feedback?
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <
>>>>>>>>>>>>>>> thw@apache.org> wrote:
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> +1 for removing pre 2.0 documentation (as well as the
>>>>>>>>>>>>>>> entries from https://beam.apache.org/get-started/downloads/)
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> Isn't it part of the beam-site changes that we will no
>>>>>>>>>>>>>>> longer check in generated documentation into the repository? Those can be
>>>>>>>>>>>>>>> generated and deployed independently (when a commit to a branch occurs),
>>>>>>>>>>>>>>> such as done in the Apex and Flink projects.
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> I was told that Scott who was working in the beam-site
>>>>>>>>>>>>>>> changes is on leave now and the migration is still pending (see note at
>>>>>>>>>>>>>>> https://github.com/apache/beam/tree/master/website). Is
>>>>>>>>>>>>>>> anyone else going to pick it up?
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> Thanks,
>>>>>>>>>>>>>>> >>>>> Thomas
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <
>>>>>>>>>>>>>>> pabloem@google.com> wrote:
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> Is it worth adding a tag / branch to the repositories
>>>>>>>>>>>>>>> every time we make a release, so that people are able to dive in and find
>>>>>>>>>>>>>>> the docs?
>>>>>>>>>>>>>>> >>>>>> Best
>>>>>>>>>>>>>>> >>>>>> -P.
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <
>>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>> >>>>>>> I would guess that users are still using some of
>>>>>>>>>>>>>>> these old releases. It is unclear from Beam website which releases are
>>>>>>>>>>>>>>> still supported or not. It probably makes sense to drop documentation for
>>>>>>>>>>>>>>> releases < 2.0. (I would suggest keeping docs for 2.0). For the future I
>>>>>>>>>>>>>>> can work on updating the Beam website to clarify the state of each release.
>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>> >>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <
>>>>>>>>>>>>>>> ehudm@google.com> wrote:
>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>> The older docs are not directly linked to and are
>>>>>>>>>>>>>>> in Github commit history.
>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>> If there are no objections I'm going to delete
>>>>>>>>>>>>>>> javadocs and pydocs for releases older than 1 year,
>>>>>>>>>>>>>>> >>>>>>>> meaning 2.0.0 and older (going by the dates here).
>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>>>>>>>>>>>>>>> danoliveira@google.com> wrote:
>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>> The older docs should be recorded in the commit
>>>>>>>>>>>>>>> history of the website repository, right? If they're not currently used in
>>>>>>>>>>>>>>> the website and they're in the commit history then I don't see a reason to
>>>>>>>>>>>>>>> save them.
>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <
>>>>>>>>>>>>>>> ehudm@google.com> wrote:
>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>>> Hi all,
>>>>>>>>>>>>>>> >>>>>>>>>> I'm writing a PR for apache/beam-site and
>>>>>>>>>>>>>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
>>>>>>>>>>>>>>> trying to deletes 22k files and then copy 22k files (warning large file).
>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>>> It seems that we could save a lot of time by
>>>>>>>>>>>>>>> deleting the older javadoc and pydoc files for older versions. Is there a
>>>>>>>>>>>>>>> good reason to keep around this kind of documentation for older versions
>>>>>>>>>>>>>>> (say 1 year back)?
>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>> >>>>>> --
>>>>>>>>>>>>>>> >>>>>> Got feedback? go/pabloem-feedback
>>>>>>>>>>>>>>> <https://goto.google.com/pabloem-feedback>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>
>>>

Re: Removing documentation for old Beam versions

Posted by Udi Meiri <eh...@google.com>.
Just to be clear, generated html for javadoc and pydoc will be put in
apache/beam-site, but generated html for .md files will be put in
apache/beam under the asf-site branch.

On Wed, Sep 26, 2018 at 9:34 AM Thomas Weise <th...@apache.org> wrote:

> Looks like the is agreement that all sources should be in the main beam
> repository, the remaining discussion was where the generated content should
> be served from, specifically the generated docs.
>
> If the setup that Alan found allows us to keep using the beam-site
> repository for the generated stuff and that does not unreasonably
> complicate the CI process, then I'm in favor of that. It looks cleaner to
> not mingle source and generated files in the same repo. Otherwise we can do
> the asf-site branch in the main repo and get rid of docs from it once we
> found a better solution.
>
>
> On Wed, Sep 26, 2018 at 7:09 AM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> OK, thanks. That link was very helpful. Of the three options we must use,
>> checking into git seems preferable than checking into svn let alone the
>> CMS. Keeping the same repo means that it's harder to generate the docs for
>> version X while head is checked out.
>>
>> I'm in favor of moving forward with this in the short term, but we should
>> expore other options (like Flink has) for the longer term.
>>
>>
>>
>> On Wed, Sep 26, 2018 at 3:53 PM Scott Wegner <sc...@apache.org> wrote:
>>
>>> Yes. There are few options for publishing your ASF website, described
>>> here: https://www.apache.org/dev/project-site.html. We can publish from
>>> a Git repo, SVN, or a UI-based CMS interface.
>>>
>>> On Wed, Sep 26, 2018 at 9:45 AM Robert Bradshaw <ro...@google.com>
>>> wrote:
>>>
>>>> I am also definitely in favor of a single repository. Perhaps I'm just
>>>> misunderstanding why the generated must be put in a git repository at
>>>> all--is it because that's the easiest way to serve them?
>>>>
>>>> On Wed, Sep 26, 2018 at 3:39 PM Scott Wegner <sc...@apache.org> wrote:
>>>>
>>>>> Alan found the place where website publishing is configured [1], which
>>>>> has examples of project sites being configured with more than one git root.
>>>>> This is great for us because it allows us to leave generated
>>>>> javadocs/pydocs in the beam-site repository and publish website markdown
>>>>> content from the main repo.
>>>>>
>>>>> Alan has a PR ready to publish generated HTML in a post-commit job
>>>>> [2]. Once that goes through the last step is to upgrade the publishing
>>>>> config.
>>>>>
>>>>> [1]
>>>>> https://github.com/apache/infrastructure-puppet/blob/deployment/modules/gitwcsub/files/config/gitwcsub.cfg
>>>>> [2] https://github.com/apache/beam/pull/6431
>>>>>
>>>>> On Mon, Sep 24, 2018 at 4:35 PM Scott Wegner <sw...@google.com>
>>>>> wrote:
>>>>>
>>>>>> > We could add a new default branch (master?) and keep all the
>>>>>> non-generated files (src/) there, and put generated files (content/) in the
>>>>>> asf-site branch (like we already do).
>>>>>>
>>>>>> I'm strongly in favor of having sources in a single repository. We
>>>>>> have significant process and infrastructure built up for the apache/beam
>>>>>> repo (for build, PR, CI, release, etc.) that we can take advantage of by
>>>>>> putting website sources in the same repo. The current beam-site repo PR
>>>>>> automation is flaky because it was custom-built and not given the same
>>>>>> level of attention as the main repo.
>>>>>>
>>>>>> The caveat to consolidating website sources in the main repo is that
>>>>>> it incentivizes putting the generated sources branch on the same repo. I've
>>>>>> documented a few of the reasons in the Appendix of the design doc [1]:
>>>>>>  - It's easier to maintain a single repository; easily apply existing
>>>>>> tooling/infrastructure
>>>>>> - Jenkins tooling for publishing generated HTML may not work
>>>>>> cross-repo [2]
>>>>>>
>>>>>> My preference is to move forward with the migration of sources to
>>>>>> apache/beam [master], and website generated HTML to apache/beam [asf-site].
>>>>>> I like the idea of separating the publishing/hosting of generated
>>>>>> javadocs/pydocs since they add so much cruft, but it should not hold up the
>>>>>> migration.
>>>>>>
>>>>>> [1]
>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.wqwi2jpoiiuc
>>>>>>
>>>>>> [2]
>>>>>> https://stackoverflow.com/questions/14843696/checkout-multiple-git-repos-into-same-jenkins-workspace
>>>>>>
>>>>>> On Mon, Sep 24, 2018 at 2:33 PM Udi Meiri <eh...@google.com> wrote:
>>>>>>
>>>>>>> Staying on beam-site SGTM. We could add a new default branch
>>>>>>> (master?) and keep all the non-generated files (src/) there, and put
>>>>>>> generated files (content/) in the asf-site branch (like we already do).
>>>>>>> That way there's no confusion as to which files you should update.
>>>>>>> (This is of course assuming we still place generated docs in git
>>>>>>> repos.)
>>>>>>>
>>>>>>> On Mon, Sep 24, 2018 at 11:23 AM Thomas Weise <th...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> My thought was to leave the asf-site branch in the beam-site
>>>>>>>> repository, add generated docs to that branch (until we have a better
>>>>>>>> solution), and have only sources in the beam repo.
>>>>>>>>
>>>>>>>> Scott had filed https://issues.apache.org/jira/browse/BEAM-5459 -
>>>>>>>> it would eliminate the need to place generated docs into git repos.
>>>>>>>>
>>>>>>>> On Mon, Sep 24, 2018 at 11:06 AM Udi Meiri <eh...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I believe that beam.apache.org is populated from the asf-site
>>>>>>>>> branch of the apache/beam-site repo. (gitpubsub:
>>>>>>>>> https://www.apache.org/dev/project-site.html#intro)
>>>>>>>>> If we move the markdown-based docs to apache/beam, leave generated
>>>>>>>>> javadoc and pydoc in apache/beam-site, and point gitpubsub to apache/beam,
>>>>>>>>> then javadoc and pydoc will not get pushed to the website.
>>>>>>>>>
>>>>>>>>> Is there some place where we can push javadoc and pydoc files? Or
>>>>>>>>> perhaps there an alternative way to push updates to
>>>>>>>>> beam.apache.org? (not requiring the asf-site branch)
>>>>>>>>>
>>>>>>>>> On Fri, Sep 21, 2018 at 6:40 PM Thomas Weise <th...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Scott,
>>>>>>>>>>
>>>>>>>>>> Thanks for bringing the discussion back here.
>>>>>>>>>>
>>>>>>>>>> I agree that we should separate the changes for hosting of
>>>>>>>>>> generated java/pydocs from the rest of website automation so that we can
>>>>>>>>>> make the switch and fix the contributor headache soon.
>>>>>>>>>>
>>>>>>>>>> But perhaps we can avoid adding 4m lines of generated code to the
>>>>>>>>>> main beam repository (and keep on adding with every release) if we continue
>>>>>>>>>> to serve the site from the old beam-site repo? (I left a comment the doc.)
>>>>>>>>>>
>>>>>>>>>> About trying buildbot, as mentioned earlier I would be happy to
>>>>>>>>>> help with it. I prefer a setup that keeps the docs separate from the web
>>>>>>>>>> site.
>>>>>>>>>>
>>>>>>>>>> Thomas
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Sep 21, 2018 at 10:28 AM Scott Wegner <sc...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Re-opening this thread as it came up today in the discussion for
>>>>>>>>>>> PR#6458 [1]. This PR is part of the work for Beam-Site Automation
>>>>>>>>>>> Reliability improvements; design doc here:
>>>>>>>>>>> https://s.apache.org/beam-site-automation
>>>>>>>>>>>
>>>>>>>>>>> The current plan is to keep generated javadoc/pydoc sources only
>>>>>>>>>>> on the asf-site branch, which is necessary for the current githubpubsub
>>>>>>>>>>> publishing mechanism. This maintains our current approach, the only change
>>>>>>>>>>> being that we're moving the asf-site branch from the retiring
>>>>>>>>>>> apache/beam-site repository into a new apache/beam repo branch.
>>>>>>>>>>>
>>>>>>>>>>> The concern for committing generated content is the extra
>>>>>>>>>>> overhead during git fetch. I did some analysis to measure the impact [2],
>>>>>>>>>>> and found that fetching a week of source + generated content history from
>>>>>>>>>>> apache/beam-site took 0.39 seconds.
>>>>>>>>>>>
>>>>>>>>>>> I like the idea of publishing javadoc/pydoc snapshots to an
>>>>>>>>>>> external location like Flink does with buildbot, but that work is separable
>>>>>>>>>>> and shouldn't be a prerequisite for this effort. The goal of this work is
>>>>>>>>>>> to improve the reliability of automation for contributing website changes.
>>>>>>>>>>> At last measure, only about half of beam-site PR merges use Mergebot
>>>>>>>>>>> without experiencing some reliability issue [3].
>>>>>>>>>>>
>>>>>>>>>>> I've opened BEAM-5459 [4] to track moving our generated docs out
>>>>>>>>>>> of git. Thomas, would you have bandwidth to look into this?
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> https://github.com/apache/beam/pull/6458#issuecomment-423406643
>>>>>>>>>>> [2]
>>>>>>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.uqzivheohd7j
>>>>>>>>>>> [3]
>>>>>>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.a208cwi78xmu
>>>>>>>>>>> [4] https://issues.apache.org/jira/browse/BEAM-5459
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Aug 24, 2018 at 11:48 AM Thomas Weise <th...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Udi,
>>>>>>>>>>>>
>>>>>>>>>>>> Good to know you will continue this work.
>>>>>>>>>>>>
>>>>>>>>>>>> Let me know if you want to try the buildbot route (which does
>>>>>>>>>>>> not require generated documentation to be checked into the repo). Happy to
>>>>>>>>>>>> help with that.
>>>>>>>>>>>>
>>>>>>>>>>>> Thomas
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Aug 24, 2018 at 11:36 AM Udi Meiri <eh...@google.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I'm picking up the website migration. The plan is to not
>>>>>>>>>>>>> include generated files in the master branch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> However, I've been told that even putting generated files a
>>>>>>>>>>>>> separate branch could blow up the git repository for all (e.g. make git
>>>>>>>>>>>>> pulls a lot longer?).
>>>>>>>>>>>>> Not sure if this is a real issue or not.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw <
>>>>>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise <th...@apache.org>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Yes, I think the separation of generated code will need to
>>>>>>>>>>>>>> occur prior to completing the merge and switching the web site to the main
>>>>>>>>>>>>>> repo.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > There should be no reason to check generated documentation
>>>>>>>>>>>>>> into either of the repos/branches.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Huge +1 to this. Thomas, would have time to set something
>>>>>>>>>>>>>> like this up
>>>>>>>>>>>>>> for Beam? If not, could anyone else pick this up?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> > Please see as an example how this was solved in Flink,
>>>>>>>>>>>>>> using the ASF buildbot infrastructure.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Documentation per version/release, for example:
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > The buildbot configuration is here (requires committer
>>>>>>>>>>>>>> access):
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Thanks,
>>>>>>>>>>>>>> > Thomas
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin <
>>>>>>>>>>>>>> migryz@google.com> wrote:
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> Last time I talked with Scott I brought this idea in. I
>>>>>>>>>>>>>> believe the plan was either to publish compiled site to website directly,
>>>>>>>>>>>>>> or keep it in separate storage from apache/beam repo.
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> One of the main reasons not to check in compiled version
>>>>>>>>>>>>>> of website is that every developer will have to pull all the versions of
>>>>>>>>>>>>>> website every time they clone repo, which is not that good of an idea to do.
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> Regards,
>>>>>>>>>>>>>> >> --Mikhail
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> Have feedback?
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>> Pablo, the docs are generated into versioned paths, e.g.,
>>>>>>>>>>>>>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so
>>>>>>>>>>>>>> tags are not necessary?
>>>>>>>>>>>>>> >>> Also, once apache/beam-site is merged with apache/beam
>>>>>>>>>>>>>> the release branch should have the relevant docs (although perhaps it's
>>>>>>>>>>>>>> better to put them in a different repo or storage system).
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>> Thomas, I would very much like to not have javadoc/pydoc
>>>>>>>>>>>>>> generation be part of the website review process, as it takes up a lot of
>>>>>>>>>>>>>> time when changes are staged (10s of thousands of files), especially when a
>>>>>>>>>>>>>> PR is updated and existing staged files need to be deleted.
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <
>>>>>>>>>>>>>> migryz@google.com> wrote:
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>> +1 For removing old documentation.
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>> @Thomas: Migration work is in backlog and will be picked
>>>>>>>>>>>>>> up in near time.
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>> --Mikhail
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>> Have feedback?
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <
>>>>>>>>>>>>>> thw@apache.org> wrote:
>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>> >>>>> +1 for removing pre 2.0 documentation (as well as the
>>>>>>>>>>>>>> entries from https://beam.apache.org/get-started/downloads/)
>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>> >>>>> Isn't it part of the beam-site changes that we will no
>>>>>>>>>>>>>> longer check in generated documentation into the repository? Those can be
>>>>>>>>>>>>>> generated and deployed independently (when a commit to a branch occurs),
>>>>>>>>>>>>>> such as done in the Apex and Flink projects.
>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>> >>>>> I was told that Scott who was working in the beam-site
>>>>>>>>>>>>>> changes is on leave now and the migration is still pending (see note at
>>>>>>>>>>>>>> https://github.com/apache/beam/tree/master/website). Is
>>>>>>>>>>>>>> anyone else going to pick it up?
>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>> >>>>> Thanks,
>>>>>>>>>>>>>> >>>>> Thomas
>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>> >>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <
>>>>>>>>>>>>>> pabloem@google.com> wrote:
>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>> >>>>>> Is it worth adding a tag / branch to the repositories
>>>>>>>>>>>>>> every time we make a release, so that people are able to dive in and find
>>>>>>>>>>>>>> the docs?
>>>>>>>>>>>>>> >>>>>> Best
>>>>>>>>>>>>>> >>>>>> -P.
>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>> >>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <
>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>> >>>>>>> I would guess that users are still using some of
>>>>>>>>>>>>>> these old releases. It is unclear from Beam website which releases are
>>>>>>>>>>>>>> still supported or not. It probably makes sense to drop documentation for
>>>>>>>>>>>>>> releases < 2.0. (I would suggest keeping docs for 2.0). For the future I
>>>>>>>>>>>>>> can work on updating the Beam website to clarify the state of each release.
>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>> >>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <
>>>>>>>>>>>>>> ehudm@google.com> wrote:
>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>> >>>>>>>> The older docs are not directly linked to and are in
>>>>>>>>>>>>>> Github commit history.
>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>> >>>>>>>> If there are no objections I'm going to delete
>>>>>>>>>>>>>> javadocs and pydocs for releases older than 1 year,
>>>>>>>>>>>>>> >>>>>>>> meaning 2.0.0 and older (going by the dates here).
>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>> >>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>>>>>>>>>>>>>> danoliveira@google.com> wrote:
>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>> >>>>>>>>> The older docs should be recorded in the commit
>>>>>>>>>>>>>> history of the website repository, right? If they're not currently used in
>>>>>>>>>>>>>> the website and they're in the commit history then I don't see a reason to
>>>>>>>>>>>>>> save them.
>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>> >>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <
>>>>>>>>>>>>>> ehudm@google.com> wrote:
>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>> >>>>>>>>>> Hi all,
>>>>>>>>>>>>>> >>>>>>>>>> I'm writing a PR for apache/beam-site and
>>>>>>>>>>>>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
>>>>>>>>>>>>>> trying to deletes 22k files and then copy 22k files (warning large file).
>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>> >>>>>>>>>> It seems that we could save a lot of time by
>>>>>>>>>>>>>> deleting the older javadoc and pydoc files for older versions. Is there a
>>>>>>>>>>>>>> good reason to keep around this kind of documentation for older versions
>>>>>>>>>>>>>> (say 1 year back)?
>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>> >>>>>> --
>>>>>>>>>>>>>> >>>>>> Got feedback? go/pabloem-feedback
>>>>>>>>>>>>>> <https://goto.google.com/pabloem-feedback>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>>
>>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>> Got feedback? tinyurl.com/swegner-feedback
>>>
>>

Re: Removing documentation for old Beam versions

Posted by Thomas Weise <th...@apache.org>.
Looks like the is agreement that all sources should be in the main beam
repository, the remaining discussion was where the generated content should
be served from, specifically the generated docs.

If the setup that Alan found allows us to keep using the beam-site
repository for the generated stuff and that does not unreasonably
complicate the CI process, then I'm in favor of that. It looks cleaner to
not mingle source and generated files in the same repo. Otherwise we can do
the asf-site branch in the main repo and get rid of docs from it once we
found a better solution.


On Wed, Sep 26, 2018 at 7:09 AM Robert Bradshaw <ro...@google.com> wrote:

> OK, thanks. That link was very helpful. Of the three options we must use,
> checking into git seems preferable than checking into svn let alone the
> CMS. Keeping the same repo means that it's harder to generate the docs for
> version X while head is checked out.
>
> I'm in favor of moving forward with this in the short term, but we should
> expore other options (like Flink has) for the longer term.
>
>
>
> On Wed, Sep 26, 2018 at 3:53 PM Scott Wegner <sc...@apache.org> wrote:
>
>> Yes. There are few options for publishing your ASF website, described
>> here: https://www.apache.org/dev/project-site.html. We can publish from
>> a Git repo, SVN, or a UI-based CMS interface.
>>
>> On Wed, Sep 26, 2018 at 9:45 AM Robert Bradshaw <ro...@google.com>
>> wrote:
>>
>>> I am also definitely in favor of a single repository. Perhaps I'm just
>>> misunderstanding why the generated must be put in a git repository at
>>> all--is it because that's the easiest way to serve them?
>>>
>>> On Wed, Sep 26, 2018 at 3:39 PM Scott Wegner <sc...@apache.org> wrote:
>>>
>>>> Alan found the place where website publishing is configured [1], which
>>>> has examples of project sites being configured with more than one git root.
>>>> This is great for us because it allows us to leave generated
>>>> javadocs/pydocs in the beam-site repository and publish website markdown
>>>> content from the main repo.
>>>>
>>>> Alan has a PR ready to publish generated HTML in a post-commit job [2].
>>>> Once that goes through the last step is to upgrade the publishing config.
>>>>
>>>> [1]
>>>> https://github.com/apache/infrastructure-puppet/blob/deployment/modules/gitwcsub/files/config/gitwcsub.cfg
>>>> [2] https://github.com/apache/beam/pull/6431
>>>>
>>>> On Mon, Sep 24, 2018 at 4:35 PM Scott Wegner <sw...@google.com>
>>>> wrote:
>>>>
>>>>> > We could add a new default branch (master?) and keep all the
>>>>> non-generated files (src/) there, and put generated files (content/) in the
>>>>> asf-site branch (like we already do).
>>>>>
>>>>> I'm strongly in favor of having sources in a single repository. We
>>>>> have significant process and infrastructure built up for the apache/beam
>>>>> repo (for build, PR, CI, release, etc.) that we can take advantage of by
>>>>> putting website sources in the same repo. The current beam-site repo PR
>>>>> automation is flaky because it was custom-built and not given the same
>>>>> level of attention as the main repo.
>>>>>
>>>>> The caveat to consolidating website sources in the main repo is that
>>>>> it incentivizes putting the generated sources branch on the same repo. I've
>>>>> documented a few of the reasons in the Appendix of the design doc [1]:
>>>>>  - It's easier to maintain a single repository; easily apply existing
>>>>> tooling/infrastructure
>>>>> - Jenkins tooling for publishing generated HTML may not work
>>>>> cross-repo [2]
>>>>>
>>>>> My preference is to move forward with the migration of sources to
>>>>> apache/beam [master], and website generated HTML to apache/beam [asf-site].
>>>>> I like the idea of separating the publishing/hosting of generated
>>>>> javadocs/pydocs since they add so much cruft, but it should not hold up the
>>>>> migration.
>>>>>
>>>>> [1]
>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.wqwi2jpoiiuc
>>>>>
>>>>> [2]
>>>>> https://stackoverflow.com/questions/14843696/checkout-multiple-git-repos-into-same-jenkins-workspace
>>>>>
>>>>> On Mon, Sep 24, 2018 at 2:33 PM Udi Meiri <eh...@google.com> wrote:
>>>>>
>>>>>> Staying on beam-site SGTM. We could add a new default branch
>>>>>> (master?) and keep all the non-generated files (src/) there, and put
>>>>>> generated files (content/) in the asf-site branch (like we already do).
>>>>>> That way there's no confusion as to which files you should update.
>>>>>> (This is of course assuming we still place generated docs in git
>>>>>> repos.)
>>>>>>
>>>>>> On Mon, Sep 24, 2018 at 11:23 AM Thomas Weise <th...@apache.org> wrote:
>>>>>>
>>>>>>> My thought was to leave the asf-site branch in the beam-site
>>>>>>> repository, add generated docs to that branch (until we have a better
>>>>>>> solution), and have only sources in the beam repo.
>>>>>>>
>>>>>>> Scott had filed https://issues.apache.org/jira/browse/BEAM-5459 -
>>>>>>> it would eliminate the need to place generated docs into git repos.
>>>>>>>
>>>>>>> On Mon, Sep 24, 2018 at 11:06 AM Udi Meiri <eh...@google.com> wrote:
>>>>>>>
>>>>>>>> I believe that beam.apache.org is populated from the asf-site
>>>>>>>> branch of the apache/beam-site repo. (gitpubsub:
>>>>>>>> https://www.apache.org/dev/project-site.html#intro)
>>>>>>>> If we move the markdown-based docs to apache/beam, leave generated
>>>>>>>> javadoc and pydoc in apache/beam-site, and point gitpubsub to apache/beam,
>>>>>>>> then javadoc and pydoc will not get pushed to the website.
>>>>>>>>
>>>>>>>> Is there some place where we can push javadoc and pydoc files? Or
>>>>>>>> perhaps there an alternative way to push updates to beam.apache.org?
>>>>>>>> (not requiring the asf-site branch)
>>>>>>>>
>>>>>>>> On Fri, Sep 21, 2018 at 6:40 PM Thomas Weise <th...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Scott,
>>>>>>>>>
>>>>>>>>> Thanks for bringing the discussion back here.
>>>>>>>>>
>>>>>>>>> I agree that we should separate the changes for hosting of
>>>>>>>>> generated java/pydocs from the rest of website automation so that we can
>>>>>>>>> make the switch and fix the contributor headache soon.
>>>>>>>>>
>>>>>>>>> But perhaps we can avoid adding 4m lines of generated code to the
>>>>>>>>> main beam repository (and keep on adding with every release) if we continue
>>>>>>>>> to serve the site from the old beam-site repo? (I left a comment the doc.)
>>>>>>>>>
>>>>>>>>> About trying buildbot, as mentioned earlier I would be happy to
>>>>>>>>> help with it. I prefer a setup that keeps the docs separate from the web
>>>>>>>>> site.
>>>>>>>>>
>>>>>>>>> Thomas
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Sep 21, 2018 at 10:28 AM Scott Wegner <sc...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Re-opening this thread as it came up today in the discussion for
>>>>>>>>>> PR#6458 [1]. This PR is part of the work for Beam-Site Automation
>>>>>>>>>> Reliability improvements; design doc here:
>>>>>>>>>> https://s.apache.org/beam-site-automation
>>>>>>>>>>
>>>>>>>>>> The current plan is to keep generated javadoc/pydoc sources only
>>>>>>>>>> on the asf-site branch, which is necessary for the current githubpubsub
>>>>>>>>>> publishing mechanism. This maintains our current approach, the only change
>>>>>>>>>> being that we're moving the asf-site branch from the retiring
>>>>>>>>>> apache/beam-site repository into a new apache/beam repo branch.
>>>>>>>>>>
>>>>>>>>>> The concern for committing generated content is the extra
>>>>>>>>>> overhead during git fetch. I did some analysis to measure the impact [2],
>>>>>>>>>> and found that fetching a week of source + generated content history from
>>>>>>>>>> apache/beam-site took 0.39 seconds.
>>>>>>>>>>
>>>>>>>>>> I like the idea of publishing javadoc/pydoc snapshots to an
>>>>>>>>>> external location like Flink does with buildbot, but that work is separable
>>>>>>>>>> and shouldn't be a prerequisite for this effort. The goal of this work is
>>>>>>>>>> to improve the reliability of automation for contributing website changes.
>>>>>>>>>> At last measure, only about half of beam-site PR merges use Mergebot
>>>>>>>>>> without experiencing some reliability issue [3].
>>>>>>>>>>
>>>>>>>>>> I've opened BEAM-5459 [4] to track moving our generated docs out
>>>>>>>>>> of git. Thomas, would you have bandwidth to look into this?
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://github.com/apache/beam/pull/6458#issuecomment-423406643
>>>>>>>>>> [2]
>>>>>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.uqzivheohd7j
>>>>>>>>>> [3]
>>>>>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.a208cwi78xmu
>>>>>>>>>> [4] https://issues.apache.org/jira/browse/BEAM-5459
>>>>>>>>>>
>>>>>>>>>> On Fri, Aug 24, 2018 at 11:48 AM Thomas Weise <th...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Udi,
>>>>>>>>>>>
>>>>>>>>>>> Good to know you will continue this work.
>>>>>>>>>>>
>>>>>>>>>>> Let me know if you want to try the buildbot route (which does
>>>>>>>>>>> not require generated documentation to be checked into the repo). Happy to
>>>>>>>>>>> help with that.
>>>>>>>>>>>
>>>>>>>>>>> Thomas
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Aug 24, 2018 at 11:36 AM Udi Meiri <eh...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I'm picking up the website migration. The plan is to not
>>>>>>>>>>>> include generated files in the master branch.
>>>>>>>>>>>>
>>>>>>>>>>>> However, I've been told that even putting generated files a
>>>>>>>>>>>> separate branch could blow up the git repository for all (e.g. make git
>>>>>>>>>>>> pulls a lot longer?).
>>>>>>>>>>>> Not sure if this is a real issue or not.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw <
>>>>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise <th...@apache.org>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Yes, I think the separation of generated code will need to
>>>>>>>>>>>>> occur prior to completing the merge and switching the web site to the main
>>>>>>>>>>>>> repo.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > There should be no reason to check generated documentation
>>>>>>>>>>>>> into either of the repos/branches.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Huge +1 to this. Thomas, would have time to set something like
>>>>>>>>>>>>> this up
>>>>>>>>>>>>> for Beam? If not, could anyone else pick this up?
>>>>>>>>>>>>>
>>>>>>>>>>>>> > Please see as an example how this was solved in Flink, using
>>>>>>>>>>>>> the ASF buildbot infrastructure.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Documentation per version/release, for example:
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > The buildbot configuration is here (requires committer
>>>>>>>>>>>>> access):
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Thanks,
>>>>>>>>>>>>> > Thomas
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin <
>>>>>>>>>>>>> migryz@google.com> wrote:
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> Last time I talked with Scott I brought this idea in. I
>>>>>>>>>>>>> believe the plan was either to publish compiled site to website directly,
>>>>>>>>>>>>> or keep it in separate storage from apache/beam repo.
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> One of the main reasons not to check in compiled version of
>>>>>>>>>>>>> website is that every developer will have to pull all the versions of
>>>>>>>>>>>>> website every time they clone repo, which is not that good of an idea to do.
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> Regards,
>>>>>>>>>>>>> >> --Mikhail
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> Have feedback?
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> >>> Pablo, the docs are generated into versioned paths, e.g.,
>>>>>>>>>>>>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so
>>>>>>>>>>>>> tags are not necessary?
>>>>>>>>>>>>> >>> Also, once apache/beam-site is merged with apache/beam the
>>>>>>>>>>>>> release branch should have the relevant docs (although perhaps it's better
>>>>>>>>>>>>> to put them in a different repo or storage system).
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> >>> Thomas, I would very much like to not have javadoc/pydoc
>>>>>>>>>>>>> generation be part of the website review process, as it takes up a lot of
>>>>>>>>>>>>> time when changes are staged (10s of thousands of files), especially when a
>>>>>>>>>>>>> PR is updated and existing staged files need to be deleted.
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> >>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <
>>>>>>>>>>>>> migryz@google.com> wrote:
>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>> >>>> +1 For removing old documentation.
>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>> >>>> @Thomas: Migration work is in backlog and will be picked
>>>>>>>>>>>>> up in near time.
>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>> >>>> --Mikhail
>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>> >>>> Have feedback?
>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>> >>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <
>>>>>>>>>>>>> thw@apache.org> wrote:
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> +1 for removing pre 2.0 documentation (as well as the
>>>>>>>>>>>>> entries from https://beam.apache.org/get-started/downloads/)
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> Isn't it part of the beam-site changes that we will no
>>>>>>>>>>>>> longer check in generated documentation into the repository? Those can be
>>>>>>>>>>>>> generated and deployed independently (when a commit to a branch occurs),
>>>>>>>>>>>>> such as done in the Apex and Flink projects.
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> I was told that Scott who was working in the beam-site
>>>>>>>>>>>>> changes is on leave now and the migration is still pending (see note at
>>>>>>>>>>>>> https://github.com/apache/beam/tree/master/website). Is
>>>>>>>>>>>>> anyone else going to pick it up?
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> Thanks,
>>>>>>>>>>>>> >>>>> Thomas
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <
>>>>>>>>>>>>> pabloem@google.com> wrote:
>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>> >>>>>> Is it worth adding a tag / branch to the repositories
>>>>>>>>>>>>> every time we make a release, so that people are able to dive in and find
>>>>>>>>>>>>> the docs?
>>>>>>>>>>>>> >>>>>> Best
>>>>>>>>>>>>> >>>>>> -P.
>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>> >>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <
>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>> >>>>>>> I would guess that users are still using some of these
>>>>>>>>>>>>> old releases. It is unclear from Beam website which releases are still
>>>>>>>>>>>>> supported or not. It probably makes sense to drop documentation for
>>>>>>>>>>>>> releases < 2.0. (I would suggest keeping docs for 2.0). For the future I
>>>>>>>>>>>>> can work on updating the Beam website to clarify the state of each release.
>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>> >>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <
>>>>>>>>>>>>> ehudm@google.com> wrote:
>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>> >>>>>>>> The older docs are not directly linked to and are in
>>>>>>>>>>>>> Github commit history.
>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>> >>>>>>>> If there are no objections I'm going to delete
>>>>>>>>>>>>> javadocs and pydocs for releases older than 1 year,
>>>>>>>>>>>>> >>>>>>>> meaning 2.0.0 and older (going by the dates here).
>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>> >>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>>>>>>>>>>>>> danoliveira@google.com> wrote:
>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>> >>>>>>>>> The older docs should be recorded in the commit
>>>>>>>>>>>>> history of the website repository, right? If they're not currently used in
>>>>>>>>>>>>> the website and they're in the commit history then I don't see a reason to
>>>>>>>>>>>>> save them.
>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>> >>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <
>>>>>>>>>>>>> ehudm@google.com> wrote:
>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>> >>>>>>>>>> Hi all,
>>>>>>>>>>>>> >>>>>>>>>> I'm writing a PR for apache/beam-site and
>>>>>>>>>>>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
>>>>>>>>>>>>> trying to deletes 22k files and then copy 22k files (warning large file).
>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>> >>>>>>>>>> It seems that we could save a lot of time by
>>>>>>>>>>>>> deleting the older javadoc and pydoc files for older versions. Is there a
>>>>>>>>>>>>> good reason to keep around this kind of documentation for older versions
>>>>>>>>>>>>> (say 1 year back)?
>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>> >>>>>> --
>>>>>>>>>>>>> >>>>>> Got feedback? go/pabloem-feedback
>>>>>>>>>>>>> <https://goto.google.com/pabloem-feedback>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>
>>>
>>
>> --
>>
>>
>>
>>
>> Got feedback? tinyurl.com/swegner-feedback
>>
>

Re: Removing documentation for old Beam versions

Posted by Robert Bradshaw <ro...@google.com>.
OK, thanks. That link was very helpful. Of the three options we must use,
checking into git seems preferable than checking into svn let alone the
CMS. Keeping the same repo means that it's harder to generate the docs for
version X while head is checked out.

I'm in favor of moving forward with this in the short term, but we should
expore other options (like Flink has) for the longer term.



On Wed, Sep 26, 2018 at 3:53 PM Scott Wegner <sc...@apache.org> wrote:

> Yes. There are few options for publishing your ASF website, described
> here: https://www.apache.org/dev/project-site.html. We can publish from a
> Git repo, SVN, or a UI-based CMS interface.
>
> On Wed, Sep 26, 2018 at 9:45 AM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> I am also definitely in favor of a single repository. Perhaps I'm just
>> misunderstanding why the generated must be put in a git repository at
>> all--is it because that's the easiest way to serve them?
>>
>> On Wed, Sep 26, 2018 at 3:39 PM Scott Wegner <sc...@apache.org> wrote:
>>
>>> Alan found the place where website publishing is configured [1], which
>>> has examples of project sites being configured with more than one git root.
>>> This is great for us because it allows us to leave generated
>>> javadocs/pydocs in the beam-site repository and publish website markdown
>>> content from the main repo.
>>>
>>> Alan has a PR ready to publish generated HTML in a post-commit job [2].
>>> Once that goes through the last step is to upgrade the publishing config.
>>>
>>> [1]
>>> https://github.com/apache/infrastructure-puppet/blob/deployment/modules/gitwcsub/files/config/gitwcsub.cfg
>>> [2] https://github.com/apache/beam/pull/6431
>>>
>>> On Mon, Sep 24, 2018 at 4:35 PM Scott Wegner <sw...@google.com> wrote:
>>>
>>>> > We could add a new default branch (master?) and keep all the
>>>> non-generated files (src/) there, and put generated files (content/) in the
>>>> asf-site branch (like we already do).
>>>>
>>>> I'm strongly in favor of having sources in a single repository. We have
>>>> significant process and infrastructure built up for the apache/beam repo
>>>> (for build, PR, CI, release, etc.) that we can take advantage of by putting
>>>> website sources in the same repo. The current beam-site repo PR automation
>>>> is flaky because it was custom-built and not given the same level of
>>>> attention as the main repo.
>>>>
>>>> The caveat to consolidating website sources in the main repo is that it
>>>> incentivizes putting the generated sources branch on the same repo. I've
>>>> documented a few of the reasons in the Appendix of the design doc [1]:
>>>>  - It's easier to maintain a single repository; easily apply existing
>>>> tooling/infrastructure
>>>> - Jenkins tooling for publishing generated HTML may not work cross-repo
>>>> [2]
>>>>
>>>> My preference is to move forward with the migration of sources to
>>>> apache/beam [master], and website generated HTML to apache/beam [asf-site].
>>>> I like the idea of separating the publishing/hosting of generated
>>>> javadocs/pydocs since they add so much cruft, but it should not hold up the
>>>> migration.
>>>>
>>>> [1]
>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.wqwi2jpoiiuc
>>>>
>>>> [2]
>>>> https://stackoverflow.com/questions/14843696/checkout-multiple-git-repos-into-same-jenkins-workspace
>>>>
>>>> On Mon, Sep 24, 2018 at 2:33 PM Udi Meiri <eh...@google.com> wrote:
>>>>
>>>>> Staying on beam-site SGTM. We could add a new default branch (master?)
>>>>> and keep all the non-generated files (src/) there, and put generated files
>>>>> (content/) in the asf-site branch (like we already do).
>>>>> That way there's no confusion as to which files you should update.
>>>>> (This is of course assuming we still place generated docs in git
>>>>> repos.)
>>>>>
>>>>> On Mon, Sep 24, 2018 at 11:23 AM Thomas Weise <th...@apache.org> wrote:
>>>>>
>>>>>> My thought was to leave the asf-site branch in the beam-site
>>>>>> repository, add generated docs to that branch (until we have a better
>>>>>> solution), and have only sources in the beam repo.
>>>>>>
>>>>>> Scott had filed https://issues.apache.org/jira/browse/BEAM-5459 -
>>>>>> it would eliminate the need to place generated docs into git repos.
>>>>>>
>>>>>> On Mon, Sep 24, 2018 at 11:06 AM Udi Meiri <eh...@google.com> wrote:
>>>>>>
>>>>>>> I believe that beam.apache.org is populated from the asf-site
>>>>>>> branch of the apache/beam-site repo. (gitpubsub:
>>>>>>> https://www.apache.org/dev/project-site.html#intro)
>>>>>>> If we move the markdown-based docs to apache/beam, leave generated
>>>>>>> javadoc and pydoc in apache/beam-site, and point gitpubsub to apache/beam,
>>>>>>> then javadoc and pydoc will not get pushed to the website.
>>>>>>>
>>>>>>> Is there some place where we can push javadoc and pydoc files? Or
>>>>>>> perhaps there an alternative way to push updates to beam.apache.org?
>>>>>>> (not requiring the asf-site branch)
>>>>>>>
>>>>>>> On Fri, Sep 21, 2018 at 6:40 PM Thomas Weise <th...@apache.org> wrote:
>>>>>>>
>>>>>>>> Hi Scott,
>>>>>>>>
>>>>>>>> Thanks for bringing the discussion back here.
>>>>>>>>
>>>>>>>> I agree that we should separate the changes for hosting of
>>>>>>>> generated java/pydocs from the rest of website automation so that we can
>>>>>>>> make the switch and fix the contributor headache soon.
>>>>>>>>
>>>>>>>> But perhaps we can avoid adding 4m lines of generated code to the
>>>>>>>> main beam repository (and keep on adding with every release) if we continue
>>>>>>>> to serve the site from the old beam-site repo? (I left a comment the doc.)
>>>>>>>>
>>>>>>>> About trying buildbot, as mentioned earlier I would be happy to
>>>>>>>> help with it. I prefer a setup that keeps the docs separate from the web
>>>>>>>> site.
>>>>>>>>
>>>>>>>> Thomas
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Sep 21, 2018 at 10:28 AM Scott Wegner <sc...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Re-opening this thread as it came up today in the discussion for
>>>>>>>>> PR#6458 [1]. This PR is part of the work for Beam-Site Automation
>>>>>>>>> Reliability improvements; design doc here:
>>>>>>>>> https://s.apache.org/beam-site-automation
>>>>>>>>>
>>>>>>>>> The current plan is to keep generated javadoc/pydoc sources only
>>>>>>>>> on the asf-site branch, which is necessary for the current githubpubsub
>>>>>>>>> publishing mechanism. This maintains our current approach, the only change
>>>>>>>>> being that we're moving the asf-site branch from the retiring
>>>>>>>>> apache/beam-site repository into a new apache/beam repo branch.
>>>>>>>>>
>>>>>>>>> The concern for committing generated content is the extra overhead
>>>>>>>>> during git fetch. I did some analysis to measure the impact [2], and found
>>>>>>>>> that fetching a week of source + generated content history from
>>>>>>>>> apache/beam-site took 0.39 seconds.
>>>>>>>>>
>>>>>>>>> I like the idea of publishing javadoc/pydoc snapshots to an
>>>>>>>>> external location like Flink does with buildbot, but that work is separable
>>>>>>>>> and shouldn't be a prerequisite for this effort. The goal of this work is
>>>>>>>>> to improve the reliability of automation for contributing website changes.
>>>>>>>>> At last measure, only about half of beam-site PR merges use Mergebot
>>>>>>>>> without experiencing some reliability issue [3].
>>>>>>>>>
>>>>>>>>> I've opened BEAM-5459 [4] to track moving our generated docs out
>>>>>>>>> of git. Thomas, would you have bandwidth to look into this?
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://github.com/apache/beam/pull/6458#issuecomment-423406643
>>>>>>>>> [2]
>>>>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.uqzivheohd7j
>>>>>>>>> [3]
>>>>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.a208cwi78xmu
>>>>>>>>> [4] https://issues.apache.org/jira/browse/BEAM-5459
>>>>>>>>>
>>>>>>>>> On Fri, Aug 24, 2018 at 11:48 AM Thomas Weise <th...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Udi,
>>>>>>>>>>
>>>>>>>>>> Good to know you will continue this work.
>>>>>>>>>>
>>>>>>>>>> Let me know if you want to try the buildbot route (which does not
>>>>>>>>>> require generated documentation to be checked into the repo). Happy to help
>>>>>>>>>> with that.
>>>>>>>>>>
>>>>>>>>>> Thomas
>>>>>>>>>>
>>>>>>>>>> On Fri, Aug 24, 2018 at 11:36 AM Udi Meiri <eh...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I'm picking up the website migration. The plan is to not include
>>>>>>>>>>> generated files in the master branch.
>>>>>>>>>>>
>>>>>>>>>>> However, I've been told that even putting generated files a
>>>>>>>>>>> separate branch could blow up the git repository for all (e.g. make git
>>>>>>>>>>> pulls a lot longer?).
>>>>>>>>>>> Not sure if this is a real issue or not.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw <
>>>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise <th...@apache.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> >
>>>>>>>>>>>> > Yes, I think the separation of generated code will need to
>>>>>>>>>>>> occur prior to completing the merge and switching the web site to the main
>>>>>>>>>>>> repo.
>>>>>>>>>>>> >
>>>>>>>>>>>> > There should be no reason to check generated documentation
>>>>>>>>>>>> into either of the repos/branches.
>>>>>>>>>>>>
>>>>>>>>>>>> Huge +1 to this. Thomas, would have time to set something like
>>>>>>>>>>>> this up
>>>>>>>>>>>> for Beam? If not, could anyone else pick this up?
>>>>>>>>>>>>
>>>>>>>>>>>> > Please see as an example how this was solved in Flink, using
>>>>>>>>>>>> the ASF buildbot infrastructure.
>>>>>>>>>>>> >
>>>>>>>>>>>> > Documentation per version/release, for example:
>>>>>>>>>>>> >
>>>>>>>>>>>> > https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>>>>>>>>>>>> >
>>>>>>>>>>>> > The buildbot configuration is here (requires committer
>>>>>>>>>>>> access):
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
>>>>>>>>>>>> >
>>>>>>>>>>>> > Thanks,
>>>>>>>>>>>> > Thomas
>>>>>>>>>>>> >
>>>>>>>>>>>> > On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin <
>>>>>>>>>>>> migryz@google.com> wrote:
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Last time I talked with Scott I brought this idea in. I
>>>>>>>>>>>> believe the plan was either to publish compiled site to website directly,
>>>>>>>>>>>> or keep it in separate storage from apache/beam repo.
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> One of the main reasons not to check in compiled version of
>>>>>>>>>>>> website is that every developer will have to pull all the versions of
>>>>>>>>>>>> website every time they clone repo, which is not that good of an idea to do.
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Regards,
>>>>>>>>>>>> >> --Mikhail
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Have feedback?
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> Pablo, the docs are generated into versioned paths, e.g.,
>>>>>>>>>>>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so
>>>>>>>>>>>> tags are not necessary?
>>>>>>>>>>>> >>> Also, once apache/beam-site is merged with apache/beam the
>>>>>>>>>>>> release branch should have the relevant docs (although perhaps it's better
>>>>>>>>>>>> to put them in a different repo or storage system).
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> Thomas, I would very much like to not have javadoc/pydoc
>>>>>>>>>>>> generation be part of the website review process, as it takes up a lot of
>>>>>>>>>>>> time when changes are staged (10s of thousands of files), especially when a
>>>>>>>>>>>> PR is updated and existing staged files need to be deleted.
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <
>>>>>>>>>>>> migryz@google.com> wrote:
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> +1 For removing old documentation.
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> @Thomas: Migration work is in backlog and will be picked
>>>>>>>>>>>> up in near time.
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> --Mikhail
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> Have feedback?
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <
>>>>>>>>>>>> thw@apache.org> wrote:
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> +1 for removing pre 2.0 documentation (as well as the
>>>>>>>>>>>> entries from https://beam.apache.org/get-started/downloads/)
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> Isn't it part of the beam-site changes that we will no
>>>>>>>>>>>> longer check in generated documentation into the repository? Those can be
>>>>>>>>>>>> generated and deployed independently (when a commit to a branch occurs),
>>>>>>>>>>>> such as done in the Apex and Flink projects.
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> I was told that Scott who was working in the beam-site
>>>>>>>>>>>> changes is on leave now and the migration is still pending (see note at
>>>>>>>>>>>> https://github.com/apache/beam/tree/master/website). Is anyone
>>>>>>>>>>>> else going to pick it up?
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> Thanks,
>>>>>>>>>>>> >>>>> Thomas
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <
>>>>>>>>>>>> pabloem@google.com> wrote:
>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>> >>>>>> Is it worth adding a tag / branch to the repositories
>>>>>>>>>>>> every time we make a release, so that people are able to dive in and find
>>>>>>>>>>>> the docs?
>>>>>>>>>>>> >>>>>> Best
>>>>>>>>>>>> >>>>>> -P.
>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>> >>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <
>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>> >>>>>>> I would guess that users are still using some of these
>>>>>>>>>>>> old releases. It is unclear from Beam website which releases are still
>>>>>>>>>>>> supported or not. It probably makes sense to drop documentation for
>>>>>>>>>>>> releases < 2.0. (I would suggest keeping docs for 2.0). For the future I
>>>>>>>>>>>> can work on updating the Beam website to clarify the state of each release.
>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>> >>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <
>>>>>>>>>>>> ehudm@google.com> wrote:
>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>> >>>>>>>> The older docs are not directly linked to and are in
>>>>>>>>>>>> Github commit history.
>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>> >>>>>>>> If there are no objections I'm going to delete
>>>>>>>>>>>> javadocs and pydocs for releases older than 1 year,
>>>>>>>>>>>> >>>>>>>> meaning 2.0.0 and older (going by the dates here).
>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>> >>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>>>>>>>>>>>> danoliveira@google.com> wrote:
>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>> >>>>>>>>> The older docs should be recorded in the commit
>>>>>>>>>>>> history of the website repository, right? If they're not currently used in
>>>>>>>>>>>> the website and they're in the commit history then I don't see a reason to
>>>>>>>>>>>> save them.
>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>> >>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <
>>>>>>>>>>>> ehudm@google.com> wrote:
>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>> >>>>>>>>>> Hi all,
>>>>>>>>>>>> >>>>>>>>>> I'm writing a PR for apache/beam-site and
>>>>>>>>>>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
>>>>>>>>>>>> trying to deletes 22k files and then copy 22k files (warning large file).
>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>> >>>>>>>>>> It seems that we could save a lot of time by
>>>>>>>>>>>> deleting the older javadoc and pydoc files for older versions. Is there a
>>>>>>>>>>>> good reason to keep around this kind of documentation for older versions
>>>>>>>>>>>> (say 1 year back)?
>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>> >>>>>> --
>>>>>>>>>>>> >>>>>> Got feedback? go/pabloem-feedback
>>>>>>>>>>>> <https://goto.google.com/pabloem-feedback>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>>>>>>
>>>>>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>> Got feedback? tinyurl.com/swegner-feedback
>>>
>>
>
> --
>
>
>
>
> Got feedback? tinyurl.com/swegner-feedback
>

Re: Removing documentation for old Beam versions

Posted by Scott Wegner <sc...@apache.org>.
Yes. There are few options for publishing your ASF website, described here:
https://www.apache.org/dev/project-site.html. We can publish from a Git
repo, SVN, or a UI-based CMS interface.

On Wed, Sep 26, 2018 at 9:45 AM Robert Bradshaw <ro...@google.com> wrote:

> I am also definitely in favor of a single repository. Perhaps I'm just
> misunderstanding why the generated must be put in a git repository at
> all--is it because that's the easiest way to serve them?
>
> On Wed, Sep 26, 2018 at 3:39 PM Scott Wegner <sc...@apache.org> wrote:
>
>> Alan found the place where website publishing is configured [1], which
>> has examples of project sites being configured with more than one git root.
>> This is great for us because it allows us to leave generated
>> javadocs/pydocs in the beam-site repository and publish website markdown
>> content from the main repo.
>>
>> Alan has a PR ready to publish generated HTML in a post-commit job [2].
>> Once that goes through the last step is to upgrade the publishing config.
>>
>> [1]
>> https://github.com/apache/infrastructure-puppet/blob/deployment/modules/gitwcsub/files/config/gitwcsub.cfg
>> [2] https://github.com/apache/beam/pull/6431
>>
>> On Mon, Sep 24, 2018 at 4:35 PM Scott Wegner <sw...@google.com> wrote:
>>
>>> > We could add a new default branch (master?) and keep all the
>>> non-generated files (src/) there, and put generated files (content/) in the
>>> asf-site branch (like we already do).
>>>
>>> I'm strongly in favor of having sources in a single repository. We have
>>> significant process and infrastructure built up for the apache/beam repo
>>> (for build, PR, CI, release, etc.) that we can take advantage of by putting
>>> website sources in the same repo. The current beam-site repo PR automation
>>> is flaky because it was custom-built and not given the same level of
>>> attention as the main repo.
>>>
>>> The caveat to consolidating website sources in the main repo is that it
>>> incentivizes putting the generated sources branch on the same repo. I've
>>> documented a few of the reasons in the Appendix of the design doc [1]:
>>>  - It's easier to maintain a single repository; easily apply existing
>>> tooling/infrastructure
>>> - Jenkins tooling for publishing generated HTML may not work cross-repo
>>> [2]
>>>
>>> My preference is to move forward with the migration of sources to
>>> apache/beam [master], and website generated HTML to apache/beam [asf-site].
>>> I like the idea of separating the publishing/hosting of generated
>>> javadocs/pydocs since they add so much cruft, but it should not hold up the
>>> migration.
>>>
>>> [1]
>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.wqwi2jpoiiuc
>>>
>>> [2]
>>> https://stackoverflow.com/questions/14843696/checkout-multiple-git-repos-into-same-jenkins-workspace
>>>
>>> On Mon, Sep 24, 2018 at 2:33 PM Udi Meiri <eh...@google.com> wrote:
>>>
>>>> Staying on beam-site SGTM. We could add a new default branch (master?)
>>>> and keep all the non-generated files (src/) there, and put generated files
>>>> (content/) in the asf-site branch (like we already do).
>>>> That way there's no confusion as to which files you should update.
>>>> (This is of course assuming we still place generated docs in git repos.)
>>>>
>>>> On Mon, Sep 24, 2018 at 11:23 AM Thomas Weise <th...@apache.org> wrote:
>>>>
>>>>> My thought was to leave the asf-site branch in the beam-site
>>>>> repository, add generated docs to that branch (until we have a better
>>>>> solution), and have only sources in the beam repo.
>>>>>
>>>>> Scott had filed https://issues.apache.org/jira/browse/BEAM-5459 -
>>>>> it would eliminate the need to place generated docs into git repos.
>>>>>
>>>>> On Mon, Sep 24, 2018 at 11:06 AM Udi Meiri <eh...@google.com> wrote:
>>>>>
>>>>>> I believe that beam.apache.org is populated from the asf-site branch
>>>>>> of the apache/beam-site repo. (gitpubsub:
>>>>>> https://www.apache.org/dev/project-site.html#intro)
>>>>>> If we move the markdown-based docs to apache/beam, leave generated
>>>>>> javadoc and pydoc in apache/beam-site, and point gitpubsub to apache/beam,
>>>>>> then javadoc and pydoc will not get pushed to the website.
>>>>>>
>>>>>> Is there some place where we can push javadoc and pydoc files? Or
>>>>>> perhaps there an alternative way to push updates to beam.apache.org?
>>>>>> (not requiring the asf-site branch)
>>>>>>
>>>>>> On Fri, Sep 21, 2018 at 6:40 PM Thomas Weise <th...@apache.org> wrote:
>>>>>>
>>>>>>> Hi Scott,
>>>>>>>
>>>>>>> Thanks for bringing the discussion back here.
>>>>>>>
>>>>>>> I agree that we should separate the changes for hosting of generated
>>>>>>> java/pydocs from the rest of website automation so that we can make the
>>>>>>> switch and fix the contributor headache soon.
>>>>>>>
>>>>>>> But perhaps we can avoid adding 4m lines of generated code to the
>>>>>>> main beam repository (and keep on adding with every release) if we continue
>>>>>>> to serve the site from the old beam-site repo? (I left a comment the doc.)
>>>>>>>
>>>>>>> About trying buildbot, as mentioned earlier I would be happy to help
>>>>>>> with it. I prefer a setup that keeps the docs separate from the web site.
>>>>>>>
>>>>>>> Thomas
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Sep 21, 2018 at 10:28 AM Scott Wegner <sc...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Re-opening this thread as it came up today in the discussion for
>>>>>>>> PR#6458 [1]. This PR is part of the work for Beam-Site Automation
>>>>>>>> Reliability improvements; design doc here:
>>>>>>>> https://s.apache.org/beam-site-automation
>>>>>>>>
>>>>>>>> The current plan is to keep generated javadoc/pydoc sources only on
>>>>>>>> the asf-site branch, which is necessary for the current githubpubsub
>>>>>>>> publishing mechanism. This maintains our current approach, the only change
>>>>>>>> being that we're moving the asf-site branch from the retiring
>>>>>>>> apache/beam-site repository into a new apache/beam repo branch.
>>>>>>>>
>>>>>>>> The concern for committing generated content is the extra overhead
>>>>>>>> during git fetch. I did some analysis to measure the impact [2], and found
>>>>>>>> that fetching a week of source + generated content history from
>>>>>>>> apache/beam-site took 0.39 seconds.
>>>>>>>>
>>>>>>>> I like the idea of publishing javadoc/pydoc snapshots to an
>>>>>>>> external location like Flink does with buildbot, but that work is separable
>>>>>>>> and shouldn't be a prerequisite for this effort. The goal of this work is
>>>>>>>> to improve the reliability of automation for contributing website changes.
>>>>>>>> At last measure, only about half of beam-site PR merges use Mergebot
>>>>>>>> without experiencing some reliability issue [3].
>>>>>>>>
>>>>>>>> I've opened BEAM-5459 [4] to track moving our generated docs out of
>>>>>>>> git. Thomas, would you have bandwidth to look into this?
>>>>>>>>
>>>>>>>> [1] https://github.com/apache/beam/pull/6458#issuecomment-423406643
>>>>>>>>
>>>>>>>> [2]
>>>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.uqzivheohd7j
>>>>>>>> [3]
>>>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.a208cwi78xmu
>>>>>>>> [4] https://issues.apache.org/jira/browse/BEAM-5459
>>>>>>>>
>>>>>>>> On Fri, Aug 24, 2018 at 11:48 AM Thomas Weise <th...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Udi,
>>>>>>>>>
>>>>>>>>> Good to know you will continue this work.
>>>>>>>>>
>>>>>>>>> Let me know if you want to try the buildbot route (which does not
>>>>>>>>> require generated documentation to be checked into the repo). Happy to help
>>>>>>>>> with that.
>>>>>>>>>
>>>>>>>>> Thomas
>>>>>>>>>
>>>>>>>>> On Fri, Aug 24, 2018 at 11:36 AM Udi Meiri <eh...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I'm picking up the website migration. The plan is to not include
>>>>>>>>>> generated files in the master branch.
>>>>>>>>>>
>>>>>>>>>> However, I've been told that even putting generated files a
>>>>>>>>>> separate branch could blow up the git repository for all (e.g. make git
>>>>>>>>>> pulls a lot longer?).
>>>>>>>>>> Not sure if this is a real issue or not.
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw <
>>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise <th...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>> >
>>>>>>>>>>> > Yes, I think the separation of generated code will need to
>>>>>>>>>>> occur prior to completing the merge and switching the web site to the main
>>>>>>>>>>> repo.
>>>>>>>>>>> >
>>>>>>>>>>> > There should be no reason to check generated documentation
>>>>>>>>>>> into either of the repos/branches.
>>>>>>>>>>>
>>>>>>>>>>> Huge +1 to this. Thomas, would have time to set something like
>>>>>>>>>>> this up
>>>>>>>>>>> for Beam? If not, could anyone else pick this up?
>>>>>>>>>>>
>>>>>>>>>>> > Please see as an example how this was solved in Flink, using
>>>>>>>>>>> the ASF buildbot infrastructure.
>>>>>>>>>>> >
>>>>>>>>>>> > Documentation per version/release, for example:
>>>>>>>>>>> >
>>>>>>>>>>> > https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>>>>>>>>>>> >
>>>>>>>>>>> > The buildbot configuration is here (requires committer access):
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
>>>>>>>>>>> >
>>>>>>>>>>> > Thanks,
>>>>>>>>>>> > Thomas
>>>>>>>>>>> >
>>>>>>>>>>> > On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin <
>>>>>>>>>>> migryz@google.com> wrote:
>>>>>>>>>>> >>
>>>>>>>>>>> >> Last time I talked with Scott I brought this idea in. I
>>>>>>>>>>> believe the plan was either to publish compiled site to website directly,
>>>>>>>>>>> or keep it in separate storage from apache/beam repo.
>>>>>>>>>>> >>
>>>>>>>>>>> >> One of the main reasons not to check in compiled version of
>>>>>>>>>>> website is that every developer will have to pull all the versions of
>>>>>>>>>>> website every time they clone repo, which is not that good of an idea to do.
>>>>>>>>>>> >>
>>>>>>>>>>> >> Regards,
>>>>>>>>>>> >> --Mikhail
>>>>>>>>>>> >>
>>>>>>>>>>> >> Have feedback?
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Pablo, the docs are generated into versioned paths, e.g.,
>>>>>>>>>>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so
>>>>>>>>>>> tags are not necessary?
>>>>>>>>>>> >>> Also, once apache/beam-site is merged with apache/beam the
>>>>>>>>>>> release branch should have the relevant docs (although perhaps it's better
>>>>>>>>>>> to put them in a different repo or storage system).
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Thomas, I would very much like to not have javadoc/pydoc
>>>>>>>>>>> generation be part of the website review process, as it takes up a lot of
>>>>>>>>>>> time when changes are staged (10s of thousands of files), especially when a
>>>>>>>>>>> PR is updated and existing staged files need to be deleted.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <
>>>>>>>>>>> migryz@google.com> wrote:
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> +1 For removing old documentation.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> @Thomas: Migration work is in backlog and will be picked up
>>>>>>>>>>> in near time.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> --Mikhail
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> Have feedback?
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <
>>>>>>>>>>> thw@apache.org> wrote:
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> +1 for removing pre 2.0 documentation (as well as the
>>>>>>>>>>> entries from https://beam.apache.org/get-started/downloads/)
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Isn't it part of the beam-site changes that we will no
>>>>>>>>>>> longer check in generated documentation into the repository? Those can be
>>>>>>>>>>> generated and deployed independently (when a commit to a branch occurs),
>>>>>>>>>>> such as done in the Apex and Flink projects.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> I was told that Scott who was working in the beam-site
>>>>>>>>>>> changes is on leave now and the migration is still pending (see note at
>>>>>>>>>>> https://github.com/apache/beam/tree/master/website). Is anyone
>>>>>>>>>>> else going to pick it up?
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Thanks,
>>>>>>>>>>> >>>>> Thomas
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <
>>>>>>>>>>> pabloem@google.com> wrote:
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>> Is it worth adding a tag / branch to the repositories
>>>>>>>>>>> every time we make a release, so that people are able to dive in and find
>>>>>>>>>>> the docs?
>>>>>>>>>>> >>>>>> Best
>>>>>>>>>>> >>>>>> -P.
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <
>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>> I would guess that users are still using some of these
>>>>>>>>>>> old releases. It is unclear from Beam website which releases are still
>>>>>>>>>>> supported or not. It probably makes sense to drop documentation for
>>>>>>>>>>> releases < 2.0. (I would suggest keeping docs for 2.0). For the future I
>>>>>>>>>>> can work on updating the Beam website to clarify the state of each release.
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <
>>>>>>>>>>> ehudm@google.com> wrote:
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>> The older docs are not directly linked to and are in
>>>>>>>>>>> Github commit history.
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>> If there are no objections I'm going to delete javadocs
>>>>>>>>>>> and pydocs for releases older than 1 year,
>>>>>>>>>>> >>>>>>>> meaning 2.0.0 and older (going by the dates here).
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>>>>>>>>>>> danoliveira@google.com> wrote:
>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>> >>>>>>>>> The older docs should be recorded in the commit
>>>>>>>>>>> history of the website repository, right? If they're not currently used in
>>>>>>>>>>> the website and they're in the commit history then I don't see a reason to
>>>>>>>>>>> save them.
>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>> >>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <
>>>>>>>>>>> ehudm@google.com> wrote:
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>> >>>>>>>>>> Hi all,
>>>>>>>>>>> >>>>>>>>>> I'm writing a PR for apache/beam-site and
>>>>>>>>>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
>>>>>>>>>>> trying to deletes 22k files and then copy 22k files (warning large file).
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>> >>>>>>>>>> It seems that we could save a lot of time by deleting
>>>>>>>>>>> the older javadoc and pydoc files for older versions. Is there a good
>>>>>>>>>>> reason to keep around this kind of documentation for older versions (say 1
>>>>>>>>>>> year back)?
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>> --
>>>>>>>>>>> >>>>>> Got feedback? go/pabloem-feedback
>>>>>>>>>>> <https://goto.google.com/pabloem-feedback>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>>>>>
>>>>>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>> Got feedback? tinyurl.com/swegner-feedback
>>>
>>
>>
>> --
>>
>>
>>
>>
>> Got feedback? tinyurl.com/swegner-feedback
>>
>

-- 




Got feedback? tinyurl.com/swegner-feedback

Re: Removing documentation for old Beam versions

Posted by Robert Bradshaw <ro...@google.com>.
I am also definitely in favor of a single repository. Perhaps I'm just
misunderstanding why the generated must be put in a git repository at
all--is it because that's the easiest way to serve them?

On Wed, Sep 26, 2018 at 3:39 PM Scott Wegner <sc...@apache.org> wrote:

> Alan found the place where website publishing is configured [1], which has
> examples of project sites being configured with more than one git root.
> This is great for us because it allows us to leave generated
> javadocs/pydocs in the beam-site repository and publish website markdown
> content from the main repo.
>
> Alan has a PR ready to publish generated HTML in a post-commit job [2].
> Once that goes through the last step is to upgrade the publishing config.
>
> [1]
> https://github.com/apache/infrastructure-puppet/blob/deployment/modules/gitwcsub/files/config/gitwcsub.cfg
> [2] https://github.com/apache/beam/pull/6431
>
> On Mon, Sep 24, 2018 at 4:35 PM Scott Wegner <sw...@google.com> wrote:
>
>> > We could add a new default branch (master?) and keep all the
>> non-generated files (src/) there, and put generated files (content/) in the
>> asf-site branch (like we already do).
>>
>> I'm strongly in favor of having sources in a single repository. We have
>> significant process and infrastructure built up for the apache/beam repo
>> (for build, PR, CI, release, etc.) that we can take advantage of by putting
>> website sources in the same repo. The current beam-site repo PR automation
>> is flaky because it was custom-built and not given the same level of
>> attention as the main repo.
>>
>> The caveat to consolidating website sources in the main repo is that it
>> incentivizes putting the generated sources branch on the same repo. I've
>> documented a few of the reasons in the Appendix of the design doc [1]:
>>  - It's easier to maintain a single repository; easily apply existing
>> tooling/infrastructure
>> - Jenkins tooling for publishing generated HTML may not work cross-repo
>> [2]
>>
>> My preference is to move forward with the migration of sources to
>> apache/beam [master], and website generated HTML to apache/beam [asf-site].
>> I like the idea of separating the publishing/hosting of generated
>> javadocs/pydocs since they add so much cruft, but it should not hold up the
>> migration.
>>
>> [1]
>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.wqwi2jpoiiuc
>>
>> [2]
>> https://stackoverflow.com/questions/14843696/checkout-multiple-git-repos-into-same-jenkins-workspace
>>
>> On Mon, Sep 24, 2018 at 2:33 PM Udi Meiri <eh...@google.com> wrote:
>>
>>> Staying on beam-site SGTM. We could add a new default branch (master?)
>>> and keep all the non-generated files (src/) there, and put generated files
>>> (content/) in the asf-site branch (like we already do).
>>> That way there's no confusion as to which files you should update.
>>> (This is of course assuming we still place generated docs in git repos.)
>>>
>>> On Mon, Sep 24, 2018 at 11:23 AM Thomas Weise <th...@apache.org> wrote:
>>>
>>>> My thought was to leave the asf-site branch in the beam-site
>>>> repository, add generated docs to that branch (until we have a better
>>>> solution), and have only sources in the beam repo.
>>>>
>>>> Scott had filed https://issues.apache.org/jira/browse/BEAM-5459 -
>>>> it would eliminate the need to place generated docs into git repos.
>>>>
>>>> On Mon, Sep 24, 2018 at 11:06 AM Udi Meiri <eh...@google.com> wrote:
>>>>
>>>>> I believe that beam.apache.org is populated from the asf-site branch
>>>>> of the apache/beam-site repo. (gitpubsub:
>>>>> https://www.apache.org/dev/project-site.html#intro)
>>>>> If we move the markdown-based docs to apache/beam, leave generated
>>>>> javadoc and pydoc in apache/beam-site, and point gitpubsub to apache/beam,
>>>>> then javadoc and pydoc will not get pushed to the website.
>>>>>
>>>>> Is there some place where we can push javadoc and pydoc files? Or
>>>>> perhaps there an alternative way to push updates to beam.apache.org?
>>>>> (not requiring the asf-site branch)
>>>>>
>>>>> On Fri, Sep 21, 2018 at 6:40 PM Thomas Weise <th...@apache.org> wrote:
>>>>>
>>>>>> Hi Scott,
>>>>>>
>>>>>> Thanks for bringing the discussion back here.
>>>>>>
>>>>>> I agree that we should separate the changes for hosting of generated
>>>>>> java/pydocs from the rest of website automation so that we can make the
>>>>>> switch and fix the contributor headache soon.
>>>>>>
>>>>>> But perhaps we can avoid adding 4m lines of generated code to the
>>>>>> main beam repository (and keep on adding with every release) if we continue
>>>>>> to serve the site from the old beam-site repo? (I left a comment the doc.)
>>>>>>
>>>>>> About trying buildbot, as mentioned earlier I would be happy to help
>>>>>> with it. I prefer a setup that keeps the docs separate from the web site.
>>>>>>
>>>>>> Thomas
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 21, 2018 at 10:28 AM Scott Wegner <sc...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Re-opening this thread as it came up today in the discussion for
>>>>>>> PR#6458 [1]. This PR is part of the work for Beam-Site Automation
>>>>>>> Reliability improvements; design doc here:
>>>>>>> https://s.apache.org/beam-site-automation
>>>>>>>
>>>>>>> The current plan is to keep generated javadoc/pydoc sources only on
>>>>>>> the asf-site branch, which is necessary for the current githubpubsub
>>>>>>> publishing mechanism. This maintains our current approach, the only change
>>>>>>> being that we're moving the asf-site branch from the retiring
>>>>>>> apache/beam-site repository into a new apache/beam repo branch.
>>>>>>>
>>>>>>> The concern for committing generated content is the extra overhead
>>>>>>> during git fetch. I did some analysis to measure the impact [2], and found
>>>>>>> that fetching a week of source + generated content history from
>>>>>>> apache/beam-site took 0.39 seconds.
>>>>>>>
>>>>>>> I like the idea of publishing javadoc/pydoc snapshots to an external
>>>>>>> location like Flink does with buildbot, but that work is separable and
>>>>>>> shouldn't be a prerequisite for this effort. The goal of this work is to
>>>>>>> improve the reliability of automation for contributing website changes. At
>>>>>>> last measure, only about half of beam-site PR merges use Mergebot without
>>>>>>> experiencing some reliability issue [3].
>>>>>>>
>>>>>>> I've opened BEAM-5459 [4] to track moving our generated docs out of
>>>>>>> git. Thomas, would you have bandwidth to look into this?
>>>>>>>
>>>>>>> [1] https://github.com/apache/beam/pull/6458#issuecomment-423406643
>>>>>>> [2]
>>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.uqzivheohd7j
>>>>>>> [3]
>>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.a208cwi78xmu
>>>>>>> [4] https://issues.apache.org/jira/browse/BEAM-5459
>>>>>>>
>>>>>>> On Fri, Aug 24, 2018 at 11:48 AM Thomas Weise <th...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Udi,
>>>>>>>>
>>>>>>>> Good to know you will continue this work.
>>>>>>>>
>>>>>>>> Let me know if you want to try the buildbot route (which does not
>>>>>>>> require generated documentation to be checked into the repo). Happy to help
>>>>>>>> with that.
>>>>>>>>
>>>>>>>> Thomas
>>>>>>>>
>>>>>>>> On Fri, Aug 24, 2018 at 11:36 AM Udi Meiri <eh...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I'm picking up the website migration. The plan is to not include
>>>>>>>>> generated files in the master branch.
>>>>>>>>>
>>>>>>>>> However, I've been told that even putting generated files a
>>>>>>>>> separate branch could blow up the git repository for all (e.g. make git
>>>>>>>>> pulls a lot longer?).
>>>>>>>>> Not sure if this is a real issue or not.
>>>>>>>>>
>>>>>>>>> On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw <
>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise <th...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>> >
>>>>>>>>>> > Yes, I think the separation of generated code will need to
>>>>>>>>>> occur prior to completing the merge and switching the web site to the main
>>>>>>>>>> repo.
>>>>>>>>>> >
>>>>>>>>>> > There should be no reason to check generated documentation into
>>>>>>>>>> either of the repos/branches.
>>>>>>>>>>
>>>>>>>>>> Huge +1 to this. Thomas, would have time to set something like
>>>>>>>>>> this up
>>>>>>>>>> for Beam? If not, could anyone else pick this up?
>>>>>>>>>>
>>>>>>>>>> > Please see as an example how this was solved in Flink, using
>>>>>>>>>> the ASF buildbot infrastructure.
>>>>>>>>>> >
>>>>>>>>>> > Documentation per version/release, for example:
>>>>>>>>>> >
>>>>>>>>>> > https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>>>>>>>>>> >
>>>>>>>>>> > The buildbot configuration is here (requires committer access):
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
>>>>>>>>>> >
>>>>>>>>>> > Thanks,
>>>>>>>>>> > Thomas
>>>>>>>>>> >
>>>>>>>>>> > On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin <
>>>>>>>>>> migryz@google.com> wrote:
>>>>>>>>>> >>
>>>>>>>>>> >> Last time I talked with Scott I brought this idea in. I
>>>>>>>>>> believe the plan was either to publish compiled site to website directly,
>>>>>>>>>> or keep it in separate storage from apache/beam repo.
>>>>>>>>>> >>
>>>>>>>>>> >> One of the main reasons not to check in compiled version of
>>>>>>>>>> website is that every developer will have to pull all the versions of
>>>>>>>>>> website every time they clone repo, which is not that good of an idea to do.
>>>>>>>>>> >>
>>>>>>>>>> >> Regards,
>>>>>>>>>> >> --Mikhail
>>>>>>>>>> >>
>>>>>>>>>> >> Have feedback?
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>> >>>
>>>>>>>>>> >>> Pablo, the docs are generated into versioned paths, e.g.,
>>>>>>>>>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so
>>>>>>>>>> tags are not necessary?
>>>>>>>>>> >>> Also, once apache/beam-site is merged with apache/beam the
>>>>>>>>>> release branch should have the relevant docs (although perhaps it's better
>>>>>>>>>> to put them in a different repo or storage system).
>>>>>>>>>> >>>
>>>>>>>>>> >>> Thomas, I would very much like to not have javadoc/pydoc
>>>>>>>>>> generation be part of the website review process, as it takes up a lot of
>>>>>>>>>> time when changes are staged (10s of thousands of files), especially when a
>>>>>>>>>> PR is updated and existing staged files need to be deleted.
>>>>>>>>>> >>>
>>>>>>>>>> >>>
>>>>>>>>>> >>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <
>>>>>>>>>> migryz@google.com> wrote:
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> +1 For removing old documentation.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> @Thomas: Migration work is in backlog and will be picked up
>>>>>>>>>> in near time.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> --Mikhail
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> Have feedback?
>>>>>>>>>> >>>>
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <th...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> +1 for removing pre 2.0 documentation (as well as the
>>>>>>>>>> entries from https://beam.apache.org/get-started/downloads/)
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Isn't it part of the beam-site changes that we will no
>>>>>>>>>> longer check in generated documentation into the repository? Those can be
>>>>>>>>>> generated and deployed independently (when a commit to a branch occurs),
>>>>>>>>>> such as done in the Apex and Flink projects.
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> I was told that Scott who was working in the beam-site
>>>>>>>>>> changes is on leave now and the migration is still pending (see note at
>>>>>>>>>> https://github.com/apache/beam/tree/master/website). Is anyone
>>>>>>>>>> else going to pick it up?
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Thanks,
>>>>>>>>>> >>>>> Thomas
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <
>>>>>>>>>> pabloem@google.com> wrote:
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>>> Is it worth adding a tag / branch to the repositories
>>>>>>>>>> every time we make a release, so that people are able to dive in and find
>>>>>>>>>> the docs?
>>>>>>>>>> >>>>>> Best
>>>>>>>>>> >>>>>> -P.
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <
>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> I would guess that users are still using some of these
>>>>>>>>>> old releases. It is unclear from Beam website which releases are still
>>>>>>>>>> supported or not. It probably makes sense to drop documentation for
>>>>>>>>>> releases < 2.0. (I would suggest keeping docs for 2.0). For the future I
>>>>>>>>>> can work on updating the Beam website to clarify the state of each release.
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <
>>>>>>>>>> ehudm@google.com> wrote:
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> The older docs are not directly linked to and are in
>>>>>>>>>> Github commit history.
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> If there are no objections I'm going to delete javadocs
>>>>>>>>>> and pydocs for releases older than 1 year,
>>>>>>>>>> >>>>>>>> meaning 2.0.0 and older (going by the dates here).
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>>>>>>>>>> danoliveira@google.com> wrote:
>>>>>>>>>> >>>>>>>>>
>>>>>>>>>> >>>>>>>>> The older docs should be recorded in the commit history
>>>>>>>>>> of the website repository, right? If they're not currently used in the
>>>>>>>>>> website and they're in the commit history then I don't see a reason to save
>>>>>>>>>> them.
>>>>>>>>>> >>>>>>>>>
>>>>>>>>>> >>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <
>>>>>>>>>> ehudm@google.com> wrote:
>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>> >>>>>>>>>> Hi all,
>>>>>>>>>> >>>>>>>>>> I'm writing a PR for apache/beam-site and
>>>>>>>>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
>>>>>>>>>> trying to deletes 22k files and then copy 22k files (warning large file).
>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>> >>>>>>>>>> It seems that we could save a lot of time by deleting
>>>>>>>>>> the older javadoc and pydoc files for older versions. Is there a good
>>>>>>>>>> reason to keep around this kind of documentation for older versions (say 1
>>>>>>>>>> year back)?
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>> --
>>>>>>>>>> >>>>>> Got feedback? go/pabloem-feedback
>>>>>>>>>> <https://goto.google.com/pabloem-feedback>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>>>>
>>>>>>
>>
>> --
>>
>>
>>
>>
>> Got feedback? tinyurl.com/swegner-feedback
>>
>
>
> --
>
>
>
>
> Got feedback? tinyurl.com/swegner-feedback
>

Re: Removing documentation for old Beam versions

Posted by Scott Wegner <sc...@apache.org>.
Alan found the place where website publishing is configured [1], which has
examples of project sites being configured with more than one git root.
This is great for us because it allows us to leave generated
javadocs/pydocs in the beam-site repository and publish website markdown
content from the main repo.

Alan has a PR ready to publish generated HTML in a post-commit job [2].
Once that goes through the last step is to upgrade the publishing config.

[1]
https://github.com/apache/infrastructure-puppet/blob/deployment/modules/gitwcsub/files/config/gitwcsub.cfg
[2] https://github.com/apache/beam/pull/6431

On Mon, Sep 24, 2018 at 4:35 PM Scott Wegner <sw...@google.com> wrote:

> > We could add a new default branch (master?) and keep all the
> non-generated files (src/) there, and put generated files (content/) in the
> asf-site branch (like we already do).
>
> I'm strongly in favor of having sources in a single repository. We have
> significant process and infrastructure built up for the apache/beam repo
> (for build, PR, CI, release, etc.) that we can take advantage of by putting
> website sources in the same repo. The current beam-site repo PR automation
> is flaky because it was custom-built and not given the same level of
> attention as the main repo.
>
> The caveat to consolidating website sources in the main repo is that it
> incentivizes putting the generated sources branch on the same repo. I've
> documented a few of the reasons in the Appendix of the design doc [1]:
>  - It's easier to maintain a single repository; easily apply existing
> tooling/infrastructure
> - Jenkins tooling for publishing generated HTML may not work cross-repo [2]
>
> My preference is to move forward with the migration of sources to
> apache/beam [master], and website generated HTML to apache/beam [asf-site].
> I like the idea of separating the publishing/hosting of generated
> javadocs/pydocs since they add so much cruft, but it should not hold up the
> migration.
>
> [1]
> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.wqwi2jpoiiuc
>
> [2]
> https://stackoverflow.com/questions/14843696/checkout-multiple-git-repos-into-same-jenkins-workspace
>
> On Mon, Sep 24, 2018 at 2:33 PM Udi Meiri <eh...@google.com> wrote:
>
>> Staying on beam-site SGTM. We could add a new default branch (master?)
>> and keep all the non-generated files (src/) there, and put generated files
>> (content/) in the asf-site branch (like we already do).
>> That way there's no confusion as to which files you should update.
>> (This is of course assuming we still place generated docs in git repos.)
>>
>> On Mon, Sep 24, 2018 at 11:23 AM Thomas Weise <th...@apache.org> wrote:
>>
>>> My thought was to leave the asf-site branch in the beam-site repository,
>>> add generated docs to that branch (until we have a better solution), and
>>> have only sources in the beam repo.
>>>
>>> Scott had filed https://issues.apache.org/jira/browse/BEAM-5459 -
>>> it would eliminate the need to place generated docs into git repos.
>>>
>>> On Mon, Sep 24, 2018 at 11:06 AM Udi Meiri <eh...@google.com> wrote:
>>>
>>>> I believe that beam.apache.org is populated from the asf-site branch
>>>> of the apache/beam-site repo. (gitpubsub:
>>>> https://www.apache.org/dev/project-site.html#intro)
>>>> If we move the markdown-based docs to apache/beam, leave generated
>>>> javadoc and pydoc in apache/beam-site, and point gitpubsub to apache/beam,
>>>> then javadoc and pydoc will not get pushed to the website.
>>>>
>>>> Is there some place where we can push javadoc and pydoc files? Or
>>>> perhaps there an alternative way to push updates to beam.apache.org?
>>>> (not requiring the asf-site branch)
>>>>
>>>> On Fri, Sep 21, 2018 at 6:40 PM Thomas Weise <th...@apache.org> wrote:
>>>>
>>>>> Hi Scott,
>>>>>
>>>>> Thanks for bringing the discussion back here.
>>>>>
>>>>> I agree that we should separate the changes for hosting of generated
>>>>> java/pydocs from the rest of website automation so that we can make the
>>>>> switch and fix the contributor headache soon.
>>>>>
>>>>> But perhaps we can avoid adding 4m lines of generated code to the main
>>>>> beam repository (and keep on adding with every release) if we continue to
>>>>> serve the site from the old beam-site repo? (I left a comment the doc.)
>>>>>
>>>>> About trying buildbot, as mentioned earlier I would be happy to help
>>>>> with it. I prefer a setup that keeps the docs separate from the web site.
>>>>>
>>>>> Thomas
>>>>>
>>>>>
>>>>> On Fri, Sep 21, 2018 at 10:28 AM Scott Wegner <sc...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Re-opening this thread as it came up today in the discussion for
>>>>>> PR#6458 [1]. This PR is part of the work for Beam-Site Automation
>>>>>> Reliability improvements; design doc here:
>>>>>> https://s.apache.org/beam-site-automation
>>>>>>
>>>>>> The current plan is to keep generated javadoc/pydoc sources only on
>>>>>> the asf-site branch, which is necessary for the current githubpubsub
>>>>>> publishing mechanism. This maintains our current approach, the only change
>>>>>> being that we're moving the asf-site branch from the retiring
>>>>>> apache/beam-site repository into a new apache/beam repo branch.
>>>>>>
>>>>>> The concern for committing generated content is the extra overhead
>>>>>> during git fetch. I did some analysis to measure the impact [2], and found
>>>>>> that fetching a week of source + generated content history from
>>>>>> apache/beam-site took 0.39 seconds.
>>>>>>
>>>>>> I like the idea of publishing javadoc/pydoc snapshots to an external
>>>>>> location like Flink does with buildbot, but that work is separable and
>>>>>> shouldn't be a prerequisite for this effort. The goal of this work is to
>>>>>> improve the reliability of automation for contributing website changes. At
>>>>>> last measure, only about half of beam-site PR merges use Mergebot without
>>>>>> experiencing some reliability issue [3].
>>>>>>
>>>>>> I've opened BEAM-5459 [4] to track moving our generated docs out of
>>>>>> git. Thomas, would you have bandwidth to look into this?
>>>>>>
>>>>>> [1] https://github.com/apache/beam/pull/6458#issuecomment-423406643
>>>>>> [2]
>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.uqzivheohd7j
>>>>>> [3]
>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.a208cwi78xmu
>>>>>> [4] https://issues.apache.org/jira/browse/BEAM-5459
>>>>>>
>>>>>> On Fri, Aug 24, 2018 at 11:48 AM Thomas Weise <th...@apache.org> wrote:
>>>>>>
>>>>>>> Hi Udi,
>>>>>>>
>>>>>>> Good to know you will continue this work.
>>>>>>>
>>>>>>> Let me know if you want to try the buildbot route (which does not
>>>>>>> require generated documentation to be checked into the repo). Happy to help
>>>>>>> with that.
>>>>>>>
>>>>>>> Thomas
>>>>>>>
>>>>>>> On Fri, Aug 24, 2018 at 11:36 AM Udi Meiri <eh...@google.com> wrote:
>>>>>>>
>>>>>>>> I'm picking up the website migration. The plan is to not include
>>>>>>>> generated files in the master branch.
>>>>>>>>
>>>>>>>> However, I've been told that even putting generated files a
>>>>>>>> separate branch could blow up the git repository for all (e.g. make git
>>>>>>>> pulls a lot longer?).
>>>>>>>> Not sure if this is a real issue or not.
>>>>>>>>
>>>>>>>> On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw <
>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>
>>>>>>>>> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise <th...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>> >
>>>>>>>>> > Yes, I think the separation of generated code will need to occur
>>>>>>>>> prior to completing the merge and switching the web site to the main repo.
>>>>>>>>> >
>>>>>>>>> > There should be no reason to check generated documentation into
>>>>>>>>> either of the repos/branches.
>>>>>>>>>
>>>>>>>>> Huge +1 to this. Thomas, would have time to set something like
>>>>>>>>> this up
>>>>>>>>> for Beam? If not, could anyone else pick this up?
>>>>>>>>>
>>>>>>>>> > Please see as an example how this was solved in Flink, using the
>>>>>>>>> ASF buildbot infrastructure.
>>>>>>>>> >
>>>>>>>>> > Documentation per version/release, for example:
>>>>>>>>> >
>>>>>>>>> > https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>>>>>>>>> >
>>>>>>>>> > The buildbot configuration is here (requires committer access):
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
>>>>>>>>> >
>>>>>>>>> > Thanks,
>>>>>>>>> > Thomas
>>>>>>>>> >
>>>>>>>>> > On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin <
>>>>>>>>> migryz@google.com> wrote:
>>>>>>>>> >>
>>>>>>>>> >> Last time I talked with Scott I brought this idea in. I believe
>>>>>>>>> the plan was either to publish compiled site to website directly, or keep
>>>>>>>>> it in separate storage from apache/beam repo.
>>>>>>>>> >>
>>>>>>>>> >> One of the main reasons not to check in compiled version of
>>>>>>>>> website is that every developer will have to pull all the versions of
>>>>>>>>> website every time they clone repo, which is not that good of an idea to do.
>>>>>>>>> >>
>>>>>>>>> >> Regards,
>>>>>>>>> >> --Mikhail
>>>>>>>>> >>
>>>>>>>>> >> Have feedback?
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com>
>>>>>>>>> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>> Pablo, the docs are generated into versioned paths, e.g.,
>>>>>>>>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags
>>>>>>>>> are not necessary?
>>>>>>>>> >>> Also, once apache/beam-site is merged with apache/beam the
>>>>>>>>> release branch should have the relevant docs (although perhaps it's better
>>>>>>>>> to put them in a different repo or storage system).
>>>>>>>>> >>>
>>>>>>>>> >>> Thomas, I would very much like to not have javadoc/pydoc
>>>>>>>>> generation be part of the website review process, as it takes up a lot of
>>>>>>>>> time when changes are staged (10s of thousands of files), especially when a
>>>>>>>>> PR is updated and existing staged files need to be deleted.
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <
>>>>>>>>> migryz@google.com> wrote:
>>>>>>>>> >>>>
>>>>>>>>> >>>> +1 For removing old documentation.
>>>>>>>>> >>>>
>>>>>>>>> >>>> @Thomas: Migration work is in backlog and will be picked up
>>>>>>>>> in near time.
>>>>>>>>> >>>>
>>>>>>>>> >>>> --Mikhail
>>>>>>>>> >>>>
>>>>>>>>> >>>> Have feedback?
>>>>>>>>> >>>>
>>>>>>>>> >>>>
>>>>>>>>> >>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <th...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> +1 for removing pre 2.0 documentation (as well as the
>>>>>>>>> entries from https://beam.apache.org/get-started/downloads/)
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Isn't it part of the beam-site changes that we will no
>>>>>>>>> longer check in generated documentation into the repository? Those can be
>>>>>>>>> generated and deployed independently (when a commit to a branch occurs),
>>>>>>>>> such as done in the Apex and Flink projects.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> I was told that Scott who was working in the beam-site
>>>>>>>>> changes is on leave now and the migration is still pending (see note at
>>>>>>>>> https://github.com/apache/beam/tree/master/website). Is anyone
>>>>>>>>> else going to pick it up?
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Thanks,
>>>>>>>>> >>>>> Thomas
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <
>>>>>>>>> pabloem@google.com> wrote:
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> Is it worth adding a tag / branch to the repositories every
>>>>>>>>> time we make a release, so that people are able to dive in and find the
>>>>>>>>> docs?
>>>>>>>>> >>>>>> Best
>>>>>>>>> >>>>>> -P.
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <
>>>>>>>>> altay@google.com> wrote:
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> I would guess that users are still using some of these old
>>>>>>>>> releases. It is unclear from Beam website which releases are still
>>>>>>>>> supported or not. It probably makes sense to drop documentation for
>>>>>>>>> releases < 2.0. (I would suggest keeping docs for 2.0). For the future I
>>>>>>>>> can work on updating the Beam website to clarify the state of each release.
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <
>>>>>>>>> ehudm@google.com> wrote:
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> The older docs are not directly linked to and are in
>>>>>>>>> Github commit history.
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> If there are no objections I'm going to delete javadocs
>>>>>>>>> and pydocs for releases older than 1 year,
>>>>>>>>> >>>>>>>> meaning 2.0.0 and older (going by the dates here).
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>>>>>>>>> danoliveira@google.com> wrote:
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> The older docs should be recorded in the commit history
>>>>>>>>> of the website repository, right? If they're not currently used in the
>>>>>>>>> website and they're in the commit history then I don't see a reason to save
>>>>>>>>> them.
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <
>>>>>>>>> ehudm@google.com> wrote:
>>>>>>>>> >>>>>>>>>>
>>>>>>>>> >>>>>>>>>> Hi all,
>>>>>>>>> >>>>>>>>>> I'm writing a PR for apache/beam-site and
>>>>>>>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
>>>>>>>>> trying to deletes 22k files and then copy 22k files (warning large file).
>>>>>>>>> >>>>>>>>>>
>>>>>>>>> >>>>>>>>>> It seems that we could save a lot of time by deleting
>>>>>>>>> the older javadoc and pydoc files for older versions. Is there a good
>>>>>>>>> reason to keep around this kind of documentation for older versions (say 1
>>>>>>>>> year back)?
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>> --
>>>>>>>>> >>>>>> Got feedback? go/pabloem-feedback
>>>>>>>>> <https://goto.google.com/pabloem-feedback>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>>>
>>>>>
>
> --
>
>
>
>
> Got feedback? tinyurl.com/swegner-feedback
>


-- 




Got feedback? tinyurl.com/swegner-feedback

Re: Removing documentation for old Beam versions

Posted by Scott Wegner <sw...@google.com>.
> We could add a new default branch (master?) and keep all the
non-generated files (src/) there, and put generated files (content/) in the
asf-site branch (like we already do).

I'm strongly in favor of having sources in a single repository. We have
significant process and infrastructure built up for the apache/beam repo
(for build, PR, CI, release, etc.) that we can take advantage of by putting
website sources in the same repo. The current beam-site repo PR automation
is flaky because it was custom-built and not given the same level of
attention as the main repo.

The caveat to consolidating website sources in the main repo is that it
incentivizes putting the generated sources branch on the same repo. I've
documented a few of the reasons in the Appendix of the design doc [1]:
 - It's easier to maintain a single repository; easily apply existing
tooling/infrastructure
- Jenkins tooling for publishing generated HTML may not work cross-repo [2]

My preference is to move forward with the migration of sources to
apache/beam [master], and website generated HTML to apache/beam [asf-site].
I like the idea of separating the publishing/hosting of generated
javadocs/pydocs since they add so much cruft, but it should not hold up the
migration.

[1]
https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.wqwi2jpoiiuc

[2]
https://stackoverflow.com/questions/14843696/checkout-multiple-git-repos-into-same-jenkins-workspace

On Mon, Sep 24, 2018 at 2:33 PM Udi Meiri <eh...@google.com> wrote:

> Staying on beam-site SGTM. We could add a new default branch (master?) and
> keep all the non-generated files (src/) there, and put generated files
> (content/) in the asf-site branch (like we already do).
> That way there's no confusion as to which files you should update.
> (This is of course assuming we still place generated docs in git repos.)
>
> On Mon, Sep 24, 2018 at 11:23 AM Thomas Weise <th...@apache.org> wrote:
>
>> My thought was to leave the asf-site branch in the beam-site repository,
>> add generated docs to that branch (until we have a better solution), and
>> have only sources in the beam repo.
>>
>> Scott had filed https://issues.apache.org/jira/browse/BEAM-5459 -
>> it would eliminate the need to place generated docs into git repos.
>>
>> On Mon, Sep 24, 2018 at 11:06 AM Udi Meiri <eh...@google.com> wrote:
>>
>>> I believe that beam.apache.org is populated from the asf-site branch of
>>> the apache/beam-site repo. (gitpubsub:
>>> https://www.apache.org/dev/project-site.html#intro)
>>> If we move the markdown-based docs to apache/beam, leave generated
>>> javadoc and pydoc in apache/beam-site, and point gitpubsub to apache/beam,
>>> then javadoc and pydoc will not get pushed to the website.
>>>
>>> Is there some place where we can push javadoc and pydoc files? Or
>>> perhaps there an alternative way to push updates to beam.apache.org?
>>> (not requiring the asf-site branch)
>>>
>>> On Fri, Sep 21, 2018 at 6:40 PM Thomas Weise <th...@apache.org> wrote:
>>>
>>>> Hi Scott,
>>>>
>>>> Thanks for bringing the discussion back here.
>>>>
>>>> I agree that we should separate the changes for hosting of generated
>>>> java/pydocs from the rest of website automation so that we can make the
>>>> switch and fix the contributor headache soon.
>>>>
>>>> But perhaps we can avoid adding 4m lines of generated code to the main
>>>> beam repository (and keep on adding with every release) if we continue to
>>>> serve the site from the old beam-site repo? (I left a comment the doc.)
>>>>
>>>> About trying buildbot, as mentioned earlier I would be happy to help
>>>> with it. I prefer a setup that keeps the docs separate from the web site.
>>>>
>>>> Thomas
>>>>
>>>>
>>>> On Fri, Sep 21, 2018 at 10:28 AM Scott Wegner <sc...@apache.org> wrote:
>>>>
>>>>> Re-opening this thread as it came up today in the discussion for
>>>>> PR#6458 [1]. This PR is part of the work for Beam-Site Automation
>>>>> Reliability improvements; design doc here:
>>>>> https://s.apache.org/beam-site-automation
>>>>>
>>>>> The current plan is to keep generated javadoc/pydoc sources only on
>>>>> the asf-site branch, which is necessary for the current githubpubsub
>>>>> publishing mechanism. This maintains our current approach, the only change
>>>>> being that we're moving the asf-site branch from the retiring
>>>>> apache/beam-site repository into a new apache/beam repo branch.
>>>>>
>>>>> The concern for committing generated content is the extra overhead
>>>>> during git fetch. I did some analysis to measure the impact [2], and found
>>>>> that fetching a week of source + generated content history from
>>>>> apache/beam-site took 0.39 seconds.
>>>>>
>>>>> I like the idea of publishing javadoc/pydoc snapshots to an external
>>>>> location like Flink does with buildbot, but that work is separable and
>>>>> shouldn't be a prerequisite for this effort. The goal of this work is to
>>>>> improve the reliability of automation for contributing website changes. At
>>>>> last measure, only about half of beam-site PR merges use Mergebot without
>>>>> experiencing some reliability issue [3].
>>>>>
>>>>> I've opened BEAM-5459 [4] to track moving our generated docs out of
>>>>> git. Thomas, would you have bandwidth to look into this?
>>>>>
>>>>> [1] https://github.com/apache/beam/pull/6458#issuecomment-423406643
>>>>> [2]
>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.uqzivheohd7j
>>>>> [3]
>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.a208cwi78xmu
>>>>> [4] https://issues.apache.org/jira/browse/BEAM-5459
>>>>>
>>>>> On Fri, Aug 24, 2018 at 11:48 AM Thomas Weise <th...@apache.org> wrote:
>>>>>
>>>>>> Hi Udi,
>>>>>>
>>>>>> Good to know you will continue this work.
>>>>>>
>>>>>> Let me know if you want to try the buildbot route (which does not
>>>>>> require generated documentation to be checked into the repo). Happy to help
>>>>>> with that.
>>>>>>
>>>>>> Thomas
>>>>>>
>>>>>> On Fri, Aug 24, 2018 at 11:36 AM Udi Meiri <eh...@google.com> wrote:
>>>>>>
>>>>>>> I'm picking up the website migration. The plan is to not include
>>>>>>> generated files in the master branch.
>>>>>>>
>>>>>>> However, I've been told that even putting generated files a separate
>>>>>>> branch could blow up the git repository for all (e.g. make git pulls a lot
>>>>>>> longer?).
>>>>>>> Not sure if this is a real issue or not.
>>>>>>>
>>>>>>> On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw <ro...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise <th...@apache.org> wrote:
>>>>>>>> >
>>>>>>>> > Yes, I think the separation of generated code will need to occur
>>>>>>>> prior to completing the merge and switching the web site to the main repo.
>>>>>>>> >
>>>>>>>> > There should be no reason to check generated documentation into
>>>>>>>> either of the repos/branches.
>>>>>>>>
>>>>>>>> Huge +1 to this. Thomas, would have time to set something like this
>>>>>>>> up
>>>>>>>> for Beam? If not, could anyone else pick this up?
>>>>>>>>
>>>>>>>> > Please see as an example how this was solved in Flink, using the
>>>>>>>> ASF buildbot infrastructure.
>>>>>>>> >
>>>>>>>> > Documentation per version/release, for example:
>>>>>>>> >
>>>>>>>> > https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>>>>>>>> >
>>>>>>>> > The buildbot configuration is here (requires committer access):
>>>>>>>> >
>>>>>>>> >
>>>>>>>> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
>>>>>>>> >
>>>>>>>> > Thanks,
>>>>>>>> > Thomas
>>>>>>>> >
>>>>>>>> > On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin <
>>>>>>>> migryz@google.com> wrote:
>>>>>>>> >>
>>>>>>>> >> Last time I talked with Scott I brought this idea in. I believe
>>>>>>>> the plan was either to publish compiled site to website directly, or keep
>>>>>>>> it in separate storage from apache/beam repo.
>>>>>>>> >>
>>>>>>>> >> One of the main reasons not to check in compiled version of
>>>>>>>> website is that every developer will have to pull all the versions of
>>>>>>>> website every time they clone repo, which is not that good of an idea to do.
>>>>>>>> >>
>>>>>>>> >> Regards,
>>>>>>>> >> --Mikhail
>>>>>>>> >>
>>>>>>>> >> Have feedback?
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com>
>>>>>>>> wrote:
>>>>>>>> >>>
>>>>>>>> >>> Pablo, the docs are generated into versioned paths, e.g.,
>>>>>>>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags
>>>>>>>> are not necessary?
>>>>>>>> >>> Also, once apache/beam-site is merged with apache/beam the
>>>>>>>> release branch should have the relevant docs (although perhaps it's better
>>>>>>>> to put them in a different repo or storage system).
>>>>>>>> >>>
>>>>>>>> >>> Thomas, I would very much like to not have javadoc/pydoc
>>>>>>>> generation be part of the website review process, as it takes up a lot of
>>>>>>>> time when changes are staged (10s of thousands of files), especially when a
>>>>>>>> PR is updated and existing staged files need to be deleted.
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <
>>>>>>>> migryz@google.com> wrote:
>>>>>>>> >>>>
>>>>>>>> >>>> +1 For removing old documentation.
>>>>>>>> >>>>
>>>>>>>> >>>> @Thomas: Migration work is in backlog and will be picked up in
>>>>>>>> near time.
>>>>>>>> >>>>
>>>>>>>> >>>> --Mikhail
>>>>>>>> >>>>
>>>>>>>> >>>> Have feedback?
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <th...@apache.org>
>>>>>>>> wrote:
>>>>>>>> >>>>>
>>>>>>>> >>>>> +1 for removing pre 2.0 documentation (as well as the entries
>>>>>>>> from https://beam.apache.org/get-started/downloads/)
>>>>>>>> >>>>>
>>>>>>>> >>>>> Isn't it part of the beam-site changes that we will no longer
>>>>>>>> check in generated documentation into the repository? Those can be
>>>>>>>> generated and deployed independently (when a commit to a branch occurs),
>>>>>>>> such as done in the Apex and Flink projects.
>>>>>>>> >>>>>
>>>>>>>> >>>>> I was told that Scott who was working in the beam-site
>>>>>>>> changes is on leave now and the migration is still pending (see note at
>>>>>>>> https://github.com/apache/beam/tree/master/website). Is anyone
>>>>>>>> else going to pick it up?
>>>>>>>> >>>>>
>>>>>>>> >>>>> Thanks,
>>>>>>>> >>>>> Thomas
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <
>>>>>>>> pabloem@google.com> wrote:
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Is it worth adding a tag / branch to the repositories every
>>>>>>>> time we make a release, so that people are able to dive in and find the
>>>>>>>> docs?
>>>>>>>> >>>>>> Best
>>>>>>>> >>>>>> -P.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <
>>>>>>>> altay@google.com> wrote:
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> I would guess that users are still using some of these old
>>>>>>>> releases. It is unclear from Beam website which releases are still
>>>>>>>> supported or not. It probably makes sense to drop documentation for
>>>>>>>> releases < 2.0. (I would suggest keeping docs for 2.0). For the future I
>>>>>>>> can work on updating the Beam website to clarify the state of each release.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <
>>>>>>>> ehudm@google.com> wrote:
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> The older docs are not directly linked to and are in
>>>>>>>> Github commit history.
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> If there are no objections I'm going to delete javadocs
>>>>>>>> and pydocs for releases older than 1 year,
>>>>>>>> >>>>>>>> meaning 2.0.0 and older (going by the dates here).
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>>>>>>>> danoliveira@google.com> wrote:
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> The older docs should be recorded in the commit history
>>>>>>>> of the website repository, right? If they're not currently used in the
>>>>>>>> website and they're in the commit history then I don't see a reason to save
>>>>>>>> them.
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <
>>>>>>>> ehudm@google.com> wrote:
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> Hi all,
>>>>>>>> >>>>>>>>>> I'm writing a PR for apache/beam-site and
>>>>>>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
>>>>>>>> trying to deletes 22k files and then copy 22k files (warning large file).
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> It seems that we could save a lot of time by deleting
>>>>>>>> the older javadoc and pydoc files for older versions. Is there a good
>>>>>>>> reason to keep around this kind of documentation for older versions (say 1
>>>>>>>> year back)?
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>> --
>>>>>>>> >>>>>> Got feedback? go/pabloem-feedback
>>>>>>>> <https://goto.google.com/pabloem-feedback>
>>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>>
>>>>

-- 




Got feedback? tinyurl.com/swegner-feedback

Re: Removing documentation for old Beam versions

Posted by Udi Meiri <eh...@google.com>.
Staying on beam-site SGTM. We could add a new default branch (master?) and
keep all the non-generated files (src/) there, and put generated files
(content/) in the asf-site branch (like we already do).
That way there's no confusion as to which files you should update.
(This is of course assuming we still place generated docs in git repos.)

On Mon, Sep 24, 2018 at 11:23 AM Thomas Weise <th...@apache.org> wrote:

> My thought was to leave the asf-site branch in the beam-site repository,
> add generated docs to that branch (until we have a better solution), and
> have only sources in the beam repo.
>
> Scott had filed https://issues.apache.org/jira/browse/BEAM-5459 -
> it would eliminate the need to place generated docs into git repos.
>
> On Mon, Sep 24, 2018 at 11:06 AM Udi Meiri <eh...@google.com> wrote:
>
>> I believe that beam.apache.org is populated from the asf-site branch of
>> the apache/beam-site repo. (gitpubsub:
>> https://www.apache.org/dev/project-site.html#intro)
>> If we move the markdown-based docs to apache/beam, leave generated
>> javadoc and pydoc in apache/beam-site, and point gitpubsub to apache/beam,
>> then javadoc and pydoc will not get pushed to the website.
>>
>> Is there some place where we can push javadoc and pydoc files? Or perhaps
>> there an alternative way to push updates to beam.apache.org? (not
>> requiring the asf-site branch)
>>
>> On Fri, Sep 21, 2018 at 6:40 PM Thomas Weise <th...@apache.org> wrote:
>>
>>> Hi Scott,
>>>
>>> Thanks for bringing the discussion back here.
>>>
>>> I agree that we should separate the changes for hosting of generated
>>> java/pydocs from the rest of website automation so that we can make the
>>> switch and fix the contributor headache soon.
>>>
>>> But perhaps we can avoid adding 4m lines of generated code to the main
>>> beam repository (and keep on adding with every release) if we continue to
>>> serve the site from the old beam-site repo? (I left a comment the doc.)
>>>
>>> About trying buildbot, as mentioned earlier I would be happy to help
>>> with it. I prefer a setup that keeps the docs separate from the web site.
>>>
>>> Thomas
>>>
>>>
>>> On Fri, Sep 21, 2018 at 10:28 AM Scott Wegner <sc...@apache.org> wrote:
>>>
>>>> Re-opening this thread as it came up today in the discussion for
>>>> PR#6458 [1]. This PR is part of the work for Beam-Site Automation
>>>> Reliability improvements; design doc here:
>>>> https://s.apache.org/beam-site-automation
>>>>
>>>> The current plan is to keep generated javadoc/pydoc sources only on the
>>>> asf-site branch, which is necessary for the current githubpubsub publishing
>>>> mechanism. This maintains our current approach, the only change being that
>>>> we're moving the asf-site branch from the retiring apache/beam-site
>>>> repository into a new apache/beam repo branch.
>>>>
>>>> The concern for committing generated content is the extra overhead
>>>> during git fetch. I did some analysis to measure the impact [2], and found
>>>> that fetching a week of source + generated content history from
>>>> apache/beam-site took 0.39 seconds.
>>>>
>>>> I like the idea of publishing javadoc/pydoc snapshots to an external
>>>> location like Flink does with buildbot, but that work is separable and
>>>> shouldn't be a prerequisite for this effort. The goal of this work is to
>>>> improve the reliability of automation for contributing website changes. At
>>>> last measure, only about half of beam-site PR merges use Mergebot without
>>>> experiencing some reliability issue [3].
>>>>
>>>> I've opened BEAM-5459 [4] to track moving our generated docs out of
>>>> git. Thomas, would you have bandwidth to look into this?
>>>>
>>>> [1] https://github.com/apache/beam/pull/6458#issuecomment-423406643
>>>> [2]
>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.uqzivheohd7j
>>>> [3]
>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.a208cwi78xmu
>>>> [4] https://issues.apache.org/jira/browse/BEAM-5459
>>>>
>>>> On Fri, Aug 24, 2018 at 11:48 AM Thomas Weise <th...@apache.org> wrote:
>>>>
>>>>> Hi Udi,
>>>>>
>>>>> Good to know you will continue this work.
>>>>>
>>>>> Let me know if you want to try the buildbot route (which does not
>>>>> require generated documentation to be checked into the repo). Happy to help
>>>>> with that.
>>>>>
>>>>> Thomas
>>>>>
>>>>> On Fri, Aug 24, 2018 at 11:36 AM Udi Meiri <eh...@google.com> wrote:
>>>>>
>>>>>> I'm picking up the website migration. The plan is to not include
>>>>>> generated files in the master branch.
>>>>>>
>>>>>> However, I've been told that even putting generated files a separate
>>>>>> branch could blow up the git repository for all (e.g. make git pulls a lot
>>>>>> longer?).
>>>>>> Not sure if this is a real issue or not.
>>>>>>
>>>>>> On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw <ro...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise <th...@apache.org> wrote:
>>>>>>> >
>>>>>>> > Yes, I think the separation of generated code will need to occur
>>>>>>> prior to completing the merge and switching the web site to the main repo.
>>>>>>> >
>>>>>>> > There should be no reason to check generated documentation into
>>>>>>> either of the repos/branches.
>>>>>>>
>>>>>>> Huge +1 to this. Thomas, would have time to set something like this
>>>>>>> up
>>>>>>> for Beam? If not, could anyone else pick this up?
>>>>>>>
>>>>>>> > Please see as an example how this was solved in Flink, using the
>>>>>>> ASF buildbot infrastructure.
>>>>>>> >
>>>>>>> > Documentation per version/release, for example:
>>>>>>> >
>>>>>>> > https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>>>>>>> >
>>>>>>> > The buildbot configuration is here (requires committer access):
>>>>>>> >
>>>>>>> >
>>>>>>> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> > Thomas
>>>>>>> >
>>>>>>> > On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin <
>>>>>>> migryz@google.com> wrote:
>>>>>>> >>
>>>>>>> >> Last time I talked with Scott I brought this idea in. I believe
>>>>>>> the plan was either to publish compiled site to website directly, or keep
>>>>>>> it in separate storage from apache/beam repo.
>>>>>>> >>
>>>>>>> >> One of the main reasons not to check in compiled version of
>>>>>>> website is that every developer will have to pull all the versions of
>>>>>>> website every time they clone repo, which is not that good of an idea to do.
>>>>>>> >>
>>>>>>> >> Regards,
>>>>>>> >> --Mikhail
>>>>>>> >>
>>>>>>> >> Have feedback?
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com>
>>>>>>> wrote:
>>>>>>> >>>
>>>>>>> >>> Pablo, the docs are generated into versioned paths, e.g.,
>>>>>>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags
>>>>>>> are not necessary?
>>>>>>> >>> Also, once apache/beam-site is merged with apache/beam the
>>>>>>> release branch should have the relevant docs (although perhaps it's better
>>>>>>> to put them in a different repo or storage system).
>>>>>>> >>>
>>>>>>> >>> Thomas, I would very much like to not have javadoc/pydoc
>>>>>>> generation be part of the website review process, as it takes up a lot of
>>>>>>> time when changes are staged (10s of thousands of files), especially when a
>>>>>>> PR is updated and existing staged files need to be deleted.
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <
>>>>>>> migryz@google.com> wrote:
>>>>>>> >>>>
>>>>>>> >>>> +1 For removing old documentation.
>>>>>>> >>>>
>>>>>>> >>>> @Thomas: Migration work is in backlog and will be picked up in
>>>>>>> near time.
>>>>>>> >>>>
>>>>>>> >>>> --Mikhail
>>>>>>> >>>>
>>>>>>> >>>> Have feedback?
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <th...@apache.org>
>>>>>>> wrote:
>>>>>>> >>>>>
>>>>>>> >>>>> +1 for removing pre 2.0 documentation (as well as the entries
>>>>>>> from https://beam.apache.org/get-started/downloads/)
>>>>>>> >>>>>
>>>>>>> >>>>> Isn't it part of the beam-site changes that we will no longer
>>>>>>> check in generated documentation into the repository? Those can be
>>>>>>> generated and deployed independently (when a commit to a branch occurs),
>>>>>>> such as done in the Apex and Flink projects.
>>>>>>> >>>>>
>>>>>>> >>>>> I was told that Scott who was working in the beam-site changes
>>>>>>> is on leave now and the migration is still pending (see note at
>>>>>>> https://github.com/apache/beam/tree/master/website). Is anyone else
>>>>>>> going to pick it up?
>>>>>>> >>>>>
>>>>>>> >>>>> Thanks,
>>>>>>> >>>>> Thomas
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <
>>>>>>> pabloem@google.com> wrote:
>>>>>>> >>>>>>
>>>>>>> >>>>>> Is it worth adding a tag / branch to the repositories every
>>>>>>> time we make a release, so that people are able to dive in and find the
>>>>>>> docs?
>>>>>>> >>>>>> Best
>>>>>>> >>>>>> -P.
>>>>>>> >>>>>>
>>>>>>> >>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <al...@google.com>
>>>>>>> wrote:
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> I would guess that users are still using some of these old
>>>>>>> releases. It is unclear from Beam website which releases are still
>>>>>>> supported or not. It probably makes sense to drop documentation for
>>>>>>> releases < 2.0. (I would suggest keeping docs for 2.0). For the future I
>>>>>>> can work on updating the Beam website to clarify the state of each release.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <eh...@google.com>
>>>>>>> wrote:
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> The older docs are not directly linked to and are in Github
>>>>>>> commit history.
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> If there are no objections I'm going to delete javadocs and
>>>>>>> pydocs for releases older than 1 year,
>>>>>>> >>>>>>>> meaning 2.0.0 and older (going by the dates here).
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>>>>>>> danoliveira@google.com> wrote:
>>>>>>> >>>>>>>>>
>>>>>>> >>>>>>>>> The older docs should be recorded in the commit history of
>>>>>>> the website repository, right? If they're not currently used in the website
>>>>>>> and they're in the commit history then I don't see a reason to save them.
>>>>>>> >>>>>>>>>
>>>>>>> >>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <
>>>>>>> ehudm@google.com> wrote:
>>>>>>> >>>>>>>>>>
>>>>>>> >>>>>>>>>> Hi all,
>>>>>>> >>>>>>>>>> I'm writing a PR for apache/beam-site and
>>>>>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
>>>>>>> trying to deletes 22k files and then copy 22k files (warning large file).
>>>>>>> >>>>>>>>>>
>>>>>>> >>>>>>>>>> It seems that we could save a lot of time by deleting the
>>>>>>> older javadoc and pydoc files for older versions. Is there a good reason to
>>>>>>> keep around this kind of documentation for older versions (say 1 year back)?
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>> --
>>>>>>> >>>>>> Got feedback? go/pabloem-feedback
>>>>>>> <https://goto.google.com/pabloem-feedback>
>>>>>>>
>>>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>
>>>

Re: Removing documentation for old Beam versions

Posted by Thomas Weise <th...@apache.org>.
My thought was to leave the asf-site branch in the beam-site repository,
add generated docs to that branch (until we have a better solution), and
have only sources in the beam repo.

Scott had filed https://issues.apache.org/jira/browse/BEAM-5459 - it would
eliminate the need to place generated docs into git repos.

On Mon, Sep 24, 2018 at 11:06 AM Udi Meiri <eh...@google.com> wrote:

> I believe that beam.apache.org is populated from the asf-site branch of
> the apache/beam-site repo. (gitpubsub:
> https://www.apache.org/dev/project-site.html#intro)
> If we move the markdown-based docs to apache/beam, leave generated javadoc
> and pydoc in apache/beam-site, and point gitpubsub to apache/beam, then
> javadoc and pydoc will not get pushed to the website.
>
> Is there some place where we can push javadoc and pydoc files? Or perhaps
> there an alternative way to push updates to beam.apache.org? (not
> requiring the asf-site branch)
>
> On Fri, Sep 21, 2018 at 6:40 PM Thomas Weise <th...@apache.org> wrote:
>
>> Hi Scott,
>>
>> Thanks for bringing the discussion back here.
>>
>> I agree that we should separate the changes for hosting of generated
>> java/pydocs from the rest of website automation so that we can make the
>> switch and fix the contributor headache soon.
>>
>> But perhaps we can avoid adding 4m lines of generated code to the main
>> beam repository (and keep on adding with every release) if we continue to
>> serve the site from the old beam-site repo? (I left a comment the doc.)
>>
>> About trying buildbot, as mentioned earlier I would be happy to help with
>> it. I prefer a setup that keeps the docs separate from the web site.
>>
>> Thomas
>>
>>
>> On Fri, Sep 21, 2018 at 10:28 AM Scott Wegner <sc...@apache.org> wrote:
>>
>>> Re-opening this thread as it came up today in the discussion for PR#6458
>>> [1]. This PR is part of the work for Beam-Site Automation Reliability
>>> improvements; design doc here: https://s.apache.org/beam-site-automation
>>>
>>> The current plan is to keep generated javadoc/pydoc sources only on the
>>> asf-site branch, which is necessary for the current githubpubsub publishing
>>> mechanism. This maintains our current approach, the only change being that
>>> we're moving the asf-site branch from the retiring apache/beam-site
>>> repository into a new apache/beam repo branch.
>>>
>>> The concern for committing generated content is the extra overhead
>>> during git fetch. I did some analysis to measure the impact [2], and found
>>> that fetching a week of source + generated content history from
>>> apache/beam-site took 0.39 seconds.
>>>
>>> I like the idea of publishing javadoc/pydoc snapshots to an external
>>> location like Flink does with buildbot, but that work is separable and
>>> shouldn't be a prerequisite for this effort. The goal of this work is to
>>> improve the reliability of automation for contributing website changes. At
>>> last measure, only about half of beam-site PR merges use Mergebot without
>>> experiencing some reliability issue [3].
>>>
>>> I've opened BEAM-5459 [4] to track moving our generated docs out of git.
>>> Thomas, would you have bandwidth to look into this?
>>>
>>> [1] https://github.com/apache/beam/pull/6458#issuecomment-423406643
>>> [2]
>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.uqzivheohd7j
>>> [3]
>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.a208cwi78xmu
>>> [4] https://issues.apache.org/jira/browse/BEAM-5459
>>>
>>> On Fri, Aug 24, 2018 at 11:48 AM Thomas Weise <th...@apache.org> wrote:
>>>
>>>> Hi Udi,
>>>>
>>>> Good to know you will continue this work.
>>>>
>>>> Let me know if you want to try the buildbot route (which does not
>>>> require generated documentation to be checked into the repo). Happy to help
>>>> with that.
>>>>
>>>> Thomas
>>>>
>>>> On Fri, Aug 24, 2018 at 11:36 AM Udi Meiri <eh...@google.com> wrote:
>>>>
>>>>> I'm picking up the website migration. The plan is to not include
>>>>> generated files in the master branch.
>>>>>
>>>>> However, I've been told that even putting generated files a separate
>>>>> branch could blow up the git repository for all (e.g. make git pulls a lot
>>>>> longer?).
>>>>> Not sure if this is a real issue or not.
>>>>>
>>>>> On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw <ro...@google.com>
>>>>> wrote:
>>>>>
>>>>>> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise <th...@apache.org> wrote:
>>>>>> >
>>>>>> > Yes, I think the separation of generated code will need to occur
>>>>>> prior to completing the merge and switching the web site to the main repo.
>>>>>> >
>>>>>> > There should be no reason to check generated documentation into
>>>>>> either of the repos/branches.
>>>>>>
>>>>>> Huge +1 to this. Thomas, would have time to set something like this up
>>>>>> for Beam? If not, could anyone else pick this up?
>>>>>>
>>>>>> > Please see as an example how this was solved in Flink, using the
>>>>>> ASF buildbot infrastructure.
>>>>>> >
>>>>>> > Documentation per version/release, for example:
>>>>>> >
>>>>>> > https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>>>>>> >
>>>>>> > The buildbot configuration is here (requires committer access):
>>>>>> >
>>>>>> >
>>>>>> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
>>>>>> >
>>>>>> > Thanks,
>>>>>> > Thomas
>>>>>> >
>>>>>> > On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin <mi...@google.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> Last time I talked with Scott I brought this idea in. I believe
>>>>>> the plan was either to publish compiled site to website directly, or keep
>>>>>> it in separate storage from apache/beam repo.
>>>>>> >>
>>>>>> >> One of the main reasons not to check in compiled version of
>>>>>> website is that every developer will have to pull all the versions of
>>>>>> website every time they clone repo, which is not that good of an idea to do.
>>>>>> >>
>>>>>> >> Regards,
>>>>>> >> --Mikhail
>>>>>> >>
>>>>>> >> Have feedback?
>>>>>> >>
>>>>>> >>
>>>>>> >> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com> wrote:
>>>>>> >>>
>>>>>> >>> Pablo, the docs are generated into versioned paths, e.g.,
>>>>>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags
>>>>>> are not necessary?
>>>>>> >>> Also, once apache/beam-site is merged with apache/beam the
>>>>>> release branch should have the relevant docs (although perhaps it's better
>>>>>> to put them in a different repo or storage system).
>>>>>> >>>
>>>>>> >>> Thomas, I would very much like to not have javadoc/pydoc
>>>>>> generation be part of the website review process, as it takes up a lot of
>>>>>> time when changes are staged (10s of thousands of files), especially when a
>>>>>> PR is updated and existing staged files need to be deleted.
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <
>>>>>> migryz@google.com> wrote:
>>>>>> >>>>
>>>>>> >>>> +1 For removing old documentation.
>>>>>> >>>>
>>>>>> >>>> @Thomas: Migration work is in backlog and will be picked up in
>>>>>> near time.
>>>>>> >>>>
>>>>>> >>>> --Mikhail
>>>>>> >>>>
>>>>>> >>>> Have feedback?
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <th...@apache.org>
>>>>>> wrote:
>>>>>> >>>>>
>>>>>> >>>>> +1 for removing pre 2.0 documentation (as well as the entries
>>>>>> from https://beam.apache.org/get-started/downloads/)
>>>>>> >>>>>
>>>>>> >>>>> Isn't it part of the beam-site changes that we will no longer
>>>>>> check in generated documentation into the repository? Those can be
>>>>>> generated and deployed independently (when a commit to a branch occurs),
>>>>>> such as done in the Apex and Flink projects.
>>>>>> >>>>>
>>>>>> >>>>> I was told that Scott who was working in the beam-site changes
>>>>>> is on leave now and the migration is still pending (see note at
>>>>>> https://github.com/apache/beam/tree/master/website). Is anyone else
>>>>>> going to pick it up?
>>>>>> >>>>>
>>>>>> >>>>> Thanks,
>>>>>> >>>>> Thomas
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <
>>>>>> pabloem@google.com> wrote:
>>>>>> >>>>>>
>>>>>> >>>>>> Is it worth adding a tag / branch to the repositories every
>>>>>> time we make a release, so that people are able to dive in and find the
>>>>>> docs?
>>>>>> >>>>>> Best
>>>>>> >>>>>> -P.
>>>>>> >>>>>>
>>>>>> >>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <al...@google.com>
>>>>>> wrote:
>>>>>> >>>>>>>
>>>>>> >>>>>>> I would guess that users are still using some of these old
>>>>>> releases. It is unclear from Beam website which releases are still
>>>>>> supported or not. It probably makes sense to drop documentation for
>>>>>> releases < 2.0. (I would suggest keeping docs for 2.0). For the future I
>>>>>> can work on updating the Beam website to clarify the state of each release.
>>>>>> >>>>>>>
>>>>>> >>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <eh...@google.com>
>>>>>> wrote:
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> The older docs are not directly linked to and are in Github
>>>>>> commit history.
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> If there are no objections I'm going to delete javadocs and
>>>>>> pydocs for releases older than 1 year,
>>>>>> >>>>>>>> meaning 2.0.0 and older (going by the dates here).
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>>>>>> danoliveira@google.com> wrote:
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>> The older docs should be recorded in the commit history of
>>>>>> the website repository, right? If they're not currently used in the website
>>>>>> and they're in the commit history then I don't see a reason to save them.
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <eh...@google.com>
>>>>>> wrote:
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> Hi all,
>>>>>> >>>>>>>>>> I'm writing a PR for apache/beam-site and
>>>>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
>>>>>> trying to deletes 22k files and then copy 22k files (warning large file).
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> It seems that we could save a lot of time by deleting the
>>>>>> older javadoc and pydoc files for older versions. Is there a good reason to
>>>>>> keep around this kind of documentation for older versions (say 1 year back)?
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>> --
>>>>>> >>>>>> Got feedback? go/pabloem-feedback
>>>>>> <https://goto.google.com/pabloem-feedback>
>>>>>>
>>>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>> Got feedback? tinyurl.com/swegner-feedback
>>>
>>

Re: Removing documentation for old Beam versions

Posted by Udi Meiri <eh...@google.com>.
I believe that beam.apache.org is populated from the asf-site branch of the
apache/beam-site repo. (gitpubsub:
https://www.apache.org/dev/project-site.html#intro)
If we move the markdown-based docs to apache/beam, leave generated javadoc
and pydoc in apache/beam-site, and point gitpubsub to apache/beam, then
javadoc and pydoc will not get pushed to the website.

Is there some place where we can push javadoc and pydoc files? Or perhaps
there an alternative way to push updates to beam.apache.org? (not requiring
the asf-site branch)

On Fri, Sep 21, 2018 at 6:40 PM Thomas Weise <th...@apache.org> wrote:

> Hi Scott,
>
> Thanks for bringing the discussion back here.
>
> I agree that we should separate the changes for hosting of generated
> java/pydocs from the rest of website automation so that we can make the
> switch and fix the contributor headache soon.
>
> But perhaps we can avoid adding 4m lines of generated code to the main
> beam repository (and keep on adding with every release) if we continue to
> serve the site from the old beam-site repo? (I left a comment the doc.)
>
> About trying buildbot, as mentioned earlier I would be happy to help with
> it. I prefer a setup that keeps the docs separate from the web site.
>
> Thomas
>
>
> On Fri, Sep 21, 2018 at 10:28 AM Scott Wegner <sc...@apache.org> wrote:
>
>> Re-opening this thread as it came up today in the discussion for PR#6458
>> [1]. This PR is part of the work for Beam-Site Automation Reliability
>> improvements; design doc here: https://s.apache.org/beam-site-automation
>>
>> The current plan is to keep generated javadoc/pydoc sources only on the
>> asf-site branch, which is necessary for the current githubpubsub publishing
>> mechanism. This maintains our current approach, the only change being that
>> we're moving the asf-site branch from the retiring apache/beam-site
>> repository into a new apache/beam repo branch.
>>
>> The concern for committing generated content is the extra overhead during
>> git fetch. I did some analysis to measure the impact [2], and found that
>> fetching a week of source + generated content history from apache/beam-site
>> took 0.39 seconds.
>>
>> I like the idea of publishing javadoc/pydoc snapshots to an external
>> location like Flink does with buildbot, but that work is separable and
>> shouldn't be a prerequisite for this effort. The goal of this work is to
>> improve the reliability of automation for contributing website changes. At
>> last measure, only about half of beam-site PR merges use Mergebot without
>> experiencing some reliability issue [3].
>>
>> I've opened BEAM-5459 [4] to track moving our generated docs out of git.
>> Thomas, would you have bandwidth to look into this?
>>
>> [1] https://github.com/apache/beam/pull/6458#issuecomment-423406643
>> [2]
>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.uqzivheohd7j
>> [3]
>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.a208cwi78xmu
>> [4] https://issues.apache.org/jira/browse/BEAM-5459
>>
>> On Fri, Aug 24, 2018 at 11:48 AM Thomas Weise <th...@apache.org> wrote:
>>
>>> Hi Udi,
>>>
>>> Good to know you will continue this work.
>>>
>>> Let me know if you want to try the buildbot route (which does not
>>> require generated documentation to be checked into the repo). Happy to help
>>> with that.
>>>
>>> Thomas
>>>
>>> On Fri, Aug 24, 2018 at 11:36 AM Udi Meiri <eh...@google.com> wrote:
>>>
>>>> I'm picking up the website migration. The plan is to not include
>>>> generated files in the master branch.
>>>>
>>>> However, I've been told that even putting generated files a separate
>>>> branch could blow up the git repository for all (e.g. make git pulls a lot
>>>> longer?).
>>>> Not sure if this is a real issue or not.
>>>>
>>>> On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw <ro...@google.com>
>>>> wrote:
>>>>
>>>>> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise <th...@apache.org> wrote:
>>>>> >
>>>>> > Yes, I think the separation of generated code will need to occur
>>>>> prior to completing the merge and switching the web site to the main repo.
>>>>> >
>>>>> > There should be no reason to check generated documentation into
>>>>> either of the repos/branches.
>>>>>
>>>>> Huge +1 to this. Thomas, would have time to set something like this up
>>>>> for Beam? If not, could anyone else pick this up?
>>>>>
>>>>> > Please see as an example how this was solved in Flink, using the ASF
>>>>> buildbot infrastructure.
>>>>> >
>>>>> > Documentation per version/release, for example:
>>>>> >
>>>>> > https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>>>>> >
>>>>> > The buildbot configuration is here (requires committer access):
>>>>> >
>>>>> >
>>>>> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
>>>>> >
>>>>> > Thanks,
>>>>> > Thomas
>>>>> >
>>>>> > On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin <mi...@google.com>
>>>>> wrote:
>>>>> >>
>>>>> >> Last time I talked with Scott I brought this idea in. I believe the
>>>>> plan was either to publish compiled site to website directly, or keep it in
>>>>> separate storage from apache/beam repo.
>>>>> >>
>>>>> >> One of the main reasons not to check in compiled version of website
>>>>> is that every developer will have to pull all the versions of website every
>>>>> time they clone repo, which is not that good of an idea to do.
>>>>> >>
>>>>> >> Regards,
>>>>> >> --Mikhail
>>>>> >>
>>>>> >> Have feedback?
>>>>> >>
>>>>> >>
>>>>> >> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com> wrote:
>>>>> >>>
>>>>> >>> Pablo, the docs are generated into versioned paths, e.g.,
>>>>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags are
>>>>> not necessary?
>>>>> >>> Also, once apache/beam-site is merged with apache/beam the release
>>>>> branch should have the relevant docs (although perhaps it's better to put
>>>>> them in a different repo or storage system).
>>>>> >>>
>>>>> >>> Thomas, I would very much like to not have javadoc/pydoc
>>>>> generation be part of the website review process, as it takes up a lot of
>>>>> time when changes are staged (10s of thousands of files), especially when a
>>>>> PR is updated and existing staged files need to be deleted.
>>>>> >>>
>>>>> >>>
>>>>> >>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <
>>>>> migryz@google.com> wrote:
>>>>> >>>>
>>>>> >>>> +1 For removing old documentation.
>>>>> >>>>
>>>>> >>>> @Thomas: Migration work is in backlog and will be picked up in
>>>>> near time.
>>>>> >>>>
>>>>> >>>> --Mikhail
>>>>> >>>>
>>>>> >>>> Have feedback?
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <th...@apache.org>
>>>>> wrote:
>>>>> >>>>>
>>>>> >>>>> +1 for removing pre 2.0 documentation (as well as the entries
>>>>> from https://beam.apache.org/get-started/downloads/)
>>>>> >>>>>
>>>>> >>>>> Isn't it part of the beam-site changes that we will no longer
>>>>> check in generated documentation into the repository? Those can be
>>>>> generated and deployed independently (when a commit to a branch occurs),
>>>>> such as done in the Apex and Flink projects.
>>>>> >>>>>
>>>>> >>>>> I was told that Scott who was working in the beam-site changes
>>>>> is on leave now and the migration is still pending (see note at
>>>>> https://github.com/apache/beam/tree/master/website). Is anyone else
>>>>> going to pick it up?
>>>>> >>>>>
>>>>> >>>>> Thanks,
>>>>> >>>>> Thomas
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <
>>>>> pabloem@google.com> wrote:
>>>>> >>>>>>
>>>>> >>>>>> Is it worth adding a tag / branch to the repositories every
>>>>> time we make a release, so that people are able to dive in and find the
>>>>> docs?
>>>>> >>>>>> Best
>>>>> >>>>>> -P.
>>>>> >>>>>>
>>>>> >>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <al...@google.com>
>>>>> wrote:
>>>>> >>>>>>>
>>>>> >>>>>>> I would guess that users are still using some of these old
>>>>> releases. It is unclear from Beam website which releases are still
>>>>> supported or not. It probably makes sense to drop documentation for
>>>>> releases < 2.0. (I would suggest keeping docs for 2.0). For the future I
>>>>> can work on updating the Beam website to clarify the state of each release.
>>>>> >>>>>>>
>>>>> >>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <eh...@google.com>
>>>>> wrote:
>>>>> >>>>>>>>
>>>>> >>>>>>>> The older docs are not directly linked to and are in Github
>>>>> commit history.
>>>>> >>>>>>>>
>>>>> >>>>>>>> If there are no objections I'm going to delete javadocs and
>>>>> pydocs for releases older than 1 year,
>>>>> >>>>>>>> meaning 2.0.0 and older (going by the dates here).
>>>>> >>>>>>>>
>>>>> >>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>>>>> danoliveira@google.com> wrote:
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> The older docs should be recorded in the commit history of
>>>>> the website repository, right? If they're not currently used in the website
>>>>> and they're in the commit history then I don't see a reason to save them.
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <eh...@google.com>
>>>>> wrote:
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> Hi all,
>>>>> >>>>>>>>>> I'm writing a PR for apache/beam-site and
>>>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
>>>>> trying to deletes 22k files and then copy 22k files (warning large file).
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> It seems that we could save a lot of time by deleting the
>>>>> older javadoc and pydoc files for older versions. Is there a good reason to
>>>>> keep around this kind of documentation for older versions (say 1 year back)?
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>> --
>>>>> >>>>>> Got feedback? go/pabloem-feedback
>>>>> <https://goto.google.com/pabloem-feedback>
>>>>>
>>>>
>>
>> --
>>
>>
>>
>>
>> Got feedback? tinyurl.com/swegner-feedback
>>
>

Re: Removing documentation for old Beam versions

Posted by Thomas Weise <th...@apache.org>.
Hi Scott,

Thanks for bringing the discussion back here.

I agree that we should separate the changes for hosting of generated
java/pydocs from the rest of website automation so that we can make the
switch and fix the contributor headache soon.

But perhaps we can avoid adding 4m lines of generated code to the main beam
repository (and keep on adding with every release) if we continue to serve
the site from the old beam-site repo? (I left a comment the doc.)

About trying buildbot, as mentioned earlier I would be happy to help with
it. I prefer a setup that keeps the docs separate from the web site.

Thomas


On Fri, Sep 21, 2018 at 10:28 AM Scott Wegner <sc...@apache.org> wrote:

> Re-opening this thread as it came up today in the discussion for PR#6458
> [1]. This PR is part of the work for Beam-Site Automation Reliability
> improvements; design doc here: https://s.apache.org/beam-site-automation
>
> The current plan is to keep generated javadoc/pydoc sources only on the
> asf-site branch, which is necessary for the current githubpubsub publishing
> mechanism. This maintains our current approach, the only change being that
> we're moving the asf-site branch from the retiring apache/beam-site
> repository into a new apache/beam repo branch.
>
> The concern for committing generated content is the extra overhead during
> git fetch. I did some analysis to measure the impact [2], and found that
> fetching a week of source + generated content history from apache/beam-site
> took 0.39 seconds.
>
> I like the idea of publishing javadoc/pydoc snapshots to an external
> location like Flink does with buildbot, but that work is separable and
> shouldn't be a prerequisite for this effort. The goal of this work is to
> improve the reliability of automation for contributing website changes. At
> last measure, only about half of beam-site PR merges use Mergebot without
> experiencing some reliability issue [3].
>
> I've opened BEAM-5459 [4] to track moving our generated docs out of git.
> Thomas, would you have bandwidth to look into this?
>
> [1] https://github.com/apache/beam/pull/6458#issuecomment-423406643
> [2]
> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.uqzivheohd7j
> [3]
> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.a208cwi78xmu
> [4] https://issues.apache.org/jira/browse/BEAM-5459
>
> On Fri, Aug 24, 2018 at 11:48 AM Thomas Weise <th...@apache.org> wrote:
>
>> Hi Udi,
>>
>> Good to know you will continue this work.
>>
>> Let me know if you want to try the buildbot route (which does not require
>> generated documentation to be checked into the repo). Happy to help with
>> that.
>>
>> Thomas
>>
>> On Fri, Aug 24, 2018 at 11:36 AM Udi Meiri <eh...@google.com> wrote:
>>
>>> I'm picking up the website migration. The plan is to not include
>>> generated files in the master branch.
>>>
>>> However, I've been told that even putting generated files a separate
>>> branch could blow up the git repository for all (e.g. make git pulls a lot
>>> longer?).
>>> Not sure if this is a real issue or not.
>>>
>>> On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw <ro...@google.com>
>>> wrote:
>>>
>>>> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise <th...@apache.org> wrote:
>>>> >
>>>> > Yes, I think the separation of generated code will need to occur
>>>> prior to completing the merge and switching the web site to the main repo.
>>>> >
>>>> > There should be no reason to check generated documentation into
>>>> either of the repos/branches.
>>>>
>>>> Huge +1 to this. Thomas, would have time to set something like this up
>>>> for Beam? If not, could anyone else pick this up?
>>>>
>>>> > Please see as an example how this was solved in Flink, using the ASF
>>>> buildbot infrastructure.
>>>> >
>>>> > Documentation per version/release, for example:
>>>> >
>>>> > https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>>>> >
>>>> > The buildbot configuration is here (requires committer access):
>>>> >
>>>> >
>>>> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
>>>> >
>>>> > Thanks,
>>>> > Thomas
>>>> >
>>>> > On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin <mi...@google.com>
>>>> wrote:
>>>> >>
>>>> >> Last time I talked with Scott I brought this idea in. I believe the
>>>> plan was either to publish compiled site to website directly, or keep it in
>>>> separate storage from apache/beam repo.
>>>> >>
>>>> >> One of the main reasons not to check in compiled version of website
>>>> is that every developer will have to pull all the versions of website every
>>>> time they clone repo, which is not that good of an idea to do.
>>>> >>
>>>> >> Regards,
>>>> >> --Mikhail
>>>> >>
>>>> >> Have feedback?
>>>> >>
>>>> >>
>>>> >> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com> wrote:
>>>> >>>
>>>> >>> Pablo, the docs are generated into versioned paths, e.g.,
>>>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags are
>>>> not necessary?
>>>> >>> Also, once apache/beam-site is merged with apache/beam the release
>>>> branch should have the relevant docs (although perhaps it's better to put
>>>> them in a different repo or storage system).
>>>> >>>
>>>> >>> Thomas, I would very much like to not have javadoc/pydoc generation
>>>> be part of the website review process, as it takes up a lot of time when
>>>> changes are staged (10s of thousands of files), especially when a PR is
>>>> updated and existing staged files need to be deleted.
>>>> >>>
>>>> >>>
>>>> >>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <mi...@google.com>
>>>> wrote:
>>>> >>>>
>>>> >>>> +1 For removing old documentation.
>>>> >>>>
>>>> >>>> @Thomas: Migration work is in backlog and will be picked up in
>>>> near time.
>>>> >>>>
>>>> >>>> --Mikhail
>>>> >>>>
>>>> >>>> Have feedback?
>>>> >>>>
>>>> >>>>
>>>> >>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <th...@apache.org>
>>>> wrote:
>>>> >>>>>
>>>> >>>>> +1 for removing pre 2.0 documentation (as well as the entries
>>>> from https://beam.apache.org/get-started/downloads/)
>>>> >>>>>
>>>> >>>>> Isn't it part of the beam-site changes that we will no longer
>>>> check in generated documentation into the repository? Those can be
>>>> generated and deployed independently (when a commit to a branch occurs),
>>>> such as done in the Apex and Flink projects.
>>>> >>>>>
>>>> >>>>> I was told that Scott who was working in the beam-site changes is
>>>> on leave now and the migration is still pending (see note at
>>>> https://github.com/apache/beam/tree/master/website). Is anyone else
>>>> going to pick it up?
>>>> >>>>>
>>>> >>>>> Thanks,
>>>> >>>>> Thomas
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <pa...@google.com>
>>>> wrote:
>>>> >>>>>>
>>>> >>>>>> Is it worth adding a tag / branch to the repositories every time
>>>> we make a release, so that people are able to dive in and find the docs?
>>>> >>>>>> Best
>>>> >>>>>> -P.
>>>> >>>>>>
>>>> >>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <al...@google.com>
>>>> wrote:
>>>> >>>>>>>
>>>> >>>>>>> I would guess that users are still using some of these old
>>>> releases. It is unclear from Beam website which releases are still
>>>> supported or not. It probably makes sense to drop documentation for
>>>> releases < 2.0. (I would suggest keeping docs for 2.0). For the future I
>>>> can work on updating the Beam website to clarify the state of each release.
>>>> >>>>>>>
>>>> >>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <eh...@google.com>
>>>> wrote:
>>>> >>>>>>>>
>>>> >>>>>>>> The older docs are not directly linked to and are in Github
>>>> commit history.
>>>> >>>>>>>>
>>>> >>>>>>>> If there are no objections I'm going to delete javadocs and
>>>> pydocs for releases older than 1 year,
>>>> >>>>>>>> meaning 2.0.0 and older (going by the dates here).
>>>> >>>>>>>>
>>>> >>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>>>> danoliveira@google.com> wrote:
>>>> >>>>>>>>>
>>>> >>>>>>>>> The older docs should be recorded in the commit history of
>>>> the website repository, right? If they're not currently used in the website
>>>> and they're in the commit history then I don't see a reason to save them.
>>>> >>>>>>>>>
>>>> >>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <eh...@google.com>
>>>> wrote:
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Hi all,
>>>> >>>>>>>>>> I'm writing a PR for apache/beam-site and
>>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
>>>> trying to deletes 22k files and then copy 22k files (warning large file).
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> It seems that we could save a lot of time by deleting the
>>>> older javadoc and pydoc files for older versions. Is there a good reason to
>>>> keep around this kind of documentation for older versions (say 1 year back)?
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>> --
>>>> >>>>>> Got feedback? go/pabloem-feedback
>>>> <https://goto.google.com/pabloem-feedback>
>>>>
>>>
>
> --
>
>
>
>
> Got feedback? tinyurl.com/swegner-feedback
>

Re: Removing documentation for old Beam versions

Posted by Scott Wegner <sc...@apache.org>.
Re-opening this thread as it came up today in the discussion for PR#6458
[1]. This PR is part of the work for Beam-Site Automation Reliability
improvements; design doc here: https://s.apache.org/beam-site-automation

The current plan is to keep generated javadoc/pydoc sources only on the
asf-site branch, which is necessary for the current githubpubsub publishing
mechanism. This maintains our current approach, the only change being that
we're moving the asf-site branch from the retiring apache/beam-site
repository into a new apache/beam repo branch.

The concern for committing generated content is the extra overhead during
git fetch. I did some analysis to measure the impact [2], and found that
fetching a week of source + generated content history from apache/beam-site
took 0.39 seconds.

I like the idea of publishing javadoc/pydoc snapshots to an external
location like Flink does with buildbot, but that work is separable and
shouldn't be a prerequisite for this effort. The goal of this work is to
improve the reliability of automation for contributing website changes. At
last measure, only about half of beam-site PR merges use Mergebot without
experiencing some reliability issue [3].

I've opened BEAM-5459 [4] to track moving our generated docs out of git.
Thomas, would you have bandwidth to look into this?

[1] https://github.com/apache/beam/pull/6458#issuecomment-423406643
[2]
https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.uqzivheohd7j
[3]
https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.a208cwi78xmu
[4] https://issues.apache.org/jira/browse/BEAM-5459

On Fri, Aug 24, 2018 at 11:48 AM Thomas Weise <th...@apache.org> wrote:

> Hi Udi,
>
> Good to know you will continue this work.
>
> Let me know if you want to try the buildbot route (which does not require
> generated documentation to be checked into the repo). Happy to help with
> that.
>
> Thomas
>
> On Fri, Aug 24, 2018 at 11:36 AM Udi Meiri <eh...@google.com> wrote:
>
>> I'm picking up the website migration. The plan is to not include
>> generated files in the master branch.
>>
>> However, I've been told that even putting generated files a separate
>> branch could blow up the git repository for all (e.g. make git pulls a lot
>> longer?).
>> Not sure if this is a real issue or not.
>>
>> On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw <ro...@google.com>
>> wrote:
>>
>>> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise <th...@apache.org> wrote:
>>> >
>>> > Yes, I think the separation of generated code will need to occur prior
>>> to completing the merge and switching the web site to the main repo.
>>> >
>>> > There should be no reason to check generated documentation into either
>>> of the repos/branches.
>>>
>>> Huge +1 to this. Thomas, would have time to set something like this up
>>> for Beam? If not, could anyone else pick this up?
>>>
>>> > Please see as an example how this was solved in Flink, using the ASF
>>> buildbot infrastructure.
>>> >
>>> > Documentation per version/release, for example:
>>> >
>>> > https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>>> >
>>> > The buildbot configuration is here (requires committer access):
>>> >
>>> >
>>> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
>>> >
>>> > Thanks,
>>> > Thomas
>>> >
>>> > On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin <mi...@google.com>
>>> wrote:
>>> >>
>>> >> Last time I talked with Scott I brought this idea in. I believe the
>>> plan was either to publish compiled site to website directly, or keep it in
>>> separate storage from apache/beam repo.
>>> >>
>>> >> One of the main reasons not to check in compiled version of website
>>> is that every developer will have to pull all the versions of website every
>>> time they clone repo, which is not that good of an idea to do.
>>> >>
>>> >> Regards,
>>> >> --Mikhail
>>> >>
>>> >> Have feedback?
>>> >>
>>> >>
>>> >> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com> wrote:
>>> >>>
>>> >>> Pablo, the docs are generated into versioned paths, e.g.,
>>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags are
>>> not necessary?
>>> >>> Also, once apache/beam-site is merged with apache/beam the release
>>> branch should have the relevant docs (although perhaps it's better to put
>>> them in a different repo or storage system).
>>> >>>
>>> >>> Thomas, I would very much like to not have javadoc/pydoc generation
>>> be part of the website review process, as it takes up a lot of time when
>>> changes are staged (10s of thousands of files), especially when a PR is
>>> updated and existing staged files need to be deleted.
>>> >>>
>>> >>>
>>> >>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <mi...@google.com>
>>> wrote:
>>> >>>>
>>> >>>> +1 For removing old documentation.
>>> >>>>
>>> >>>> @Thomas: Migration work is in backlog and will be picked up in near
>>> time.
>>> >>>>
>>> >>>> --Mikhail
>>> >>>>
>>> >>>> Have feedback?
>>> >>>>
>>> >>>>
>>> >>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <th...@apache.org>
>>> wrote:
>>> >>>>>
>>> >>>>> +1 for removing pre 2.0 documentation (as well as the entries from
>>> https://beam.apache.org/get-started/downloads/)
>>> >>>>>
>>> >>>>> Isn't it part of the beam-site changes that we will no longer
>>> check in generated documentation into the repository? Those can be
>>> generated and deployed independently (when a commit to a branch occurs),
>>> such as done in the Apex and Flink projects.
>>> >>>>>
>>> >>>>> I was told that Scott who was working in the beam-site changes is
>>> on leave now and the migration is still pending (see note at
>>> https://github.com/apache/beam/tree/master/website). Is anyone else
>>> going to pick it up?
>>> >>>>>
>>> >>>>> Thanks,
>>> >>>>> Thomas
>>> >>>>>
>>> >>>>>
>>> >>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <pa...@google.com>
>>> wrote:
>>> >>>>>>
>>> >>>>>> Is it worth adding a tag / branch to the repositories every time
>>> we make a release, so that people are able to dive in and find the docs?
>>> >>>>>> Best
>>> >>>>>> -P.
>>> >>>>>>
>>> >>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <al...@google.com>
>>> wrote:
>>> >>>>>>>
>>> >>>>>>> I would guess that users are still using some of these old
>>> releases. It is unclear from Beam website which releases are still
>>> supported or not. It probably makes sense to drop documentation for
>>> releases < 2.0. (I would suggest keeping docs for 2.0). For the future I
>>> can work on updating the Beam website to clarify the state of each release.
>>> >>>>>>>
>>> >>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <eh...@google.com>
>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>> The older docs are not directly linked to and are in Github
>>> commit history.
>>> >>>>>>>>
>>> >>>>>>>> If there are no objections I'm going to delete javadocs and
>>> pydocs for releases older than 1 year,
>>> >>>>>>>> meaning 2.0.0 and older (going by the dates here).
>>> >>>>>>>>
>>> >>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>>> danoliveira@google.com> wrote:
>>> >>>>>>>>>
>>> >>>>>>>>> The older docs should be recorded in the commit history of the
>>> website repository, right? If they're not currently used in the website and
>>> they're in the commit history then I don't see a reason to save them.
>>> >>>>>>>>>
>>> >>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <eh...@google.com>
>>> wrote:
>>> >>>>>>>>>>
>>> >>>>>>>>>> Hi all,
>>> >>>>>>>>>> I'm writing a PR for apache/beam-site and
>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
>>> trying to deletes 22k files and then copy 22k files (warning large file).
>>> >>>>>>>>>>
>>> >>>>>>>>>> It seems that we could save a lot of time by deleting the
>>> older javadoc and pydoc files for older versions. Is there a good reason to
>>> keep around this kind of documentation for older versions (say 1 year back)?
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>> --
>>> >>>>>> Got feedback? go/pabloem-feedback
>>> <https://goto.google.com/pabloem-feedback>
>>>
>>

-- 




Got feedback? tinyurl.com/swegner-feedback

Re: Removing documentation for old Beam versions

Posted by Thomas Weise <th...@apache.org>.
Hi Udi,

Good to know you will continue this work.

Let me know if you want to try the buildbot route (which does not require
generated documentation to be checked into the repo). Happy to help with
that.

Thomas

On Fri, Aug 24, 2018 at 11:36 AM Udi Meiri <eh...@google.com> wrote:

> I'm picking up the website migration. The plan is to not include generated
> files in the master branch.
>
> However, I've been told that even putting generated files a separate
> branch could blow up the git repository for all (e.g. make git pulls a lot
> longer?).
> Not sure if this is a real issue or not.
>
> On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise <th...@apache.org> wrote:
>> >
>> > Yes, I think the separation of generated code will need to occur prior
>> to completing the merge and switching the web site to the main repo.
>> >
>> > There should be no reason to check generated documentation into either
>> of the repos/branches.
>>
>> Huge +1 to this. Thomas, would have time to set something like this up
>> for Beam? If not, could anyone else pick this up?
>>
>> > Please see as an example how this was solved in Flink, using the ASF
>> buildbot infrastructure.
>> >
>> > Documentation per version/release, for example:
>> >
>> > https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>> >
>> > The buildbot configuration is here (requires committer access):
>> >
>> >
>> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
>> >
>> > Thanks,
>> > Thomas
>> >
>> > On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin <mi...@google.com>
>> wrote:
>> >>
>> >> Last time I talked with Scott I brought this idea in. I believe the
>> plan was either to publish compiled site to website directly, or keep it in
>> separate storage from apache/beam repo.
>> >>
>> >> One of the main reasons not to check in compiled version of website is
>> that every developer will have to pull all the versions of website every
>> time they clone repo, which is not that good of an idea to do.
>> >>
>> >> Regards,
>> >> --Mikhail
>> >>
>> >> Have feedback?
>> >>
>> >>
>> >> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com> wrote:
>> >>>
>> >>> Pablo, the docs are generated into versioned paths, e.g.,
>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags are
>> not necessary?
>> >>> Also, once apache/beam-site is merged with apache/beam the release
>> branch should have the relevant docs (although perhaps it's better to put
>> them in a different repo or storage system).
>> >>>
>> >>> Thomas, I would very much like to not have javadoc/pydoc generation
>> be part of the website review process, as it takes up a lot of time when
>> changes are staged (10s of thousands of files), especially when a PR is
>> updated and existing staged files need to be deleted.
>> >>>
>> >>>
>> >>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <mi...@google.com>
>> wrote:
>> >>>>
>> >>>> +1 For removing old documentation.
>> >>>>
>> >>>> @Thomas: Migration work is in backlog and will be picked up in near
>> time.
>> >>>>
>> >>>> --Mikhail
>> >>>>
>> >>>> Have feedback?
>> >>>>
>> >>>>
>> >>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <th...@apache.org> wrote:
>> >>>>>
>> >>>>> +1 for removing pre 2.0 documentation (as well as the entries from
>> https://beam.apache.org/get-started/downloads/)
>> >>>>>
>> >>>>> Isn't it part of the beam-site changes that we will no longer check
>> in generated documentation into the repository? Those can be generated and
>> deployed independently (when a commit to a branch occurs), such as done in
>> the Apex and Flink projects.
>> >>>>>
>> >>>>> I was told that Scott who was working in the beam-site changes is
>> on leave now and the migration is still pending (see note at
>> https://github.com/apache/beam/tree/master/website). Is anyone else
>> going to pick it up?
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Thomas
>> >>>>>
>> >>>>>
>> >>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <pa...@google.com>
>> wrote:
>> >>>>>>
>> >>>>>> Is it worth adding a tag / branch to the repositories every time
>> we make a release, so that people are able to dive in and find the docs?
>> >>>>>> Best
>> >>>>>> -P.
>> >>>>>>
>> >>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <al...@google.com>
>> wrote:
>> >>>>>>>
>> >>>>>>> I would guess that users are still using some of these old
>> releases. It is unclear from Beam website which releases are still
>> supported or not. It probably makes sense to drop documentation for
>> releases < 2.0. (I would suggest keeping docs for 2.0). For the future I
>> can work on updating the Beam website to clarify the state of each release.
>> >>>>>>>
>> >>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <eh...@google.com>
>> wrote:
>> >>>>>>>>
>> >>>>>>>> The older docs are not directly linked to and are in Github
>> commit history.
>> >>>>>>>>
>> >>>>>>>> If there are no objections I'm going to delete javadocs and
>> pydocs for releases older than 1 year,
>> >>>>>>>> meaning 2.0.0 and older (going by the dates here).
>> >>>>>>>>
>> >>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>> danoliveira@google.com> wrote:
>> >>>>>>>>>
>> >>>>>>>>> The older docs should be recorded in the commit history of the
>> website repository, right? If they're not currently used in the website and
>> they're in the commit history then I don't see a reason to save them.
>> >>>>>>>>>
>> >>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <eh...@google.com>
>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> Hi all,
>> >>>>>>>>>> I'm writing a PR for apache/beam-site and
>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
>> trying to deletes 22k files and then copy 22k files (warning large file).
>> >>>>>>>>>>
>> >>>>>>>>>> It seems that we could save a lot of time by deleting the
>> older javadoc and pydoc files for older versions. Is there a good reason to
>> keep around this kind of documentation for older versions (say 1 year back)?
>> >>>>>>>
>> >>>>>>>
>> >>>>>> --
>> >>>>>> Got feedback? go/pabloem-feedback
>> <https://goto.google.com/pabloem-feedback>
>>
>

Re: Removing documentation for old Beam versions

Posted by Andrew Pilloud <ap...@google.com>.
Git is really efficient at things it can perform diffs on. Generated source
code tends to be fine as long as it has reasonably short lines. It becomes
an issue when you are checking in binaries, images, and compressed files
(jars for example).

On Fri, Aug 24, 2018 at 11:36 AM Udi Meiri <eh...@google.com> wrote:

> I'm picking up the website migration. The plan is to not include generated
> files in the master branch.
>
> However, I've been told that even putting generated files a separate
> branch could blow up the git repository for all (e.g. make git pulls a lot
> longer?).
> Not sure if this is a real issue or not.
>
> On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise <th...@apache.org> wrote:
>> >
>> > Yes, I think the separation of generated code will need to occur prior
>> to completing the merge and switching the web site to the main repo.
>> >
>> > There should be no reason to check generated documentation into either
>> of the repos/branches.
>>
>> Huge +1 to this. Thomas, would have time to set something like this up
>> for Beam? If not, could anyone else pick this up?
>>
>> > Please see as an example how this was solved in Flink, using the ASF
>> buildbot infrastructure.
>> >
>> > Documentation per version/release, for example:
>> >
>> > https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>> >
>> > The buildbot configuration is here (requires committer access):
>> >
>> >
>> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
>> >
>> > Thanks,
>> > Thomas
>> >
>> > On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin <mi...@google.com>
>> wrote:
>> >>
>> >> Last time I talked with Scott I brought this idea in. I believe the
>> plan was either to publish compiled site to website directly, or keep it in
>> separate storage from apache/beam repo.
>> >>
>> >> One of the main reasons not to check in compiled version of website is
>> that every developer will have to pull all the versions of website every
>> time they clone repo, which is not that good of an idea to do.
>> >>
>> >> Regards,
>> >> --Mikhail
>> >>
>> >> Have feedback?
>> >>
>> >>
>> >> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com> wrote:
>> >>>
>> >>> Pablo, the docs are generated into versioned paths, e.g.,
>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags are
>> not necessary?
>> >>> Also, once apache/beam-site is merged with apache/beam the release
>> branch should have the relevant docs (although perhaps it's better to put
>> them in a different repo or storage system).
>> >>>
>> >>> Thomas, I would very much like to not have javadoc/pydoc generation
>> be part of the website review process, as it takes up a lot of time when
>> changes are staged (10s of thousands of files), especially when a PR is
>> updated and existing staged files need to be deleted.
>> >>>
>> >>>
>> >>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <mi...@google.com>
>> wrote:
>> >>>>
>> >>>> +1 For removing old documentation.
>> >>>>
>> >>>> @Thomas: Migration work is in backlog and will be picked up in near
>> time.
>> >>>>
>> >>>> --Mikhail
>> >>>>
>> >>>> Have feedback?
>> >>>>
>> >>>>
>> >>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <th...@apache.org> wrote:
>> >>>>>
>> >>>>> +1 for removing pre 2.0 documentation (as well as the entries from
>> https://beam.apache.org/get-started/downloads/)
>> >>>>>
>> >>>>> Isn't it part of the beam-site changes that we will no longer check
>> in generated documentation into the repository? Those can be generated and
>> deployed independently (when a commit to a branch occurs), such as done in
>> the Apex and Flink projects.
>> >>>>>
>> >>>>> I was told that Scott who was working in the beam-site changes is
>> on leave now and the migration is still pending (see note at
>> https://github.com/apache/beam/tree/master/website). Is anyone else
>> going to pick it up?
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Thomas
>> >>>>>
>> >>>>>
>> >>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <pa...@google.com>
>> wrote:
>> >>>>>>
>> >>>>>> Is it worth adding a tag / branch to the repositories every time
>> we make a release, so that people are able to dive in and find the docs?
>> >>>>>> Best
>> >>>>>> -P.
>> >>>>>>
>> >>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <al...@google.com>
>> wrote:
>> >>>>>>>
>> >>>>>>> I would guess that users are still using some of these old
>> releases. It is unclear from Beam website which releases are still
>> supported or not. It probably makes sense to drop documentation for
>> releases < 2.0. (I would suggest keeping docs for 2.0). For the future I
>> can work on updating the Beam website to clarify the state of each release.
>> >>>>>>>
>> >>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <eh...@google.com>
>> wrote:
>> >>>>>>>>
>> >>>>>>>> The older docs are not directly linked to and are in Github
>> commit history.
>> >>>>>>>>
>> >>>>>>>> If there are no objections I'm going to delete javadocs and
>> pydocs for releases older than 1 year,
>> >>>>>>>> meaning 2.0.0 and older (going by the dates here).
>> >>>>>>>>
>> >>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>> danoliveira@google.com> wrote:
>> >>>>>>>>>
>> >>>>>>>>> The older docs should be recorded in the commit history of the
>> website repository, right? If they're not currently used in the website and
>> they're in the commit history then I don't see a reason to save them.
>> >>>>>>>>>
>> >>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <eh...@google.com>
>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> Hi all,
>> >>>>>>>>>> I'm writing a PR for apache/beam-site and
>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
>> trying to deletes 22k files and then copy 22k files (warning large file).
>> >>>>>>>>>>
>> >>>>>>>>>> It seems that we could save a lot of time by deleting the
>> older javadoc and pydoc files for older versions. Is there a good reason to
>> keep around this kind of documentation for older versions (say 1 year back)?
>> >>>>>>>
>> >>>>>>>
>> >>>>>> --
>> >>>>>> Got feedback? go/pabloem-feedback
>> <https://goto.google.com/pabloem-feedback>
>>
>

Re: Removing documentation for old Beam versions

Posted by Udi Meiri <eh...@google.com>.
I'm picking up the website migration. The plan is to not include generated
files in the master branch.

However, I've been told that even putting generated files a separate branch
could blow up the git repository for all (e.g. make git pulls a lot
longer?).
Not sure if this is a real issue or not.

On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw <ro...@google.com> wrote:

> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise <th...@apache.org> wrote:
> >
> > Yes, I think the separation of generated code will need to occur prior
> to completing the merge and switching the web site to the main repo.
> >
> > There should be no reason to check generated documentation into either
> of the repos/branches.
>
> Huge +1 to this. Thomas, would have time to set something like this up
> for Beam? If not, could anyone else pick this up?
>
> > Please see as an example how this was solved in Flink, using the ASF
> buildbot infrastructure.
> >
> > Documentation per version/release, for example:
> >
> > https://ci.apache.org/projects/flink/flink-docs-release-1.5/
> >
> > The buildbot configuration is here (requires committer access):
> >
> >
> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
> >
> > Thanks,
> > Thomas
> >
> > On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin <mi...@google.com>
> wrote:
> >>
> >> Last time I talked with Scott I brought this idea in. I believe the
> plan was either to publish compiled site to website directly, or keep it in
> separate storage from apache/beam repo.
> >>
> >> One of the main reasons not to check in compiled version of website is
> that every developer will have to pull all the versions of website every
> time they clone repo, which is not that good of an idea to do.
> >>
> >> Regards,
> >> --Mikhail
> >>
> >> Have feedback?
> >>
> >>
> >> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com> wrote:
> >>>
> >>> Pablo, the docs are generated into versioned paths, e.g.,
> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags are not
> necessary?
> >>> Also, once apache/beam-site is merged with apache/beam the release
> branch should have the relevant docs (although perhaps it's better to put
> them in a different repo or storage system).
> >>>
> >>> Thomas, I would very much like to not have javadoc/pydoc generation be
> part of the website review process, as it takes up a lot of time when
> changes are staged (10s of thousands of files), especially when a PR is
> updated and existing staged files need to be deleted.
> >>>
> >>>
> >>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <mi...@google.com>
> wrote:
> >>>>
> >>>> +1 For removing old documentation.
> >>>>
> >>>> @Thomas: Migration work is in backlog and will be picked up in near
> time.
> >>>>
> >>>> --Mikhail
> >>>>
> >>>> Have feedback?
> >>>>
> >>>>
> >>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <th...@apache.org> wrote:
> >>>>>
> >>>>> +1 for removing pre 2.0 documentation (as well as the entries from
> https://beam.apache.org/get-started/downloads/)
> >>>>>
> >>>>> Isn't it part of the beam-site changes that we will no longer check
> in generated documentation into the repository? Those can be generated and
> deployed independently (when a commit to a branch occurs), such as done in
> the Apex and Flink projects.
> >>>>>
> >>>>> I was told that Scott who was working in the beam-site changes is on
> leave now and the migration is still pending (see note at
> https://github.com/apache/beam/tree/master/website). Is anyone else going
> to pick it up?
> >>>>>
> >>>>> Thanks,
> >>>>> Thomas
> >>>>>
> >>>>>
> >>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <pa...@google.com>
> wrote:
> >>>>>>
> >>>>>> Is it worth adding a tag / branch to the repositories every time we
> make a release, so that people are able to dive in and find the docs?
> >>>>>> Best
> >>>>>> -P.
> >>>>>>
> >>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <al...@google.com>
> wrote:
> >>>>>>>
> >>>>>>> I would guess that users are still using some of these old
> releases. It is unclear from Beam website which releases are still
> supported or not. It probably makes sense to drop documentation for
> releases < 2.0. (I would suggest keeping docs for 2.0). For the future I
> can work on updating the Beam website to clarify the state of each release.
> >>>>>>>
> >>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <eh...@google.com>
> wrote:
> >>>>>>>>
> >>>>>>>> The older docs are not directly linked to and are in Github
> commit history.
> >>>>>>>>
> >>>>>>>> If there are no objections I'm going to delete javadocs and
> pydocs for releases older than 1 year,
> >>>>>>>> meaning 2.0.0 and older (going by the dates here).
> >>>>>>>>
> >>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
> danoliveira@google.com> wrote:
> >>>>>>>>>
> >>>>>>>>> The older docs should be recorded in the commit history of the
> website repository, right? If they're not currently used in the website and
> they're in the commit history then I don't see a reason to save them.
> >>>>>>>>>
> >>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <eh...@google.com>
> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi all,
> >>>>>>>>>> I'm writing a PR for apache/beam-site and
> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
> trying to deletes 22k files and then copy 22k files (warning large file).
> >>>>>>>>>>
> >>>>>>>>>> It seems that we could save a lot of time by deleting the older
> javadoc and pydoc files for older versions. Is there a good reason to keep
> around this kind of documentation for older versions (say 1 year back)?
> >>>>>>>
> >>>>>>>
> >>>>>> --
> >>>>>> Got feedback? go/pabloem-feedback
> <https://goto.google.com/pabloem-feedback>
>

Re: Removing documentation for old Beam versions

Posted by Robert Bradshaw <ro...@google.com>.
On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise <th...@apache.org> wrote:
>
> Yes, I think the separation of generated code will need to occur prior to completing the merge and switching the web site to the main repo.
>
> There should be no reason to check generated documentation into either of the repos/branches.

Huge +1 to this. Thomas, would have time to set something like this up
for Beam? If not, could anyone else pick this up?

> Please see as an example how this was solved in Flink, using the ASF buildbot infrastructure.
>
> Documentation per version/release, for example:
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>
> The buildbot configuration is here (requires committer access):
>
> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
>
> Thanks,
> Thomas
>
> On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin <mi...@google.com> wrote:
>>
>> Last time I talked with Scott I brought this idea in. I believe the plan was either to publish compiled site to website directly, or keep it in separate storage from apache/beam repo.
>>
>> One of the main reasons not to check in compiled version of website is that every developer will have to pull all the versions of website every time they clone repo, which is not that good of an idea to do.
>>
>> Regards,
>> --Mikhail
>>
>> Have feedback?
>>
>>
>> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com> wrote:
>>>
>>> Pablo, the docs are generated into versioned paths, e.g., https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags are not necessary?
>>> Also, once apache/beam-site is merged with apache/beam the release branch should have the relevant docs (although perhaps it's better to put them in a different repo or storage system).
>>>
>>> Thomas, I would very much like to not have javadoc/pydoc generation be part of the website review process, as it takes up a lot of time when changes are staged (10s of thousands of files), especially when a PR is updated and existing staged files need to be deleted.
>>>
>>>
>>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <mi...@google.com> wrote:
>>>>
>>>> +1 For removing old documentation.
>>>>
>>>> @Thomas: Migration work is in backlog and will be picked up in near time.
>>>>
>>>> --Mikhail
>>>>
>>>> Have feedback?
>>>>
>>>>
>>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <th...@apache.org> wrote:
>>>>>
>>>>> +1 for removing pre 2.0 documentation (as well as the entries from https://beam.apache.org/get-started/downloads/)
>>>>>
>>>>> Isn't it part of the beam-site changes that we will no longer check in generated documentation into the repository? Those can be generated and deployed independently (when a commit to a branch occurs), such as done in the Apex and Flink projects.
>>>>>
>>>>> I was told that Scott who was working in the beam-site changes is on leave now and the migration is still pending (see note at https://github.com/apache/beam/tree/master/website). Is anyone else going to pick it up?
>>>>>
>>>>> Thanks,
>>>>> Thomas
>>>>>
>>>>>
>>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <pa...@google.com> wrote:
>>>>>>
>>>>>> Is it worth adding a tag / branch to the repositories every time we make a release, so that people are able to dive in and find the docs?
>>>>>> Best
>>>>>> -P.
>>>>>>
>>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <al...@google.com> wrote:
>>>>>>>
>>>>>>> I would guess that users are still using some of these old releases. It is unclear from Beam website which releases are still supported or not. It probably makes sense to drop documentation for releases < 2.0. (I would suggest keeping docs for 2.0). For the future I can work on updating the Beam website to clarify the state of each release.
>>>>>>>
>>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <eh...@google.com> wrote:
>>>>>>>>
>>>>>>>> The older docs are not directly linked to and are in Github commit history.
>>>>>>>>
>>>>>>>> If there are no objections I'm going to delete javadocs and pydocs for releases older than 1 year,
>>>>>>>> meaning 2.0.0 and older (going by the dates here).
>>>>>>>>
>>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <da...@google.com> wrote:
>>>>>>>>>
>>>>>>>>> The older docs should be recorded in the commit history of the website repository, right? If they're not currently used in the website and they're in the commit history then I don't see a reason to save them.
>>>>>>>>>
>>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <eh...@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>> I'm writing a PR for apache/beam-site and beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's trying to deletes 22k files and then copy 22k files (warning large file).
>>>>>>>>>>
>>>>>>>>>> It seems that we could save a lot of time by deleting the older javadoc and pydoc files for older versions. Is there a good reason to keep around this kind of documentation for older versions (say 1 year back)?
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Got feedback? go/pabloem-feedback

Re: Removing documentation for old Beam versions

Posted by Thomas Weise <th...@apache.org>.
Yes, I think the separation of generated code will need to occur prior to
completing the merge and switching the web site to the main repo.

There should be no reason to check generated documentation into either of
the repos/branches. Please see as an example how this was solved in Flink,
using the ASF buildbot <https://ci.apache.org/buildbot.html> infrastructure.

Documentation per version/release, for example:

https://ci.apache.org/projects/flink/flink-docs-release-1.5/

The buildbot configuration is here (requires committer access):

https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf

Thanks,
Thomas

On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin <mi...@google.com> wrote:

> Last time I talked with Scott I brought this idea in. I believe the plan
> was either to publish compiled site to website directly, or keep it in
> separate storage from apache/beam repo.
>
> One of the main reasons not to check in compiled version of website is
> that every developer will have to pull all the versions of website every
> time they clone repo, which is not that good of an idea to do.
>
> Regards,
> --Mikhail
>
> Have feedback <http://go/migryz-feedback>?
>
>
> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com> wrote:
>
>> Pablo, the docs are generated into versioned paths, e.g.,
>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags are
>> not necessary?
>> Also, once apache/beam-site is merged with apache/beam the release branch
>> should have the relevant docs (although perhaps it's better to put them in
>> a different repo or storage system).
>>
>> Thomas, I would very much like to not have javadoc/pydoc generation be
>> part of the website review process, as it takes up a lot of time when
>> changes are staged (10s of thousands of files), especially when a PR is
>> updated and existing staged files need to be deleted.
>>
>>
>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <mi...@google.com>
>> wrote:
>>
>>> +1 For removing old documentation.
>>>
>>> @Thomas: Migration work is in backlog and will be picked up in near time.
>>>
>>> --Mikhail
>>>
>>> Have feedback <http://go/migryz-feedback>?
>>>
>>>
>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <th...@apache.org> wrote:
>>>
>>>> +1 for removing pre 2.0 documentation (as well as the entries from
>>>> https://beam.apache.org/get-started/downloads/)
>>>>
>>>> Isn't it part of the beam-site changes that we will no longer check in
>>>> generated documentation into the repository? Those can be generated and
>>>> deployed independently (when a commit to a branch occurs), such as done in
>>>> the Apex and Flink projects.
>>>>
>>>> I was told that Scott who was working in the beam-site changes is on
>>>> leave now and the migration is still pending (see note at
>>>> https://github.com/apache/beam/tree/master/website). Is anyone else
>>>> going to pick it up?
>>>>
>>>> Thanks,
>>>> Thomas
>>>>
>>>>
>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <pa...@google.com>
>>>> wrote:
>>>>
>>>>> Is it worth adding a tag / branch to the repositories every time we
>>>>> make a release, so that people are able to dive in and find the docs?
>>>>> Best
>>>>> -P.
>>>>>
>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <al...@google.com> wrote:
>>>>>
>>>>>> I would guess that users are still using some of these old releases.
>>>>>> It is unclear from Beam website which releases are still supported or not.
>>>>>> It probably makes sense to drop documentation for releases < 2.0. (I would
>>>>>> suggest keeping docs for 2.0). For the future I can work on updating the
>>>>>> Beam website to clarify the state of each release.
>>>>>>
>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <eh...@google.com> wrote:
>>>>>>
>>>>>>> The older docs are not directly linked to and are in Github commit
>>>>>>> history.
>>>>>>>
>>>>>>> If there are no objections I'm going to delete javadocs and pydocs
>>>>>>> for releases older than 1 year,
>>>>>>> meaning 2.0.0 and older (going by the dates here
>>>>>>> <https://beam.apache.org/get-started/downloads/>).
>>>>>>>
>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>>>>>>> danoliveira@google.com> wrote:
>>>>>>>
>>>>>>>> The older docs should be recorded in the commit history of the
>>>>>>>> website repository, right? If they're not currently used in the website and
>>>>>>>> they're in the commit history then I don't see a reason to save them.
>>>>>>>>
>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <eh...@google.com> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>> I'm writing a PR for apache/beam-site and
>>>>>>>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
>>>>>>>>> trying to deletes 22k files and then copy 22k files (warning
>>>>>>>>> large file
>>>>>>>>> <https://builds.apache.org/job/beam_PreCommit_Website_Stage/1276/consoleText>
>>>>>>>>> ).
>>>>>>>>>
>>>>>>>>> It seems that we could save a lot of time by deleting the older
>>>>>>>>> javadoc and pydoc files for older versions. Is there a good reason to keep
>>>>>>>>> around this kind of documentation for older versions (say 1 year back)?
>>>>>>>>>
>>>>>>>>
>>>>>> --
>>>>> Got feedback? go/pabloem-feedback
>>>>> <https://goto.google.com/pabloem-feedback>
>>>>>
>>>>

Re: Removing documentation for old Beam versions

Posted by Udi Meiri <eh...@google.com>.
[image: pr-520.png]
(trying that image again)

On Thu, Aug 2, 2018 at 7:00 PM Udi Meiri <eh...@google.com> wrote:

> Alright, created https://github.com/apache/beam-site/pull/520
> [image: pr-520.png]
> Reduces staging upload from 500M down to 270M, and halves the number of
> files from ~22k to 11k.
>
>
>
> On Thu, Aug 2, 2018 at 6:58 PM Pablo Estrada <pa...@google.com> wrote:
>
>> I believe tags will be necessarily because for anyone looking for old
>> docs that have been removed, they will need to browse back in history, not
>> just browse the tree of directories.
>> -P.
>>
>> On Thu, Aug 2, 2018, 6:46 PM Mikhail Gryzykhin <mi...@google.com> wrote:
>>
>>> Last time I talked with Scott I brought this idea in. I believe the plan
>>> was either to publish compiled site to website directly, or keep it in
>>> separate storage from apache/beam repo.
>>>
>>> One of the main reasons not to check in compiled version of website is
>>> that every developer will have to pull all the versions of website every
>>> time they clone repo, which is not that good of an idea to do.
>>>
>>> Regards,
>>> --Mikhail
>>>
>>> Have feedback <http://go/migryz-feedback>?
>>>
>>>
>>> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com> wrote:
>>>
>>>> Pablo, the docs are generated into versioned paths, e.g.,
>>>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags are
>>>> not necessary?
>>>> Also, once apache/beam-site is merged with apache/beam the release
>>>> branch should have the relevant docs (although perhaps it's better to put
>>>> them in a different repo or storage system).
>>>>
>>>> Thomas, I would very much like to not have javadoc/pydoc generation be
>>>> part of the website review process, as it takes up a lot of time when
>>>> changes are staged (10s of thousands of files), especially when a PR is
>>>> updated and existing staged files need to be deleted.
>>>>
>>>>
>>>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <mi...@google.com>
>>>> wrote:
>>>>
>>>>> +1 For removing old documentation.
>>>>>
>>>>> @Thomas: Migration work is in backlog and will be picked up in near
>>>>> time.
>>>>>
>>>>> --Mikhail
>>>>>
>>>>> Have feedback <http://go/migryz-feedback>?
>>>>>
>>>>>
>>>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <th...@apache.org> wrote:
>>>>>
>>>>>> +1 for removing pre 2.0 documentation (as well as the entries from
>>>>>> https://beam.apache.org/get-started/downloads/)
>>>>>>
>>>>>> Isn't it part of the beam-site changes that we will no longer check
>>>>>> in generated documentation into the repository? Those can be generated and
>>>>>> deployed independently (when a commit to a branch occurs), such as done in
>>>>>> the Apex and Flink projects.
>>>>>>
>>>>>> I was told that Scott who was working in the beam-site changes is on
>>>>>> leave now and the migration is still pending (see note at
>>>>>> https://github.com/apache/beam/tree/master/website). Is anyone else
>>>>>> going to pick it up?
>>>>>>
>>>>>> Thanks,
>>>>>> Thomas
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <pa...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Is it worth adding a tag / branch to the repositories every time we
>>>>>>> make a release, so that people are able to dive in and find the docs?
>>>>>>> Best
>>>>>>> -P.
>>>>>>>
>>>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <al...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I would guess that users are still using some of these old
>>>>>>>> releases. It is unclear from Beam website which releases are still
>>>>>>>> supported or not. It probably makes sense to drop documentation for
>>>>>>>> releases < 2.0. (I would suggest keeping docs for 2.0). For the future I
>>>>>>>> can work on updating the Beam website to clarify the state of each release.
>>>>>>>>
>>>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <eh...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> The older docs are not directly linked to and are in Github commit
>>>>>>>>> history.
>>>>>>>>>
>>>>>>>>> If there are no objections I'm going to delete javadocs and pydocs
>>>>>>>>> for releases older than 1 year,
>>>>>>>>> meaning 2.0.0 and older (going by the dates here
>>>>>>>>> <https://beam.apache.org/get-started/downloads/>).
>>>>>>>>>
>>>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>>>>>>>>> danoliveira@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> The older docs should be recorded in the commit history of the
>>>>>>>>>> website repository, right? If they're not currently used in the website and
>>>>>>>>>> they're in the commit history then I don't see a reason to save them.
>>>>>>>>>>
>>>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <eh...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>> I'm writing a PR for apache/beam-site and
>>>>>>>>>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
>>>>>>>>>>> trying to deletes 22k files and then copy 22k files (warning
>>>>>>>>>>> large file
>>>>>>>>>>> <https://builds.apache.org/job/beam_PreCommit_Website_Stage/1276/consoleText>
>>>>>>>>>>> ).
>>>>>>>>>>>
>>>>>>>>>>> It seems that we could save a lot of time by deleting the older
>>>>>>>>>>> javadoc and pydoc files for older versions. Is there a good reason to keep
>>>>>>>>>>> around this kind of documentation for older versions (say 1 year back)?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> --
>>>>>>> Got feedback? go/pabloem-feedback
>>>>>>> <https://goto.google.com/pabloem-feedback>
>>>>>>>
>>>>>> --
>> Got feedback? go/pabloem-feedback
>> <https://goto.google.com/pabloem-feedback>
>>
>

Re: Removing documentation for old Beam versions

Posted by Udi Meiri <eh...@google.com>.
Alright, created https://github.com/apache/beam-site/pull/520
[image: pr-520.png]
Reduces staging upload from 500M down to 270M, and halves the number of
files from ~22k to 11k.



On Thu, Aug 2, 2018 at 6:58 PM Pablo Estrada <pa...@google.com> wrote:

> I believe tags will be necessarily because for anyone looking for old docs
> that have been removed, they will need to browse back in history, not just
> browse the tree of directories.
> -P.
>
> On Thu, Aug 2, 2018, 6:46 PM Mikhail Gryzykhin <mi...@google.com> wrote:
>
>> Last time I talked with Scott I brought this idea in. I believe the plan
>> was either to publish compiled site to website directly, or keep it in
>> separate storage from apache/beam repo.
>>
>> One of the main reasons not to check in compiled version of website is
>> that every developer will have to pull all the versions of website every
>> time they clone repo, which is not that good of an idea to do.
>>
>> Regards,
>> --Mikhail
>>
>> Have feedback <http://go/migryz-feedback>?
>>
>>
>> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com> wrote:
>>
>>> Pablo, the docs are generated into versioned paths, e.g.,
>>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags are
>>> not necessary?
>>> Also, once apache/beam-site is merged with apache/beam the release
>>> branch should have the relevant docs (although perhaps it's better to put
>>> them in a different repo or storage system).
>>>
>>> Thomas, I would very much like to not have javadoc/pydoc generation be
>>> part of the website review process, as it takes up a lot of time when
>>> changes are staged (10s of thousands of files), especially when a PR is
>>> updated and existing staged files need to be deleted.
>>>
>>>
>>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <mi...@google.com>
>>> wrote:
>>>
>>>> +1 For removing old documentation.
>>>>
>>>> @Thomas: Migration work is in backlog and will be picked up in near
>>>> time.
>>>>
>>>> --Mikhail
>>>>
>>>> Have feedback <http://go/migryz-feedback>?
>>>>
>>>>
>>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <th...@apache.org> wrote:
>>>>
>>>>> +1 for removing pre 2.0 documentation (as well as the entries from
>>>>> https://beam.apache.org/get-started/downloads/)
>>>>>
>>>>> Isn't it part of the beam-site changes that we will no longer check in
>>>>> generated documentation into the repository? Those can be generated and
>>>>> deployed independently (when a commit to a branch occurs), such as done in
>>>>> the Apex and Flink projects.
>>>>>
>>>>> I was told that Scott who was working in the beam-site changes is on
>>>>> leave now and the migration is still pending (see note at
>>>>> https://github.com/apache/beam/tree/master/website). Is anyone else
>>>>> going to pick it up?
>>>>>
>>>>> Thanks,
>>>>> Thomas
>>>>>
>>>>>
>>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <pa...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Is it worth adding a tag / branch to the repositories every time we
>>>>>> make a release, so that people are able to dive in and find the docs?
>>>>>> Best
>>>>>> -P.
>>>>>>
>>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <al...@google.com> wrote:
>>>>>>
>>>>>>> I would guess that users are still using some of these old releases.
>>>>>>> It is unclear from Beam website which releases are still supported or not.
>>>>>>> It probably makes sense to drop documentation for releases < 2.0. (I would
>>>>>>> suggest keeping docs for 2.0). For the future I can work on updating the
>>>>>>> Beam website to clarify the state of each release.
>>>>>>>
>>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <eh...@google.com> wrote:
>>>>>>>
>>>>>>>> The older docs are not directly linked to and are in Github commit
>>>>>>>> history.
>>>>>>>>
>>>>>>>> If there are no objections I'm going to delete javadocs and pydocs
>>>>>>>> for releases older than 1 year,
>>>>>>>> meaning 2.0.0 and older (going by the dates here
>>>>>>>> <https://beam.apache.org/get-started/downloads/>).
>>>>>>>>
>>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>>>>>>>> danoliveira@google.com> wrote:
>>>>>>>>
>>>>>>>>> The older docs should be recorded in the commit history of the
>>>>>>>>> website repository, right? If they're not currently used in the website and
>>>>>>>>> they're in the commit history then I don't see a reason to save them.
>>>>>>>>>
>>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <eh...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>> I'm writing a PR for apache/beam-site and
>>>>>>>>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
>>>>>>>>>> trying to deletes 22k files and then copy 22k files (warning
>>>>>>>>>> large file
>>>>>>>>>> <https://builds.apache.org/job/beam_PreCommit_Website_Stage/1276/consoleText>
>>>>>>>>>> ).
>>>>>>>>>>
>>>>>>>>>> It seems that we could save a lot of time by deleting the older
>>>>>>>>>> javadoc and pydoc files for older versions. Is there a good reason to keep
>>>>>>>>>> around this kind of documentation for older versions (say 1 year back)?
>>>>>>>>>>
>>>>>>>>>
>>>>>>> --
>>>>>> Got feedback? go/pabloem-feedback
>>>>>> <https://goto.google.com/pabloem-feedback>
>>>>>>
>>>>> --
> Got feedback? go/pabloem-feedback
> <https://goto.google.com/pabloem-feedback>
>

Re: Removing documentation for old Beam versions

Posted by Pablo Estrada <pa...@google.com>.
I believe tags will be necessarily because for anyone looking for old docs
that have been removed, they will need to browse back in history, not just
browse the tree of directories.
-P.

On Thu, Aug 2, 2018, 6:46 PM Mikhail Gryzykhin <mi...@google.com> wrote:

> Last time I talked with Scott I brought this idea in. I believe the plan
> was either to publish compiled site to website directly, or keep it in
> separate storage from apache/beam repo.
>
> One of the main reasons not to check in compiled version of website is
> that every developer will have to pull all the versions of website every
> time they clone repo, which is not that good of an idea to do.
>
> Regards,
> --Mikhail
>
> Have feedback <http://go/migryz-feedback>?
>
>
> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com> wrote:
>
>> Pablo, the docs are generated into versioned paths, e.g.,
>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags are
>> not necessary?
>> Also, once apache/beam-site is merged with apache/beam the release branch
>> should have the relevant docs (although perhaps it's better to put them in
>> a different repo or storage system).
>>
>> Thomas, I would very much like to not have javadoc/pydoc generation be
>> part of the website review process, as it takes up a lot of time when
>> changes are staged (10s of thousands of files), especially when a PR is
>> updated and existing staged files need to be deleted.
>>
>>
>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <mi...@google.com>
>> wrote:
>>
>>> +1 For removing old documentation.
>>>
>>> @Thomas: Migration work is in backlog and will be picked up in near time.
>>>
>>> --Mikhail
>>>
>>> Have feedback <http://go/migryz-feedback>?
>>>
>>>
>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <th...@apache.org> wrote:
>>>
>>>> +1 for removing pre 2.0 documentation (as well as the entries from
>>>> https://beam.apache.org/get-started/downloads/)
>>>>
>>>> Isn't it part of the beam-site changes that we will no longer check in
>>>> generated documentation into the repository? Those can be generated and
>>>> deployed independently (when a commit to a branch occurs), such as done in
>>>> the Apex and Flink projects.
>>>>
>>>> I was told that Scott who was working in the beam-site changes is on
>>>> leave now and the migration is still pending (see note at
>>>> https://github.com/apache/beam/tree/master/website). Is anyone else
>>>> going to pick it up?
>>>>
>>>> Thanks,
>>>> Thomas
>>>>
>>>>
>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <pa...@google.com>
>>>> wrote:
>>>>
>>>>> Is it worth adding a tag / branch to the repositories every time we
>>>>> make a release, so that people are able to dive in and find the docs?
>>>>> Best
>>>>> -P.
>>>>>
>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <al...@google.com> wrote:
>>>>>
>>>>>> I would guess that users are still using some of these old releases.
>>>>>> It is unclear from Beam website which releases are still supported or not.
>>>>>> It probably makes sense to drop documentation for releases < 2.0. (I would
>>>>>> suggest keeping docs for 2.0). For the future I can work on updating the
>>>>>> Beam website to clarify the state of each release.
>>>>>>
>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <eh...@google.com> wrote:
>>>>>>
>>>>>>> The older docs are not directly linked to and are in Github commit
>>>>>>> history.
>>>>>>>
>>>>>>> If there are no objections I'm going to delete javadocs and pydocs
>>>>>>> for releases older than 1 year,
>>>>>>> meaning 2.0.0 and older (going by the dates here
>>>>>>> <https://beam.apache.org/get-started/downloads/>).
>>>>>>>
>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>>>>>>> danoliveira@google.com> wrote:
>>>>>>>
>>>>>>>> The older docs should be recorded in the commit history of the
>>>>>>>> website repository, right? If they're not currently used in the website and
>>>>>>>> they're in the commit history then I don't see a reason to save them.
>>>>>>>>
>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <eh...@google.com> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>> I'm writing a PR for apache/beam-site and
>>>>>>>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
>>>>>>>>> trying to deletes 22k files and then copy 22k files (warning
>>>>>>>>> large file
>>>>>>>>> <https://builds.apache.org/job/beam_PreCommit_Website_Stage/1276/consoleText>
>>>>>>>>> ).
>>>>>>>>>
>>>>>>>>> It seems that we could save a lot of time by deleting the older
>>>>>>>>> javadoc and pydoc files for older versions. Is there a good reason to keep
>>>>>>>>> around this kind of documentation for older versions (say 1 year back)?
>>>>>>>>>
>>>>>>>>
>>>>>> --
>>>>> Got feedback? go/pabloem-feedback
>>>>> <https://goto.google.com/pabloem-feedback>
>>>>>
>>>> --
Got feedback? go/pabloem-feedback

Re: Removing documentation for old Beam versions

Posted by Mikhail Gryzykhin <mi...@google.com>.
Last time I talked with Scott I brought this idea in. I believe the plan
was either to publish compiled site to website directly, or keep it in
separate storage from apache/beam repo.

One of the main reasons not to check in compiled version of website is that
every developer will have to pull all the versions of website every time
they clone repo, which is not that good of an idea to do.

Regards,
--Mikhail

Have feedback <http://go/migryz-feedback>?


On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com> wrote:

> Pablo, the docs are generated into versioned paths, e.g.,
> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags are not
> necessary?
> Also, once apache/beam-site is merged with apache/beam the release branch
> should have the relevant docs (although perhaps it's better to put them in
> a different repo or storage system).
>
> Thomas, I would very much like to not have javadoc/pydoc generation be
> part of the website review process, as it takes up a lot of time when
> changes are staged (10s of thousands of files), especially when a PR is
> updated and existing staged files need to be deleted.
>
>
> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <mi...@google.com>
> wrote:
>
>> +1 For removing old documentation.
>>
>> @Thomas: Migration work is in backlog and will be picked up in near time.
>>
>> --Mikhail
>>
>> Have feedback <http://go/migryz-feedback>?
>>
>>
>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <th...@apache.org> wrote:
>>
>>> +1 for removing pre 2.0 documentation (as well as the entries from
>>> https://beam.apache.org/get-started/downloads/)
>>>
>>> Isn't it part of the beam-site changes that we will no longer check in
>>> generated documentation into the repository? Those can be generated and
>>> deployed independently (when a commit to a branch occurs), such as done in
>>> the Apex and Flink projects.
>>>
>>> I was told that Scott who was working in the beam-site changes is on
>>> leave now and the migration is still pending (see note at
>>> https://github.com/apache/beam/tree/master/website). Is anyone else
>>> going to pick it up?
>>>
>>> Thanks,
>>> Thomas
>>>
>>>
>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <pa...@google.com>
>>> wrote:
>>>
>>>> Is it worth adding a tag / branch to the repositories every time we
>>>> make a release, so that people are able to dive in and find the docs?
>>>> Best
>>>> -P.
>>>>
>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <al...@google.com> wrote:
>>>>
>>>>> I would guess that users are still using some of these old releases.
>>>>> It is unclear from Beam website which releases are still supported or not.
>>>>> It probably makes sense to drop documentation for releases < 2.0. (I would
>>>>> suggest keeping docs for 2.0). For the future I can work on updating the
>>>>> Beam website to clarify the state of each release.
>>>>>
>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <eh...@google.com> wrote:
>>>>>
>>>>>> The older docs are not directly linked to and are in Github commit
>>>>>> history.
>>>>>>
>>>>>> If there are no objections I'm going to delete javadocs and pydocs
>>>>>> for releases older than 1 year,
>>>>>> meaning 2.0.0 and older (going by the dates here
>>>>>> <https://beam.apache.org/get-started/downloads/>).
>>>>>>
>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>>>>>> danoliveira@google.com> wrote:
>>>>>>
>>>>>>> The older docs should be recorded in the commit history of the
>>>>>>> website repository, right? If they're not currently used in the website and
>>>>>>> they're in the commit history then I don't see a reason to save them.
>>>>>>>
>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <eh...@google.com> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>> I'm writing a PR for apache/beam-site and
>>>>>>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
>>>>>>>> trying to deletes 22k files and then copy 22k files (warning large
>>>>>>>> file
>>>>>>>> <https://builds.apache.org/job/beam_PreCommit_Website_Stage/1276/consoleText>
>>>>>>>> ).
>>>>>>>>
>>>>>>>> It seems that we could save a lot of time by deleting the older
>>>>>>>> javadoc and pydoc files for older versions. Is there a good reason to keep
>>>>>>>> around this kind of documentation for older versions (say 1 year back)?
>>>>>>>>
>>>>>>>
>>>>> --
>>>> Got feedback? go/pabloem-feedback
>>>> <https://goto.google.com/pabloem-feedback>
>>>>
>>>

Re: Removing documentation for old Beam versions

Posted by Udi Meiri <eh...@google.com>.
Pablo, the docs are generated into versioned paths, e.g.,
https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags are not
necessary?
Also, once apache/beam-site is merged with apache/beam the release branch
should have the relevant docs (although perhaps it's better to put them in
a different repo or storage system).

Thomas, I would very much like to not have javadoc/pydoc generation be part
of the website review process, as it takes up a lot of time when changes
are staged (10s of thousands of files), especially when a PR is updated and
existing staged files need to be deleted.


On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <mi...@google.com> wrote:

> +1 For removing old documentation.
>
> @Thomas: Migration work is in backlog and will be picked up in near time.
>
> --Mikhail
>
> Have feedback <http://go/migryz-feedback>?
>
>
> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <th...@apache.org> wrote:
>
>> +1 for removing pre 2.0 documentation (as well as the entries from
>> https://beam.apache.org/get-started/downloads/)
>>
>> Isn't it part of the beam-site changes that we will no longer check in
>> generated documentation into the repository? Those can be generated and
>> deployed independently (when a commit to a branch occurs), such as done in
>> the Apex and Flink projects.
>>
>> I was told that Scott who was working in the beam-site changes is on
>> leave now and the migration is still pending (see note at
>> https://github.com/apache/beam/tree/master/website). Is anyone else
>> going to pick it up?
>>
>> Thanks,
>> Thomas
>>
>>
>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <pa...@google.com> wrote:
>>
>>> Is it worth adding a tag / branch to the repositories every time we make
>>> a release, so that people are able to dive in and find the docs?
>>> Best
>>> -P.
>>>
>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <al...@google.com> wrote:
>>>
>>>> I would guess that users are still using some of these old releases. It
>>>> is unclear from Beam website which releases are still supported or not. It
>>>> probably makes sense to drop documentation for releases < 2.0. (I would
>>>> suggest keeping docs for 2.0). For the future I can work on updating the
>>>> Beam website to clarify the state of each release.
>>>>
>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <eh...@google.com> wrote:
>>>>
>>>>> The older docs are not directly linked to and are in Github commit
>>>>> history.
>>>>>
>>>>> If there are no objections I'm going to delete javadocs and pydocs for
>>>>> releases older than 1 year,
>>>>> meaning 2.0.0 and older (going by the dates here
>>>>> <https://beam.apache.org/get-started/downloads/>).
>>>>>
>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>>>>> danoliveira@google.com> wrote:
>>>>>
>>>>>> The older docs should be recorded in the commit history of the
>>>>>> website repository, right? If they're not currently used in the website and
>>>>>> they're in the commit history then I don't see a reason to save them.
>>>>>>
>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <eh...@google.com> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>> I'm writing a PR for apache/beam-site and
>>>>>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
>>>>>>> trying to deletes 22k files and then copy 22k files (warning large
>>>>>>> file
>>>>>>> <https://builds.apache.org/job/beam_PreCommit_Website_Stage/1276/consoleText>
>>>>>>> ).
>>>>>>>
>>>>>>> It seems that we could save a lot of time by deleting the older
>>>>>>> javadoc and pydoc files for older versions. Is there a good reason to keep
>>>>>>> around this kind of documentation for older versions (say 1 year back)?
>>>>>>>
>>>>>>
>>>> --
>>> Got feedback? go/pabloem-feedback
>>> <https://goto.google.com/pabloem-feedback>
>>>
>>

Re: Removing documentation for old Beam versions

Posted by Mikhail Gryzykhin <mi...@google.com>.
+1 For removing old documentation.

@Thomas: Migration work is in backlog and will be picked up in near time.

--Mikhail

Have feedback <http://go/migryz-feedback>?


On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <th...@apache.org> wrote:

> +1 for removing pre 2.0 documentation (as well as the entries from
> https://beam.apache.org/get-started/downloads/)
>
> Isn't it part of the beam-site changes that we will no longer check in
> generated documentation into the repository? Those can be generated and
> deployed independently (when a commit to a branch occurs), such as done in
> the Apex and Flink projects.
>
> I was told that Scott who was working in the beam-site changes is on leave
> now and the migration is still pending (see note at
> https://github.com/apache/beam/tree/master/website). Is anyone else going
> to pick it up?
>
> Thanks,
> Thomas
>
>
> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <pa...@google.com> wrote:
>
>> Is it worth adding a tag / branch to the repositories every time we make
>> a release, so that people are able to dive in and find the docs?
>> Best
>> -P.
>>
>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <al...@google.com> wrote:
>>
>>> I would guess that users are still using some of these old releases. It
>>> is unclear from Beam website which releases are still supported or not. It
>>> probably makes sense to drop documentation for releases < 2.0. (I would
>>> suggest keeping docs for 2.0). For the future I can work on updating the
>>> Beam website to clarify the state of each release.
>>>
>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <eh...@google.com> wrote:
>>>
>>>> The older docs are not directly linked to and are in Github commit
>>>> history.
>>>>
>>>> If there are no objections I'm going to delete javadocs and pydocs for
>>>> releases older than 1 year,
>>>> meaning 2.0.0 and older (going by the dates here
>>>> <https://beam.apache.org/get-started/downloads/>).
>>>>
>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <da...@google.com>
>>>> wrote:
>>>>
>>>>> The older docs should be recorded in the commit history of the website
>>>>> repository, right? If they're not currently used in the website and they're
>>>>> in the commit history then I don't see a reason to save them.
>>>>>
>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <eh...@google.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>> I'm writing a PR for apache/beam-site and
>>>>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
>>>>>> trying to deletes 22k files and then copy 22k files (warning large
>>>>>> file
>>>>>> <https://builds.apache.org/job/beam_PreCommit_Website_Stage/1276/consoleText>
>>>>>> ).
>>>>>>
>>>>>> It seems that we could save a lot of time by deleting the older
>>>>>> javadoc and pydoc files for older versions. Is there a good reason to keep
>>>>>> around this kind of documentation for older versions (say 1 year back)?
>>>>>>
>>>>>
>>> --
>> Got feedback? go/pabloem-feedback
>> <https://goto.google.com/pabloem-feedback>
>>
>

Re: Removing documentation for old Beam versions

Posted by Thomas Weise <th...@apache.org>.
+1 for removing pre 2.0 documentation (as well as the entries from
https://beam.apache.org/get-started/downloads/)

Isn't it part of the beam-site changes that we will no longer check in
generated documentation into the repository? Those can be generated and
deployed independently (when a commit to a branch occurs), such as done in
the Apex and Flink projects.

I was told that Scott who was working in the beam-site changes is on leave
now and the migration is still pending (see note at
https://github.com/apache/beam/tree/master/website). Is anyone else going
to pick it up?

Thanks,
Thomas


On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <pa...@google.com> wrote:

> Is it worth adding a tag / branch to the repositories every time we make a
> release, so that people are able to dive in and find the docs?
> Best
> -P.
>
> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <al...@google.com> wrote:
>
>> I would guess that users are still using some of these old releases. It
>> is unclear from Beam website which releases are still supported or not. It
>> probably makes sense to drop documentation for releases < 2.0. (I would
>> suggest keeping docs for 2.0). For the future I can work on updating the
>> Beam website to clarify the state of each release.
>>
>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <eh...@google.com> wrote:
>>
>>> The older docs are not directly linked to and are in Github commit
>>> history.
>>>
>>> If there are no objections I'm going to delete javadocs and pydocs for
>>> releases older than 1 year,
>>> meaning 2.0.0 and older (going by the dates here
>>> <https://beam.apache.org/get-started/downloads/>).
>>>
>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <da...@google.com>
>>> wrote:
>>>
>>>> The older docs should be recorded in the commit history of the website
>>>> repository, right? If they're not currently used in the website and they're
>>>> in the commit history then I don't see a reason to save them.
>>>>
>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <eh...@google.com> wrote:
>>>>
>>>>> Hi all,
>>>>> I'm writing a PR for apache/beam-site and beam_PreCommit_Website_Stage
>>>>> is timing out after 100 minutes, because it's trying to deletes 22k files
>>>>> and then copy 22k files (warning large file
>>>>> <https://builds.apache.org/job/beam_PreCommit_Website_Stage/1276/consoleText>
>>>>> ).
>>>>>
>>>>> It seems that we could save a lot of time by deleting the older
>>>>> javadoc and pydoc files for older versions. Is there a good reason to keep
>>>>> around this kind of documentation for older versions (say 1 year back)?
>>>>>
>>>>
>> --
> Got feedback? go/pabloem-feedback
>

Re: Removing documentation for old Beam versions

Posted by Pablo Estrada <pa...@google.com>.
Is it worth adding a tag / branch to the repositories every time we make a
release, so that people are able to dive in and find the docs?
Best
-P.

On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <al...@google.com> wrote:

> I would guess that users are still using some of these old releases. It is
> unclear from Beam website which releases are still supported or not. It
> probably makes sense to drop documentation for releases < 2.0. (I would
> suggest keeping docs for 2.0). For the future I can work on updating the
> Beam website to clarify the state of each release.
>
> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <eh...@google.com> wrote:
>
>> The older docs are not directly linked to and are in Github commit
>> history.
>>
>> If there are no objections I'm going to delete javadocs and pydocs for
>> releases older than 1 year,
>> meaning 2.0.0 and older (going by the dates here
>> <https://beam.apache.org/get-started/downloads/>).
>>
>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <da...@google.com>
>> wrote:
>>
>>> The older docs should be recorded in the commit history of the website
>>> repository, right? If they're not currently used in the website and they're
>>> in the commit history then I don't see a reason to save them.
>>>
>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <eh...@google.com> wrote:
>>>
>>>> Hi all,
>>>> I'm writing a PR for apache/beam-site and beam_PreCommit_Website_Stage
>>>> is timing out after 100 minutes, because it's trying to deletes 22k files
>>>> and then copy 22k files (warning large file
>>>> <https://builds.apache.org/job/beam_PreCommit_Website_Stage/1276/consoleText>
>>>> ).
>>>>
>>>> It seems that we could save a lot of time by deleting the older javadoc
>>>> and pydoc files for older versions. Is there a good reason to keep around
>>>> this kind of documentation for older versions (say 1 year back)?
>>>>
>>>
> --
Got feedback? go/pabloem-feedback

Re: Removing documentation for old Beam versions

Posted by Ahmet Altay <al...@google.com>.
I would guess that users are still using some of these old releases. It is
unclear from Beam website which releases are still supported or not. It
probably makes sense to drop documentation for releases < 2.0. (I would
suggest keeping docs for 2.0). For the future I can work on updating the
Beam website to clarify the state of each release.

On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <eh...@google.com> wrote:

> The older docs are not directly linked to and are in Github commit history.
>
> If there are no objections I'm going to delete javadocs and pydocs for
> releases older than 1 year,
> meaning 2.0.0 and older (going by the dates here
> <https://beam.apache.org/get-started/downloads/>).
>
> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <da...@google.com>
> wrote:
>
>> The older docs should be recorded in the commit history of the website
>> repository, right? If they're not currently used in the website and they're
>> in the commit history then I don't see a reason to save them.
>>
>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <eh...@google.com> wrote:
>>
>>> Hi all,
>>> I'm writing a PR for apache/beam-site and beam_PreCommit_Website_Stage
>>> is timing out after 100 minutes, because it's trying to deletes 22k files
>>> and then copy 22k files (warning large file
>>> <https://builds.apache.org/job/beam_PreCommit_Website_Stage/1276/consoleText>
>>> ).
>>>
>>> It seems that we could save a lot of time by deleting the older javadoc
>>> and pydoc files for older versions. Is there a good reason to keep around
>>> this kind of documentation for older versions (say 1 year back)?
>>>
>>

Re: Removing documentation for old Beam versions

Posted by Udi Meiri <eh...@google.com>.
The older docs are not directly linked to and are in Github commit history.

If there are no objections I'm going to delete javadocs and pydocs for
releases older than 1 year,
meaning 2.0.0 and older (going by the dates here
<https://beam.apache.org/get-started/downloads/>).

On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <da...@google.com>
wrote:

> The older docs should be recorded in the commit history of the website
> repository, right? If they're not currently used in the website and they're
> in the commit history then I don't see a reason to save them.
>
> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <eh...@google.com> wrote:
>
>> Hi all,
>> I'm writing a PR for apache/beam-site and beam_PreCommit_Website_Stage is
>> timing out after 100 minutes, because it's trying to deletes 22k files and
>> then copy 22k files (warning large file
>> <https://builds.apache.org/job/beam_PreCommit_Website_Stage/1276/consoleText>
>> ).
>>
>> It seems that we could save a lot of time by deleting the older javadoc
>> and pydoc files for older versions. Is there a good reason to keep around
>> this kind of documentation for older versions (say 1 year back)?
>>
>