You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Michał Walenia <mi...@polidea.com> on 2020/03/05 10:46:56 UTC

No space left on device - beam-jenkins 1 and 7

Hi there,
it seems we have a problem with Jenkins workers again. Nodes 1 and 7 both
fail jobs with "No space left on device".
Who is the best person to contact in these cases (someone with access
permissions to the workers).

I also noticed that such errors are becoming more and more frequent
recently and I'd like to discuss how can this be remedied. Can a cleanup
task be automated on Jenkins somehow?

Regards
Michal

-- 

Michał Walenia
Polidea <https://www.polidea.com/> | Software Engineer

M: +48 791 432 002 <+48791432002>
E: michal.walenia@polidea.com

Unique Tech
Check out our projects! <https://www.polidea.com/our-work>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Robert Bradshaw <ro...@google.com>.
I prefer attempting B first as well.

Can we also ensure we're setting TMPDIR (or similar) to avoid writing to
/tmp?

On Mon, Jul 27, 2020 at 5:38 PM Tyson Hamilton <ty...@google.com> wrote:

> Here is a summery of how I understand things,
>
>   - /tmp and /var/lib/docker are the culprit for filling up disks
>   - inventory Jenkins job runs every 12 hours and runs a docker prune to
> clean up images older than 24hr
>   - crontab on each machine cleans up /tmp files older than three days
> weekly
>
> This doesn't seem to be working since we're still running out of disk
> periodically and requiring manual intervention. Knobs and options we have
> available:
>
>   1. increase frequency of deleting files
>   2. decrease the number of days required to delete a file (e.g. older
> than 2 days)
>
> The execution methods we have available are:
>
>   A. cron
>     - pro: runs even if a job gets stuck in Jenkins due to full disk
>     - con: config baked into VM which is tough to update, not discoverable
> or documented well
>   B. inventory job
>     - pro: easy to update, runs every 12h already
>     - con: could get stuck if Jenkins agent runs out of disk or is
> otherwise stuck, tied to all other inventory job frequency
>   C. configure startup scripts for the VMs that set up the cron job
> anytime the VM is restarted
>     - pro: similar to A. and easy to update
>     - con: similar to A.
>
> Between the three I prefer B. because it is consistent with other
> inventory jobs. If it ends up that stuck jobs prohibit scheduling of the
> inventory job often we could further investigate C to avoid having to
> rebuild the VM images repeatedly.
>
> Any objections or comments? If not, we'll go forward with B. and reduce
> the date check from 3 days to 2 days.
>
>
> On 2020/07/24 20:13:29, Ahmet Altay <al...@google.com> wrote:
> > Tests may not be doing docker cleanup. Inventory job runs a docker prune
> > every 12 hours for images older than 24 hrs [1]. Randomly looking at one
> of
> > the recent runs [2], it cleaned up a long list of containers consuming
> > 30+GB space. That should be just 12 hours worth of containers.
> >
> > [1]
> >
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L69
> > [2]
> >
> https://ci-beam.apache.org/job/beam_Inventory_apache-beam-jenkins-14/501/console
> >
> > On Fri, Jul 24, 2020 at 1:07 PM Tyson Hamilton <ty...@google.com>
> wrote:
> >
> > > Yes, these are on the same volume in the /var/lib/docker directory. I'm
> > > unsure if they clean up leftover images.
> > >
> > > On Fri, Jul 24, 2020 at 12:52 PM Udi Meiri <eh...@google.com> wrote:
> > >
> > >> I forgot Docker images:
> > >>
> > >> ehudm@apache-ci-beam-jenkins-3:~$ sudo docker system df
> > >> TYPE                TOTAL               ACTIVE              SIZE
> > >>        RECLAIMABLE
> > >> Images              88                  9                   125.4GB
> > >>       124.2GB (99%)
> > >> Containers          40                  4                   7.927GB
> > >>       7.871GB (99%)
> > >> Local Volumes       47                  0                   3.165GB
> > >>       3.165GB (100%)
> > >> Build Cache         0                   0                   0B
> > >>        0B
> > >>
> > >> There are about 90 images on that machine, with all but 1 less than 48
> > >> hours old.
> > >> I think the docker test jobs need to try harder at cleaning up their
> > >> leftover images. (assuming they're already doing it?)
> > >>
> > >> On Fri, Jul 24, 2020 at 12:31 PM Udi Meiri <eh...@google.com> wrote:
> > >>
> > >>> The additional slots (@3 directories) take up even more space now
> than
> > >>> before.
> > >>>
> > >>> I'm testing out https://github.com/apache/beam/pull/12326 which
> could
> > >>> help by cleaning up workspaces after a run (just started a seed job).
> > >>>
> > >>> On Fri, Jul 24, 2020 at 12:13 PM Tyson Hamilton <ty...@google.com>
> > >>> wrote:
> > >>>
> > >>>> 664M    beam_PreCommit_JavaPortabilityApi_Commit
> > >>>> 656M    beam_PreCommit_JavaPortabilityApi_Commit@2
> > >>>> 611M    beam_PreCommit_JavaPortabilityApi_Cron
> > >>>> 616M    beam_PreCommit_JavaPortabilityApiJava11_Commit
> > >>>> 598M    beam_PreCommit_JavaPortabilityApiJava11_Commit@2
> > >>>> 662M    beam_PreCommit_JavaPortabilityApiJava11_Cron
> > >>>> 2.9G    beam_PreCommit_Portable_Python_Commit
> > >>>> 2.9G    beam_PreCommit_Portable_Python_Commit@2
> > >>>> 1.7G    beam_PreCommit_Portable_Python_Commit@3
> > >>>> 3.4G    beam_PreCommit_Portable_Python_Cron
> > >>>> 1.9G    beam_PreCommit_Python2_PVR_Flink_Commit
> > >>>> 1.4G    beam_PreCommit_Python2_PVR_Flink_Cron
> > >>>> 1.3G    beam_PreCommit_Python2_PVR_Flink_Phrase
> > >>>> 6.2G    beam_PreCommit_Python_Commit
> > >>>> 7.5G    beam_PreCommit_Python_Commit@2
> > >>>> 7.5G    beam_PreCommit_Python_Cron
> > >>>> 1012M   beam_PreCommit_PythonDocker_Commit
> > >>>> 1011M   beam_PreCommit_PythonDocker_Commit@2
> > >>>> 1011M   beam_PreCommit_PythonDocker_Commit@3
> > >>>> 1002M   beam_PreCommit_PythonDocker_Cron
> > >>>> 877M    beam_PreCommit_PythonFormatter_Commit
> > >>>> 988M    beam_PreCommit_PythonFormatter_Cron
> > >>>> 986M    beam_PreCommit_PythonFormatter_Phrase
> > >>>> 1.7G    beam_PreCommit_PythonLint_Commit
> > >>>> 2.1G    beam_PreCommit_PythonLint_Cron
> > >>>> 7.5G    beam_PreCommit_Python_Phrase
> > >>>> 346M    beam_PreCommit_RAT_Commit
> > >>>> 341M    beam_PreCommit_RAT_Cron
> > >>>> 338M    beam_PreCommit_Spotless_Commit
> > >>>> 339M    beam_PreCommit_Spotless_Cron
> > >>>> 5.5G    beam_PreCommit_SQL_Commit
> > >>>> 5.5G    beam_PreCommit_SQL_Cron
> > >>>> 5.5G    beam_PreCommit_SQL_Java11_Commit
> > >>>> 750M    beam_PreCommit_Website_Commit
> > >>>> 750M    beam_PreCommit_Website_Commit@2
> > >>>> 750M    beam_PreCommit_Website_Cron
> > >>>> 764M    beam_PreCommit_Website_Stage_GCS_Commit
> > >>>> 771M    beam_PreCommit_Website_Stage_GCS_Cron
> > >>>> 336M    beam_Prober_CommunityMetrics
> > >>>> 693M    beam_python_mongoio_load_test
> > >>>> 339M    beam_SeedJob
> > >>>> 333M    beam_SeedJob_Standalone
> > >>>> 334M    beam_sonarqube_report
> > >>>> 556M    beam_SQLBigQueryIO_Batch_Performance_Test_Java
> > >>>> 175G    total
> > >>>>
> > >>>> On Fri, Jul 24, 2020 at 12:04 PM Tyson Hamilton <tysonjh@google.com
> >
> > >>>> wrote:
> > >>>>
> > >>>>> Ya looks like something in the workspaces is taking up room:
> > >>>>>
> > >>>>> @apache-ci-beam-jenkins-8:/home/jenkins$ sudo du -shc .
> > >>>>> 191G    .
> > >>>>> 191G    total
> > >>>>>
> > >>>>>
> > >>>>> On Fri, Jul 24, 2020 at 11:44 AM Tyson Hamilton <
> tysonjh@google.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Node 8 is also full. The partition that /tmp is on is here:
> > >>>>>>
> > >>>>>> Filesystem      Size  Used Avail Use% Mounted on
> > >>>>>> /dev/sda1       485G  482G  2.9G 100% /
> > >>>>>>
> > >>>>>> however after cleaning up tmp with the crontab command, there is
> only
> > >>>>>> 8G usage yet it still remains 100% full:
> > >>>>>>
> > >>>>>> @apache-ci-beam-jenkins-8:/tmp$ sudo du -shc /tmp
> > >>>>>> 8.0G    /tmp
> > >>>>>> 8.0G    total
> > >>>>>>
> > >>>>>> The workspaces are in the /home/jenkins/jenkins-slave/workspace
> > >>>>>> directory. When I run a du on that, it takes really long. I'll
> let it keep
> > >>>>>> running for a while to see if it ever returns a result but so far
> this
> > >>>>>> seems suspect.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> On Fri, Jul 24, 2020 at 11:19 AM Tyson Hamilton <
> tysonjh@google.com>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Everything I've been looking at is in the /tmp dir. Where are the
> > >>>>>>> workspaces, or what are the named?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Fri, Jul 24, 2020 at 11:03 AM Udi Meiri <eh...@google.com>
> wrote:
> > >>>>>>>
> > >>>>>>>> I'm curious to what you find. Was it /tmp or the workspaces
> using
> > >>>>>>>> up the space?
> > >>>>>>>>
> > >>>>>>>> On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton <
> tysonjh@google.com>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Bleck. I just realized that it is 'offline' so that won't work.
> > >>>>>>>>> I'll clean up manually on the machine using the cron command.
> > >>>>>>>>>
> > >>>>>>>>> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <
> > >>>>>>>>> tysonjh@google.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Something isn't working with the current set up because node
> 15
> > >>>>>>>>>> appears to be out of space and is currently 'offline'
> according to Jenkins.
> > >>>>>>>>>> Can someone run the cleanup job? The machine is full,
> > >>>>>>>>>>
> > >>>>>>>>>> @apache-ci-beam-jenkins-15:/tmp$ df -h
> > >>>>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
> > >>>>>>>>>> udev             52G     0   52G   0% /dev
> > >>>>>>>>>> tmpfs            11G  265M   10G   3% /run
> > >>>>>>>>>> */dev/sda1       485G  484G  880M 100% /*
> > >>>>>>>>>> tmpfs            52G     0   52G   0% /dev/shm
> > >>>>>>>>>> tmpfs           5.0M     0  5.0M   0% /run/lock
> > >>>>>>>>>> tmpfs            52G     0   52G   0% /sys/fs/cgroup
> > >>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1017
> > >>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1037
> > >>>>>>>>>>
> > >>>>>>>>>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort
> -rhk
> > >>>>>>>>>> 1,1 | head -n 20
> > >>>>>>>>>> 20G     2020-07-24 17:52        .
> > >>>>>>>>>> 580M    2020-07-22 17:31        ./junit1031982597110125586
> > >>>>>>>>>> 517M    2020-07-22 17:31
> > >>>>>>>>>>
> ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
> > >>>>>>>>>> 517M    2020-07-22 17:31
> > >>>>>>>>>>  ./junit1031982597110125586/junit8739924829337821410
> > >>>>>>>>>> 263M    2020-07-22 12:23        ./pip-install-2GUhO_
> > >>>>>>>>>> 263M    2020-07-20 09:30        ./pip-install-sxgwqr
> > >>>>>>>>>> 263M    2020-07-17 13:56        ./pip-install-bWSKIV
> > >>>>>>>>>> 242M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T
> > >>>>>>>>>> 242M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK
> > >>>>>>>>>> 242M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ
> > >>>>>>>>>> 236M    2020-07-21 20:25
> > >>>>>>>>>>  ./beam-pipeline-tempmByU6T/tmpOWj3Yr
> > >>>>>>>>>> 236M    2020-07-21 20:21
> > >>>>>>>>>>  ./beam-pipeline-tempV85xeK/tmppbQHB3
> > >>>>>>>>>> 236M    2020-07-21 20:15
> > >>>>>>>>>>  ./beam-pipeline-temp7dJROJ/tmpgOXPKW
> > >>>>>>>>>> 111M    2020-07-23 00:57        ./pip-install-1JnyNE
> > >>>>>>>>>> 105M    2020-07-23 00:17
> ./beam-artifact1374651823280819755
> > >>>>>>>>>> 105M    2020-07-23 00:16
> ./beam-artifact5050755582921936972
> > >>>>>>>>>> 105M    2020-07-23 00:16
> ./beam-artifact1834064452502646289
> > >>>>>>>>>> 105M    2020-07-23 00:15
> ./beam-artifact682561790267074916
> > >>>>>>>>>> 105M    2020-07-23 00:15
> ./beam-artifact4691304965824489394
> > >>>>>>>>>> 105M    2020-07-23 00:14
> ./beam-artifact4050383819822604421
> > >>>>>>>>>>
> > >>>>>>>>>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <
> > >>>>>>>>>> robertwb@google.com> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <
> > >>>>>>>>>>> tysonjh@google.com> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Ah I see, thanks Kenn. I found some advice from the Apache
> > >>>>>>>>>>>> infra wiki that also suggests using a tmpdir inside the
> workspace [1]:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Procedures Projects can take to clean up disk space
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Projects can help themselves and Infra by taking some basic
> > >>>>>>>>>>>> steps to help clean up their jobs after themselves on the
> build nodes.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>    1. Use a ./tmp dir in your jobs workspace. That way it
> gets
> > >>>>>>>>>>>>    cleaned up when job workspaces expire.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>> Tests should be (able to be) written to use the standard
> > >>>>>>>>>>> temporary file mechanisms, and the environment set up on
> Jenkins such that
> > >>>>>>>>>>> that falls into the respective workspaces. Ideally this
> should be as simple
> > >>>>>>>>>>> as setting the TMPDIR (or similar) environment variable (and
> making sure it
> > >>>>>>>>>>> exists/is writable).
> > >>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>    1. Configure your jobs to wipe workspaces on start or
> > >>>>>>>>>>>>    finish.
> > >>>>>>>>>>>>    2. Configure your jobs to only keep 5 or 10 previous
> builds.
> > >>>>>>>>>>>>    3. Configure your jobs to only keep 5 or 10 previous
> > >>>>>>>>>>>>    artifacts.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> [1]:
> > >>>>>>>>>>>>
> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <
> > >>>>>>>>>>>> kenn@apache.org> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Those file listings look like the result of using standard
> > >>>>>>>>>>>>> temp file APIs but with TMPDIR set to /tmp.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <
> > >>>>>>>>>>>>> tysonjh@google.com> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Jobs are hermetic as far as I can tell and use unique
> > >>>>>>>>>>>>>> subdirectories inside of /tmp. Here is a quick look into
> two examples:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . |
> sort
> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20
> > >>>>>>>>>>>>>> 1.6G    2020-07-21 02:25        .
> > >>>>>>>>>>>>>> 242M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4
> > >>>>>>>>>>>>>> 242M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT
> > >>>>>>>>>>>>>> 242M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME
> > >>>>>>>>>>>>>> 242M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB
> > >>>>>>>>>>>>>> 242M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q
> > >>>>>>>>>>>>>> 242M    2020-07-17 18:35        ./beam-pipeline-temp79qot2
> > >>>>>>>>>>>>>> 236M    2020-07-17 18:48
> > >>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
> > >>>>>>>>>>>>>> 236M    2020-07-17 18:46
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
> > >>>>>>>>>>>>>> 236M    2020-07-17 18:44
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpxSm8pX
> > >>>>>>>>>>>>>> 236M    2020-07-17 18:42
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpMZJU76
> > >>>>>>>>>>>>>> 236M    2020-07-17 18:39
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmpWy1vWX
> > >>>>>>>>>>>>>> 236M    2020-07-17 18:35
> > >>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmpvN7vWA
> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:48
> > >>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmprlh_di
> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:46
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpLmVWfe
> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:44
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpvrxbY7
> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:42
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:39
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmptYF1v1
> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:35
> > >>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmplfV0Rg
> > >>>>>>>>>>>>>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . |
> sort
> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20
> > >>>>>>>>>>>>>> 817M    2020-07-21 02:26        .
> > >>>>>>>>>>>>>> 242M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM
> > >>>>>>>>>>>>>> 242M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3
> > >>>>>>>>>>>>>> 242M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq
> > >>>>>>>>>>>>>> 236M    2020-07-19 12:14
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpstXoL0
> > >>>>>>>>>>>>>> 236M    2020-07-19 12:11
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpnnVn65
> > >>>>>>>>>>>>>> 236M    2020-07-19 12:05
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpRF0iNs
> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:14
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:11
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpsmmzqe
> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:05
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:14
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpoj3orz
> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:11
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmptng9sZ
> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:05
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpWp6njc
> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:14
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmphgdj35
> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:11
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmp8ySXpm
> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:05
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
> > >>>>>>>>>>>>>> 992K    2020-07-12 12:00        ./junit642086915811430564
> > >>>>>>>>>>>>>> 988K    2020-07-12 12:00
> ./junit642086915811430564/beam
> > >>>>>>>>>>>>>> 984K    2020-07-12 12:00
> > >>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes
> > >>>>>>>>>>>>>> 980K    2020-07-12 12:00
> > >>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes/0
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <
> ehudm@google.com>
> > >>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> You're right, job workspaces should be hermetic.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <
> > >>>>>>>>>>>>>>> kenn@apache.org> wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I'm probably late to this discussion and missing
> something,
> > >>>>>>>>>>>>>>>> but why are we writing to /tmp at all? I would expect
> TMPDIR to point
> > >>>>>>>>>>>>>>>> somewhere inside the job directory that will be wiped
> by Jenkins, and I
> > >>>>>>>>>>>>>>>> would expect code to always create temp files via APIs
> that respect this.
> > >>>>>>>>>>>>>>>> Is Jenkins not cleaning up? Do we not have the ability
> to set this up? Do
> > >>>>>>>>>>>>>>>> we have bugs in our code (that we could probably find
> by setting TMPDIR to
> > >>>>>>>>>>>>>>>> somewhere not-/tmp and running the tests without write
> permission to /tmp,
> > >>>>>>>>>>>>>>>> etc)
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Kenn
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <
> > >>>>>>>>>>>>>>>> altay@google.com> wrote:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Related to workspace directory growth, +Udi Meiri
> > >>>>>>>>>>>>>>>>> <eh...@google.com> filed a relevant issue previously (
> > >>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for
> > >>>>>>>>>>>>>>>>> cleaning up workspace directory after successful jobs.
> Alternatively, we
> > >>>>>>>>>>>>>>>>> can consider periodically cleaning up the /src
> directories.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> I would suggest moving the cron task from internal cron
> > >>>>>>>>>>>>>>>>> scripts to the inventory job (
> > >>>>>>>>>>>>>>>>>
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51
> ).
> > >>>>>>>>>>>>>>>>> That way, we can see all the cron jobs as part of the
> source tree, adjust
> > >>>>>>>>>>>>>>>>> frequencies and clean up codes with PRs. I do not know
> how internal cron
> > >>>>>>>>>>>>>>>>> scripts are created, maintained, and how would they be
> recreated for new
> > >>>>>>>>>>>>>>>>> worker instances.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> /cc +Tyson Hamilton <ty...@google.com>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
> > >>>>>>>>>>>>>>>>> damian.gadomski@polidea.com> wrote:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Hey,
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> I've recently created a solution for the growing /tmp
> > >>>>>>>>>>>>>>>>>> directory. Part of it is the job mentioned by Tyson:
> > >>>>>>>>>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not
> > >>>>>>>>>>>>>>>>>> triggered by cron and should be a last resort
> solution for some strange
> > >>>>>>>>>>>>>>>>>> cases.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Along with that job, I've also updated every worker
> with
> > >>>>>>>>>>>>>>>>>> an internal cron script. It's being executed once a
> week and deletes all
> > >>>>>>>>>>>>>>>>>> the files (and only files) that were not accessed for
> at least three days.
> > >>>>>>>>>>>>>>>>>> That's designed to be as safe as possible for the
> running jobs on the
> > >>>>>>>>>>>>>>>>>> worker (not to delete the files that are still in
> use), and also to be
> > >>>>>>>>>>>>>>>>>> insensitive to the current workload on the machine.
> The cleanup will always
> > >>>>>>>>>>>>>>>>>> happen, even if some long-running/stuck jobs are
> blocking the machine.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> I also think that currently the "No space left" errors
> > >>>>>>>>>>>>>>>>>> may be a consequence of growing workspace directory
> rather than /tmp. I
> > >>>>>>>>>>>>>>>>>> didn't do any detailed analysis but e.g. currently, on
> > >>>>>>>>>>>>>>>>>> apache-beam-jenkins-7 the workspace directory size is
> 158 GB while /tmp is
> > >>>>>>>>>>>>>>>>>> only 16 GB. We should either guarantee the disk size
> to hold workspaces for
> > >>>>>>>>>>>>>>>>>> all jobs (because eventually, every worker will
> execute each job) or clear
> > >>>>>>>>>>>>>>>>>> also the workspaces in some way.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Regards,
> > >>>>>>>>>>>>>>>>>> Damian
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <
> > >>>>>>>>>>>>>>>>>> mxm@apache.org> wrote:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> +1 for scheduling it via a cron job if it won't lead
> to
> > >>>>>>>>>>>>>>>>>>> test failures
> > >>>>>>>>>>>>>>>>>>> while running. Not a Jenkins expert but maybe there
> is
> > >>>>>>>>>>>>>>>>>>> the notion of
> > >>>>>>>>>>>>>>>>>>> running exclusively while no other tasks are running?
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> -Max
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
> > >>>>>>>>>>>>>>>>>>> > FYI there was a job introduced to do this in
> Jenkins:
> > >>>>>>>>>>>>>>>>>>> beam_Clean_tmp_directory
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> > Currently it needs to be run manually. I'm seeing
> some
> > >>>>>>>>>>>>>>>>>>> out of disk related errors in precommit tests
> currently, perhaps we should
> > >>>>>>>>>>>>>>>>>>> schedule this job with cron?
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <
> > >>>>>>>>>>>>>>>>>>> heejong@google.com> wrote:
> > >>>>>>>>>>>>>>>>>>> >> Still seeing no space left on device errors on
> > >>>>>>>>>>>>>>>>>>> jenkins-7 (for example:
> > >>>>>>>>>>>>>>>>>>> >>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
> > >>>>>>>>>>>>>>>>>>> )
> > >>>>>>>>>>>>>>>>>>> >>
> > >>>>>>>>>>>>>>>>>>> >>
> > >>>>>>>>>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <
> > >>>>>>>>>>>>>>>>>>> amyrvold@google.com> wrote:
> > >>>>>>>>>>>>>>>>>>> >>
> > >>>>>>>>>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by
> jenkins
> > >>>>>>>>>>>>>>>>>>> older than 3 days.
> > >>>>>>>>>>>>>>>>>>> >>> Agree that we need a longer term solution.
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>> >>> Passing recent tests on all executors except
> > >>>>>>>>>>>>>>>>>>> jenkins-12, which has not
> > >>>>>>>>>>>>>>>>>>> >>> scheduled recent builds for the past 13 days. Not
> > >>>>>>>>>>>>>>>>>>> scheduling:
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>> Recent passing builds:
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <
> > >>>>>>>>>>>>>>>>>>> altay@google.com> wrote:
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>> >>>> +Alan Myrvold <am...@google.com> is doing a
> one
> > >>>>>>>>>>>>>>>>>>> time cleanup. I agree
> > >>>>>>>>>>>>>>>>>>> >>>> that we need to have a solution to automate this
> > >>>>>>>>>>>>>>>>>>> task or address the root
> > >>>>>>>>>>>>>>>>>>> >>>> cause of the buildup.
> > >>>>>>>>>>>>>>>>>>> >>>>
> > >>>>>>>>>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
> > >>>>>>>>>>>>>>>>>>> michal.walenia@polidea.com>
> > >>>>>>>>>>>>>>>>>>> >>>> wrote:
> > >>>>>>>>>>>>>>>>>>> >>>>
> > >>>>>>>>>>>>>>>>>>> >>>>> Hi there,
> > >>>>>>>>>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins workers
> > >>>>>>>>>>>>>>>>>>> again. Nodes 1 and 7
> > >>>>>>>>>>>>>>>>>>> >>>>> both fail jobs with "No space left on device".
> > >>>>>>>>>>>>>>>>>>> >>>>> Who is the best person to contact in these
> cases
> > >>>>>>>>>>>>>>>>>>> (someone with access
> > >>>>>>>>>>>>>>>>>>> >>>>> permissions to the workers).
> > >>>>>>>>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>>>>>>>> >>>>> I also noticed that such errors are becoming
> more
> > >>>>>>>>>>>>>>>>>>> and more frequent
> > >>>>>>>>>>>>>>>>>>> >>>>> recently and I'd like to discuss how can this
> be
> > >>>>>>>>>>>>>>>>>>> remedied. Can a cleanup
> > >>>>>>>>>>>>>>>>>>> >>>>> task be automated on Jenkins somehow?
> > >>>>>>>>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>>>>>>>> >>>>> Regards
> > >>>>>>>>>>>>>>>>>>> >>>>> Michal
> > >>>>>>>>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>>>>>>>> >>>>> --
> > >>>>>>>>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>>>>>>>> >>>>> Michał Walenia
> > >>>>>>>>>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software
> > >>>>>>>>>>>>>>>>>>> Engineer
> > >>>>>>>>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002>
> <+48%20791%20432%20002> <
> > >>>>>>>>>>>>>>>>>>> +48791432002 <+48%20791%20432%20002>
> <+48%20791%20432%20002>>
> > >>>>>>>>>>>>>>>>>>> >>>>> E: michal.walenia@polidea.com
> > >>>>>>>>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>>>>>>>> >>>>> Unique Tech
> > >>>>>>>>>>>>>>>>>>> >>>>> Check out our projects! <
> > >>>>>>>>>>>>>>>>>>> https://www.polidea.com/our-work>
> > >>>>>>>>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>>>>>>>> >>>>
> > >>>>>>>>>>>>>>>>>>> >>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> >
>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Damian Gadomski <da...@polidea.com>.
I did a small research on the temporary directories. Seems that there's no
one unified way of telling applications to use a specific path. Neither the
guarantee that all of them will use the dedicated custom directory. Badly
behaving apps could always hardcode `/tmp`, e.g. Java ;)

But, we should be able to handle most of the cases by setting the TEMPDIR
env variable (and also less popular `TMP`, `TEMP`) and passing Java
property `java.io.tmpdir` to the builds.

There's even a plugin [1] that perfectly fits our needs. But it's
unmaintained for 3 years and not available in Jenkins plugin repository.
Not sure if we want to use it anyway. Alternatively, we can add the envs,
property, and the creation of the directory manually to the DSL scripts.

[1] https://github.com/acrolinx/tmpdir-jenkins-plugin

On Wed, Jul 29, 2020 at 12:58 AM Kenneth Knowles <ke...@apache.org> wrote:

> Cool. If it is /home/jenkins it should be just fine. Thanks for checking!
>
> Kenn
>
> On Tue, Jul 28, 2020 at 10:23 AM Damian Gadomski <
> damian.gadomski@polidea.com> wrote:
>
>> Sorry, mistake while copying, [1] should be:
>> [1]
>> https://github.com/apache/beam/blob/8aca8ccc7f1a14516ad769b63845ddd4dc163d92/.test-infra/jenkins/CommonJobProperties.groovy#L63
>>
>>
>> On Tue, Jul 28, 2020 at 7:21 PM Damian Gadomski <
>> damian.gadomski@polidea.com> wrote:
>>
>>> That's interesting. I didn't check that myself but all the Jenkins jobs
>>> are configured to wipe the workspace just before the actual build happens
>>> [1]
>>> <https://github.com/apache/beam/blob/8aca8ccc7f1a14516ad769b63845ddd4dc163d92/.test-infra/jenkins/CommonJobProperties.groovy#L6>.
>>> Git SCM plugin is used for that and it enables the option called "Wipe out
>>> repository and force clone". Docs state that it "deletes the contents of
>>> the workspace before build and before checkout" [2]
>>> <https://plugins.jenkins.io/git/>. Therefore I assume that removing
>>> workspace just after the build won't change anything.
>>>
>>> The ./.gradle/caches/modules-2/files-2.1 dir is indeed present on the
>>> worker machines but it's rather in /home/jenkins dir.
>>>
>>> damgad@apache-ci-beam-jenkins-13:/home/jenkins/.gradle$ sudo du -sh
>>> 11G .
>>> damgad@apache-ci-beam-jenkins-13:/home/jenkins/.gradle$ sudo du -sh
>>> caches/modules-2/files-2.1
>>> 2.3G caches/modules-2/files-2.1
>>>
>>> I can't find that directory structure inside workspaces.
>>>
>>> damgad@apache-ci-beam-jenkins-13:/home/jenkins/jenkins-slave/workspace$
>>> sudo find -name "files-2.1"
>>> damgad@apache-ci-beam-jenkins-13:/home/jenkins/jenkins-slave/workspace$
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/8aca8ccc7f1a14516ad769b63845ddd4dc163d92/.test-infra/jenkins/CommonJobProperties.groovy#L6
>>> [2] https://plugins.jenkins.io/git/
>>>
>>> On Tue, Jul 28, 2020 at 5:47 PM Kenneth Knowles <ke...@apache.org> wrote:
>>>
>>>> Just checking - will this wipe out dependency cache? That will slow
>>>> things down and significantly increase flakiness. If I recall correctly,
>>>> the default Jenkins layout was:
>>>>
>>>>     /home/jenkins/jenkins-slave/workspace/$jobname
>>>>     /home/jenkins/jenkins-slave/workspace/$jobname/.m2
>>>>     /home/jenkins/jenkins-slave/workspace/$jobname/.git
>>>>
>>>> Where you can see that it did a `git clone` right into the root
>>>> workspace directory, adjacent to .m2. This was not hygienic. One important
>>>> thing was that `git clean` would wipe the maven cache with every build. So
>>>> in https://github.com/apache/beam/pull/3976 we changed it to
>>>>
>>>>     /home/jenkins/jenkins-slave/workspace/$jobname
>>>>     /home/jenkins/jenkins-slave/workspace/$jobname/.m2
>>>>     /home/jenkins/jenkins-slave/workspace/$jobname/src/.git
>>>>
>>>> Now the .m2 directory survives and we do not constantly see flakes
>>>> re-downloading deps that are immutable. This does, of course, use disk
>>>> space.
>>>>
>>>> That was in the maven days. Gradle is the same except for $HOME/.m2 is
>>>> replaced by $HOME/.gradle/caches/modules-2/files-2.1. Is Jenkins configured
>>>> the same way so we will be wiping out the dependencies? If so, can you
>>>> address this issue? Everything in that directory should be immutable and
>>>> just a cache to avoid pointless re-download.
>>>>
>>>> Kenn
>>>>
>>>> On Tue, Jul 28, 2020 at 2:25 AM Damian Gadomski <
>>>> damian.gadomski@polidea.com> wrote:
>>>>
>>>>> Agree with Udi, workspaces seem to be the third culprit, not yet
>>>>> addressed in any way (until PR#12326
>>>>> <https://github.com/apache/beam/pull/12326> is merged). I feel that
>>>>> it'll solve the issue of filling up the disks for a long time ;)
>>>>>
>>>>> I'm also OK with moving /tmp cleanup to option B, and will happily
>>>>> investigate on proper TMPDIR config.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jul 28, 2020 at 3:07 AM Udi Meiri <eh...@google.com> wrote:
>>>>>
>>>>>> What about the workspaces, which can take up 175GB in some cases (see
>>>>>> above)?
>>>>>> I'm working on getting them cleaned up automatically:
>>>>>> https://github.com/apache/beam/pull/12326
>>>>>>
>>>>>> My opinion is that we would get more mileage out of fixing the jobs
>>>>>> that leave behind files in /tmp and images/containers in Docker.
>>>>>> This would also help keep development machines clean.
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 27, 2020 at 5:31 PM Tyson Hamilton <ty...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Here is a summery of how I understand things,
>>>>>>>
>>>>>>>   - /tmp and /var/lib/docker are the culprit for filling up disks
>>>>>>>   - inventory Jenkins job runs every 12 hours and runs a docker
>>>>>>> prune to clean up images older than 24hr
>>>>>>>   - crontab on each machine cleans up /tmp files older than three
>>>>>>> days weekly
>>>>>>>
>>>>>>> This doesn't seem to be working since we're still running out of
>>>>>>> disk periodically and requiring manual intervention. Knobs and options we
>>>>>>> have available:
>>>>>>>
>>>>>>>   1. increase frequency of deleting files
>>>>>>>   2. decrease the number of days required to delete a file (e.g.
>>>>>>> older than 2 days)
>>>>>>>
>>>>>>> The execution methods we have available are:
>>>>>>>
>>>>>>>   A. cron
>>>>>>>     - pro: runs even if a job gets stuck in Jenkins due to full disk
>>>>>>>     - con: config baked into VM which is tough to update, not
>>>>>>> discoverable or documented well
>>>>>>>   B. inventory job
>>>>>>>     - pro: easy to update, runs every 12h already
>>>>>>>     - con: could get stuck if Jenkins agent runs out of disk or is
>>>>>>> otherwise stuck, tied to all other inventory job frequency
>>>>>>>   C. configure startup scripts for the VMs that set up the cron job
>>>>>>> anytime the VM is restarted
>>>>>>>     - pro: similar to A. and easy to update
>>>>>>>     - con: similar to A.
>>>>>>>
>>>>>>> Between the three I prefer B. because it is consistent with other
>>>>>>> inventory jobs. If it ends up that stuck jobs prohibit scheduling of the
>>>>>>> inventory job often we could further investigate C to avoid having to
>>>>>>> rebuild the VM images repeatedly.
>>>>>>>
>>>>>>> Any objections or comments? If not, we'll go forward with B. and
>>>>>>> reduce the date check from 3 days to 2 days.
>>>>>>>
>>>>>>>
>>>>>>> On 2020/07/24 20:13:29, Ahmet Altay <al...@google.com> wrote:
>>>>>>> > Tests may not be doing docker cleanup. Inventory job runs a docker
>>>>>>> prune
>>>>>>> > every 12 hours for images older than 24 hrs [1]. Randomly looking
>>>>>>> at one of
>>>>>>> > the recent runs [2], it cleaned up a long list of containers
>>>>>>> consuming
>>>>>>> > 30+GB space. That should be just 12 hours worth of containers.
>>>>>>> >
>>>>>>> > [1]
>>>>>>> >
>>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L69
>>>>>>> > [2]
>>>>>>> >
>>>>>>> https://ci-beam.apache.org/job/beam_Inventory_apache-beam-jenkins-14/501/console
>>>>>>> >
>>>>>>> > On Fri, Jul 24, 2020 at 1:07 PM Tyson Hamilton <ty...@google.com>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > > Yes, these are on the same volume in the /var/lib/docker
>>>>>>> directory. I'm
>>>>>>> > > unsure if they clean up leftover images.
>>>>>>> > >
>>>>>>> > > On Fri, Jul 24, 2020 at 12:52 PM Udi Meiri <eh...@google.com>
>>>>>>> wrote:
>>>>>>> > >
>>>>>>> > >> I forgot Docker images:
>>>>>>> > >>
>>>>>>> > >> ehudm@apache-ci-beam-jenkins-3:~$ sudo docker system df
>>>>>>> > >> TYPE                TOTAL               ACTIVE              SIZE
>>>>>>> > >>        RECLAIMABLE
>>>>>>> > >> Images              88                  9
>>>>>>>  125.4GB
>>>>>>> > >>       124.2GB (99%)
>>>>>>> > >> Containers          40                  4
>>>>>>>  7.927GB
>>>>>>> > >>       7.871GB (99%)
>>>>>>> > >> Local Volumes       47                  0
>>>>>>>  3.165GB
>>>>>>> > >>       3.165GB (100%)
>>>>>>> > >> Build Cache         0                   0                   0B
>>>>>>> > >>        0B
>>>>>>> > >>
>>>>>>> > >> There are about 90 images on that machine, with all but 1 less
>>>>>>> than 48
>>>>>>> > >> hours old.
>>>>>>> > >> I think the docker test jobs need to try harder at cleaning up
>>>>>>> their
>>>>>>> > >> leftover images. (assuming they're already doing it?)
>>>>>>> > >>
>>>>>>> > >> On Fri, Jul 24, 2020 at 12:31 PM Udi Meiri <eh...@google.com>
>>>>>>> wrote:
>>>>>>> > >>
>>>>>>> > >>> The additional slots (@3 directories) take up even more space
>>>>>>> now than
>>>>>>> > >>> before.
>>>>>>> > >>>
>>>>>>> > >>> I'm testing out https://github.com/apache/beam/pull/12326
>>>>>>> which could
>>>>>>> > >>> help by cleaning up workspaces after a run (just started a
>>>>>>> seed job).
>>>>>>> > >>>
>>>>>>> > >>> On Fri, Jul 24, 2020 at 12:13 PM Tyson Hamilton <
>>>>>>> tysonjh@google.com>
>>>>>>> > >>> wrote:
>>>>>>> > >>>
>>>>>>> > >>>> 664M    beam_PreCommit_JavaPortabilityApi_Commit
>>>>>>> > >>>> 656M    beam_PreCommit_JavaPortabilityApi_Commit@2
>>>>>>> > >>>> 611M    beam_PreCommit_JavaPortabilityApi_Cron
>>>>>>> > >>>> 616M    beam_PreCommit_JavaPortabilityApiJava11_Commit
>>>>>>> > >>>> 598M    beam_PreCommit_JavaPortabilityApiJava11_Commit@2
>>>>>>> > >>>> 662M    beam_PreCommit_JavaPortabilityApiJava11_Cron
>>>>>>> > >>>> 2.9G    beam_PreCommit_Portable_Python_Commit
>>>>>>> > >>>> 2.9G    beam_PreCommit_Portable_Python_Commit@2
>>>>>>> > >>>> 1.7G    beam_PreCommit_Portable_Python_Commit@3
>>>>>>> > >>>> 3.4G    beam_PreCommit_Portable_Python_Cron
>>>>>>> > >>>> 1.9G    beam_PreCommit_Python2_PVR_Flink_Commit
>>>>>>> > >>>> 1.4G    beam_PreCommit_Python2_PVR_Flink_Cron
>>>>>>> > >>>> 1.3G    beam_PreCommit_Python2_PVR_Flink_Phrase
>>>>>>> > >>>> 6.2G    beam_PreCommit_Python_Commit
>>>>>>> > >>>> 7.5G    beam_PreCommit_Python_Commit@2
>>>>>>> > >>>> 7.5G    beam_PreCommit_Python_Cron
>>>>>>> > >>>> 1012M   beam_PreCommit_PythonDocker_Commit
>>>>>>> > >>>> 1011M   beam_PreCommit_PythonDocker_Commit@2
>>>>>>> > >>>> 1011M   beam_PreCommit_PythonDocker_Commit@3
>>>>>>> > >>>> 1002M   beam_PreCommit_PythonDocker_Cron
>>>>>>> > >>>> 877M    beam_PreCommit_PythonFormatter_Commit
>>>>>>> > >>>> 988M    beam_PreCommit_PythonFormatter_Cron
>>>>>>> > >>>> 986M    beam_PreCommit_PythonFormatter_Phrase
>>>>>>> > >>>> 1.7G    beam_PreCommit_PythonLint_Commit
>>>>>>> > >>>> 2.1G    beam_PreCommit_PythonLint_Cron
>>>>>>> > >>>> 7.5G    beam_PreCommit_Python_Phrase
>>>>>>> > >>>> 346M    beam_PreCommit_RAT_Commit
>>>>>>> > >>>> 341M    beam_PreCommit_RAT_Cron
>>>>>>> > >>>> 338M    beam_PreCommit_Spotless_Commit
>>>>>>> > >>>> 339M    beam_PreCommit_Spotless_Cron
>>>>>>> > >>>> 5.5G    beam_PreCommit_SQL_Commit
>>>>>>> > >>>> 5.5G    beam_PreCommit_SQL_Cron
>>>>>>> > >>>> 5.5G    beam_PreCommit_SQL_Java11_Commit
>>>>>>> > >>>> 750M    beam_PreCommit_Website_Commit
>>>>>>> > >>>> 750M    beam_PreCommit_Website_Commit@2
>>>>>>> > >>>> 750M    beam_PreCommit_Website_Cron
>>>>>>> > >>>> 764M    beam_PreCommit_Website_Stage_GCS_Commit
>>>>>>> > >>>> 771M    beam_PreCommit_Website_Stage_GCS_Cron
>>>>>>> > >>>> 336M    beam_Prober_CommunityMetrics
>>>>>>> > >>>> 693M    beam_python_mongoio_load_test
>>>>>>> > >>>> 339M    beam_SeedJob
>>>>>>> > >>>> 333M    beam_SeedJob_Standalone
>>>>>>> > >>>> 334M    beam_sonarqube_report
>>>>>>> > >>>> 556M    beam_SQLBigQueryIO_Batch_Performance_Test_Java
>>>>>>> > >>>> 175G    total
>>>>>>> > >>>>
>>>>>>> > >>>> On Fri, Jul 24, 2020 at 12:04 PM Tyson Hamilton <
>>>>>>> tysonjh@google.com>
>>>>>>> > >>>> wrote:
>>>>>>> > >>>>
>>>>>>> > >>>>> Ya looks like something in the workspaces is taking up room:
>>>>>>> > >>>>>
>>>>>>> > >>>>> @apache-ci-beam-jenkins-8:/home/jenkins$ sudo du -shc .
>>>>>>> > >>>>> 191G    .
>>>>>>> > >>>>> 191G    total
>>>>>>> > >>>>>
>>>>>>> > >>>>>
>>>>>>> > >>>>> On Fri, Jul 24, 2020 at 11:44 AM Tyson Hamilton <
>>>>>>> tysonjh@google.com>
>>>>>>> > >>>>> wrote:
>>>>>>> > >>>>>
>>>>>>> > >>>>>> Node 8 is also full. The partition that /tmp is on is here:
>>>>>>> > >>>>>>
>>>>>>> > >>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>>> > >>>>>> /dev/sda1       485G  482G  2.9G 100% /
>>>>>>> > >>>>>>
>>>>>>> > >>>>>> however after cleaning up tmp with the crontab command,
>>>>>>> there is only
>>>>>>> > >>>>>> 8G usage yet it still remains 100% full:
>>>>>>> > >>>>>>
>>>>>>> > >>>>>> @apache-ci-beam-jenkins-8:/tmp$ sudo du -shc /tmp
>>>>>>> > >>>>>> 8.0G    /tmp
>>>>>>> > >>>>>> 8.0G    total
>>>>>>> > >>>>>>
>>>>>>> > >>>>>> The workspaces are in the
>>>>>>> /home/jenkins/jenkins-slave/workspace
>>>>>>> > >>>>>> directory. When I run a du on that, it takes really long.
>>>>>>> I'll let it keep
>>>>>>> > >>>>>> running for a while to see if it ever returns a result but
>>>>>>> so far this
>>>>>>> > >>>>>> seems suspect.
>>>>>>> > >>>>>>
>>>>>>> > >>>>>>
>>>>>>> > >>>>>>
>>>>>>> > >>>>>>
>>>>>>> > >>>>>>
>>>>>>> > >>>>>> On Fri, Jul 24, 2020 at 11:19 AM Tyson Hamilton <
>>>>>>> tysonjh@google.com>
>>>>>>> > >>>>>> wrote:
>>>>>>> > >>>>>>
>>>>>>> > >>>>>>> Everything I've been looking at is in the /tmp dir. Where
>>>>>>> are the
>>>>>>> > >>>>>>> workspaces, or what are the named?
>>>>>>> > >>>>>>>
>>>>>>> > >>>>>>>
>>>>>>> > >>>>>>>
>>>>>>> > >>>>>>>
>>>>>>> > >>>>>>> On Fri, Jul 24, 2020 at 11:03 AM Udi Meiri <
>>>>>>> ehudm@google.com> wrote:
>>>>>>> > >>>>>>>
>>>>>>> > >>>>>>>> I'm curious to what you find. Was it /tmp or the
>>>>>>> workspaces using
>>>>>>> > >>>>>>>> up the space?
>>>>>>> > >>>>>>>>
>>>>>>> > >>>>>>>> On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton <
>>>>>>> tysonjh@google.com>
>>>>>>> > >>>>>>>> wrote:
>>>>>>> > >>>>>>>>
>>>>>>> > >>>>>>>>> Bleck. I just realized that it is 'offline' so that
>>>>>>> won't work.
>>>>>>> > >>>>>>>>> I'll clean up manually on the machine using the cron
>>>>>>> command.
>>>>>>> > >>>>>>>>>
>>>>>>> > >>>>>>>>> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <
>>>>>>> > >>>>>>>>> tysonjh@google.com> wrote:
>>>>>>> > >>>>>>>>>
>>>>>>> > >>>>>>>>>> Something isn't working with the current set up because
>>>>>>> node 15
>>>>>>> > >>>>>>>>>> appears to be out of space and is currently 'offline'
>>>>>>> according to Jenkins.
>>>>>>> > >>>>>>>>>> Can someone run the cleanup job? The machine is full,
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>> @apache-ci-beam-jenkins-15:/tmp$ df -h
>>>>>>> > >>>>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>>> > >>>>>>>>>> udev             52G     0   52G   0% /dev
>>>>>>> > >>>>>>>>>> tmpfs            11G  265M   10G   3% /run
>>>>>>> > >>>>>>>>>> */dev/sda1       485G  484G  880M 100% /*
>>>>>>> > >>>>>>>>>> tmpfs            52G     0   52G   0% /dev/shm
>>>>>>> > >>>>>>>>>> tmpfs           5.0M     0  5.0M   0% /run/lock
>>>>>>> > >>>>>>>>>> tmpfs            52G     0   52G   0% /sys/fs/cgroup
>>>>>>> > >>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1017
>>>>>>> > >>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1037
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . |
>>>>>>> sort -rhk
>>>>>>> > >>>>>>>>>> 1,1 | head -n 20
>>>>>>> > >>>>>>>>>> 20G     2020-07-24 17:52        .
>>>>>>> > >>>>>>>>>> 580M    2020-07-22 17:31
>>>>>>> ./junit1031982597110125586
>>>>>>> > >>>>>>>>>> 517M    2020-07-22 17:31
>>>>>>> > >>>>>>>>>>
>>>>>>> ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
>>>>>>> > >>>>>>>>>> 517M    2020-07-22 17:31
>>>>>>> > >>>>>>>>>>  ./junit1031982597110125586/junit8739924829337821410
>>>>>>> > >>>>>>>>>> 263M    2020-07-22 12:23        ./pip-install-2GUhO_
>>>>>>> > >>>>>>>>>> 263M    2020-07-20 09:30        ./pip-install-sxgwqr
>>>>>>> > >>>>>>>>>> 263M    2020-07-17 13:56        ./pip-install-bWSKIV
>>>>>>> > >>>>>>>>>> 242M    2020-07-21 20:25
>>>>>>> ./beam-pipeline-tempmByU6T
>>>>>>> > >>>>>>>>>> 242M    2020-07-21 20:21
>>>>>>> ./beam-pipeline-tempV85xeK
>>>>>>> > >>>>>>>>>> 242M    2020-07-21 20:15
>>>>>>> ./beam-pipeline-temp7dJROJ
>>>>>>> > >>>>>>>>>> 236M    2020-07-21 20:25
>>>>>>> > >>>>>>>>>>  ./beam-pipeline-tempmByU6T/tmpOWj3Yr
>>>>>>> > >>>>>>>>>> 236M    2020-07-21 20:21
>>>>>>> > >>>>>>>>>>  ./beam-pipeline-tempV85xeK/tmppbQHB3
>>>>>>> > >>>>>>>>>> 236M    2020-07-21 20:15
>>>>>>> > >>>>>>>>>>  ./beam-pipeline-temp7dJROJ/tmpgOXPKW
>>>>>>> > >>>>>>>>>> 111M    2020-07-23 00:57        ./pip-install-1JnyNE
>>>>>>> > >>>>>>>>>> 105M    2020-07-23 00:17
>>>>>>> ./beam-artifact1374651823280819755
>>>>>>> > >>>>>>>>>> 105M    2020-07-23 00:16
>>>>>>> ./beam-artifact5050755582921936972
>>>>>>> > >>>>>>>>>> 105M    2020-07-23 00:16
>>>>>>> ./beam-artifact1834064452502646289
>>>>>>> > >>>>>>>>>> 105M    2020-07-23 00:15
>>>>>>> ./beam-artifact682561790267074916
>>>>>>> > >>>>>>>>>> 105M    2020-07-23 00:15
>>>>>>> ./beam-artifact4691304965824489394
>>>>>>> > >>>>>>>>>> 105M    2020-07-23 00:14
>>>>>>> ./beam-artifact4050383819822604421
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <
>>>>>>> > >>>>>>>>>> robertwb@google.com> wrote:
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <
>>>>>>> > >>>>>>>>>>> tysonjh@google.com> wrote:
>>>>>>> > >>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>> Ah I see, thanks Kenn. I found some advice from the
>>>>>>> Apache
>>>>>>> > >>>>>>>>>>>> infra wiki that also suggests using a tmpdir inside
>>>>>>> the workspace [1]:
>>>>>>> > >>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>> Procedures Projects can take to clean up disk space
>>>>>>> > >>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>> Projects can help themselves and Infra by taking some
>>>>>>> basic
>>>>>>> > >>>>>>>>>>>> steps to help clean up their jobs after themselves on
>>>>>>> the build nodes.
>>>>>>> > >>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>    1. Use a ./tmp dir in your jobs workspace. That
>>>>>>> way it gets
>>>>>>> > >>>>>>>>>>>>    cleaned up when job workspaces expire.
>>>>>>> > >>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>> Tests should be (able to be) written to use the
>>>>>>> standard
>>>>>>> > >>>>>>>>>>> temporary file mechanisms, and the environment set up
>>>>>>> on Jenkins such that
>>>>>>> > >>>>>>>>>>> that falls into the respective workspaces. Ideally
>>>>>>> this should be as simple
>>>>>>> > >>>>>>>>>>> as setting the TMPDIR (or similar) environment
>>>>>>> variable (and making sure it
>>>>>>> > >>>>>>>>>>> exists/is writable).
>>>>>>> > >>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>    1. Configure your jobs to wipe workspaces on start
>>>>>>> or
>>>>>>> > >>>>>>>>>>>>    finish.
>>>>>>> > >>>>>>>>>>>>    2. Configure your jobs to only keep 5 or 10
>>>>>>> previous builds.
>>>>>>> > >>>>>>>>>>>>    3. Configure your jobs to only keep 5 or 10
>>>>>>> previous
>>>>>>> > >>>>>>>>>>>>    artifacts.
>>>>>>> > >>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>> [1]:
>>>>>>> > >>>>>>>>>>>>
>>>>>>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
>>>>>>> > >>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <
>>>>>>> > >>>>>>>>>>>> kenn@apache.org> wrote:
>>>>>>> > >>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>> Those file listings look like the result of using
>>>>>>> standard
>>>>>>> > >>>>>>>>>>>>> temp file APIs but with TMPDIR set to /tmp.
>>>>>>> > >>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <
>>>>>>> > >>>>>>>>>>>>> tysonjh@google.com> wrote:
>>>>>>> > >>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>> Jobs are hermetic as far as I can tell and use
>>>>>>> unique
>>>>>>> > >>>>>>>>>>>>>> subdirectories inside of /tmp. Here is a quick look
>>>>>>> into two examples:
>>>>>>> > >>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time
>>>>>>> . | sort
>>>>>>> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20
>>>>>>> > >>>>>>>>>>>>>> 1.6G    2020-07-21 02:25        .
>>>>>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:48
>>>>>>> ./beam-pipeline-temp3ybuY4
>>>>>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:46
>>>>>>> ./beam-pipeline-tempuxjiPT
>>>>>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:44
>>>>>>> ./beam-pipeline-tempVpg1ME
>>>>>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:42
>>>>>>> ./beam-pipeline-tempJ4EpyB
>>>>>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:39
>>>>>>> ./beam-pipeline-tempepea7Q
>>>>>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:35
>>>>>>> ./beam-pipeline-temp79qot2
>>>>>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:48
>>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>>>>>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:46
>>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>>>>>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:44
>>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>>>>>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:42
>>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>>>>>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:39
>>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmpWy1vWX
>>>>>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:35
>>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmpvN7vWA
>>>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:48
>>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmprlh_di
>>>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:46
>>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>>>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:44
>>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>>>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:42
>>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>>>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:39
>>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmptYF1v1
>>>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:35
>>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmplfV0Rg
>>>>>>> > >>>>>>>>>>>>>> 2.7M    2020-07-17 20:10
>>>>>>> ./pip-install-q9l227ef
>>>>>>> > >>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time
>>>>>>> . | sort
>>>>>>> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20
>>>>>>> > >>>>>>>>>>>>>> 817M    2020-07-21 02:26        .
>>>>>>> > >>>>>>>>>>>>>> 242M    2020-07-19 12:14
>>>>>>> ./beam-pipeline-tempUTXqlM
>>>>>>> > >>>>>>>>>>>>>> 242M    2020-07-19 12:11
>>>>>>> ./beam-pipeline-tempx3Yno3
>>>>>>> > >>>>>>>>>>>>>> 242M    2020-07-19 12:05
>>>>>>> ./beam-pipeline-tempyCrMYq
>>>>>>> > >>>>>>>>>>>>>> 236M    2020-07-19 12:14
>>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpstXoL0
>>>>>>> > >>>>>>>>>>>>>> 236M    2020-07-19 12:11
>>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpnnVn65
>>>>>>> > >>>>>>>>>>>>>> 236M    2020-07-19 12:05
>>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpRF0iNs
>>>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:14
>>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>>>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:11
>>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpsmmzqe
>>>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:05
>>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>>>>>>> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:14
>>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpoj3orz
>>>>>>> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:11
>>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmptng9sZ
>>>>>>> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:05
>>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpWp6njc
>>>>>>> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:14
>>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmphgdj35
>>>>>>> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:11
>>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmp8ySXpm
>>>>>>> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:05
>>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>>>>>>> > >>>>>>>>>>>>>> 992K    2020-07-12 12:00
>>>>>>> ./junit642086915811430564
>>>>>>> > >>>>>>>>>>>>>> 988K    2020-07-12 12:00
>>>>>>> ./junit642086915811430564/beam
>>>>>>> > >>>>>>>>>>>>>> 984K    2020-07-12 12:00
>>>>>>> > >>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes
>>>>>>> > >>>>>>>>>>>>>> 980K    2020-07-12 12:00
>>>>>>> > >>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes/0
>>>>>>> > >>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <
>>>>>>> ehudm@google.com>
>>>>>>> > >>>>>>>>>>>>>> wrote:
>>>>>>> > >>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>> You're right, job workspaces should be hermetic.
>>>>>>> > >>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <
>>>>>>> > >>>>>>>>>>>>>>> kenn@apache.org> wrote:
>>>>>>> > >>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>> I'm probably late to this discussion and missing
>>>>>>> something,
>>>>>>> > >>>>>>>>>>>>>>>> but why are we writing to /tmp at all? I would
>>>>>>> expect TMPDIR to point
>>>>>>> > >>>>>>>>>>>>>>>> somewhere inside the job directory that will be
>>>>>>> wiped by Jenkins, and I
>>>>>>> > >>>>>>>>>>>>>>>> would expect code to always create temp files via
>>>>>>> APIs that respect this.
>>>>>>> > >>>>>>>>>>>>>>>> Is Jenkins not cleaning up? Do we not have the
>>>>>>> ability to set this up? Do
>>>>>>> > >>>>>>>>>>>>>>>> we have bugs in our code (that we could probably
>>>>>>> find by setting TMPDIR to
>>>>>>> > >>>>>>>>>>>>>>>> somewhere not-/tmp and running the tests without
>>>>>>> write permission to /tmp,
>>>>>>> > >>>>>>>>>>>>>>>> etc)
>>>>>>> > >>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>> Kenn
>>>>>>> > >>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <
>>>>>>> > >>>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>> > >>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>> Related to workspace directory growth, +Udi Meiri
>>>>>>> > >>>>>>>>>>>>>>>>> <eh...@google.com> filed a relevant issue
>>>>>>> previously (
>>>>>>> > >>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865)
>>>>>>> for
>>>>>>> > >>>>>>>>>>>>>>>>> cleaning up workspace directory after successful
>>>>>>> jobs. Alternatively, we
>>>>>>> > >>>>>>>>>>>>>>>>> can consider periodically cleaning up the /src
>>>>>>> directories.
>>>>>>> > >>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>> I would suggest moving the cron task from
>>>>>>> internal cron
>>>>>>> > >>>>>>>>>>>>>>>>> scripts to the inventory job (
>>>>>>> > >>>>>>>>>>>>>>>>>
>>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51
>>>>>>> ).
>>>>>>> > >>>>>>>>>>>>>>>>> That way, we can see all the cron jobs as part
>>>>>>> of the source tree, adjust
>>>>>>> > >>>>>>>>>>>>>>>>> frequencies and clean up codes with PRs. I do
>>>>>>> not know how internal cron
>>>>>>> > >>>>>>>>>>>>>>>>> scripts are created, maintained, and how would
>>>>>>> they be recreated for new
>>>>>>> > >>>>>>>>>>>>>>>>> worker instances.
>>>>>>> > >>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>> /cc +Tyson Hamilton <ty...@google.com>
>>>>>>> > >>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>>>>>>> > >>>>>>>>>>>>>>>>> damian.gadomski@polidea.com> wrote:
>>>>>>> > >>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>> Hey,
>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>> I've recently created a solution for the
>>>>>>> growing /tmp
>>>>>>> > >>>>>>>>>>>>>>>>>> directory. Part of it is the job mentioned by
>>>>>>> Tyson:
>>>>>>> > >>>>>>>>>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally
>>>>>>> not
>>>>>>> > >>>>>>>>>>>>>>>>>> triggered by cron and should be a last resort
>>>>>>> solution for some strange
>>>>>>> > >>>>>>>>>>>>>>>>>> cases.
>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>> Along with that job, I've also updated every
>>>>>>> worker with
>>>>>>> > >>>>>>>>>>>>>>>>>> an internal cron script. It's being executed
>>>>>>> once a week and deletes all
>>>>>>> > >>>>>>>>>>>>>>>>>> the files (and only files) that were not
>>>>>>> accessed for at least three days.
>>>>>>> > >>>>>>>>>>>>>>>>>> That's designed to be as safe as possible for
>>>>>>> the running jobs on the
>>>>>>> > >>>>>>>>>>>>>>>>>> worker (not to delete the files that are still
>>>>>>> in use), and also to be
>>>>>>> > >>>>>>>>>>>>>>>>>> insensitive to the current workload on the
>>>>>>> machine. The cleanup will always
>>>>>>> > >>>>>>>>>>>>>>>>>> happen, even if some long-running/stuck jobs
>>>>>>> are blocking the machine.
>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>> I also think that currently the "No space left"
>>>>>>> errors
>>>>>>> > >>>>>>>>>>>>>>>>>> may be a consequence of growing workspace
>>>>>>> directory rather than /tmp. I
>>>>>>> > >>>>>>>>>>>>>>>>>> didn't do any detailed analysis but e.g.
>>>>>>> currently, on
>>>>>>> > >>>>>>>>>>>>>>>>>> apache-beam-jenkins-7 the workspace directory
>>>>>>> size is 158 GB while /tmp is
>>>>>>> > >>>>>>>>>>>>>>>>>> only 16 GB. We should either guarantee the disk
>>>>>>> size to hold workspaces for
>>>>>>> > >>>>>>>>>>>>>>>>>> all jobs (because eventually, every worker will
>>>>>>> execute each job) or clear
>>>>>>> > >>>>>>>>>>>>>>>>>> also the workspaces in some way.
>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>> Regards,
>>>>>>> > >>>>>>>>>>>>>>>>>> Damian
>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian
>>>>>>> Michels <
>>>>>>> > >>>>>>>>>>>>>>>>>> mxm@apache.org> wrote:
>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>>> +1 for scheduling it via a cron job if it
>>>>>>> won't lead to
>>>>>>> > >>>>>>>>>>>>>>>>>>> test failures
>>>>>>> > >>>>>>>>>>>>>>>>>>> while running. Not a Jenkins expert but maybe
>>>>>>> there is
>>>>>>> > >>>>>>>>>>>>>>>>>>> the notion of
>>>>>>> > >>>>>>>>>>>>>>>>>>> running exclusively while no other tasks are
>>>>>>> running?
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>>> -Max
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>>>>> > >>>>>>>>>>>>>>>>>>> > FYI there was a job introduced to do this in
>>>>>>> Jenkins:
>>>>>>> > >>>>>>>>>>>>>>>>>>> beam_Clean_tmp_directory
>>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>>> > >>>>>>>>>>>>>>>>>>> > Currently it needs to be run manually. I'm
>>>>>>> seeing some
>>>>>>> > >>>>>>>>>>>>>>>>>>> out of disk related errors in precommit tests
>>>>>>> currently, perhaps we should
>>>>>>> > >>>>>>>>>>>>>>>>>>> schedule this job with cron?
>>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>>> > >>>>>>>>>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <
>>>>>>> > >>>>>>>>>>>>>>>>>>> heejong@google.com> wrote:
>>>>>>> > >>>>>>>>>>>>>>>>>>> >> Still seeing no space left on device errors
>>>>>>> on
>>>>>>> > >>>>>>>>>>>>>>>>>>> jenkins-7 (for example:
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>>>>>>> > >>>>>>>>>>>>>>>>>>> )
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>
>>>>>>> > >>>>>>>>>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold
>>>>>>> <
>>>>>>> > >>>>>>>>>>>>>>>>>>> amyrvold@google.com> wrote:
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned
>>>>>>> by jenkins
>>>>>>> > >>>>>>>>>>>>>>>>>>> older than 3 days.
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>> Agree that we need a longer term solution.
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>> Passing recent tests on all executors
>>>>>>> except
>>>>>>> > >>>>>>>>>>>>>>>>>>> jenkins-12, which has not
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>> scheduled recent builds for the past 13
>>>>>>> days. Not
>>>>>>> > >>>>>>>>>>>>>>>>>>> scheduling:
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>> Recent passing builds:
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet
>>>>>>> Altay <
>>>>>>> > >>>>>>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>> +Alan Myrvold <am...@google.com> is
>>>>>>> doing a one
>>>>>>> > >>>>>>>>>>>>>>>>>>> time cleanup. I agree
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>> that we need to have a solution to
>>>>>>> automate this
>>>>>>> > >>>>>>>>>>>>>>>>>>> task or address the root
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>> cause of the buildup.
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał
>>>>>>> Walenia <
>>>>>>> > >>>>>>>>>>>>>>>>>>> michal.walenia@polidea.com>
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>> wrote:
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Hi there,
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins
>>>>>>> workers
>>>>>>> > >>>>>>>>>>>>>>>>>>> again. Nodes 1 and 7
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> both fail jobs with "No space left on
>>>>>>> device".
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Who is the best person to contact in
>>>>>>> these cases
>>>>>>> > >>>>>>>>>>>>>>>>>>> (someone with access
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> permissions to the workers).
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> I also noticed that such errors are
>>>>>>> becoming more
>>>>>>> > >>>>>>>>>>>>>>>>>>> and more frequent
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> recently and I'd like to discuss how can
>>>>>>> this be
>>>>>>> > >>>>>>>>>>>>>>>>>>> remedied. Can a cleanup
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> task be automated on Jenkins somehow?
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Regards
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Michal
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> --
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Michał Walenia
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> |
>>>>>>> Software
>>>>>>> > >>>>>>>>>>>>>>>>>>> Engineer
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> M: +48 791 432 002
>>>>>>> <+48%20791%20432%20002> <+48%20791%20432%20002> <
>>>>>>> > >>>>>>>>>>>>>>>>>>> +48791432002 <+48%20791%20432%20002>
>>>>>>> <+48%20791%20432%20002>>
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> E: michal.walenia@polidea.com
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Unique Tech
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Check out our projects! <
>>>>>>> > >>>>>>>>>>>>>>>>>>> https://www.polidea.com/our-work>
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>
>>>>>>> > >>>>>>>>>>>>>>>>>>> >>
>>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>> >
>>>>>>>
>>>>>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Kenneth Knowles <ke...@apache.org>.
Cool. If it is /home/jenkins it should be just fine. Thanks for checking!

Kenn

On Tue, Jul 28, 2020 at 10:23 AM Damian Gadomski <
damian.gadomski@polidea.com> wrote:

> Sorry, mistake while copying, [1] should be:
> [1]
> https://github.com/apache/beam/blob/8aca8ccc7f1a14516ad769b63845ddd4dc163d92/.test-infra/jenkins/CommonJobProperties.groovy#L63
>
>
> On Tue, Jul 28, 2020 at 7:21 PM Damian Gadomski <
> damian.gadomski@polidea.com> wrote:
>
>> That's interesting. I didn't check that myself but all the Jenkins jobs
>> are configured to wipe the workspace just before the actual build happens
>> [1]
>> <https://github.com/apache/beam/blob/8aca8ccc7f1a14516ad769b63845ddd4dc163d92/.test-infra/jenkins/CommonJobProperties.groovy#L6>.
>> Git SCM plugin is used for that and it enables the option called "Wipe out
>> repository and force clone". Docs state that it "deletes the contents of
>> the workspace before build and before checkout" [2]
>> <https://plugins.jenkins.io/git/>. Therefore I assume that removing
>> workspace just after the build won't change anything.
>>
>> The ./.gradle/caches/modules-2/files-2.1 dir is indeed present on the
>> worker machines but it's rather in /home/jenkins dir.
>>
>> damgad@apache-ci-beam-jenkins-13:/home/jenkins/.gradle$ sudo du -sh
>> 11G .
>> damgad@apache-ci-beam-jenkins-13:/home/jenkins/.gradle$ sudo du -sh
>> caches/modules-2/files-2.1
>> 2.3G caches/modules-2/files-2.1
>>
>> I can't find that directory structure inside workspaces.
>>
>> damgad@apache-ci-beam-jenkins-13:/home/jenkins/jenkins-slave/workspace$
>> sudo find -name "files-2.1"
>> damgad@apache-ci-beam-jenkins-13:/home/jenkins/jenkins-slave/workspace$
>>
>> [1]
>> https://github.com/apache/beam/blob/8aca8ccc7f1a14516ad769b63845ddd4dc163d92/.test-infra/jenkins/CommonJobProperties.groovy#L6
>> [2] https://plugins.jenkins.io/git/
>>
>> On Tue, Jul 28, 2020 at 5:47 PM Kenneth Knowles <ke...@apache.org> wrote:
>>
>>> Just checking - will this wipe out dependency cache? That will slow
>>> things down and significantly increase flakiness. If I recall correctly,
>>> the default Jenkins layout was:
>>>
>>>     /home/jenkins/jenkins-slave/workspace/$jobname
>>>     /home/jenkins/jenkins-slave/workspace/$jobname/.m2
>>>     /home/jenkins/jenkins-slave/workspace/$jobname/.git
>>>
>>> Where you can see that it did a `git clone` right into the root
>>> workspace directory, adjacent to .m2. This was not hygienic. One important
>>> thing was that `git clean` would wipe the maven cache with every build. So
>>> in https://github.com/apache/beam/pull/3976 we changed it to
>>>
>>>     /home/jenkins/jenkins-slave/workspace/$jobname
>>>     /home/jenkins/jenkins-slave/workspace/$jobname/.m2
>>>     /home/jenkins/jenkins-slave/workspace/$jobname/src/.git
>>>
>>> Now the .m2 directory survives and we do not constantly see flakes
>>> re-downloading deps that are immutable. This does, of course, use disk
>>> space.
>>>
>>> That was in the maven days. Gradle is the same except for $HOME/.m2 is
>>> replaced by $HOME/.gradle/caches/modules-2/files-2.1. Is Jenkins configured
>>> the same way so we will be wiping out the dependencies? If so, can you
>>> address this issue? Everything in that directory should be immutable and
>>> just a cache to avoid pointless re-download.
>>>
>>> Kenn
>>>
>>> On Tue, Jul 28, 2020 at 2:25 AM Damian Gadomski <
>>> damian.gadomski@polidea.com> wrote:
>>>
>>>> Agree with Udi, workspaces seem to be the third culprit, not yet
>>>> addressed in any way (until PR#12326
>>>> <https://github.com/apache/beam/pull/12326> is merged). I feel that
>>>> it'll solve the issue of filling up the disks for a long time ;)
>>>>
>>>> I'm also OK with moving /tmp cleanup to option B, and will happily
>>>> investigate on proper TMPDIR config.
>>>>
>>>>
>>>>
>>>> On Tue, Jul 28, 2020 at 3:07 AM Udi Meiri <eh...@google.com> wrote:
>>>>
>>>>> What about the workspaces, which can take up 175GB in some cases (see
>>>>> above)?
>>>>> I'm working on getting them cleaned up automatically:
>>>>> https://github.com/apache/beam/pull/12326
>>>>>
>>>>> My opinion is that we would get more mileage out of fixing the jobs
>>>>> that leave behind files in /tmp and images/containers in Docker.
>>>>> This would also help keep development machines clean.
>>>>>
>>>>>
>>>>> On Mon, Jul 27, 2020 at 5:31 PM Tyson Hamilton <ty...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Here is a summery of how I understand things,
>>>>>>
>>>>>>   - /tmp and /var/lib/docker are the culprit for filling up disks
>>>>>>   - inventory Jenkins job runs every 12 hours and runs a docker prune
>>>>>> to clean up images older than 24hr
>>>>>>   - crontab on each machine cleans up /tmp files older than three
>>>>>> days weekly
>>>>>>
>>>>>> This doesn't seem to be working since we're still running out of disk
>>>>>> periodically and requiring manual intervention. Knobs and options we have
>>>>>> available:
>>>>>>
>>>>>>   1. increase frequency of deleting files
>>>>>>   2. decrease the number of days required to delete a file (e.g.
>>>>>> older than 2 days)
>>>>>>
>>>>>> The execution methods we have available are:
>>>>>>
>>>>>>   A. cron
>>>>>>     - pro: runs even if a job gets stuck in Jenkins due to full disk
>>>>>>     - con: config baked into VM which is tough to update, not
>>>>>> discoverable or documented well
>>>>>>   B. inventory job
>>>>>>     - pro: easy to update, runs every 12h already
>>>>>>     - con: could get stuck if Jenkins agent runs out of disk or is
>>>>>> otherwise stuck, tied to all other inventory job frequency
>>>>>>   C. configure startup scripts for the VMs that set up the cron job
>>>>>> anytime the VM is restarted
>>>>>>     - pro: similar to A. and easy to update
>>>>>>     - con: similar to A.
>>>>>>
>>>>>> Between the three I prefer B. because it is consistent with other
>>>>>> inventory jobs. If it ends up that stuck jobs prohibit scheduling of the
>>>>>> inventory job often we could further investigate C to avoid having to
>>>>>> rebuild the VM images repeatedly.
>>>>>>
>>>>>> Any objections or comments? If not, we'll go forward with B. and
>>>>>> reduce the date check from 3 days to 2 days.
>>>>>>
>>>>>>
>>>>>> On 2020/07/24 20:13:29, Ahmet Altay <al...@google.com> wrote:
>>>>>> > Tests may not be doing docker cleanup. Inventory job runs a docker
>>>>>> prune
>>>>>> > every 12 hours for images older than 24 hrs [1]. Randomly looking
>>>>>> at one of
>>>>>> > the recent runs [2], it cleaned up a long list of containers
>>>>>> consuming
>>>>>> > 30+GB space. That should be just 12 hours worth of containers.
>>>>>> >
>>>>>> > [1]
>>>>>> >
>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L69
>>>>>> > [2]
>>>>>> >
>>>>>> https://ci-beam.apache.org/job/beam_Inventory_apache-beam-jenkins-14/501/console
>>>>>> >
>>>>>> > On Fri, Jul 24, 2020 at 1:07 PM Tyson Hamilton <ty...@google.com>
>>>>>> wrote:
>>>>>> >
>>>>>> > > Yes, these are on the same volume in the /var/lib/docker
>>>>>> directory. I'm
>>>>>> > > unsure if they clean up leftover images.
>>>>>> > >
>>>>>> > > On Fri, Jul 24, 2020 at 12:52 PM Udi Meiri <eh...@google.com>
>>>>>> wrote:
>>>>>> > >
>>>>>> > >> I forgot Docker images:
>>>>>> > >>
>>>>>> > >> ehudm@apache-ci-beam-jenkins-3:~$ sudo docker system df
>>>>>> > >> TYPE                TOTAL               ACTIVE              SIZE
>>>>>> > >>        RECLAIMABLE
>>>>>> > >> Images              88                  9
>>>>>>  125.4GB
>>>>>> > >>       124.2GB (99%)
>>>>>> > >> Containers          40                  4
>>>>>>  7.927GB
>>>>>> > >>       7.871GB (99%)
>>>>>> > >> Local Volumes       47                  0
>>>>>>  3.165GB
>>>>>> > >>       3.165GB (100%)
>>>>>> > >> Build Cache         0                   0                   0B
>>>>>> > >>        0B
>>>>>> > >>
>>>>>> > >> There are about 90 images on that machine, with all but 1 less
>>>>>> than 48
>>>>>> > >> hours old.
>>>>>> > >> I think the docker test jobs need to try harder at cleaning up
>>>>>> their
>>>>>> > >> leftover images. (assuming they're already doing it?)
>>>>>> > >>
>>>>>> > >> On Fri, Jul 24, 2020 at 12:31 PM Udi Meiri <eh...@google.com>
>>>>>> wrote:
>>>>>> > >>
>>>>>> > >>> The additional slots (@3 directories) take up even more space
>>>>>> now than
>>>>>> > >>> before.
>>>>>> > >>>
>>>>>> > >>> I'm testing out https://github.com/apache/beam/pull/12326
>>>>>> which could
>>>>>> > >>> help by cleaning up workspaces after a run (just started a seed
>>>>>> job).
>>>>>> > >>>
>>>>>> > >>> On Fri, Jul 24, 2020 at 12:13 PM Tyson Hamilton <
>>>>>> tysonjh@google.com>
>>>>>> > >>> wrote:
>>>>>> > >>>
>>>>>> > >>>> 664M    beam_PreCommit_JavaPortabilityApi_Commit
>>>>>> > >>>> 656M    beam_PreCommit_JavaPortabilityApi_Commit@2
>>>>>> > >>>> 611M    beam_PreCommit_JavaPortabilityApi_Cron
>>>>>> > >>>> 616M    beam_PreCommit_JavaPortabilityApiJava11_Commit
>>>>>> > >>>> 598M    beam_PreCommit_JavaPortabilityApiJava11_Commit@2
>>>>>> > >>>> 662M    beam_PreCommit_JavaPortabilityApiJava11_Cron
>>>>>> > >>>> 2.9G    beam_PreCommit_Portable_Python_Commit
>>>>>> > >>>> 2.9G    beam_PreCommit_Portable_Python_Commit@2
>>>>>> > >>>> 1.7G    beam_PreCommit_Portable_Python_Commit@3
>>>>>> > >>>> 3.4G    beam_PreCommit_Portable_Python_Cron
>>>>>> > >>>> 1.9G    beam_PreCommit_Python2_PVR_Flink_Commit
>>>>>> > >>>> 1.4G    beam_PreCommit_Python2_PVR_Flink_Cron
>>>>>> > >>>> 1.3G    beam_PreCommit_Python2_PVR_Flink_Phrase
>>>>>> > >>>> 6.2G    beam_PreCommit_Python_Commit
>>>>>> > >>>> 7.5G    beam_PreCommit_Python_Commit@2
>>>>>> > >>>> 7.5G    beam_PreCommit_Python_Cron
>>>>>> > >>>> 1012M   beam_PreCommit_PythonDocker_Commit
>>>>>> > >>>> 1011M   beam_PreCommit_PythonDocker_Commit@2
>>>>>> > >>>> 1011M   beam_PreCommit_PythonDocker_Commit@3
>>>>>> > >>>> 1002M   beam_PreCommit_PythonDocker_Cron
>>>>>> > >>>> 877M    beam_PreCommit_PythonFormatter_Commit
>>>>>> > >>>> 988M    beam_PreCommit_PythonFormatter_Cron
>>>>>> > >>>> 986M    beam_PreCommit_PythonFormatter_Phrase
>>>>>> > >>>> 1.7G    beam_PreCommit_PythonLint_Commit
>>>>>> > >>>> 2.1G    beam_PreCommit_PythonLint_Cron
>>>>>> > >>>> 7.5G    beam_PreCommit_Python_Phrase
>>>>>> > >>>> 346M    beam_PreCommit_RAT_Commit
>>>>>> > >>>> 341M    beam_PreCommit_RAT_Cron
>>>>>> > >>>> 338M    beam_PreCommit_Spotless_Commit
>>>>>> > >>>> 339M    beam_PreCommit_Spotless_Cron
>>>>>> > >>>> 5.5G    beam_PreCommit_SQL_Commit
>>>>>> > >>>> 5.5G    beam_PreCommit_SQL_Cron
>>>>>> > >>>> 5.5G    beam_PreCommit_SQL_Java11_Commit
>>>>>> > >>>> 750M    beam_PreCommit_Website_Commit
>>>>>> > >>>> 750M    beam_PreCommit_Website_Commit@2
>>>>>> > >>>> 750M    beam_PreCommit_Website_Cron
>>>>>> > >>>> 764M    beam_PreCommit_Website_Stage_GCS_Commit
>>>>>> > >>>> 771M    beam_PreCommit_Website_Stage_GCS_Cron
>>>>>> > >>>> 336M    beam_Prober_CommunityMetrics
>>>>>> > >>>> 693M    beam_python_mongoio_load_test
>>>>>> > >>>> 339M    beam_SeedJob
>>>>>> > >>>> 333M    beam_SeedJob_Standalone
>>>>>> > >>>> 334M    beam_sonarqube_report
>>>>>> > >>>> 556M    beam_SQLBigQueryIO_Batch_Performance_Test_Java
>>>>>> > >>>> 175G    total
>>>>>> > >>>>
>>>>>> > >>>> On Fri, Jul 24, 2020 at 12:04 PM Tyson Hamilton <
>>>>>> tysonjh@google.com>
>>>>>> > >>>> wrote:
>>>>>> > >>>>
>>>>>> > >>>>> Ya looks like something in the workspaces is taking up room:
>>>>>> > >>>>>
>>>>>> > >>>>> @apache-ci-beam-jenkins-8:/home/jenkins$ sudo du -shc .
>>>>>> > >>>>> 191G    .
>>>>>> > >>>>> 191G    total
>>>>>> > >>>>>
>>>>>> > >>>>>
>>>>>> > >>>>> On Fri, Jul 24, 2020 at 11:44 AM Tyson Hamilton <
>>>>>> tysonjh@google.com>
>>>>>> > >>>>> wrote:
>>>>>> > >>>>>
>>>>>> > >>>>>> Node 8 is also full. The partition that /tmp is on is here:
>>>>>> > >>>>>>
>>>>>> > >>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>> > >>>>>> /dev/sda1       485G  482G  2.9G 100% /
>>>>>> > >>>>>>
>>>>>> > >>>>>> however after cleaning up tmp with the crontab command,
>>>>>> there is only
>>>>>> > >>>>>> 8G usage yet it still remains 100% full:
>>>>>> > >>>>>>
>>>>>> > >>>>>> @apache-ci-beam-jenkins-8:/tmp$ sudo du -shc /tmp
>>>>>> > >>>>>> 8.0G    /tmp
>>>>>> > >>>>>> 8.0G    total
>>>>>> > >>>>>>
>>>>>> > >>>>>> The workspaces are in the
>>>>>> /home/jenkins/jenkins-slave/workspace
>>>>>> > >>>>>> directory. When I run a du on that, it takes really long.
>>>>>> I'll let it keep
>>>>>> > >>>>>> running for a while to see if it ever returns a result but
>>>>>> so far this
>>>>>> > >>>>>> seems suspect.
>>>>>> > >>>>>>
>>>>>> > >>>>>>
>>>>>> > >>>>>>
>>>>>> > >>>>>>
>>>>>> > >>>>>>
>>>>>> > >>>>>> On Fri, Jul 24, 2020 at 11:19 AM Tyson Hamilton <
>>>>>> tysonjh@google.com>
>>>>>> > >>>>>> wrote:
>>>>>> > >>>>>>
>>>>>> > >>>>>>> Everything I've been looking at is in the /tmp dir. Where
>>>>>> are the
>>>>>> > >>>>>>> workspaces, or what are the named?
>>>>>> > >>>>>>>
>>>>>> > >>>>>>>
>>>>>> > >>>>>>>
>>>>>> > >>>>>>>
>>>>>> > >>>>>>> On Fri, Jul 24, 2020 at 11:03 AM Udi Meiri <
>>>>>> ehudm@google.com> wrote:
>>>>>> > >>>>>>>
>>>>>> > >>>>>>>> I'm curious to what you find. Was it /tmp or the
>>>>>> workspaces using
>>>>>> > >>>>>>>> up the space?
>>>>>> > >>>>>>>>
>>>>>> > >>>>>>>> On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton <
>>>>>> tysonjh@google.com>
>>>>>> > >>>>>>>> wrote:
>>>>>> > >>>>>>>>
>>>>>> > >>>>>>>>> Bleck. I just realized that it is 'offline' so that won't
>>>>>> work.
>>>>>> > >>>>>>>>> I'll clean up manually on the machine using the cron
>>>>>> command.
>>>>>> > >>>>>>>>>
>>>>>> > >>>>>>>>> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <
>>>>>> > >>>>>>>>> tysonjh@google.com> wrote:
>>>>>> > >>>>>>>>>
>>>>>> > >>>>>>>>>> Something isn't working with the current set up because
>>>>>> node 15
>>>>>> > >>>>>>>>>> appears to be out of space and is currently 'offline'
>>>>>> according to Jenkins.
>>>>>> > >>>>>>>>>> Can someone run the cleanup job? The machine is full,
>>>>>> > >>>>>>>>>>
>>>>>> > >>>>>>>>>> @apache-ci-beam-jenkins-15:/tmp$ df -h
>>>>>> > >>>>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>> > >>>>>>>>>> udev             52G     0   52G   0% /dev
>>>>>> > >>>>>>>>>> tmpfs            11G  265M   10G   3% /run
>>>>>> > >>>>>>>>>> */dev/sda1       485G  484G  880M 100% /*
>>>>>> > >>>>>>>>>> tmpfs            52G     0   52G   0% /dev/shm
>>>>>> > >>>>>>>>>> tmpfs           5.0M     0  5.0M   0% /run/lock
>>>>>> > >>>>>>>>>> tmpfs            52G     0   52G   0% /sys/fs/cgroup
>>>>>> > >>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1017
>>>>>> > >>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1037
>>>>>> > >>>>>>>>>>
>>>>>> > >>>>>>>>>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . |
>>>>>> sort -rhk
>>>>>> > >>>>>>>>>> 1,1 | head -n 20
>>>>>> > >>>>>>>>>> 20G     2020-07-24 17:52        .
>>>>>> > >>>>>>>>>> 580M    2020-07-22 17:31
>>>>>> ./junit1031982597110125586
>>>>>> > >>>>>>>>>> 517M    2020-07-22 17:31
>>>>>> > >>>>>>>>>>
>>>>>> ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
>>>>>> > >>>>>>>>>> 517M    2020-07-22 17:31
>>>>>> > >>>>>>>>>>  ./junit1031982597110125586/junit8739924829337821410
>>>>>> > >>>>>>>>>> 263M    2020-07-22 12:23        ./pip-install-2GUhO_
>>>>>> > >>>>>>>>>> 263M    2020-07-20 09:30        ./pip-install-sxgwqr
>>>>>> > >>>>>>>>>> 263M    2020-07-17 13:56        ./pip-install-bWSKIV
>>>>>> > >>>>>>>>>> 242M    2020-07-21 20:25
>>>>>> ./beam-pipeline-tempmByU6T
>>>>>> > >>>>>>>>>> 242M    2020-07-21 20:21
>>>>>> ./beam-pipeline-tempV85xeK
>>>>>> > >>>>>>>>>> 242M    2020-07-21 20:15
>>>>>> ./beam-pipeline-temp7dJROJ
>>>>>> > >>>>>>>>>> 236M    2020-07-21 20:25
>>>>>> > >>>>>>>>>>  ./beam-pipeline-tempmByU6T/tmpOWj3Yr
>>>>>> > >>>>>>>>>> 236M    2020-07-21 20:21
>>>>>> > >>>>>>>>>>  ./beam-pipeline-tempV85xeK/tmppbQHB3
>>>>>> > >>>>>>>>>> 236M    2020-07-21 20:15
>>>>>> > >>>>>>>>>>  ./beam-pipeline-temp7dJROJ/tmpgOXPKW
>>>>>> > >>>>>>>>>> 111M    2020-07-23 00:57        ./pip-install-1JnyNE
>>>>>> > >>>>>>>>>> 105M    2020-07-23 00:17
>>>>>> ./beam-artifact1374651823280819755
>>>>>> > >>>>>>>>>> 105M    2020-07-23 00:16
>>>>>> ./beam-artifact5050755582921936972
>>>>>> > >>>>>>>>>> 105M    2020-07-23 00:16
>>>>>> ./beam-artifact1834064452502646289
>>>>>> > >>>>>>>>>> 105M    2020-07-23 00:15
>>>>>> ./beam-artifact682561790267074916
>>>>>> > >>>>>>>>>> 105M    2020-07-23 00:15
>>>>>> ./beam-artifact4691304965824489394
>>>>>> > >>>>>>>>>> 105M    2020-07-23 00:14
>>>>>> ./beam-artifact4050383819822604421
>>>>>> > >>>>>>>>>>
>>>>>> > >>>>>>>>>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <
>>>>>> > >>>>>>>>>> robertwb@google.com> wrote:
>>>>>> > >>>>>>>>>>
>>>>>> > >>>>>>>>>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <
>>>>>> > >>>>>>>>>>> tysonjh@google.com> wrote:
>>>>>> > >>>>>>>>>>>
>>>>>> > >>>>>>>>>>>> Ah I see, thanks Kenn. I found some advice from the
>>>>>> Apache
>>>>>> > >>>>>>>>>>>> infra wiki that also suggests using a tmpdir inside
>>>>>> the workspace [1]:
>>>>>> > >>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>> Procedures Projects can take to clean up disk space
>>>>>> > >>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>> Projects can help themselves and Infra by taking some
>>>>>> basic
>>>>>> > >>>>>>>>>>>> steps to help clean up their jobs after themselves on
>>>>>> the build nodes.
>>>>>> > >>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>    1. Use a ./tmp dir in your jobs workspace. That way
>>>>>> it gets
>>>>>> > >>>>>>>>>>>>    cleaned up when job workspaces expire.
>>>>>> > >>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>
>>>>>> > >>>>>>>>>>> Tests should be (able to be) written to use the standard
>>>>>> > >>>>>>>>>>> temporary file mechanisms, and the environment set up
>>>>>> on Jenkins such that
>>>>>> > >>>>>>>>>>> that falls into the respective workspaces. Ideally this
>>>>>> should be as simple
>>>>>> > >>>>>>>>>>> as setting the TMPDIR (or similar) environment variable
>>>>>> (and making sure it
>>>>>> > >>>>>>>>>>> exists/is writable).
>>>>>> > >>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>    1. Configure your jobs to wipe workspaces on start
>>>>>> or
>>>>>> > >>>>>>>>>>>>    finish.
>>>>>> > >>>>>>>>>>>>    2. Configure your jobs to only keep 5 or 10
>>>>>> previous builds.
>>>>>> > >>>>>>>>>>>>    3. Configure your jobs to only keep 5 or 10 previous
>>>>>> > >>>>>>>>>>>>    artifacts.
>>>>>> > >>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>> [1]:
>>>>>> > >>>>>>>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
>>>>>> > >>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <
>>>>>> > >>>>>>>>>>>> kenn@apache.org> wrote:
>>>>>> > >>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>> Those file listings look like the result of using
>>>>>> standard
>>>>>> > >>>>>>>>>>>>> temp file APIs but with TMPDIR set to /tmp.
>>>>>> > >>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <
>>>>>> > >>>>>>>>>>>>> tysonjh@google.com> wrote:
>>>>>> > >>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>> Jobs are hermetic as far as I can tell and use unique
>>>>>> > >>>>>>>>>>>>>> subdirectories inside of /tmp. Here is a quick look
>>>>>> into two examples:
>>>>>> > >>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time .
>>>>>> | sort
>>>>>> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20
>>>>>> > >>>>>>>>>>>>>> 1.6G    2020-07-21 02:25        .
>>>>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:48
>>>>>> ./beam-pipeline-temp3ybuY4
>>>>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:46
>>>>>> ./beam-pipeline-tempuxjiPT
>>>>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:44
>>>>>> ./beam-pipeline-tempVpg1ME
>>>>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:42
>>>>>> ./beam-pipeline-tempJ4EpyB
>>>>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:39
>>>>>> ./beam-pipeline-tempepea7Q
>>>>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:35
>>>>>> ./beam-pipeline-temp79qot2
>>>>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:48
>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>>>>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:46
>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>>>>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:44
>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>>>>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:42
>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>>>>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:39
>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmpWy1vWX
>>>>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:35
>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmpvN7vWA
>>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:48
>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmprlh_di
>>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:46
>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:44
>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:42
>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:39
>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmptYF1v1
>>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:35
>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmplfV0Rg
>>>>>> > >>>>>>>>>>>>>> 2.7M    2020-07-17 20:10
>>>>>> ./pip-install-q9l227ef
>>>>>> > >>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time
>>>>>> . | sort
>>>>>> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20
>>>>>> > >>>>>>>>>>>>>> 817M    2020-07-21 02:26        .
>>>>>> > >>>>>>>>>>>>>> 242M    2020-07-19 12:14
>>>>>> ./beam-pipeline-tempUTXqlM
>>>>>> > >>>>>>>>>>>>>> 242M    2020-07-19 12:11
>>>>>> ./beam-pipeline-tempx3Yno3
>>>>>> > >>>>>>>>>>>>>> 242M    2020-07-19 12:05
>>>>>> ./beam-pipeline-tempyCrMYq
>>>>>> > >>>>>>>>>>>>>> 236M    2020-07-19 12:14
>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpstXoL0
>>>>>> > >>>>>>>>>>>>>> 236M    2020-07-19 12:11
>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpnnVn65
>>>>>> > >>>>>>>>>>>>>> 236M    2020-07-19 12:05
>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpRF0iNs
>>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:14
>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:11
>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpsmmzqe
>>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:05
>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>>>>>> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:14
>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpoj3orz
>>>>>> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:11
>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmptng9sZ
>>>>>> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:05
>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpWp6njc
>>>>>> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:14
>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmphgdj35
>>>>>> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:11
>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmp8ySXpm
>>>>>> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:05
>>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>>>>>> > >>>>>>>>>>>>>> 992K    2020-07-12 12:00
>>>>>> ./junit642086915811430564
>>>>>> > >>>>>>>>>>>>>> 988K    2020-07-12 12:00
>>>>>> ./junit642086915811430564/beam
>>>>>> > >>>>>>>>>>>>>> 984K    2020-07-12 12:00
>>>>>> > >>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes
>>>>>> > >>>>>>>>>>>>>> 980K    2020-07-12 12:00
>>>>>> > >>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes/0
>>>>>> > >>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <
>>>>>> ehudm@google.com>
>>>>>> > >>>>>>>>>>>>>> wrote:
>>>>>> > >>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>> You're right, job workspaces should be hermetic.
>>>>>> > >>>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <
>>>>>> > >>>>>>>>>>>>>>> kenn@apache.org> wrote:
>>>>>> > >>>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>>> I'm probably late to this discussion and missing
>>>>>> something,
>>>>>> > >>>>>>>>>>>>>>>> but why are we writing to /tmp at all? I would
>>>>>> expect TMPDIR to point
>>>>>> > >>>>>>>>>>>>>>>> somewhere inside the job directory that will be
>>>>>> wiped by Jenkins, and I
>>>>>> > >>>>>>>>>>>>>>>> would expect code to always create temp files via
>>>>>> APIs that respect this.
>>>>>> > >>>>>>>>>>>>>>>> Is Jenkins not cleaning up? Do we not have the
>>>>>> ability to set this up? Do
>>>>>> > >>>>>>>>>>>>>>>> we have bugs in our code (that we could probably
>>>>>> find by setting TMPDIR to
>>>>>> > >>>>>>>>>>>>>>>> somewhere not-/tmp and running the tests without
>>>>>> write permission to /tmp,
>>>>>> > >>>>>>>>>>>>>>>> etc)
>>>>>> > >>>>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>>> Kenn
>>>>>> > >>>>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <
>>>>>> > >>>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>> > >>>>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>>>> Related to workspace directory growth, +Udi Meiri
>>>>>> > >>>>>>>>>>>>>>>>> <eh...@google.com> filed a relevant issue
>>>>>> previously (
>>>>>> > >>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865)
>>>>>> for
>>>>>> > >>>>>>>>>>>>>>>>> cleaning up workspace directory after successful
>>>>>> jobs. Alternatively, we
>>>>>> > >>>>>>>>>>>>>>>>> can consider periodically cleaning up the /src
>>>>>> directories.
>>>>>> > >>>>>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>>>> I would suggest moving the cron task from
>>>>>> internal cron
>>>>>> > >>>>>>>>>>>>>>>>> scripts to the inventory job (
>>>>>> > >>>>>>>>>>>>>>>>>
>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51
>>>>>> ).
>>>>>> > >>>>>>>>>>>>>>>>> That way, we can see all the cron jobs as part of
>>>>>> the source tree, adjust
>>>>>> > >>>>>>>>>>>>>>>>> frequencies and clean up codes with PRs. I do not
>>>>>> know how internal cron
>>>>>> > >>>>>>>>>>>>>>>>> scripts are created, maintained, and how would
>>>>>> they be recreated for new
>>>>>> > >>>>>>>>>>>>>>>>> worker instances.
>>>>>> > >>>>>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>>>> /cc +Tyson Hamilton <ty...@google.com>
>>>>>> > >>>>>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>>>>>> > >>>>>>>>>>>>>>>>> damian.gadomski@polidea.com> wrote:
>>>>>> > >>>>>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>>>>> Hey,
>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>>>>> I've recently created a solution for the growing
>>>>>> /tmp
>>>>>> > >>>>>>>>>>>>>>>>>> directory. Part of it is the job mentioned by
>>>>>> Tyson:
>>>>>> > >>>>>>>>>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally
>>>>>> not
>>>>>> > >>>>>>>>>>>>>>>>>> triggered by cron and should be a last resort
>>>>>> solution for some strange
>>>>>> > >>>>>>>>>>>>>>>>>> cases.
>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>>>>> Along with that job, I've also updated every
>>>>>> worker with
>>>>>> > >>>>>>>>>>>>>>>>>> an internal cron script. It's being executed
>>>>>> once a week and deletes all
>>>>>> > >>>>>>>>>>>>>>>>>> the files (and only files) that were not
>>>>>> accessed for at least three days.
>>>>>> > >>>>>>>>>>>>>>>>>> That's designed to be as safe as possible for
>>>>>> the running jobs on the
>>>>>> > >>>>>>>>>>>>>>>>>> worker (not to delete the files that are still
>>>>>> in use), and also to be
>>>>>> > >>>>>>>>>>>>>>>>>> insensitive to the current workload on the
>>>>>> machine. The cleanup will always
>>>>>> > >>>>>>>>>>>>>>>>>> happen, even if some long-running/stuck jobs are
>>>>>> blocking the machine.
>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>>>>> I also think that currently the "No space left"
>>>>>> errors
>>>>>> > >>>>>>>>>>>>>>>>>> may be a consequence of growing workspace
>>>>>> directory rather than /tmp. I
>>>>>> > >>>>>>>>>>>>>>>>>> didn't do any detailed analysis but e.g.
>>>>>> currently, on
>>>>>> > >>>>>>>>>>>>>>>>>> apache-beam-jenkins-7 the workspace directory
>>>>>> size is 158 GB while /tmp is
>>>>>> > >>>>>>>>>>>>>>>>>> only 16 GB. We should either guarantee the disk
>>>>>> size to hold workspaces for
>>>>>> > >>>>>>>>>>>>>>>>>> all jobs (because eventually, every worker will
>>>>>> execute each job) or clear
>>>>>> > >>>>>>>>>>>>>>>>>> also the workspaces in some way.
>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>>>>> Regards,
>>>>>> > >>>>>>>>>>>>>>>>>> Damian
>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian
>>>>>> Michels <
>>>>>> > >>>>>>>>>>>>>>>>>> mxm@apache.org> wrote:
>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>>>>>> +1 for scheduling it via a cron job if it won't
>>>>>> lead to
>>>>>> > >>>>>>>>>>>>>>>>>>> test failures
>>>>>> > >>>>>>>>>>>>>>>>>>> while running. Not a Jenkins expert but maybe
>>>>>> there is
>>>>>> > >>>>>>>>>>>>>>>>>>> the notion of
>>>>>> > >>>>>>>>>>>>>>>>>>> running exclusively while no other tasks are
>>>>>> running?
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>>>>>> -Max
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>>>> > >>>>>>>>>>>>>>>>>>> > FYI there was a job introduced to do this in
>>>>>> Jenkins:
>>>>>> > >>>>>>>>>>>>>>>>>>> beam_Clean_tmp_directory
>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>> > >>>>>>>>>>>>>>>>>>> > Currently it needs to be run manually. I'm
>>>>>> seeing some
>>>>>> > >>>>>>>>>>>>>>>>>>> out of disk related errors in precommit tests
>>>>>> currently, perhaps we should
>>>>>> > >>>>>>>>>>>>>>>>>>> schedule this job with cron?
>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>> > >>>>>>>>>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <
>>>>>> > >>>>>>>>>>>>>>>>>>> heejong@google.com> wrote:
>>>>>> > >>>>>>>>>>>>>>>>>>> >> Still seeing no space left on device errors
>>>>>> on
>>>>>> > >>>>>>>>>>>>>>>>>>> jenkins-7 (for example:
>>>>>> > >>>>>>>>>>>>>>>>>>> >>
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>>>>>> > >>>>>>>>>>>>>>>>>>> )
>>>>>> > >>>>>>>>>>>>>>>>>>> >>
>>>>>> > >>>>>>>>>>>>>>>>>>> >>
>>>>>> > >>>>>>>>>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <
>>>>>> > >>>>>>>>>>>>>>>>>>> amyrvold@google.com> wrote:
>>>>>> > >>>>>>>>>>>>>>>>>>> >>
>>>>>> > >>>>>>>>>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned
>>>>>> by jenkins
>>>>>> > >>>>>>>>>>>>>>>>>>> older than 3 days.
>>>>>> > >>>>>>>>>>>>>>>>>>> >>> Agree that we need a longer term solution.
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>> > >>>>>>>>>>>>>>>>>>> >>> Passing recent tests on all executors except
>>>>>> > >>>>>>>>>>>>>>>>>>> jenkins-12, which has not
>>>>>> > >>>>>>>>>>>>>>>>>>> >>> scheduled recent builds for the past 13
>>>>>> days. Not
>>>>>> > >>>>>>>>>>>>>>>>>>> scheduling:
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>> > >>>>>>>>>>>>>>>>>>> >>> Recent passing builds:
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>> > >>>>>>>>>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay
>>>>>> <
>>>>>> > >>>>>>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>> +Alan Myrvold <am...@google.com> is
>>>>>> doing a one
>>>>>> > >>>>>>>>>>>>>>>>>>> time cleanup. I agree
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>> that we need to have a solution to
>>>>>> automate this
>>>>>> > >>>>>>>>>>>>>>>>>>> task or address the root
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>> cause of the buildup.
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał
>>>>>> Walenia <
>>>>>> > >>>>>>>>>>>>>>>>>>> michal.walenia@polidea.com>
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>> wrote:
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Hi there,
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins
>>>>>> workers
>>>>>> > >>>>>>>>>>>>>>>>>>> again. Nodes 1 and 7
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> both fail jobs with "No space left on
>>>>>> device".
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Who is the best person to contact in
>>>>>> these cases
>>>>>> > >>>>>>>>>>>>>>>>>>> (someone with access
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> permissions to the workers).
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> I also noticed that such errors are
>>>>>> becoming more
>>>>>> > >>>>>>>>>>>>>>>>>>> and more frequent
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> recently and I'd like to discuss how can
>>>>>> this be
>>>>>> > >>>>>>>>>>>>>>>>>>> remedied. Can a cleanup
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> task be automated on Jenkins somehow?
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Regards
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Michal
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> --
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Michał Walenia
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> |
>>>>>> Software
>>>>>> > >>>>>>>>>>>>>>>>>>> Engineer
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> M: +48 791 432 002
>>>>>> <+48%20791%20432%20002> <+48%20791%20432%20002> <
>>>>>> > >>>>>>>>>>>>>>>>>>> +48791432002 <+48%20791%20432%20002>
>>>>>> <+48%20791%20432%20002>>
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> E: michal.walenia@polidea.com
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Unique Tech
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Check out our projects! <
>>>>>> > >>>>>>>>>>>>>>>>>>> https://www.polidea.com/our-work>
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>
>>>>>> > >>>>>>>>>>>>>>>>>>> >>
>>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>> >
>>>>>>
>>>>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Damian Gadomski <da...@polidea.com>.
Sorry, mistake while copying, [1] should be:
[1]
https://github.com/apache/beam/blob/8aca8ccc7f1a14516ad769b63845ddd4dc163d92/.test-infra/jenkins/CommonJobProperties.groovy#L63


On Tue, Jul 28, 2020 at 7:21 PM Damian Gadomski <da...@polidea.com>
wrote:

> That's interesting. I didn't check that myself but all the Jenkins jobs
> are configured to wipe the workspace just before the actual build happens
> [1]
> <https://github.com/apache/beam/blob/8aca8ccc7f1a14516ad769b63845ddd4dc163d92/.test-infra/jenkins/CommonJobProperties.groovy#L6>.
> Git SCM plugin is used for that and it enables the option called "Wipe out
> repository and force clone". Docs state that it "deletes the contents of
> the workspace before build and before checkout" [2]
> <https://plugins.jenkins.io/git/>. Therefore I assume that removing
> workspace just after the build won't change anything.
>
> The ./.gradle/caches/modules-2/files-2.1 dir is indeed present on the
> worker machines but it's rather in /home/jenkins dir.
>
> damgad@apache-ci-beam-jenkins-13:/home/jenkins/.gradle$ sudo du -sh
> 11G .
> damgad@apache-ci-beam-jenkins-13:/home/jenkins/.gradle$ sudo du -sh
> caches/modules-2/files-2.1
> 2.3G caches/modules-2/files-2.1
>
> I can't find that directory structure inside workspaces.
>
> damgad@apache-ci-beam-jenkins-13:/home/jenkins/jenkins-slave/workspace$
> sudo find -name "files-2.1"
> damgad@apache-ci-beam-jenkins-13:/home/jenkins/jenkins-slave/workspace$
>
> [1]
> https://github.com/apache/beam/blob/8aca8ccc7f1a14516ad769b63845ddd4dc163d92/.test-infra/jenkins/CommonJobProperties.groovy#L6
> [2] https://plugins.jenkins.io/git/
>
> On Tue, Jul 28, 2020 at 5:47 PM Kenneth Knowles <ke...@apache.org> wrote:
>
>> Just checking - will this wipe out dependency cache? That will slow
>> things down and significantly increase flakiness. If I recall correctly,
>> the default Jenkins layout was:
>>
>>     /home/jenkins/jenkins-slave/workspace/$jobname
>>     /home/jenkins/jenkins-slave/workspace/$jobname/.m2
>>     /home/jenkins/jenkins-slave/workspace/$jobname/.git
>>
>> Where you can see that it did a `git clone` right into the root workspace
>> directory, adjacent to .m2. This was not hygienic. One important thing was
>> that `git clean` would wipe the maven cache with every build. So in
>> https://github.com/apache/beam/pull/3976 we changed it to
>>
>>     /home/jenkins/jenkins-slave/workspace/$jobname
>>     /home/jenkins/jenkins-slave/workspace/$jobname/.m2
>>     /home/jenkins/jenkins-slave/workspace/$jobname/src/.git
>>
>> Now the .m2 directory survives and we do not constantly see flakes
>> re-downloading deps that are immutable. This does, of course, use disk
>> space.
>>
>> That was in the maven days. Gradle is the same except for $HOME/.m2 is
>> replaced by $HOME/.gradle/caches/modules-2/files-2.1. Is Jenkins configured
>> the same way so we will be wiping out the dependencies? If so, can you
>> address this issue? Everything in that directory should be immutable and
>> just a cache to avoid pointless re-download.
>>
>> Kenn
>>
>> On Tue, Jul 28, 2020 at 2:25 AM Damian Gadomski <
>> damian.gadomski@polidea.com> wrote:
>>
>>> Agree with Udi, workspaces seem to be the third culprit, not yet
>>> addressed in any way (until PR#12326
>>> <https://github.com/apache/beam/pull/12326> is merged). I feel that
>>> it'll solve the issue of filling up the disks for a long time ;)
>>>
>>> I'm also OK with moving /tmp cleanup to option B, and will happily
>>> investigate on proper TMPDIR config.
>>>
>>>
>>>
>>> On Tue, Jul 28, 2020 at 3:07 AM Udi Meiri <eh...@google.com> wrote:
>>>
>>>> What about the workspaces, which can take up 175GB in some cases (see
>>>> above)?
>>>> I'm working on getting them cleaned up automatically:
>>>> https://github.com/apache/beam/pull/12326
>>>>
>>>> My opinion is that we would get more mileage out of fixing the jobs
>>>> that leave behind files in /tmp and images/containers in Docker.
>>>> This would also help keep development machines clean.
>>>>
>>>>
>>>> On Mon, Jul 27, 2020 at 5:31 PM Tyson Hamilton <ty...@google.com>
>>>> wrote:
>>>>
>>>>> Here is a summery of how I understand things,
>>>>>
>>>>>   - /tmp and /var/lib/docker are the culprit for filling up disks
>>>>>   - inventory Jenkins job runs every 12 hours and runs a docker prune
>>>>> to clean up images older than 24hr
>>>>>   - crontab on each machine cleans up /tmp files older than three days
>>>>> weekly
>>>>>
>>>>> This doesn't seem to be working since we're still running out of disk
>>>>> periodically and requiring manual intervention. Knobs and options we have
>>>>> available:
>>>>>
>>>>>   1. increase frequency of deleting files
>>>>>   2. decrease the number of days required to delete a file (e.g. older
>>>>> than 2 days)
>>>>>
>>>>> The execution methods we have available are:
>>>>>
>>>>>   A. cron
>>>>>     - pro: runs even if a job gets stuck in Jenkins due to full disk
>>>>>     - con: config baked into VM which is tough to update, not
>>>>> discoverable or documented well
>>>>>   B. inventory job
>>>>>     - pro: easy to update, runs every 12h already
>>>>>     - con: could get stuck if Jenkins agent runs out of disk or is
>>>>> otherwise stuck, tied to all other inventory job frequency
>>>>>   C. configure startup scripts for the VMs that set up the cron job
>>>>> anytime the VM is restarted
>>>>>     - pro: similar to A. and easy to update
>>>>>     - con: similar to A.
>>>>>
>>>>> Between the three I prefer B. because it is consistent with other
>>>>> inventory jobs. If it ends up that stuck jobs prohibit scheduling of the
>>>>> inventory job often we could further investigate C to avoid having to
>>>>> rebuild the VM images repeatedly.
>>>>>
>>>>> Any objections or comments? If not, we'll go forward with B. and
>>>>> reduce the date check from 3 days to 2 days.
>>>>>
>>>>>
>>>>> On 2020/07/24 20:13:29, Ahmet Altay <al...@google.com> wrote:
>>>>> > Tests may not be doing docker cleanup. Inventory job runs a docker
>>>>> prune
>>>>> > every 12 hours for images older than 24 hrs [1]. Randomly looking at
>>>>> one of
>>>>> > the recent runs [2], it cleaned up a long list of containers
>>>>> consuming
>>>>> > 30+GB space. That should be just 12 hours worth of containers.
>>>>> >
>>>>> > [1]
>>>>> >
>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L69
>>>>> > [2]
>>>>> >
>>>>> https://ci-beam.apache.org/job/beam_Inventory_apache-beam-jenkins-14/501/console
>>>>> >
>>>>> > On Fri, Jul 24, 2020 at 1:07 PM Tyson Hamilton <ty...@google.com>
>>>>> wrote:
>>>>> >
>>>>> > > Yes, these are on the same volume in the /var/lib/docker
>>>>> directory. I'm
>>>>> > > unsure if they clean up leftover images.
>>>>> > >
>>>>> > > On Fri, Jul 24, 2020 at 12:52 PM Udi Meiri <eh...@google.com>
>>>>> wrote:
>>>>> > >
>>>>> > >> I forgot Docker images:
>>>>> > >>
>>>>> > >> ehudm@apache-ci-beam-jenkins-3:~$ sudo docker system df
>>>>> > >> TYPE                TOTAL               ACTIVE              SIZE
>>>>> > >>        RECLAIMABLE
>>>>> > >> Images              88                  9
>>>>>  125.4GB
>>>>> > >>       124.2GB (99%)
>>>>> > >> Containers          40                  4
>>>>>  7.927GB
>>>>> > >>       7.871GB (99%)
>>>>> > >> Local Volumes       47                  0
>>>>>  3.165GB
>>>>> > >>       3.165GB (100%)
>>>>> > >> Build Cache         0                   0                   0B
>>>>> > >>        0B
>>>>> > >>
>>>>> > >> There are about 90 images on that machine, with all but 1 less
>>>>> than 48
>>>>> > >> hours old.
>>>>> > >> I think the docker test jobs need to try harder at cleaning up
>>>>> their
>>>>> > >> leftover images. (assuming they're already doing it?)
>>>>> > >>
>>>>> > >> On Fri, Jul 24, 2020 at 12:31 PM Udi Meiri <eh...@google.com>
>>>>> wrote:
>>>>> > >>
>>>>> > >>> The additional slots (@3 directories) take up even more space
>>>>> now than
>>>>> > >>> before.
>>>>> > >>>
>>>>> > >>> I'm testing out https://github.com/apache/beam/pull/12326 which
>>>>> could
>>>>> > >>> help by cleaning up workspaces after a run (just started a seed
>>>>> job).
>>>>> > >>>
>>>>> > >>> On Fri, Jul 24, 2020 at 12:13 PM Tyson Hamilton <
>>>>> tysonjh@google.com>
>>>>> > >>> wrote:
>>>>> > >>>
>>>>> > >>>> 664M    beam_PreCommit_JavaPortabilityApi_Commit
>>>>> > >>>> 656M    beam_PreCommit_JavaPortabilityApi_Commit@2
>>>>> > >>>> 611M    beam_PreCommit_JavaPortabilityApi_Cron
>>>>> > >>>> 616M    beam_PreCommit_JavaPortabilityApiJava11_Commit
>>>>> > >>>> 598M    beam_PreCommit_JavaPortabilityApiJava11_Commit@2
>>>>> > >>>> 662M    beam_PreCommit_JavaPortabilityApiJava11_Cron
>>>>> > >>>> 2.9G    beam_PreCommit_Portable_Python_Commit
>>>>> > >>>> 2.9G    beam_PreCommit_Portable_Python_Commit@2
>>>>> > >>>> 1.7G    beam_PreCommit_Portable_Python_Commit@3
>>>>> > >>>> 3.4G    beam_PreCommit_Portable_Python_Cron
>>>>> > >>>> 1.9G    beam_PreCommit_Python2_PVR_Flink_Commit
>>>>> > >>>> 1.4G    beam_PreCommit_Python2_PVR_Flink_Cron
>>>>> > >>>> 1.3G    beam_PreCommit_Python2_PVR_Flink_Phrase
>>>>> > >>>> 6.2G    beam_PreCommit_Python_Commit
>>>>> > >>>> 7.5G    beam_PreCommit_Python_Commit@2
>>>>> > >>>> 7.5G    beam_PreCommit_Python_Cron
>>>>> > >>>> 1012M   beam_PreCommit_PythonDocker_Commit
>>>>> > >>>> 1011M   beam_PreCommit_PythonDocker_Commit@2
>>>>> > >>>> 1011M   beam_PreCommit_PythonDocker_Commit@3
>>>>> > >>>> 1002M   beam_PreCommit_PythonDocker_Cron
>>>>> > >>>> 877M    beam_PreCommit_PythonFormatter_Commit
>>>>> > >>>> 988M    beam_PreCommit_PythonFormatter_Cron
>>>>> > >>>> 986M    beam_PreCommit_PythonFormatter_Phrase
>>>>> > >>>> 1.7G    beam_PreCommit_PythonLint_Commit
>>>>> > >>>> 2.1G    beam_PreCommit_PythonLint_Cron
>>>>> > >>>> 7.5G    beam_PreCommit_Python_Phrase
>>>>> > >>>> 346M    beam_PreCommit_RAT_Commit
>>>>> > >>>> 341M    beam_PreCommit_RAT_Cron
>>>>> > >>>> 338M    beam_PreCommit_Spotless_Commit
>>>>> > >>>> 339M    beam_PreCommit_Spotless_Cron
>>>>> > >>>> 5.5G    beam_PreCommit_SQL_Commit
>>>>> > >>>> 5.5G    beam_PreCommit_SQL_Cron
>>>>> > >>>> 5.5G    beam_PreCommit_SQL_Java11_Commit
>>>>> > >>>> 750M    beam_PreCommit_Website_Commit
>>>>> > >>>> 750M    beam_PreCommit_Website_Commit@2
>>>>> > >>>> 750M    beam_PreCommit_Website_Cron
>>>>> > >>>> 764M    beam_PreCommit_Website_Stage_GCS_Commit
>>>>> > >>>> 771M    beam_PreCommit_Website_Stage_GCS_Cron
>>>>> > >>>> 336M    beam_Prober_CommunityMetrics
>>>>> > >>>> 693M    beam_python_mongoio_load_test
>>>>> > >>>> 339M    beam_SeedJob
>>>>> > >>>> 333M    beam_SeedJob_Standalone
>>>>> > >>>> 334M    beam_sonarqube_report
>>>>> > >>>> 556M    beam_SQLBigQueryIO_Batch_Performance_Test_Java
>>>>> > >>>> 175G    total
>>>>> > >>>>
>>>>> > >>>> On Fri, Jul 24, 2020 at 12:04 PM Tyson Hamilton <
>>>>> tysonjh@google.com>
>>>>> > >>>> wrote:
>>>>> > >>>>
>>>>> > >>>>> Ya looks like something in the workspaces is taking up room:
>>>>> > >>>>>
>>>>> > >>>>> @apache-ci-beam-jenkins-8:/home/jenkins$ sudo du -shc .
>>>>> > >>>>> 191G    .
>>>>> > >>>>> 191G    total
>>>>> > >>>>>
>>>>> > >>>>>
>>>>> > >>>>> On Fri, Jul 24, 2020 at 11:44 AM Tyson Hamilton <
>>>>> tysonjh@google.com>
>>>>> > >>>>> wrote:
>>>>> > >>>>>
>>>>> > >>>>>> Node 8 is also full. The partition that /tmp is on is here:
>>>>> > >>>>>>
>>>>> > >>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>> > >>>>>> /dev/sda1       485G  482G  2.9G 100% /
>>>>> > >>>>>>
>>>>> > >>>>>> however after cleaning up tmp with the crontab command, there
>>>>> is only
>>>>> > >>>>>> 8G usage yet it still remains 100% full:
>>>>> > >>>>>>
>>>>> > >>>>>> @apache-ci-beam-jenkins-8:/tmp$ sudo du -shc /tmp
>>>>> > >>>>>> 8.0G    /tmp
>>>>> > >>>>>> 8.0G    total
>>>>> > >>>>>>
>>>>> > >>>>>> The workspaces are in the
>>>>> /home/jenkins/jenkins-slave/workspace
>>>>> > >>>>>> directory. When I run a du on that, it takes really long.
>>>>> I'll let it keep
>>>>> > >>>>>> running for a while to see if it ever returns a result but so
>>>>> far this
>>>>> > >>>>>> seems suspect.
>>>>> > >>>>>>
>>>>> > >>>>>>
>>>>> > >>>>>>
>>>>> > >>>>>>
>>>>> > >>>>>>
>>>>> > >>>>>> On Fri, Jul 24, 2020 at 11:19 AM Tyson Hamilton <
>>>>> tysonjh@google.com>
>>>>> > >>>>>> wrote:
>>>>> > >>>>>>
>>>>> > >>>>>>> Everything I've been looking at is in the /tmp dir. Where
>>>>> are the
>>>>> > >>>>>>> workspaces, or what are the named?
>>>>> > >>>>>>>
>>>>> > >>>>>>>
>>>>> > >>>>>>>
>>>>> > >>>>>>>
>>>>> > >>>>>>> On Fri, Jul 24, 2020 at 11:03 AM Udi Meiri <eh...@google.com>
>>>>> wrote:
>>>>> > >>>>>>>
>>>>> > >>>>>>>> I'm curious to what you find. Was it /tmp or the workspaces
>>>>> using
>>>>> > >>>>>>>> up the space?
>>>>> > >>>>>>>>
>>>>> > >>>>>>>> On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton <
>>>>> tysonjh@google.com>
>>>>> > >>>>>>>> wrote:
>>>>> > >>>>>>>>
>>>>> > >>>>>>>>> Bleck. I just realized that it is 'offline' so that won't
>>>>> work.
>>>>> > >>>>>>>>> I'll clean up manually on the machine using the cron
>>>>> command.
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <
>>>>> > >>>>>>>>> tysonjh@google.com> wrote:
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>>> Something isn't working with the current set up because
>>>>> node 15
>>>>> > >>>>>>>>>> appears to be out of space and is currently 'offline'
>>>>> according to Jenkins.
>>>>> > >>>>>>>>>> Can someone run the cleanup job? The machine is full,
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>> @apache-ci-beam-jenkins-15:/tmp$ df -h
>>>>> > >>>>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>> > >>>>>>>>>> udev             52G     0   52G   0% /dev
>>>>> > >>>>>>>>>> tmpfs            11G  265M   10G   3% /run
>>>>> > >>>>>>>>>> */dev/sda1       485G  484G  880M 100% /*
>>>>> > >>>>>>>>>> tmpfs            52G     0   52G   0% /dev/shm
>>>>> > >>>>>>>>>> tmpfs           5.0M     0  5.0M   0% /run/lock
>>>>> > >>>>>>>>>> tmpfs            52G     0   52G   0% /sys/fs/cgroup
>>>>> > >>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1017
>>>>> > >>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1037
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . |
>>>>> sort -rhk
>>>>> > >>>>>>>>>> 1,1 | head -n 20
>>>>> > >>>>>>>>>> 20G     2020-07-24 17:52        .
>>>>> > >>>>>>>>>> 580M    2020-07-22 17:31        ./junit1031982597110125586
>>>>> > >>>>>>>>>> 517M    2020-07-22 17:31
>>>>> > >>>>>>>>>>
>>>>> ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
>>>>> > >>>>>>>>>> 517M    2020-07-22 17:31
>>>>> > >>>>>>>>>>  ./junit1031982597110125586/junit8739924829337821410
>>>>> > >>>>>>>>>> 263M    2020-07-22 12:23        ./pip-install-2GUhO_
>>>>> > >>>>>>>>>> 263M    2020-07-20 09:30        ./pip-install-sxgwqr
>>>>> > >>>>>>>>>> 263M    2020-07-17 13:56        ./pip-install-bWSKIV
>>>>> > >>>>>>>>>> 242M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T
>>>>> > >>>>>>>>>> 242M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK
>>>>> > >>>>>>>>>> 242M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ
>>>>> > >>>>>>>>>> 236M    2020-07-21 20:25
>>>>> > >>>>>>>>>>  ./beam-pipeline-tempmByU6T/tmpOWj3Yr
>>>>> > >>>>>>>>>> 236M    2020-07-21 20:21
>>>>> > >>>>>>>>>>  ./beam-pipeline-tempV85xeK/tmppbQHB3
>>>>> > >>>>>>>>>> 236M    2020-07-21 20:15
>>>>> > >>>>>>>>>>  ./beam-pipeline-temp7dJROJ/tmpgOXPKW
>>>>> > >>>>>>>>>> 111M    2020-07-23 00:57        ./pip-install-1JnyNE
>>>>> > >>>>>>>>>> 105M    2020-07-23 00:17
>>>>> ./beam-artifact1374651823280819755
>>>>> > >>>>>>>>>> 105M    2020-07-23 00:16
>>>>> ./beam-artifact5050755582921936972
>>>>> > >>>>>>>>>> 105M    2020-07-23 00:16
>>>>> ./beam-artifact1834064452502646289
>>>>> > >>>>>>>>>> 105M    2020-07-23 00:15
>>>>> ./beam-artifact682561790267074916
>>>>> > >>>>>>>>>> 105M    2020-07-23 00:15
>>>>> ./beam-artifact4691304965824489394
>>>>> > >>>>>>>>>> 105M    2020-07-23 00:14
>>>>> ./beam-artifact4050383819822604421
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <
>>>>> > >>>>>>>>>> robertwb@google.com> wrote:
>>>>> > >>>>>>>>>>
>>>>> > >>>>>>>>>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <
>>>>> > >>>>>>>>>>> tysonjh@google.com> wrote:
>>>>> > >>>>>>>>>>>
>>>>> > >>>>>>>>>>>> Ah I see, thanks Kenn. I found some advice from the
>>>>> Apache
>>>>> > >>>>>>>>>>>> infra wiki that also suggests using a tmpdir inside the
>>>>> workspace [1]:
>>>>> > >>>>>>>>>>>>
>>>>> > >>>>>>>>>>>> Procedures Projects can take to clean up disk space
>>>>> > >>>>>>>>>>>>
>>>>> > >>>>>>>>>>>> Projects can help themselves and Infra by taking some
>>>>> basic
>>>>> > >>>>>>>>>>>> steps to help clean up their jobs after themselves on
>>>>> the build nodes.
>>>>> > >>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>    1. Use a ./tmp dir in your jobs workspace. That way
>>>>> it gets
>>>>> > >>>>>>>>>>>>    cleaned up when job workspaces expire.
>>>>> > >>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>
>>>>> > >>>>>>>>>>> Tests should be (able to be) written to use the standard
>>>>> > >>>>>>>>>>> temporary file mechanisms, and the environment set up on
>>>>> Jenkins such that
>>>>> > >>>>>>>>>>> that falls into the respective workspaces. Ideally this
>>>>> should be as simple
>>>>> > >>>>>>>>>>> as setting the TMPDIR (or similar) environment variable
>>>>> (and making sure it
>>>>> > >>>>>>>>>>> exists/is writable).
>>>>> > >>>>>>>>>>>
>>>>> > >>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>    1. Configure your jobs to wipe workspaces on start or
>>>>> > >>>>>>>>>>>>    finish.
>>>>> > >>>>>>>>>>>>    2. Configure your jobs to only keep 5 or 10 previous
>>>>> builds.
>>>>> > >>>>>>>>>>>>    3. Configure your jobs to only keep 5 or 10 previous
>>>>> > >>>>>>>>>>>>    artifacts.
>>>>> > >>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>
>>>>> > >>>>>>>>>>>> [1]:
>>>>> > >>>>>>>>>>>>
>>>>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
>>>>> > >>>>>>>>>>>>
>>>>> > >>>>>>>>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <
>>>>> > >>>>>>>>>>>> kenn@apache.org> wrote:
>>>>> > >>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>> Those file listings look like the result of using
>>>>> standard
>>>>> > >>>>>>>>>>>>> temp file APIs but with TMPDIR set to /tmp.
>>>>> > >>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <
>>>>> > >>>>>>>>>>>>> tysonjh@google.com> wrote:
>>>>> > >>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>> Jobs are hermetic as far as I can tell and use unique
>>>>> > >>>>>>>>>>>>>> subdirectories inside of /tmp. Here is a quick look
>>>>> into two examples:
>>>>> > >>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time .
>>>>> | sort
>>>>> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20
>>>>> > >>>>>>>>>>>>>> 1.6G    2020-07-21 02:25        .
>>>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:48
>>>>> ./beam-pipeline-temp3ybuY4
>>>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:46
>>>>> ./beam-pipeline-tempuxjiPT
>>>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:44
>>>>> ./beam-pipeline-tempVpg1ME
>>>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:42
>>>>> ./beam-pipeline-tempJ4EpyB
>>>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:39
>>>>> ./beam-pipeline-tempepea7Q
>>>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:35
>>>>> ./beam-pipeline-temp79qot2
>>>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:48
>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>>>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:46
>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>>>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:44
>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>>>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:42
>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>>>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:39
>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmpWy1vWX
>>>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:35
>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmpvN7vWA
>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:48
>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmprlh_di
>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:46
>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:44
>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:42
>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:39
>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmptYF1v1
>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:35
>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmplfV0Rg
>>>>> > >>>>>>>>>>>>>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
>>>>> > >>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time .
>>>>> | sort
>>>>> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20
>>>>> > >>>>>>>>>>>>>> 817M    2020-07-21 02:26        .
>>>>> > >>>>>>>>>>>>>> 242M    2020-07-19 12:14
>>>>> ./beam-pipeline-tempUTXqlM
>>>>> > >>>>>>>>>>>>>> 242M    2020-07-19 12:11
>>>>> ./beam-pipeline-tempx3Yno3
>>>>> > >>>>>>>>>>>>>> 242M    2020-07-19 12:05
>>>>> ./beam-pipeline-tempyCrMYq
>>>>> > >>>>>>>>>>>>>> 236M    2020-07-19 12:14
>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpstXoL0
>>>>> > >>>>>>>>>>>>>> 236M    2020-07-19 12:11
>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpnnVn65
>>>>> > >>>>>>>>>>>>>> 236M    2020-07-19 12:05
>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpRF0iNs
>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:14
>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:11
>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpsmmzqe
>>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:05
>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>>>>> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:14
>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpoj3orz
>>>>> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:11
>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmptng9sZ
>>>>> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:05
>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpWp6njc
>>>>> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:14
>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmphgdj35
>>>>> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:11
>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmp8ySXpm
>>>>> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:05
>>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>>>>> > >>>>>>>>>>>>>> 992K    2020-07-12 12:00
>>>>> ./junit642086915811430564
>>>>> > >>>>>>>>>>>>>> 988K    2020-07-12 12:00
>>>>> ./junit642086915811430564/beam
>>>>> > >>>>>>>>>>>>>> 984K    2020-07-12 12:00
>>>>> > >>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes
>>>>> > >>>>>>>>>>>>>> 980K    2020-07-12 12:00
>>>>> > >>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes/0
>>>>> > >>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <
>>>>> ehudm@google.com>
>>>>> > >>>>>>>>>>>>>> wrote:
>>>>> > >>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>> You're right, job workspaces should be hermetic.
>>>>> > >>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <
>>>>> > >>>>>>>>>>>>>>> kenn@apache.org> wrote:
>>>>> > >>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>> I'm probably late to this discussion and missing
>>>>> something,
>>>>> > >>>>>>>>>>>>>>>> but why are we writing to /tmp at all? I would
>>>>> expect TMPDIR to point
>>>>> > >>>>>>>>>>>>>>>> somewhere inside the job directory that will be
>>>>> wiped by Jenkins, and I
>>>>> > >>>>>>>>>>>>>>>> would expect code to always create temp files via
>>>>> APIs that respect this.
>>>>> > >>>>>>>>>>>>>>>> Is Jenkins not cleaning up? Do we not have the
>>>>> ability to set this up? Do
>>>>> > >>>>>>>>>>>>>>>> we have bugs in our code (that we could probably
>>>>> find by setting TMPDIR to
>>>>> > >>>>>>>>>>>>>>>> somewhere not-/tmp and running the tests without
>>>>> write permission to /tmp,
>>>>> > >>>>>>>>>>>>>>>> etc)
>>>>> > >>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>> Kenn
>>>>> > >>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <
>>>>> > >>>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>> > >>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>> Related to workspace directory growth, +Udi Meiri
>>>>> > >>>>>>>>>>>>>>>>> <eh...@google.com> filed a relevant issue
>>>>> previously (
>>>>> > >>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865)
>>>>> for
>>>>> > >>>>>>>>>>>>>>>>> cleaning up workspace directory after successful
>>>>> jobs. Alternatively, we
>>>>> > >>>>>>>>>>>>>>>>> can consider periodically cleaning up the /src
>>>>> directories.
>>>>> > >>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>> I would suggest moving the cron task from internal
>>>>> cron
>>>>> > >>>>>>>>>>>>>>>>> scripts to the inventory job (
>>>>> > >>>>>>>>>>>>>>>>>
>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51
>>>>> ).
>>>>> > >>>>>>>>>>>>>>>>> That way, we can see all the cron jobs as part of
>>>>> the source tree, adjust
>>>>> > >>>>>>>>>>>>>>>>> frequencies and clean up codes with PRs. I do not
>>>>> know how internal cron
>>>>> > >>>>>>>>>>>>>>>>> scripts are created, maintained, and how would
>>>>> they be recreated for new
>>>>> > >>>>>>>>>>>>>>>>> worker instances.
>>>>> > >>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>> /cc +Tyson Hamilton <ty...@google.com>
>>>>> > >>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>>>>> > >>>>>>>>>>>>>>>>> damian.gadomski@polidea.com> wrote:
>>>>> > >>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>>> Hey,
>>>>> > >>>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>>> I've recently created a solution for the growing
>>>>> /tmp
>>>>> > >>>>>>>>>>>>>>>>>> directory. Part of it is the job mentioned by
>>>>> Tyson:
>>>>> > >>>>>>>>>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not
>>>>> > >>>>>>>>>>>>>>>>>> triggered by cron and should be a last resort
>>>>> solution for some strange
>>>>> > >>>>>>>>>>>>>>>>>> cases.
>>>>> > >>>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>>> Along with that job, I've also updated every
>>>>> worker with
>>>>> > >>>>>>>>>>>>>>>>>> an internal cron script. It's being executed once
>>>>> a week and deletes all
>>>>> > >>>>>>>>>>>>>>>>>> the files (and only files) that were not accessed
>>>>> for at least three days.
>>>>> > >>>>>>>>>>>>>>>>>> That's designed to be as safe as possible for the
>>>>> running jobs on the
>>>>> > >>>>>>>>>>>>>>>>>> worker (not to delete the files that are still in
>>>>> use), and also to be
>>>>> > >>>>>>>>>>>>>>>>>> insensitive to the current workload on the
>>>>> machine. The cleanup will always
>>>>> > >>>>>>>>>>>>>>>>>> happen, even if some long-running/stuck jobs are
>>>>> blocking the machine.
>>>>> > >>>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>>> I also think that currently the "No space left"
>>>>> errors
>>>>> > >>>>>>>>>>>>>>>>>> may be a consequence of growing workspace
>>>>> directory rather than /tmp. I
>>>>> > >>>>>>>>>>>>>>>>>> didn't do any detailed analysis but e.g.
>>>>> currently, on
>>>>> > >>>>>>>>>>>>>>>>>> apache-beam-jenkins-7 the workspace directory
>>>>> size is 158 GB while /tmp is
>>>>> > >>>>>>>>>>>>>>>>>> only 16 GB. We should either guarantee the disk
>>>>> size to hold workspaces for
>>>>> > >>>>>>>>>>>>>>>>>> all jobs (because eventually, every worker will
>>>>> execute each job) or clear
>>>>> > >>>>>>>>>>>>>>>>>> also the workspaces in some way.
>>>>> > >>>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>>> Regards,
>>>>> > >>>>>>>>>>>>>>>>>> Damian
>>>>> > >>>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian
>>>>> Michels <
>>>>> > >>>>>>>>>>>>>>>>>> mxm@apache.org> wrote:
>>>>> > >>>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>>>> +1 for scheduling it via a cron job if it won't
>>>>> lead to
>>>>> > >>>>>>>>>>>>>>>>>>> test failures
>>>>> > >>>>>>>>>>>>>>>>>>> while running. Not a Jenkins expert but maybe
>>>>> there is
>>>>> > >>>>>>>>>>>>>>>>>>> the notion of
>>>>> > >>>>>>>>>>>>>>>>>>> running exclusively while no other tasks are
>>>>> running?
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>>>> -Max
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>>> > >>>>>>>>>>>>>>>>>>> > FYI there was a job introduced to do this in
>>>>> Jenkins:
>>>>> > >>>>>>>>>>>>>>>>>>> beam_Clean_tmp_directory
>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>> > >>>>>>>>>>>>>>>>>>> > Currently it needs to be run manually. I'm
>>>>> seeing some
>>>>> > >>>>>>>>>>>>>>>>>>> out of disk related errors in precommit tests
>>>>> currently, perhaps we should
>>>>> > >>>>>>>>>>>>>>>>>>> schedule this job with cron?
>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>> > >>>>>>>>>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <
>>>>> > >>>>>>>>>>>>>>>>>>> heejong@google.com> wrote:
>>>>> > >>>>>>>>>>>>>>>>>>> >> Still seeing no space left on device errors on
>>>>> > >>>>>>>>>>>>>>>>>>> jenkins-7 (for example:
>>>>> > >>>>>>>>>>>>>>>>>>> >>
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>>>>> > >>>>>>>>>>>>>>>>>>> )
>>>>> > >>>>>>>>>>>>>>>>>>> >>
>>>>> > >>>>>>>>>>>>>>>>>>> >>
>>>>> > >>>>>>>>>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <
>>>>> > >>>>>>>>>>>>>>>>>>> amyrvold@google.com> wrote:
>>>>> > >>>>>>>>>>>>>>>>>>> >>
>>>>> > >>>>>>>>>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by
>>>>> jenkins
>>>>> > >>>>>>>>>>>>>>>>>>> older than 3 days.
>>>>> > >>>>>>>>>>>>>>>>>>> >>> Agree that we need a longer term solution.
>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>> > >>>>>>>>>>>>>>>>>>> >>> Passing recent tests on all executors except
>>>>> > >>>>>>>>>>>>>>>>>>> jenkins-12, which has not
>>>>> > >>>>>>>>>>>>>>>>>>> >>> scheduled recent builds for the past 13
>>>>> days. Not
>>>>> > >>>>>>>>>>>>>>>>>>> scheduling:
>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>> > >>>>>>>>>>>>>>>>>>> >>> Recent passing builds:
>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>>> > >>>>>>>>>>>>>>>>>>> >
>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>> > >>>>>>>>>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <
>>>>> > >>>>>>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>>> > >>>>>>>>>>>>>>>>>>> >>>> +Alan Myrvold <am...@google.com> is
>>>>> doing a one
>>>>> > >>>>>>>>>>>>>>>>>>> time cleanup. I agree
>>>>> > >>>>>>>>>>>>>>>>>>> >>>> that we need to have a solution to automate
>>>>> this
>>>>> > >>>>>>>>>>>>>>>>>>> task or address the root
>>>>> > >>>>>>>>>>>>>>>>>>> >>>> cause of the buildup.
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>
>>>>> > >>>>>>>>>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał
>>>>> Walenia <
>>>>> > >>>>>>>>>>>>>>>>>>> michal.walenia@polidea.com>
>>>>> > >>>>>>>>>>>>>>>>>>> >>>> wrote:
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Hi there,
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins
>>>>> workers
>>>>> > >>>>>>>>>>>>>>>>>>> again. Nodes 1 and 7
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> both fail jobs with "No space left on
>>>>> device".
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Who is the best person to contact in these
>>>>> cases
>>>>> > >>>>>>>>>>>>>>>>>>> (someone with access
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> permissions to the workers).
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> I also noticed that such errors are
>>>>> becoming more
>>>>> > >>>>>>>>>>>>>>>>>>> and more frequent
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> recently and I'd like to discuss how can
>>>>> this be
>>>>> > >>>>>>>>>>>>>>>>>>> remedied. Can a cleanup
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> task be automated on Jenkins somehow?
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Regards
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Michal
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> --
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Michał Walenia
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> |
>>>>> Software
>>>>> > >>>>>>>>>>>>>>>>>>> Engineer
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002>
>>>>> <+48%20791%20432%20002> <
>>>>> > >>>>>>>>>>>>>>>>>>> +48791432002 <+48%20791%20432%20002>
>>>>> <+48%20791%20432%20002>>
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> E: michal.walenia@polidea.com
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Unique Tech
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Check out our projects! <
>>>>> > >>>>>>>>>>>>>>>>>>> https://www.polidea.com/our-work>
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>>> > >>>>>>>>>>>>>>>>>>> >>>>
>>>>> > >>>>>>>>>>>>>>>>>>> >>
>>>>> > >>>>>>>>>>>>>>>>>>>
>>>>> > >>>>>>>>>>>>>>>>>>
>>>>> >
>>>>>
>>>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Damian Gadomski <da...@polidea.com>.
That's interesting. I didn't check that myself but all the Jenkins jobs are
configured to wipe the workspace just before the actual build happens [1]
<https://github.com/apache/beam/blob/8aca8ccc7f1a14516ad769b63845ddd4dc163d92/.test-infra/jenkins/CommonJobProperties.groovy#L6>.
Git SCM plugin is used for that and it enables the option called "Wipe out
repository and force clone". Docs state that it "deletes the contents of
the workspace before build and before checkout" [2]
<https://plugins.jenkins.io/git/>. Therefore I assume that removing
workspace just after the build won't change anything.

The ./.gradle/caches/modules-2/files-2.1 dir is indeed present on the
worker machines but it's rather in /home/jenkins dir.

damgad@apache-ci-beam-jenkins-13:/home/jenkins/.gradle$ sudo du -sh
11G .
damgad@apache-ci-beam-jenkins-13:/home/jenkins/.gradle$ sudo du -sh
caches/modules-2/files-2.1
2.3G caches/modules-2/files-2.1

I can't find that directory structure inside workspaces.

damgad@apache-ci-beam-jenkins-13:/home/jenkins/jenkins-slave/workspace$
sudo find -name "files-2.1"
damgad@apache-ci-beam-jenkins-13:/home/jenkins/jenkins-slave/workspace$

[1]
https://github.com/apache/beam/blob/8aca8ccc7f1a14516ad769b63845ddd4dc163d92/.test-infra/jenkins/CommonJobProperties.groovy#L6
[2] https://plugins.jenkins.io/git/

On Tue, Jul 28, 2020 at 5:47 PM Kenneth Knowles <ke...@apache.org> wrote:

> Just checking - will this wipe out dependency cache? That will slow things
> down and significantly increase flakiness. If I recall correctly, the
> default Jenkins layout was:
>
>     /home/jenkins/jenkins-slave/workspace/$jobname
>     /home/jenkins/jenkins-slave/workspace/$jobname/.m2
>     /home/jenkins/jenkins-slave/workspace/$jobname/.git
>
> Where you can see that it did a `git clone` right into the root workspace
> directory, adjacent to .m2. This was not hygienic. One important thing was
> that `git clean` would wipe the maven cache with every build. So in
> https://github.com/apache/beam/pull/3976 we changed it to
>
>     /home/jenkins/jenkins-slave/workspace/$jobname
>     /home/jenkins/jenkins-slave/workspace/$jobname/.m2
>     /home/jenkins/jenkins-slave/workspace/$jobname/src/.git
>
> Now the .m2 directory survives and we do not constantly see flakes
> re-downloading deps that are immutable. This does, of course, use disk
> space.
>
> That was in the maven days. Gradle is the same except for $HOME/.m2 is
> replaced by $HOME/.gradle/caches/modules-2/files-2.1. Is Jenkins configured
> the same way so we will be wiping out the dependencies? If so, can you
> address this issue? Everything in that directory should be immutable and
> just a cache to avoid pointless re-download.
>
> Kenn
>
> On Tue, Jul 28, 2020 at 2:25 AM Damian Gadomski <
> damian.gadomski@polidea.com> wrote:
>
>> Agree with Udi, workspaces seem to be the third culprit, not yet
>> addressed in any way (until PR#12326
>> <https://github.com/apache/beam/pull/12326> is merged). I feel that
>> it'll solve the issue of filling up the disks for a long time ;)
>>
>> I'm also OK with moving /tmp cleanup to option B, and will happily
>> investigate on proper TMPDIR config.
>>
>>
>>
>> On Tue, Jul 28, 2020 at 3:07 AM Udi Meiri <eh...@google.com> wrote:
>>
>>> What about the workspaces, which can take up 175GB in some cases (see
>>> above)?
>>> I'm working on getting them cleaned up automatically:
>>> https://github.com/apache/beam/pull/12326
>>>
>>> My opinion is that we would get more mileage out of fixing the jobs that
>>> leave behind files in /tmp and images/containers in Docker.
>>> This would also help keep development machines clean.
>>>
>>>
>>> On Mon, Jul 27, 2020 at 5:31 PM Tyson Hamilton <ty...@google.com>
>>> wrote:
>>>
>>>> Here is a summery of how I understand things,
>>>>
>>>>   - /tmp and /var/lib/docker are the culprit for filling up disks
>>>>   - inventory Jenkins job runs every 12 hours and runs a docker prune
>>>> to clean up images older than 24hr
>>>>   - crontab on each machine cleans up /tmp files older than three days
>>>> weekly
>>>>
>>>> This doesn't seem to be working since we're still running out of disk
>>>> periodically and requiring manual intervention. Knobs and options we have
>>>> available:
>>>>
>>>>   1. increase frequency of deleting files
>>>>   2. decrease the number of days required to delete a file (e.g. older
>>>> than 2 days)
>>>>
>>>> The execution methods we have available are:
>>>>
>>>>   A. cron
>>>>     - pro: runs even if a job gets stuck in Jenkins due to full disk
>>>>     - con: config baked into VM which is tough to update, not
>>>> discoverable or documented well
>>>>   B. inventory job
>>>>     - pro: easy to update, runs every 12h already
>>>>     - con: could get stuck if Jenkins agent runs out of disk or is
>>>> otherwise stuck, tied to all other inventory job frequency
>>>>   C. configure startup scripts for the VMs that set up the cron job
>>>> anytime the VM is restarted
>>>>     - pro: similar to A. and easy to update
>>>>     - con: similar to A.
>>>>
>>>> Between the three I prefer B. because it is consistent with other
>>>> inventory jobs. If it ends up that stuck jobs prohibit scheduling of the
>>>> inventory job often we could further investigate C to avoid having to
>>>> rebuild the VM images repeatedly.
>>>>
>>>> Any objections or comments? If not, we'll go forward with B. and reduce
>>>> the date check from 3 days to 2 days.
>>>>
>>>>
>>>> On 2020/07/24 20:13:29, Ahmet Altay <al...@google.com> wrote:
>>>> > Tests may not be doing docker cleanup. Inventory job runs a docker
>>>> prune
>>>> > every 12 hours for images older than 24 hrs [1]. Randomly looking at
>>>> one of
>>>> > the recent runs [2], it cleaned up a long list of containers consuming
>>>> > 30+GB space. That should be just 12 hours worth of containers.
>>>> >
>>>> > [1]
>>>> >
>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L69
>>>> > [2]
>>>> >
>>>> https://ci-beam.apache.org/job/beam_Inventory_apache-beam-jenkins-14/501/console
>>>> >
>>>> > On Fri, Jul 24, 2020 at 1:07 PM Tyson Hamilton <ty...@google.com>
>>>> wrote:
>>>> >
>>>> > > Yes, these are on the same volume in the /var/lib/docker directory.
>>>> I'm
>>>> > > unsure if they clean up leftover images.
>>>> > >
>>>> > > On Fri, Jul 24, 2020 at 12:52 PM Udi Meiri <eh...@google.com>
>>>> wrote:
>>>> > >
>>>> > >> I forgot Docker images:
>>>> > >>
>>>> > >> ehudm@apache-ci-beam-jenkins-3:~$ sudo docker system df
>>>> > >> TYPE                TOTAL               ACTIVE              SIZE
>>>> > >>        RECLAIMABLE
>>>> > >> Images              88                  9                   125.4GB
>>>> > >>       124.2GB (99%)
>>>> > >> Containers          40                  4                   7.927GB
>>>> > >>       7.871GB (99%)
>>>> > >> Local Volumes       47                  0                   3.165GB
>>>> > >>       3.165GB (100%)
>>>> > >> Build Cache         0                   0                   0B
>>>> > >>        0B
>>>> > >>
>>>> > >> There are about 90 images on that machine, with all but 1 less
>>>> than 48
>>>> > >> hours old.
>>>> > >> I think the docker test jobs need to try harder at cleaning up
>>>> their
>>>> > >> leftover images. (assuming they're already doing it?)
>>>> > >>
>>>> > >> On Fri, Jul 24, 2020 at 12:31 PM Udi Meiri <eh...@google.com>
>>>> wrote:
>>>> > >>
>>>> > >>> The additional slots (@3 directories) take up even more space now
>>>> than
>>>> > >>> before.
>>>> > >>>
>>>> > >>> I'm testing out https://github.com/apache/beam/pull/12326 which
>>>> could
>>>> > >>> help by cleaning up workspaces after a run (just started a seed
>>>> job).
>>>> > >>>
>>>> > >>> On Fri, Jul 24, 2020 at 12:13 PM Tyson Hamilton <
>>>> tysonjh@google.com>
>>>> > >>> wrote:
>>>> > >>>
>>>> > >>>> 664M    beam_PreCommit_JavaPortabilityApi_Commit
>>>> > >>>> 656M    beam_PreCommit_JavaPortabilityApi_Commit@2
>>>> > >>>> 611M    beam_PreCommit_JavaPortabilityApi_Cron
>>>> > >>>> 616M    beam_PreCommit_JavaPortabilityApiJava11_Commit
>>>> > >>>> 598M    beam_PreCommit_JavaPortabilityApiJava11_Commit@2
>>>> > >>>> 662M    beam_PreCommit_JavaPortabilityApiJava11_Cron
>>>> > >>>> 2.9G    beam_PreCommit_Portable_Python_Commit
>>>> > >>>> 2.9G    beam_PreCommit_Portable_Python_Commit@2
>>>> > >>>> 1.7G    beam_PreCommit_Portable_Python_Commit@3
>>>> > >>>> 3.4G    beam_PreCommit_Portable_Python_Cron
>>>> > >>>> 1.9G    beam_PreCommit_Python2_PVR_Flink_Commit
>>>> > >>>> 1.4G    beam_PreCommit_Python2_PVR_Flink_Cron
>>>> > >>>> 1.3G    beam_PreCommit_Python2_PVR_Flink_Phrase
>>>> > >>>> 6.2G    beam_PreCommit_Python_Commit
>>>> > >>>> 7.5G    beam_PreCommit_Python_Commit@2
>>>> > >>>> 7.5G    beam_PreCommit_Python_Cron
>>>> > >>>> 1012M   beam_PreCommit_PythonDocker_Commit
>>>> > >>>> 1011M   beam_PreCommit_PythonDocker_Commit@2
>>>> > >>>> 1011M   beam_PreCommit_PythonDocker_Commit@3
>>>> > >>>> 1002M   beam_PreCommit_PythonDocker_Cron
>>>> > >>>> 877M    beam_PreCommit_PythonFormatter_Commit
>>>> > >>>> 988M    beam_PreCommit_PythonFormatter_Cron
>>>> > >>>> 986M    beam_PreCommit_PythonFormatter_Phrase
>>>> > >>>> 1.7G    beam_PreCommit_PythonLint_Commit
>>>> > >>>> 2.1G    beam_PreCommit_PythonLint_Cron
>>>> > >>>> 7.5G    beam_PreCommit_Python_Phrase
>>>> > >>>> 346M    beam_PreCommit_RAT_Commit
>>>> > >>>> 341M    beam_PreCommit_RAT_Cron
>>>> > >>>> 338M    beam_PreCommit_Spotless_Commit
>>>> > >>>> 339M    beam_PreCommit_Spotless_Cron
>>>> > >>>> 5.5G    beam_PreCommit_SQL_Commit
>>>> > >>>> 5.5G    beam_PreCommit_SQL_Cron
>>>> > >>>> 5.5G    beam_PreCommit_SQL_Java11_Commit
>>>> > >>>> 750M    beam_PreCommit_Website_Commit
>>>> > >>>> 750M    beam_PreCommit_Website_Commit@2
>>>> > >>>> 750M    beam_PreCommit_Website_Cron
>>>> > >>>> 764M    beam_PreCommit_Website_Stage_GCS_Commit
>>>> > >>>> 771M    beam_PreCommit_Website_Stage_GCS_Cron
>>>> > >>>> 336M    beam_Prober_CommunityMetrics
>>>> > >>>> 693M    beam_python_mongoio_load_test
>>>> > >>>> 339M    beam_SeedJob
>>>> > >>>> 333M    beam_SeedJob_Standalone
>>>> > >>>> 334M    beam_sonarqube_report
>>>> > >>>> 556M    beam_SQLBigQueryIO_Batch_Performance_Test_Java
>>>> > >>>> 175G    total
>>>> > >>>>
>>>> > >>>> On Fri, Jul 24, 2020 at 12:04 PM Tyson Hamilton <
>>>> tysonjh@google.com>
>>>> > >>>> wrote:
>>>> > >>>>
>>>> > >>>>> Ya looks like something in the workspaces is taking up room:
>>>> > >>>>>
>>>> > >>>>> @apache-ci-beam-jenkins-8:/home/jenkins$ sudo du -shc .
>>>> > >>>>> 191G    .
>>>> > >>>>> 191G    total
>>>> > >>>>>
>>>> > >>>>>
>>>> > >>>>> On Fri, Jul 24, 2020 at 11:44 AM Tyson Hamilton <
>>>> tysonjh@google.com>
>>>> > >>>>> wrote:
>>>> > >>>>>
>>>> > >>>>>> Node 8 is also full. The partition that /tmp is on is here:
>>>> > >>>>>>
>>>> > >>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>> > >>>>>> /dev/sda1       485G  482G  2.9G 100% /
>>>> > >>>>>>
>>>> > >>>>>> however after cleaning up tmp with the crontab command, there
>>>> is only
>>>> > >>>>>> 8G usage yet it still remains 100% full:
>>>> > >>>>>>
>>>> > >>>>>> @apache-ci-beam-jenkins-8:/tmp$ sudo du -shc /tmp
>>>> > >>>>>> 8.0G    /tmp
>>>> > >>>>>> 8.0G    total
>>>> > >>>>>>
>>>> > >>>>>> The workspaces are in the /home/jenkins/jenkins-slave/workspace
>>>> > >>>>>> directory. When I run a du on that, it takes really long. I'll
>>>> let it keep
>>>> > >>>>>> running for a while to see if it ever returns a result but so
>>>> far this
>>>> > >>>>>> seems suspect.
>>>> > >>>>>>
>>>> > >>>>>>
>>>> > >>>>>>
>>>> > >>>>>>
>>>> > >>>>>>
>>>> > >>>>>> On Fri, Jul 24, 2020 at 11:19 AM Tyson Hamilton <
>>>> tysonjh@google.com>
>>>> > >>>>>> wrote:
>>>> > >>>>>>
>>>> > >>>>>>> Everything I've been looking at is in the /tmp dir. Where are
>>>> the
>>>> > >>>>>>> workspaces, or what are the named?
>>>> > >>>>>>>
>>>> > >>>>>>>
>>>> > >>>>>>>
>>>> > >>>>>>>
>>>> > >>>>>>> On Fri, Jul 24, 2020 at 11:03 AM Udi Meiri <eh...@google.com>
>>>> wrote:
>>>> > >>>>>>>
>>>> > >>>>>>>> I'm curious to what you find. Was it /tmp or the workspaces
>>>> using
>>>> > >>>>>>>> up the space?
>>>> > >>>>>>>>
>>>> > >>>>>>>> On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton <
>>>> tysonjh@google.com>
>>>> > >>>>>>>> wrote:
>>>> > >>>>>>>>
>>>> > >>>>>>>>> Bleck. I just realized that it is 'offline' so that won't
>>>> work.
>>>> > >>>>>>>>> I'll clean up manually on the machine using the cron
>>>> command.
>>>> > >>>>>>>>>
>>>> > >>>>>>>>> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <
>>>> > >>>>>>>>> tysonjh@google.com> wrote:
>>>> > >>>>>>>>>
>>>> > >>>>>>>>>> Something isn't working with the current set up because
>>>> node 15
>>>> > >>>>>>>>>> appears to be out of space and is currently 'offline'
>>>> according to Jenkins.
>>>> > >>>>>>>>>> Can someone run the cleanup job? The machine is full,
>>>> > >>>>>>>>>>
>>>> > >>>>>>>>>> @apache-ci-beam-jenkins-15:/tmp$ df -h
>>>> > >>>>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>> > >>>>>>>>>> udev             52G     0   52G   0% /dev
>>>> > >>>>>>>>>> tmpfs            11G  265M   10G   3% /run
>>>> > >>>>>>>>>> */dev/sda1       485G  484G  880M 100% /*
>>>> > >>>>>>>>>> tmpfs            52G     0   52G   0% /dev/shm
>>>> > >>>>>>>>>> tmpfs           5.0M     0  5.0M   0% /run/lock
>>>> > >>>>>>>>>> tmpfs            52G     0   52G   0% /sys/fs/cgroup
>>>> > >>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1017
>>>> > >>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1037
>>>> > >>>>>>>>>>
>>>> > >>>>>>>>>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . |
>>>> sort -rhk
>>>> > >>>>>>>>>> 1,1 | head -n 20
>>>> > >>>>>>>>>> 20G     2020-07-24 17:52        .
>>>> > >>>>>>>>>> 580M    2020-07-22 17:31        ./junit1031982597110125586
>>>> > >>>>>>>>>> 517M    2020-07-22 17:31
>>>> > >>>>>>>>>>
>>>> ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
>>>> > >>>>>>>>>> 517M    2020-07-22 17:31
>>>> > >>>>>>>>>>  ./junit1031982597110125586/junit8739924829337821410
>>>> > >>>>>>>>>> 263M    2020-07-22 12:23        ./pip-install-2GUhO_
>>>> > >>>>>>>>>> 263M    2020-07-20 09:30        ./pip-install-sxgwqr
>>>> > >>>>>>>>>> 263M    2020-07-17 13:56        ./pip-install-bWSKIV
>>>> > >>>>>>>>>> 242M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T
>>>> > >>>>>>>>>> 242M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK
>>>> > >>>>>>>>>> 242M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ
>>>> > >>>>>>>>>> 236M    2020-07-21 20:25
>>>> > >>>>>>>>>>  ./beam-pipeline-tempmByU6T/tmpOWj3Yr
>>>> > >>>>>>>>>> 236M    2020-07-21 20:21
>>>> > >>>>>>>>>>  ./beam-pipeline-tempV85xeK/tmppbQHB3
>>>> > >>>>>>>>>> 236M    2020-07-21 20:15
>>>> > >>>>>>>>>>  ./beam-pipeline-temp7dJROJ/tmpgOXPKW
>>>> > >>>>>>>>>> 111M    2020-07-23 00:57        ./pip-install-1JnyNE
>>>> > >>>>>>>>>> 105M    2020-07-23 00:17
>>>> ./beam-artifact1374651823280819755
>>>> > >>>>>>>>>> 105M    2020-07-23 00:16
>>>> ./beam-artifact5050755582921936972
>>>> > >>>>>>>>>> 105M    2020-07-23 00:16
>>>> ./beam-artifact1834064452502646289
>>>> > >>>>>>>>>> 105M    2020-07-23 00:15
>>>> ./beam-artifact682561790267074916
>>>> > >>>>>>>>>> 105M    2020-07-23 00:15
>>>> ./beam-artifact4691304965824489394
>>>> > >>>>>>>>>> 105M    2020-07-23 00:14
>>>> ./beam-artifact4050383819822604421
>>>> > >>>>>>>>>>
>>>> > >>>>>>>>>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <
>>>> > >>>>>>>>>> robertwb@google.com> wrote:
>>>> > >>>>>>>>>>
>>>> > >>>>>>>>>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <
>>>> > >>>>>>>>>>> tysonjh@google.com> wrote:
>>>> > >>>>>>>>>>>
>>>> > >>>>>>>>>>>> Ah I see, thanks Kenn. I found some advice from the
>>>> Apache
>>>> > >>>>>>>>>>>> infra wiki that also suggests using a tmpdir inside the
>>>> workspace [1]:
>>>> > >>>>>>>>>>>>
>>>> > >>>>>>>>>>>> Procedures Projects can take to clean up disk space
>>>> > >>>>>>>>>>>>
>>>> > >>>>>>>>>>>> Projects can help themselves and Infra by taking some
>>>> basic
>>>> > >>>>>>>>>>>> steps to help clean up their jobs after themselves on
>>>> the build nodes.
>>>> > >>>>>>>>>>>>
>>>> > >>>>>>>>>>>>
>>>> > >>>>>>>>>>>>
>>>> > >>>>>>>>>>>>    1. Use a ./tmp dir in your jobs workspace. That way
>>>> it gets
>>>> > >>>>>>>>>>>>    cleaned up when job workspaces expire.
>>>> > >>>>>>>>>>>>
>>>> > >>>>>>>>>>>>
>>>> > >>>>>>>>>>> Tests should be (able to be) written to use the standard
>>>> > >>>>>>>>>>> temporary file mechanisms, and the environment set up on
>>>> Jenkins such that
>>>> > >>>>>>>>>>> that falls into the respective workspaces. Ideally this
>>>> should be as simple
>>>> > >>>>>>>>>>> as setting the TMPDIR (or similar) environment variable
>>>> (and making sure it
>>>> > >>>>>>>>>>> exists/is writable).
>>>> > >>>>>>>>>>>
>>>> > >>>>>>>>>>>>
>>>> > >>>>>>>>>>>>    1. Configure your jobs to wipe workspaces on start or
>>>> > >>>>>>>>>>>>    finish.
>>>> > >>>>>>>>>>>>    2. Configure your jobs to only keep 5 or 10 previous
>>>> builds.
>>>> > >>>>>>>>>>>>    3. Configure your jobs to only keep 5 or 10 previous
>>>> > >>>>>>>>>>>>    artifacts.
>>>> > >>>>>>>>>>>>
>>>> > >>>>>>>>>>>>
>>>> > >>>>>>>>>>>>
>>>> > >>>>>>>>>>>> [1]:
>>>> > >>>>>>>>>>>>
>>>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
>>>> > >>>>>>>>>>>>
>>>> > >>>>>>>>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <
>>>> > >>>>>>>>>>>> kenn@apache.org> wrote:
>>>> > >>>>>>>>>>>>
>>>> > >>>>>>>>>>>>> Those file listings look like the result of using
>>>> standard
>>>> > >>>>>>>>>>>>> temp file APIs but with TMPDIR set to /tmp.
>>>> > >>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <
>>>> > >>>>>>>>>>>>> tysonjh@google.com> wrote:
>>>> > >>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>> Jobs are hermetic as far as I can tell and use unique
>>>> > >>>>>>>>>>>>>> subdirectories inside of /tmp. Here is a quick look
>>>> into two examples:
>>>> > >>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . |
>>>> sort
>>>> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20
>>>> > >>>>>>>>>>>>>> 1.6G    2020-07-21 02:25        .
>>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:48
>>>> ./beam-pipeline-temp3ybuY4
>>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:46
>>>> ./beam-pipeline-tempuxjiPT
>>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:44
>>>> ./beam-pipeline-tempVpg1ME
>>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:42
>>>> ./beam-pipeline-tempJ4EpyB
>>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:39
>>>> ./beam-pipeline-tempepea7Q
>>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:35
>>>> ./beam-pipeline-temp79qot2
>>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:48
>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:46
>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:44
>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:42
>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:39
>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmpWy1vWX
>>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:35
>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmpvN7vWA
>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:48
>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmprlh_di
>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:46
>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:44
>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:42
>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:39
>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmptYF1v1
>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:35
>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmplfV0Rg
>>>> > >>>>>>>>>>>>>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
>>>> > >>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time .
>>>> | sort
>>>> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20
>>>> > >>>>>>>>>>>>>> 817M    2020-07-21 02:26        .
>>>> > >>>>>>>>>>>>>> 242M    2020-07-19 12:14
>>>> ./beam-pipeline-tempUTXqlM
>>>> > >>>>>>>>>>>>>> 242M    2020-07-19 12:11
>>>> ./beam-pipeline-tempx3Yno3
>>>> > >>>>>>>>>>>>>> 242M    2020-07-19 12:05
>>>> ./beam-pipeline-tempyCrMYq
>>>> > >>>>>>>>>>>>>> 236M    2020-07-19 12:14
>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpstXoL0
>>>> > >>>>>>>>>>>>>> 236M    2020-07-19 12:11
>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpnnVn65
>>>> > >>>>>>>>>>>>>> 236M    2020-07-19 12:05
>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpRF0iNs
>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:14
>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:11
>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpsmmzqe
>>>> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:05
>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>>>> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:14
>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpoj3orz
>>>> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:11
>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmptng9sZ
>>>> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:05
>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpWp6njc
>>>> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:14
>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmphgdj35
>>>> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:11
>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmp8ySXpm
>>>> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:05
>>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>>>> > >>>>>>>>>>>>>> 992K    2020-07-12 12:00
>>>> ./junit642086915811430564
>>>> > >>>>>>>>>>>>>> 988K    2020-07-12 12:00
>>>> ./junit642086915811430564/beam
>>>> > >>>>>>>>>>>>>> 984K    2020-07-12 12:00
>>>> > >>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes
>>>> > >>>>>>>>>>>>>> 980K    2020-07-12 12:00
>>>> > >>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes/0
>>>> > >>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <
>>>> ehudm@google.com>
>>>> > >>>>>>>>>>>>>> wrote:
>>>> > >>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>> You're right, job workspaces should be hermetic.
>>>> > >>>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <
>>>> > >>>>>>>>>>>>>>> kenn@apache.org> wrote:
>>>> > >>>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>>> I'm probably late to this discussion and missing
>>>> something,
>>>> > >>>>>>>>>>>>>>>> but why are we writing to /tmp at all? I would
>>>> expect TMPDIR to point
>>>> > >>>>>>>>>>>>>>>> somewhere inside the job directory that will be
>>>> wiped by Jenkins, and I
>>>> > >>>>>>>>>>>>>>>> would expect code to always create temp files via
>>>> APIs that respect this.
>>>> > >>>>>>>>>>>>>>>> Is Jenkins not cleaning up? Do we not have the
>>>> ability to set this up? Do
>>>> > >>>>>>>>>>>>>>>> we have bugs in our code (that we could probably
>>>> find by setting TMPDIR to
>>>> > >>>>>>>>>>>>>>>> somewhere not-/tmp and running the tests without
>>>> write permission to /tmp,
>>>> > >>>>>>>>>>>>>>>> etc)
>>>> > >>>>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>>> Kenn
>>>> > >>>>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <
>>>> > >>>>>>>>>>>>>>>> altay@google.com> wrote:
>>>> > >>>>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>>>> Related to workspace directory growth, +Udi Meiri
>>>> > >>>>>>>>>>>>>>>>> <eh...@google.com> filed a relevant issue
>>>> previously (
>>>> > >>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865)
>>>> for
>>>> > >>>>>>>>>>>>>>>>> cleaning up workspace directory after successful
>>>> jobs. Alternatively, we
>>>> > >>>>>>>>>>>>>>>>> can consider periodically cleaning up the /src
>>>> directories.
>>>> > >>>>>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>>>> I would suggest moving the cron task from internal
>>>> cron
>>>> > >>>>>>>>>>>>>>>>> scripts to the inventory job (
>>>> > >>>>>>>>>>>>>>>>>
>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51
>>>> ).
>>>> > >>>>>>>>>>>>>>>>> That way, we can see all the cron jobs as part of
>>>> the source tree, adjust
>>>> > >>>>>>>>>>>>>>>>> frequencies and clean up codes with PRs. I do not
>>>> know how internal cron
>>>> > >>>>>>>>>>>>>>>>> scripts are created, maintained, and how would they
>>>> be recreated for new
>>>> > >>>>>>>>>>>>>>>>> worker instances.
>>>> > >>>>>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>>>> /cc +Tyson Hamilton <ty...@google.com>
>>>> > >>>>>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>>>> > >>>>>>>>>>>>>>>>> damian.gadomski@polidea.com> wrote:
>>>> > >>>>>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>>>>> Hey,
>>>> > >>>>>>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>>>>> I've recently created a solution for the growing
>>>> /tmp
>>>> > >>>>>>>>>>>>>>>>>> directory. Part of it is the job mentioned by
>>>> Tyson:
>>>> > >>>>>>>>>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not
>>>> > >>>>>>>>>>>>>>>>>> triggered by cron and should be a last resort
>>>> solution for some strange
>>>> > >>>>>>>>>>>>>>>>>> cases.
>>>> > >>>>>>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>>>>> Along with that job, I've also updated every
>>>> worker with
>>>> > >>>>>>>>>>>>>>>>>> an internal cron script. It's being executed once
>>>> a week and deletes all
>>>> > >>>>>>>>>>>>>>>>>> the files (and only files) that were not accessed
>>>> for at least three days.
>>>> > >>>>>>>>>>>>>>>>>> That's designed to be as safe as possible for the
>>>> running jobs on the
>>>> > >>>>>>>>>>>>>>>>>> worker (not to delete the files that are still in
>>>> use), and also to be
>>>> > >>>>>>>>>>>>>>>>>> insensitive to the current workload on the
>>>> machine. The cleanup will always
>>>> > >>>>>>>>>>>>>>>>>> happen, even if some long-running/stuck jobs are
>>>> blocking the machine.
>>>> > >>>>>>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>>>>> I also think that currently the "No space left"
>>>> errors
>>>> > >>>>>>>>>>>>>>>>>> may be a consequence of growing workspace
>>>> directory rather than /tmp. I
>>>> > >>>>>>>>>>>>>>>>>> didn't do any detailed analysis but e.g.
>>>> currently, on
>>>> > >>>>>>>>>>>>>>>>>> apache-beam-jenkins-7 the workspace directory size
>>>> is 158 GB while /tmp is
>>>> > >>>>>>>>>>>>>>>>>> only 16 GB. We should either guarantee the disk
>>>> size to hold workspaces for
>>>> > >>>>>>>>>>>>>>>>>> all jobs (because eventually, every worker will
>>>> execute each job) or clear
>>>> > >>>>>>>>>>>>>>>>>> also the workspaces in some way.
>>>> > >>>>>>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>>>>> Regards,
>>>> > >>>>>>>>>>>>>>>>>> Damian
>>>> > >>>>>>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian
>>>> Michels <
>>>> > >>>>>>>>>>>>>>>>>> mxm@apache.org> wrote:
>>>> > >>>>>>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>>>>>> +1 for scheduling it via a cron job if it won't
>>>> lead to
>>>> > >>>>>>>>>>>>>>>>>>> test failures
>>>> > >>>>>>>>>>>>>>>>>>> while running. Not a Jenkins expert but maybe
>>>> there is
>>>> > >>>>>>>>>>>>>>>>>>> the notion of
>>>> > >>>>>>>>>>>>>>>>>>> running exclusively while no other tasks are
>>>> running?
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>>>>>> -Max
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>> > >>>>>>>>>>>>>>>>>>> > FYI there was a job introduced to do this in
>>>> Jenkins:
>>>> > >>>>>>>>>>>>>>>>>>> beam_Clean_tmp_directory
>>>> > >>>>>>>>>>>>>>>>>>> >
>>>> > >>>>>>>>>>>>>>>>>>> > Currently it needs to be run manually. I'm
>>>> seeing some
>>>> > >>>>>>>>>>>>>>>>>>> out of disk related errors in precommit tests
>>>> currently, perhaps we should
>>>> > >>>>>>>>>>>>>>>>>>> schedule this job with cron?
>>>> > >>>>>>>>>>>>>>>>>>> >
>>>> > >>>>>>>>>>>>>>>>>>> >
>>>> > >>>>>>>>>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <
>>>> > >>>>>>>>>>>>>>>>>>> heejong@google.com> wrote:
>>>> > >>>>>>>>>>>>>>>>>>> >> Still seeing no space left on device errors on
>>>> > >>>>>>>>>>>>>>>>>>> jenkins-7 (for example:
>>>> > >>>>>>>>>>>>>>>>>>> >>
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>>>> > >>>>>>>>>>>>>>>>>>> )
>>>> > >>>>>>>>>>>>>>>>>>> >>
>>>> > >>>>>>>>>>>>>>>>>>> >>
>>>> > >>>>>>>>>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <
>>>> > >>>>>>>>>>>>>>>>>>> amyrvold@google.com> wrote:
>>>> > >>>>>>>>>>>>>>>>>>> >>
>>>> > >>>>>>>>>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by
>>>> jenkins
>>>> > >>>>>>>>>>>>>>>>>>> older than 3 days.
>>>> > >>>>>>>>>>>>>>>>>>> >>> Agree that we need a longer term solution.
>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>> > >>>>>>>>>>>>>>>>>>> >>> Passing recent tests on all executors except
>>>> > >>>>>>>>>>>>>>>>>>> jenkins-12, which has not
>>>> > >>>>>>>>>>>>>>>>>>> >>> scheduled recent builds for the past 13 days.
>>>> Not
>>>> > >>>>>>>>>>>>>>>>>>> scheduling:
>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>> > >>>>>>>>>>>>>>>>>>> >
>>>> > >>>>>>>>>>>>>>>>>>> >>> Recent passing builds:
>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>> > >>>>>>>>>>>>>>>>>>> >
>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>> > >>>>>>>>>>>>>>>>>>> >
>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>> > >>>>>>>>>>>>>>>>>>> >
>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>> > >>>>>>>>>>>>>>>>>>> >
>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>> > >>>>>>>>>>>>>>>>>>> >
>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>> > >>>>>>>>>>>>>>>>>>> >
>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>> > >>>>>>>>>>>>>>>>>>> >
>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>> > >>>>>>>>>>>>>>>>>>> >
>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>> > >>>>>>>>>>>>>>>>>>> >
>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>> > >>>>>>>>>>>>>>>>>>> >
>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>> > >>>>>>>>>>>>>>>>>>> >
>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>> > >>>>>>>>>>>>>>>>>>> >
>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>> > >>>>>>>>>>>>>>>>>>> >
>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>> > >>>>>>>>>>>>>>>>>>> >
>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>> > >>>>>>>>>>>>>>>>>>> >
>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>> > >>>>>>>>>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <
>>>> > >>>>>>>>>>>>>>>>>>> altay@google.com> wrote:
>>>> > >>>>>>>>>>>>>>>>>>> >>>
>>>> > >>>>>>>>>>>>>>>>>>> >>>> +Alan Myrvold <am...@google.com> is
>>>> doing a one
>>>> > >>>>>>>>>>>>>>>>>>> time cleanup. I agree
>>>> > >>>>>>>>>>>>>>>>>>> >>>> that we need to have a solution to automate
>>>> this
>>>> > >>>>>>>>>>>>>>>>>>> task or address the root
>>>> > >>>>>>>>>>>>>>>>>>> >>>> cause of the buildup.
>>>> > >>>>>>>>>>>>>>>>>>> >>>>
>>>> > >>>>>>>>>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał
>>>> Walenia <
>>>> > >>>>>>>>>>>>>>>>>>> michal.walenia@polidea.com>
>>>> > >>>>>>>>>>>>>>>>>>> >>>> wrote:
>>>> > >>>>>>>>>>>>>>>>>>> >>>>
>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Hi there,
>>>> > >>>>>>>>>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins
>>>> workers
>>>> > >>>>>>>>>>>>>>>>>>> again. Nodes 1 and 7
>>>> > >>>>>>>>>>>>>>>>>>> >>>>> both fail jobs with "No space left on
>>>> device".
>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Who is the best person to contact in these
>>>> cases
>>>> > >>>>>>>>>>>>>>>>>>> (someone with access
>>>> > >>>>>>>>>>>>>>>>>>> >>>>> permissions to the workers).
>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>> > >>>>>>>>>>>>>>>>>>> >>>>> I also noticed that such errors are
>>>> becoming more
>>>> > >>>>>>>>>>>>>>>>>>> and more frequent
>>>> > >>>>>>>>>>>>>>>>>>> >>>>> recently and I'd like to discuss how can
>>>> this be
>>>> > >>>>>>>>>>>>>>>>>>> remedied. Can a cleanup
>>>> > >>>>>>>>>>>>>>>>>>> >>>>> task be automated on Jenkins somehow?
>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Regards
>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Michal
>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>> > >>>>>>>>>>>>>>>>>>> >>>>> --
>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Michał Walenia
>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> |
>>>> Software
>>>> > >>>>>>>>>>>>>>>>>>> Engineer
>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>> > >>>>>>>>>>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002>
>>>> <+48%20791%20432%20002> <
>>>> > >>>>>>>>>>>>>>>>>>> +48791432002 <+48%20791%20432%20002>
>>>> <+48%20791%20432%20002>>
>>>> > >>>>>>>>>>>>>>>>>>> >>>>> E: michal.walenia@polidea.com
>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Unique Tech
>>>> > >>>>>>>>>>>>>>>>>>> >>>>> Check out our projects! <
>>>> > >>>>>>>>>>>>>>>>>>> https://www.polidea.com/our-work>
>>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>>> > >>>>>>>>>>>>>>>>>>> >>>>
>>>> > >>>>>>>>>>>>>>>>>>> >>
>>>> > >>>>>>>>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>>>>>>>
>>>> >
>>>>
>>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Kenneth Knowles <ke...@apache.org>.
Just checking - will this wipe out dependency cache? That will slow things
down and significantly increase flakiness. If I recall correctly, the
default Jenkins layout was:

    /home/jenkins/jenkins-slave/workspace/$jobname
    /home/jenkins/jenkins-slave/workspace/$jobname/.m2
    /home/jenkins/jenkins-slave/workspace/$jobname/.git

Where you can see that it did a `git clone` right into the root workspace
directory, adjacent to .m2. This was not hygienic. One important thing was
that `git clean` would wipe the maven cache with every build. So in
https://github.com/apache/beam/pull/3976 we changed it to

    /home/jenkins/jenkins-slave/workspace/$jobname
    /home/jenkins/jenkins-slave/workspace/$jobname/.m2
    /home/jenkins/jenkins-slave/workspace/$jobname/src/.git

Now the .m2 directory survives and we do not constantly see flakes
re-downloading deps that are immutable. This does, of course, use disk
space.

That was in the maven days. Gradle is the same except for $HOME/.m2 is
replaced by $HOME/.gradle/caches/modules-2/files-2.1. Is Jenkins configured
the same way so we will be wiping out the dependencies? If so, can you
address this issue? Everything in that directory should be immutable and
just a cache to avoid pointless re-download.

Kenn

On Tue, Jul 28, 2020 at 2:25 AM Damian Gadomski <da...@polidea.com>
wrote:

> Agree with Udi, workspaces seem to be the third culprit, not yet addressed
> in any way (until PR#12326 <https://github.com/apache/beam/pull/12326> is
> merged). I feel that it'll solve the issue of filling up the disks for a
> long time ;)
>
> I'm also OK with moving /tmp cleanup to option B, and will happily
> investigate on proper TMPDIR config.
>
>
>
> On Tue, Jul 28, 2020 at 3:07 AM Udi Meiri <eh...@google.com> wrote:
>
>> What about the workspaces, which can take up 175GB in some cases (see
>> above)?
>> I'm working on getting them cleaned up automatically:
>> https://github.com/apache/beam/pull/12326
>>
>> My opinion is that we would get more mileage out of fixing the jobs that
>> leave behind files in /tmp and images/containers in Docker.
>> This would also help keep development machines clean.
>>
>>
>> On Mon, Jul 27, 2020 at 5:31 PM Tyson Hamilton <ty...@google.com>
>> wrote:
>>
>>> Here is a summery of how I understand things,
>>>
>>>   - /tmp and /var/lib/docker are the culprit for filling up disks
>>>   - inventory Jenkins job runs every 12 hours and runs a docker prune to
>>> clean up images older than 24hr
>>>   - crontab on each machine cleans up /tmp files older than three days
>>> weekly
>>>
>>> This doesn't seem to be working since we're still running out of disk
>>> periodically and requiring manual intervention. Knobs and options we have
>>> available:
>>>
>>>   1. increase frequency of deleting files
>>>   2. decrease the number of days required to delete a file (e.g. older
>>> than 2 days)
>>>
>>> The execution methods we have available are:
>>>
>>>   A. cron
>>>     - pro: runs even if a job gets stuck in Jenkins due to full disk
>>>     - con: config baked into VM which is tough to update, not
>>> discoverable or documented well
>>>   B. inventory job
>>>     - pro: easy to update, runs every 12h already
>>>     - con: could get stuck if Jenkins agent runs out of disk or is
>>> otherwise stuck, tied to all other inventory job frequency
>>>   C. configure startup scripts for the VMs that set up the cron job
>>> anytime the VM is restarted
>>>     - pro: similar to A. and easy to update
>>>     - con: similar to A.
>>>
>>> Between the three I prefer B. because it is consistent with other
>>> inventory jobs. If it ends up that stuck jobs prohibit scheduling of the
>>> inventory job often we could further investigate C to avoid having to
>>> rebuild the VM images repeatedly.
>>>
>>> Any objections or comments? If not, we'll go forward with B. and reduce
>>> the date check from 3 days to 2 days.
>>>
>>>
>>> On 2020/07/24 20:13:29, Ahmet Altay <al...@google.com> wrote:
>>> > Tests may not be doing docker cleanup. Inventory job runs a docker
>>> prune
>>> > every 12 hours for images older than 24 hrs [1]. Randomly looking at
>>> one of
>>> > the recent runs [2], it cleaned up a long list of containers consuming
>>> > 30+GB space. That should be just 12 hours worth of containers.
>>> >
>>> > [1]
>>> >
>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L69
>>> > [2]
>>> >
>>> https://ci-beam.apache.org/job/beam_Inventory_apache-beam-jenkins-14/501/console
>>> >
>>> > On Fri, Jul 24, 2020 at 1:07 PM Tyson Hamilton <ty...@google.com>
>>> wrote:
>>> >
>>> > > Yes, these are on the same volume in the /var/lib/docker directory.
>>> I'm
>>> > > unsure if they clean up leftover images.
>>> > >
>>> > > On Fri, Jul 24, 2020 at 12:52 PM Udi Meiri <eh...@google.com> wrote:
>>> > >
>>> > >> I forgot Docker images:
>>> > >>
>>> > >> ehudm@apache-ci-beam-jenkins-3:~$ sudo docker system df
>>> > >> TYPE                TOTAL               ACTIVE              SIZE
>>> > >>        RECLAIMABLE
>>> > >> Images              88                  9                   125.4GB
>>> > >>       124.2GB (99%)
>>> > >> Containers          40                  4                   7.927GB
>>> > >>       7.871GB (99%)
>>> > >> Local Volumes       47                  0                   3.165GB
>>> > >>       3.165GB (100%)
>>> > >> Build Cache         0                   0                   0B
>>> > >>        0B
>>> > >>
>>> > >> There are about 90 images on that machine, with all but 1 less than
>>> 48
>>> > >> hours old.
>>> > >> I think the docker test jobs need to try harder at cleaning up their
>>> > >> leftover images. (assuming they're already doing it?)
>>> > >>
>>> > >> On Fri, Jul 24, 2020 at 12:31 PM Udi Meiri <eh...@google.com>
>>> wrote:
>>> > >>
>>> > >>> The additional slots (@3 directories) take up even more space now
>>> than
>>> > >>> before.
>>> > >>>
>>> > >>> I'm testing out https://github.com/apache/beam/pull/12326 which
>>> could
>>> > >>> help by cleaning up workspaces after a run (just started a seed
>>> job).
>>> > >>>
>>> > >>> On Fri, Jul 24, 2020 at 12:13 PM Tyson Hamilton <
>>> tysonjh@google.com>
>>> > >>> wrote:
>>> > >>>
>>> > >>>> 664M    beam_PreCommit_JavaPortabilityApi_Commit
>>> > >>>> 656M    beam_PreCommit_JavaPortabilityApi_Commit@2
>>> > >>>> 611M    beam_PreCommit_JavaPortabilityApi_Cron
>>> > >>>> 616M    beam_PreCommit_JavaPortabilityApiJava11_Commit
>>> > >>>> 598M    beam_PreCommit_JavaPortabilityApiJava11_Commit@2
>>> > >>>> 662M    beam_PreCommit_JavaPortabilityApiJava11_Cron
>>> > >>>> 2.9G    beam_PreCommit_Portable_Python_Commit
>>> > >>>> 2.9G    beam_PreCommit_Portable_Python_Commit@2
>>> > >>>> 1.7G    beam_PreCommit_Portable_Python_Commit@3
>>> > >>>> 3.4G    beam_PreCommit_Portable_Python_Cron
>>> > >>>> 1.9G    beam_PreCommit_Python2_PVR_Flink_Commit
>>> > >>>> 1.4G    beam_PreCommit_Python2_PVR_Flink_Cron
>>> > >>>> 1.3G    beam_PreCommit_Python2_PVR_Flink_Phrase
>>> > >>>> 6.2G    beam_PreCommit_Python_Commit
>>> > >>>> 7.5G    beam_PreCommit_Python_Commit@2
>>> > >>>> 7.5G    beam_PreCommit_Python_Cron
>>> > >>>> 1012M   beam_PreCommit_PythonDocker_Commit
>>> > >>>> 1011M   beam_PreCommit_PythonDocker_Commit@2
>>> > >>>> 1011M   beam_PreCommit_PythonDocker_Commit@3
>>> > >>>> 1002M   beam_PreCommit_PythonDocker_Cron
>>> > >>>> 877M    beam_PreCommit_PythonFormatter_Commit
>>> > >>>> 988M    beam_PreCommit_PythonFormatter_Cron
>>> > >>>> 986M    beam_PreCommit_PythonFormatter_Phrase
>>> > >>>> 1.7G    beam_PreCommit_PythonLint_Commit
>>> > >>>> 2.1G    beam_PreCommit_PythonLint_Cron
>>> > >>>> 7.5G    beam_PreCommit_Python_Phrase
>>> > >>>> 346M    beam_PreCommit_RAT_Commit
>>> > >>>> 341M    beam_PreCommit_RAT_Cron
>>> > >>>> 338M    beam_PreCommit_Spotless_Commit
>>> > >>>> 339M    beam_PreCommit_Spotless_Cron
>>> > >>>> 5.5G    beam_PreCommit_SQL_Commit
>>> > >>>> 5.5G    beam_PreCommit_SQL_Cron
>>> > >>>> 5.5G    beam_PreCommit_SQL_Java11_Commit
>>> > >>>> 750M    beam_PreCommit_Website_Commit
>>> > >>>> 750M    beam_PreCommit_Website_Commit@2
>>> > >>>> 750M    beam_PreCommit_Website_Cron
>>> > >>>> 764M    beam_PreCommit_Website_Stage_GCS_Commit
>>> > >>>> 771M    beam_PreCommit_Website_Stage_GCS_Cron
>>> > >>>> 336M    beam_Prober_CommunityMetrics
>>> > >>>> 693M    beam_python_mongoio_load_test
>>> > >>>> 339M    beam_SeedJob
>>> > >>>> 333M    beam_SeedJob_Standalone
>>> > >>>> 334M    beam_sonarqube_report
>>> > >>>> 556M    beam_SQLBigQueryIO_Batch_Performance_Test_Java
>>> > >>>> 175G    total
>>> > >>>>
>>> > >>>> On Fri, Jul 24, 2020 at 12:04 PM Tyson Hamilton <
>>> tysonjh@google.com>
>>> > >>>> wrote:
>>> > >>>>
>>> > >>>>> Ya looks like something in the workspaces is taking up room:
>>> > >>>>>
>>> > >>>>> @apache-ci-beam-jenkins-8:/home/jenkins$ sudo du -shc .
>>> > >>>>> 191G    .
>>> > >>>>> 191G    total
>>> > >>>>>
>>> > >>>>>
>>> > >>>>> On Fri, Jul 24, 2020 at 11:44 AM Tyson Hamilton <
>>> tysonjh@google.com>
>>> > >>>>> wrote:
>>> > >>>>>
>>> > >>>>>> Node 8 is also full. The partition that /tmp is on is here:
>>> > >>>>>>
>>> > >>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>> > >>>>>> /dev/sda1       485G  482G  2.9G 100% /
>>> > >>>>>>
>>> > >>>>>> however after cleaning up tmp with the crontab command, there
>>> is only
>>> > >>>>>> 8G usage yet it still remains 100% full:
>>> > >>>>>>
>>> > >>>>>> @apache-ci-beam-jenkins-8:/tmp$ sudo du -shc /tmp
>>> > >>>>>> 8.0G    /tmp
>>> > >>>>>> 8.0G    total
>>> > >>>>>>
>>> > >>>>>> The workspaces are in the /home/jenkins/jenkins-slave/workspace
>>> > >>>>>> directory. When I run a du on that, it takes really long. I'll
>>> let it keep
>>> > >>>>>> running for a while to see if it ever returns a result but so
>>> far this
>>> > >>>>>> seems suspect.
>>> > >>>>>>
>>> > >>>>>>
>>> > >>>>>>
>>> > >>>>>>
>>> > >>>>>>
>>> > >>>>>> On Fri, Jul 24, 2020 at 11:19 AM Tyson Hamilton <
>>> tysonjh@google.com>
>>> > >>>>>> wrote:
>>> > >>>>>>
>>> > >>>>>>> Everything I've been looking at is in the /tmp dir. Where are
>>> the
>>> > >>>>>>> workspaces, or what are the named?
>>> > >>>>>>>
>>> > >>>>>>>
>>> > >>>>>>>
>>> > >>>>>>>
>>> > >>>>>>> On Fri, Jul 24, 2020 at 11:03 AM Udi Meiri <eh...@google.com>
>>> wrote:
>>> > >>>>>>>
>>> > >>>>>>>> I'm curious to what you find. Was it /tmp or the workspaces
>>> using
>>> > >>>>>>>> up the space?
>>> > >>>>>>>>
>>> > >>>>>>>> On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton <
>>> tysonjh@google.com>
>>> > >>>>>>>> wrote:
>>> > >>>>>>>>
>>> > >>>>>>>>> Bleck. I just realized that it is 'offline' so that won't
>>> work.
>>> > >>>>>>>>> I'll clean up manually on the machine using the cron command.
>>> > >>>>>>>>>
>>> > >>>>>>>>> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <
>>> > >>>>>>>>> tysonjh@google.com> wrote:
>>> > >>>>>>>>>
>>> > >>>>>>>>>> Something isn't working with the current set up because
>>> node 15
>>> > >>>>>>>>>> appears to be out of space and is currently 'offline'
>>> according to Jenkins.
>>> > >>>>>>>>>> Can someone run the cleanup job? The machine is full,
>>> > >>>>>>>>>>
>>> > >>>>>>>>>> @apache-ci-beam-jenkins-15:/tmp$ df -h
>>> > >>>>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>> > >>>>>>>>>> udev             52G     0   52G   0% /dev
>>> > >>>>>>>>>> tmpfs            11G  265M   10G   3% /run
>>> > >>>>>>>>>> */dev/sda1       485G  484G  880M 100% /*
>>> > >>>>>>>>>> tmpfs            52G     0   52G   0% /dev/shm
>>> > >>>>>>>>>> tmpfs           5.0M     0  5.0M   0% /run/lock
>>> > >>>>>>>>>> tmpfs            52G     0   52G   0% /sys/fs/cgroup
>>> > >>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1017
>>> > >>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1037
>>> > >>>>>>>>>>
>>> > >>>>>>>>>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort
>>> -rhk
>>> > >>>>>>>>>> 1,1 | head -n 20
>>> > >>>>>>>>>> 20G     2020-07-24 17:52        .
>>> > >>>>>>>>>> 580M    2020-07-22 17:31        ./junit1031982597110125586
>>> > >>>>>>>>>> 517M    2020-07-22 17:31
>>> > >>>>>>>>>>
>>> ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
>>> > >>>>>>>>>> 517M    2020-07-22 17:31
>>> > >>>>>>>>>>  ./junit1031982597110125586/junit8739924829337821410
>>> > >>>>>>>>>> 263M    2020-07-22 12:23        ./pip-install-2GUhO_
>>> > >>>>>>>>>> 263M    2020-07-20 09:30        ./pip-install-sxgwqr
>>> > >>>>>>>>>> 263M    2020-07-17 13:56        ./pip-install-bWSKIV
>>> > >>>>>>>>>> 242M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T
>>> > >>>>>>>>>> 242M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK
>>> > >>>>>>>>>> 242M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ
>>> > >>>>>>>>>> 236M    2020-07-21 20:25
>>> > >>>>>>>>>>  ./beam-pipeline-tempmByU6T/tmpOWj3Yr
>>> > >>>>>>>>>> 236M    2020-07-21 20:21
>>> > >>>>>>>>>>  ./beam-pipeline-tempV85xeK/tmppbQHB3
>>> > >>>>>>>>>> 236M    2020-07-21 20:15
>>> > >>>>>>>>>>  ./beam-pipeline-temp7dJROJ/tmpgOXPKW
>>> > >>>>>>>>>> 111M    2020-07-23 00:57        ./pip-install-1JnyNE
>>> > >>>>>>>>>> 105M    2020-07-23 00:17
>>> ./beam-artifact1374651823280819755
>>> > >>>>>>>>>> 105M    2020-07-23 00:16
>>> ./beam-artifact5050755582921936972
>>> > >>>>>>>>>> 105M    2020-07-23 00:16
>>> ./beam-artifact1834064452502646289
>>> > >>>>>>>>>> 105M    2020-07-23 00:15
>>> ./beam-artifact682561790267074916
>>> > >>>>>>>>>> 105M    2020-07-23 00:15
>>> ./beam-artifact4691304965824489394
>>> > >>>>>>>>>> 105M    2020-07-23 00:14
>>> ./beam-artifact4050383819822604421
>>> > >>>>>>>>>>
>>> > >>>>>>>>>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <
>>> > >>>>>>>>>> robertwb@google.com> wrote:
>>> > >>>>>>>>>>
>>> > >>>>>>>>>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <
>>> > >>>>>>>>>>> tysonjh@google.com> wrote:
>>> > >>>>>>>>>>>
>>> > >>>>>>>>>>>> Ah I see, thanks Kenn. I found some advice from the Apache
>>> > >>>>>>>>>>>> infra wiki that also suggests using a tmpdir inside the
>>> workspace [1]:
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>>> Procedures Projects can take to clean up disk space
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>>> Projects can help themselves and Infra by taking some
>>> basic
>>> > >>>>>>>>>>>> steps to help clean up their jobs after themselves on the
>>> build nodes.
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>>>    1. Use a ./tmp dir in your jobs workspace. That way it
>>> gets
>>> > >>>>>>>>>>>>    cleaned up when job workspaces expire.
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>> Tests should be (able to be) written to use the standard
>>> > >>>>>>>>>>> temporary file mechanisms, and the environment set up on
>>> Jenkins such that
>>> > >>>>>>>>>>> that falls into the respective workspaces. Ideally this
>>> should be as simple
>>> > >>>>>>>>>>> as setting the TMPDIR (or similar) environment variable
>>> (and making sure it
>>> > >>>>>>>>>>> exists/is writable).
>>> > >>>>>>>>>>>
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>>>    1. Configure your jobs to wipe workspaces on start or
>>> > >>>>>>>>>>>>    finish.
>>> > >>>>>>>>>>>>    2. Configure your jobs to only keep 5 or 10 previous
>>> builds.
>>> > >>>>>>>>>>>>    3. Configure your jobs to only keep 5 or 10 previous
>>> > >>>>>>>>>>>>    artifacts.
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>>> [1]:
>>> > >>>>>>>>>>>>
>>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <
>>> > >>>>>>>>>>>> kenn@apache.org> wrote:
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>>>> Those file listings look like the result of using
>>> standard
>>> > >>>>>>>>>>>>> temp file APIs but with TMPDIR set to /tmp.
>>> > >>>>>>>>>>>>>
>>> > >>>>>>>>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <
>>> > >>>>>>>>>>>>> tysonjh@google.com> wrote:
>>> > >>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>> Jobs are hermetic as far as I can tell and use unique
>>> > >>>>>>>>>>>>>> subdirectories inside of /tmp. Here is a quick look
>>> into two examples:
>>> > >>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . |
>>> sort
>>> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20
>>> > >>>>>>>>>>>>>> 1.6G    2020-07-21 02:25        .
>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:48
>>> ./beam-pipeline-temp3ybuY4
>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:46
>>> ./beam-pipeline-tempuxjiPT
>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:44
>>> ./beam-pipeline-tempVpg1ME
>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:42
>>> ./beam-pipeline-tempJ4EpyB
>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:39
>>> ./beam-pipeline-tempepea7Q
>>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:35
>>> ./beam-pipeline-temp79qot2
>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:48
>>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:46
>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:44
>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:42
>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:39
>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmpWy1vWX
>>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:35
>>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmpvN7vWA
>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:48
>>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmprlh_di
>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:46
>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:44
>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:42
>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:39
>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmptYF1v1
>>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:35
>>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmplfV0Rg
>>> > >>>>>>>>>>>>>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
>>> > >>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . |
>>> sort
>>> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20
>>> > >>>>>>>>>>>>>> 817M    2020-07-21 02:26        .
>>> > >>>>>>>>>>>>>> 242M    2020-07-19 12:14
>>> ./beam-pipeline-tempUTXqlM
>>> > >>>>>>>>>>>>>> 242M    2020-07-19 12:11
>>> ./beam-pipeline-tempx3Yno3
>>> > >>>>>>>>>>>>>> 242M    2020-07-19 12:05
>>> ./beam-pipeline-tempyCrMYq
>>> > >>>>>>>>>>>>>> 236M    2020-07-19 12:14
>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpstXoL0
>>> > >>>>>>>>>>>>>> 236M    2020-07-19 12:11
>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpnnVn65
>>> > >>>>>>>>>>>>>> 236M    2020-07-19 12:05
>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpRF0iNs
>>> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:14
>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>>> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:11
>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpsmmzqe
>>> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:05
>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>>> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:14
>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpoj3orz
>>> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:11
>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmptng9sZ
>>> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:05
>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpWp6njc
>>> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:14
>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmphgdj35
>>> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:11
>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmp8ySXpm
>>> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:05
>>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>>> > >>>>>>>>>>>>>> 992K    2020-07-12 12:00
>>> ./junit642086915811430564
>>> > >>>>>>>>>>>>>> 988K    2020-07-12 12:00
>>> ./junit642086915811430564/beam
>>> > >>>>>>>>>>>>>> 984K    2020-07-12 12:00
>>> > >>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes
>>> > >>>>>>>>>>>>>> 980K    2020-07-12 12:00
>>> > >>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes/0
>>> > >>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <
>>> ehudm@google.com>
>>> > >>>>>>>>>>>>>> wrote:
>>> > >>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>> You're right, job workspaces should be hermetic.
>>> > >>>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <
>>> > >>>>>>>>>>>>>>> kenn@apache.org> wrote:
>>> > >>>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>>> I'm probably late to this discussion and missing
>>> something,
>>> > >>>>>>>>>>>>>>>> but why are we writing to /tmp at all? I would expect
>>> TMPDIR to point
>>> > >>>>>>>>>>>>>>>> somewhere inside the job directory that will be wiped
>>> by Jenkins, and I
>>> > >>>>>>>>>>>>>>>> would expect code to always create temp files via
>>> APIs that respect this.
>>> > >>>>>>>>>>>>>>>> Is Jenkins not cleaning up? Do we not have the
>>> ability to set this up? Do
>>> > >>>>>>>>>>>>>>>> we have bugs in our code (that we could probably find
>>> by setting TMPDIR to
>>> > >>>>>>>>>>>>>>>> somewhere not-/tmp and running the tests without
>>> write permission to /tmp,
>>> > >>>>>>>>>>>>>>>> etc)
>>> > >>>>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>>> Kenn
>>> > >>>>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <
>>> > >>>>>>>>>>>>>>>> altay@google.com> wrote:
>>> > >>>>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>>>> Related to workspace directory growth, +Udi Meiri
>>> > >>>>>>>>>>>>>>>>> <eh...@google.com> filed a relevant issue
>>> previously (
>>> > >>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for
>>> > >>>>>>>>>>>>>>>>> cleaning up workspace directory after successful
>>> jobs. Alternatively, we
>>> > >>>>>>>>>>>>>>>>> can consider periodically cleaning up the /src
>>> directories.
>>> > >>>>>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>>>> I would suggest moving the cron task from internal
>>> cron
>>> > >>>>>>>>>>>>>>>>> scripts to the inventory job (
>>> > >>>>>>>>>>>>>>>>>
>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51
>>> ).
>>> > >>>>>>>>>>>>>>>>> That way, we can see all the cron jobs as part of
>>> the source tree, adjust
>>> > >>>>>>>>>>>>>>>>> frequencies and clean up codes with PRs. I do not
>>> know how internal cron
>>> > >>>>>>>>>>>>>>>>> scripts are created, maintained, and how would they
>>> be recreated for new
>>> > >>>>>>>>>>>>>>>>> worker instances.
>>> > >>>>>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>>>> /cc +Tyson Hamilton <ty...@google.com>
>>> > >>>>>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>>> > >>>>>>>>>>>>>>>>> damian.gadomski@polidea.com> wrote:
>>> > >>>>>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>>>>> Hey,
>>> > >>>>>>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>>>>> I've recently created a solution for the growing
>>> /tmp
>>> > >>>>>>>>>>>>>>>>>> directory. Part of it is the job mentioned by Tyson:
>>> > >>>>>>>>>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not
>>> > >>>>>>>>>>>>>>>>>> triggered by cron and should be a last resort
>>> solution for some strange
>>> > >>>>>>>>>>>>>>>>>> cases.
>>> > >>>>>>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>>>>> Along with that job, I've also updated every worker
>>> with
>>> > >>>>>>>>>>>>>>>>>> an internal cron script. It's being executed once a
>>> week and deletes all
>>> > >>>>>>>>>>>>>>>>>> the files (and only files) that were not accessed
>>> for at least three days.
>>> > >>>>>>>>>>>>>>>>>> That's designed to be as safe as possible for the
>>> running jobs on the
>>> > >>>>>>>>>>>>>>>>>> worker (not to delete the files that are still in
>>> use), and also to be
>>> > >>>>>>>>>>>>>>>>>> insensitive to the current workload on the machine.
>>> The cleanup will always
>>> > >>>>>>>>>>>>>>>>>> happen, even if some long-running/stuck jobs are
>>> blocking the machine.
>>> > >>>>>>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>>>>> I also think that currently the "No space left"
>>> errors
>>> > >>>>>>>>>>>>>>>>>> may be a consequence of growing workspace directory
>>> rather than /tmp. I
>>> > >>>>>>>>>>>>>>>>>> didn't do any detailed analysis but e.g. currently,
>>> on
>>> > >>>>>>>>>>>>>>>>>> apache-beam-jenkins-7 the workspace directory size
>>> is 158 GB while /tmp is
>>> > >>>>>>>>>>>>>>>>>> only 16 GB. We should either guarantee the disk
>>> size to hold workspaces for
>>> > >>>>>>>>>>>>>>>>>> all jobs (because eventually, every worker will
>>> execute each job) or clear
>>> > >>>>>>>>>>>>>>>>>> also the workspaces in some way.
>>> > >>>>>>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>>>>> Regards,
>>> > >>>>>>>>>>>>>>>>>> Damian
>>> > >>>>>>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels
>>> <
>>> > >>>>>>>>>>>>>>>>>> mxm@apache.org> wrote:
>>> > >>>>>>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>>>>>> +1 for scheduling it via a cron job if it won't
>>> lead to
>>> > >>>>>>>>>>>>>>>>>>> test failures
>>> > >>>>>>>>>>>>>>>>>>> while running. Not a Jenkins expert but maybe
>>> there is
>>> > >>>>>>>>>>>>>>>>>>> the notion of
>>> > >>>>>>>>>>>>>>>>>>> running exclusively while no other tasks are
>>> running?
>>> > >>>>>>>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>>>>>> -Max
>>> > >>>>>>>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>> > >>>>>>>>>>>>>>>>>>> > FYI there was a job introduced to do this in
>>> Jenkins:
>>> > >>>>>>>>>>>>>>>>>>> beam_Clean_tmp_directory
>>> > >>>>>>>>>>>>>>>>>>> >
>>> > >>>>>>>>>>>>>>>>>>> > Currently it needs to be run manually. I'm
>>> seeing some
>>> > >>>>>>>>>>>>>>>>>>> out of disk related errors in precommit tests
>>> currently, perhaps we should
>>> > >>>>>>>>>>>>>>>>>>> schedule this job with cron?
>>> > >>>>>>>>>>>>>>>>>>> >
>>> > >>>>>>>>>>>>>>>>>>> >
>>> > >>>>>>>>>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <
>>> > >>>>>>>>>>>>>>>>>>> heejong@google.com> wrote:
>>> > >>>>>>>>>>>>>>>>>>> >> Still seeing no space left on device errors on
>>> > >>>>>>>>>>>>>>>>>>> jenkins-7 (for example:
>>> > >>>>>>>>>>>>>>>>>>> >>
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>>> > >>>>>>>>>>>>>>>>>>> )
>>> > >>>>>>>>>>>>>>>>>>> >>
>>> > >>>>>>>>>>>>>>>>>>> >>
>>> > >>>>>>>>>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <
>>> > >>>>>>>>>>>>>>>>>>> amyrvold@google.com> wrote:
>>> > >>>>>>>>>>>>>>>>>>> >>
>>> > >>>>>>>>>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by
>>> jenkins
>>> > >>>>>>>>>>>>>>>>>>> older than 3 days.
>>> > >>>>>>>>>>>>>>>>>>> >>> Agree that we need a longer term solution.
>>> > >>>>>>>>>>>>>>>>>>> >>>
>>> > >>>>>>>>>>>>>>>>>>> >>> Passing recent tests on all executors except
>>> > >>>>>>>>>>>>>>>>>>> jenkins-12, which has not
>>> > >>>>>>>>>>>>>>>>>>> >>> scheduled recent builds for the past 13 days.
>>> Not
>>> > >>>>>>>>>>>>>>>>>>> scheduling:
>>> > >>>>>>>>>>>>>>>>>>> >>>
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>> > >>>>>>>>>>>>>>>>>>> >
>>> > >>>>>>>>>>>>>>>>>>> >>> Recent passing builds:
>>> > >>>>>>>>>>>>>>>>>>> >>>
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>> > >>>>>>>>>>>>>>>>>>> >
>>> > >>>>>>>>>>>>>>>>>>> >>>
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>> > >>>>>>>>>>>>>>>>>>> >
>>> > >>>>>>>>>>>>>>>>>>> >>>
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>> > >>>>>>>>>>>>>>>>>>> >
>>> > >>>>>>>>>>>>>>>>>>> >>>
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>> > >>>>>>>>>>>>>>>>>>> >
>>> > >>>>>>>>>>>>>>>>>>> >>>
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>> > >>>>>>>>>>>>>>>>>>> >
>>> > >>>>>>>>>>>>>>>>>>> >>>
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>> > >>>>>>>>>>>>>>>>>>> >
>>> > >>>>>>>>>>>>>>>>>>> >>>
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>> > >>>>>>>>>>>>>>>>>>> >
>>> > >>>>>>>>>>>>>>>>>>> >>>
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>> > >>>>>>>>>>>>>>>>>>> >
>>> > >>>>>>>>>>>>>>>>>>> >>>
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>> > >>>>>>>>>>>>>>>>>>> >
>>> > >>>>>>>>>>>>>>>>>>> >>>
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>> > >>>>>>>>>>>>>>>>>>> >
>>> > >>>>>>>>>>>>>>>>>>> >>>
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>> > >>>>>>>>>>>>>>>>>>> >
>>> > >>>>>>>>>>>>>>>>>>> >>>
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>> > >>>>>>>>>>>>>>>>>>> >
>>> > >>>>>>>>>>>>>>>>>>> >>>
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>> > >>>>>>>>>>>>>>>>>>> >
>>> > >>>>>>>>>>>>>>>>>>> >>>
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>> > >>>>>>>>>>>>>>>>>>> >
>>> > >>>>>>>>>>>>>>>>>>> >>>
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>> > >>>>>>>>>>>>>>>>>>> >>> <
>>> > >>>>>>>>>>>>>>>>>>>
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>> > >>>>>>>>>>>>>>>>>>> >
>>> > >>>>>>>>>>>>>>>>>>> >>>
>>> > >>>>>>>>>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <
>>> > >>>>>>>>>>>>>>>>>>> altay@google.com> wrote:
>>> > >>>>>>>>>>>>>>>>>>> >>>
>>> > >>>>>>>>>>>>>>>>>>> >>>> +Alan Myrvold <am...@google.com> is doing
>>> a one
>>> > >>>>>>>>>>>>>>>>>>> time cleanup. I agree
>>> > >>>>>>>>>>>>>>>>>>> >>>> that we need to have a solution to automate
>>> this
>>> > >>>>>>>>>>>>>>>>>>> task or address the root
>>> > >>>>>>>>>>>>>>>>>>> >>>> cause of the buildup.
>>> > >>>>>>>>>>>>>>>>>>> >>>>
>>> > >>>>>>>>>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia
>>> <
>>> > >>>>>>>>>>>>>>>>>>> michal.walenia@polidea.com>
>>> > >>>>>>>>>>>>>>>>>>> >>>> wrote:
>>> > >>>>>>>>>>>>>>>>>>> >>>>
>>> > >>>>>>>>>>>>>>>>>>> >>>>> Hi there,
>>> > >>>>>>>>>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins
>>> workers
>>> > >>>>>>>>>>>>>>>>>>> again. Nodes 1 and 7
>>> > >>>>>>>>>>>>>>>>>>> >>>>> both fail jobs with "No space left on
>>> device".
>>> > >>>>>>>>>>>>>>>>>>> >>>>> Who is the best person to contact in these
>>> cases
>>> > >>>>>>>>>>>>>>>>>>> (someone with access
>>> > >>>>>>>>>>>>>>>>>>> >>>>> permissions to the workers).
>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>> > >>>>>>>>>>>>>>>>>>> >>>>> I also noticed that such errors are becoming
>>> more
>>> > >>>>>>>>>>>>>>>>>>> and more frequent
>>> > >>>>>>>>>>>>>>>>>>> >>>>> recently and I'd like to discuss how can
>>> this be
>>> > >>>>>>>>>>>>>>>>>>> remedied. Can a cleanup
>>> > >>>>>>>>>>>>>>>>>>> >>>>> task be automated on Jenkins somehow?
>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>> > >>>>>>>>>>>>>>>>>>> >>>>> Regards
>>> > >>>>>>>>>>>>>>>>>>> >>>>> Michal
>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>> > >>>>>>>>>>>>>>>>>>> >>>>> --
>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>> > >>>>>>>>>>>>>>>>>>> >>>>> Michał Walenia
>>> > >>>>>>>>>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> |
>>> Software
>>> > >>>>>>>>>>>>>>>>>>> Engineer
>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>> > >>>>>>>>>>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002>
>>> <+48%20791%20432%20002> <
>>> > >>>>>>>>>>>>>>>>>>> +48791432002 <+48%20791%20432%20002>
>>> <+48%20791%20432%20002>>
>>> > >>>>>>>>>>>>>>>>>>> >>>>> E: michal.walenia@polidea.com
>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>> > >>>>>>>>>>>>>>>>>>> >>>>> Unique Tech
>>> > >>>>>>>>>>>>>>>>>>> >>>>> Check out our projects! <
>>> > >>>>>>>>>>>>>>>>>>> https://www.polidea.com/our-work>
>>> > >>>>>>>>>>>>>>>>>>> >>>>>
>>> > >>>>>>>>>>>>>>>>>>> >>>>
>>> > >>>>>>>>>>>>>>>>>>> >>
>>> > >>>>>>>>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>>>>>>
>>> >
>>>
>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Damian Gadomski <da...@polidea.com>.
Agree with Udi, workspaces seem to be the third culprit, not yet addressed
in any way (until PR#12326 <https://github.com/apache/beam/pull/12326> is
merged). I feel that it'll solve the issue of filling up the disks for a
long time ;)

I'm also OK with moving /tmp cleanup to option B, and will happily
investigate on proper TMPDIR config.



On Tue, Jul 28, 2020 at 3:07 AM Udi Meiri <eh...@google.com> wrote:

> What about the workspaces, which can take up 175GB in some cases (see
> above)?
> I'm working on getting them cleaned up automatically:
> https://github.com/apache/beam/pull/12326
>
> My opinion is that we would get more mileage out of fixing the jobs that
> leave behind files in /tmp and images/containers in Docker.
> This would also help keep development machines clean.
>
>
> On Mon, Jul 27, 2020 at 5:31 PM Tyson Hamilton <ty...@google.com> wrote:
>
>> Here is a summery of how I understand things,
>>
>>   - /tmp and /var/lib/docker are the culprit for filling up disks
>>   - inventory Jenkins job runs every 12 hours and runs a docker prune to
>> clean up images older than 24hr
>>   - crontab on each machine cleans up /tmp files older than three days
>> weekly
>>
>> This doesn't seem to be working since we're still running out of disk
>> periodically and requiring manual intervention. Knobs and options we have
>> available:
>>
>>   1. increase frequency of deleting files
>>   2. decrease the number of days required to delete a file (e.g. older
>> than 2 days)
>>
>> The execution methods we have available are:
>>
>>   A. cron
>>     - pro: runs even if a job gets stuck in Jenkins due to full disk
>>     - con: config baked into VM which is tough to update, not
>> discoverable or documented well
>>   B. inventory job
>>     - pro: easy to update, runs every 12h already
>>     - con: could get stuck if Jenkins agent runs out of disk or is
>> otherwise stuck, tied to all other inventory job frequency
>>   C. configure startup scripts for the VMs that set up the cron job
>> anytime the VM is restarted
>>     - pro: similar to A. and easy to update
>>     - con: similar to A.
>>
>> Between the three I prefer B. because it is consistent with other
>> inventory jobs. If it ends up that stuck jobs prohibit scheduling of the
>> inventory job often we could further investigate C to avoid having to
>> rebuild the VM images repeatedly.
>>
>> Any objections or comments? If not, we'll go forward with B. and reduce
>> the date check from 3 days to 2 days.
>>
>>
>> On 2020/07/24 20:13:29, Ahmet Altay <al...@google.com> wrote:
>> > Tests may not be doing docker cleanup. Inventory job runs a docker prune
>> > every 12 hours for images older than 24 hrs [1]. Randomly looking at
>> one of
>> > the recent runs [2], it cleaned up a long list of containers consuming
>> > 30+GB space. That should be just 12 hours worth of containers.
>> >
>> > [1]
>> >
>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L69
>> > [2]
>> >
>> https://ci-beam.apache.org/job/beam_Inventory_apache-beam-jenkins-14/501/console
>> >
>> > On Fri, Jul 24, 2020 at 1:07 PM Tyson Hamilton <ty...@google.com>
>> wrote:
>> >
>> > > Yes, these are on the same volume in the /var/lib/docker directory.
>> I'm
>> > > unsure if they clean up leftover images.
>> > >
>> > > On Fri, Jul 24, 2020 at 12:52 PM Udi Meiri <eh...@google.com> wrote:
>> > >
>> > >> I forgot Docker images:
>> > >>
>> > >> ehudm@apache-ci-beam-jenkins-3:~$ sudo docker system df
>> > >> TYPE                TOTAL               ACTIVE              SIZE
>> > >>        RECLAIMABLE
>> > >> Images              88                  9                   125.4GB
>> > >>       124.2GB (99%)
>> > >> Containers          40                  4                   7.927GB
>> > >>       7.871GB (99%)
>> > >> Local Volumes       47                  0                   3.165GB
>> > >>       3.165GB (100%)
>> > >> Build Cache         0                   0                   0B
>> > >>        0B
>> > >>
>> > >> There are about 90 images on that machine, with all but 1 less than
>> 48
>> > >> hours old.
>> > >> I think the docker test jobs need to try harder at cleaning up their
>> > >> leftover images. (assuming they're already doing it?)
>> > >>
>> > >> On Fri, Jul 24, 2020 at 12:31 PM Udi Meiri <eh...@google.com> wrote:
>> > >>
>> > >>> The additional slots (@3 directories) take up even more space now
>> than
>> > >>> before.
>> > >>>
>> > >>> I'm testing out https://github.com/apache/beam/pull/12326 which
>> could
>> > >>> help by cleaning up workspaces after a run (just started a seed
>> job).
>> > >>>
>> > >>> On Fri, Jul 24, 2020 at 12:13 PM Tyson Hamilton <tysonjh@google.com
>> >
>> > >>> wrote:
>> > >>>
>> > >>>> 664M    beam_PreCommit_JavaPortabilityApi_Commit
>> > >>>> 656M    beam_PreCommit_JavaPortabilityApi_Commit@2
>> > >>>> 611M    beam_PreCommit_JavaPortabilityApi_Cron
>> > >>>> 616M    beam_PreCommit_JavaPortabilityApiJava11_Commit
>> > >>>> 598M    beam_PreCommit_JavaPortabilityApiJava11_Commit@2
>> > >>>> 662M    beam_PreCommit_JavaPortabilityApiJava11_Cron
>> > >>>> 2.9G    beam_PreCommit_Portable_Python_Commit
>> > >>>> 2.9G    beam_PreCommit_Portable_Python_Commit@2
>> > >>>> 1.7G    beam_PreCommit_Portable_Python_Commit@3
>> > >>>> 3.4G    beam_PreCommit_Portable_Python_Cron
>> > >>>> 1.9G    beam_PreCommit_Python2_PVR_Flink_Commit
>> > >>>> 1.4G    beam_PreCommit_Python2_PVR_Flink_Cron
>> > >>>> 1.3G    beam_PreCommit_Python2_PVR_Flink_Phrase
>> > >>>> 6.2G    beam_PreCommit_Python_Commit
>> > >>>> 7.5G    beam_PreCommit_Python_Commit@2
>> > >>>> 7.5G    beam_PreCommit_Python_Cron
>> > >>>> 1012M   beam_PreCommit_PythonDocker_Commit
>> > >>>> 1011M   beam_PreCommit_PythonDocker_Commit@2
>> > >>>> 1011M   beam_PreCommit_PythonDocker_Commit@3
>> > >>>> 1002M   beam_PreCommit_PythonDocker_Cron
>> > >>>> 877M    beam_PreCommit_PythonFormatter_Commit
>> > >>>> 988M    beam_PreCommit_PythonFormatter_Cron
>> > >>>> 986M    beam_PreCommit_PythonFormatter_Phrase
>> > >>>> 1.7G    beam_PreCommit_PythonLint_Commit
>> > >>>> 2.1G    beam_PreCommit_PythonLint_Cron
>> > >>>> 7.5G    beam_PreCommit_Python_Phrase
>> > >>>> 346M    beam_PreCommit_RAT_Commit
>> > >>>> 341M    beam_PreCommit_RAT_Cron
>> > >>>> 338M    beam_PreCommit_Spotless_Commit
>> > >>>> 339M    beam_PreCommit_Spotless_Cron
>> > >>>> 5.5G    beam_PreCommit_SQL_Commit
>> > >>>> 5.5G    beam_PreCommit_SQL_Cron
>> > >>>> 5.5G    beam_PreCommit_SQL_Java11_Commit
>> > >>>> 750M    beam_PreCommit_Website_Commit
>> > >>>> 750M    beam_PreCommit_Website_Commit@2
>> > >>>> 750M    beam_PreCommit_Website_Cron
>> > >>>> 764M    beam_PreCommit_Website_Stage_GCS_Commit
>> > >>>> 771M    beam_PreCommit_Website_Stage_GCS_Cron
>> > >>>> 336M    beam_Prober_CommunityMetrics
>> > >>>> 693M    beam_python_mongoio_load_test
>> > >>>> 339M    beam_SeedJob
>> > >>>> 333M    beam_SeedJob_Standalone
>> > >>>> 334M    beam_sonarqube_report
>> > >>>> 556M    beam_SQLBigQueryIO_Batch_Performance_Test_Java
>> > >>>> 175G    total
>> > >>>>
>> > >>>> On Fri, Jul 24, 2020 at 12:04 PM Tyson Hamilton <
>> tysonjh@google.com>
>> > >>>> wrote:
>> > >>>>
>> > >>>>> Ya looks like something in the workspaces is taking up room:
>> > >>>>>
>> > >>>>> @apache-ci-beam-jenkins-8:/home/jenkins$ sudo du -shc .
>> > >>>>> 191G    .
>> > >>>>> 191G    total
>> > >>>>>
>> > >>>>>
>> > >>>>> On Fri, Jul 24, 2020 at 11:44 AM Tyson Hamilton <
>> tysonjh@google.com>
>> > >>>>> wrote:
>> > >>>>>
>> > >>>>>> Node 8 is also full. The partition that /tmp is on is here:
>> > >>>>>>
>> > >>>>>> Filesystem      Size  Used Avail Use% Mounted on
>> > >>>>>> /dev/sda1       485G  482G  2.9G 100% /
>> > >>>>>>
>> > >>>>>> however after cleaning up tmp with the crontab command, there is
>> only
>> > >>>>>> 8G usage yet it still remains 100% full:
>> > >>>>>>
>> > >>>>>> @apache-ci-beam-jenkins-8:/tmp$ sudo du -shc /tmp
>> > >>>>>> 8.0G    /tmp
>> > >>>>>> 8.0G    total
>> > >>>>>>
>> > >>>>>> The workspaces are in the /home/jenkins/jenkins-slave/workspace
>> > >>>>>> directory. When I run a du on that, it takes really long. I'll
>> let it keep
>> > >>>>>> running for a while to see if it ever returns a result but so
>> far this
>> > >>>>>> seems suspect.
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>> On Fri, Jul 24, 2020 at 11:19 AM Tyson Hamilton <
>> tysonjh@google.com>
>> > >>>>>> wrote:
>> > >>>>>>
>> > >>>>>>> Everything I've been looking at is in the /tmp dir. Where are
>> the
>> > >>>>>>> workspaces, or what are the named?
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>> On Fri, Jul 24, 2020 at 11:03 AM Udi Meiri <eh...@google.com>
>> wrote:
>> > >>>>>>>
>> > >>>>>>>> I'm curious to what you find. Was it /tmp or the workspaces
>> using
>> > >>>>>>>> up the space?
>> > >>>>>>>>
>> > >>>>>>>> On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton <
>> tysonjh@google.com>
>> > >>>>>>>> wrote:
>> > >>>>>>>>
>> > >>>>>>>>> Bleck. I just realized that it is 'offline' so that won't
>> work.
>> > >>>>>>>>> I'll clean up manually on the machine using the cron command.
>> > >>>>>>>>>
>> > >>>>>>>>> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <
>> > >>>>>>>>> tysonjh@google.com> wrote:
>> > >>>>>>>>>
>> > >>>>>>>>>> Something isn't working with the current set up because node
>> 15
>> > >>>>>>>>>> appears to be out of space and is currently 'offline'
>> according to Jenkins.
>> > >>>>>>>>>> Can someone run the cleanup job? The machine is full,
>> > >>>>>>>>>>
>> > >>>>>>>>>> @apache-ci-beam-jenkins-15:/tmp$ df -h
>> > >>>>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>> > >>>>>>>>>> udev             52G     0   52G   0% /dev
>> > >>>>>>>>>> tmpfs            11G  265M   10G   3% /run
>> > >>>>>>>>>> */dev/sda1       485G  484G  880M 100% /*
>> > >>>>>>>>>> tmpfs            52G     0   52G   0% /dev/shm
>> > >>>>>>>>>> tmpfs           5.0M     0  5.0M   0% /run/lock
>> > >>>>>>>>>> tmpfs            52G     0   52G   0% /sys/fs/cgroup
>> > >>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1017
>> > >>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1037
>> > >>>>>>>>>>
>> > >>>>>>>>>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort
>> -rhk
>> > >>>>>>>>>> 1,1 | head -n 20
>> > >>>>>>>>>> 20G     2020-07-24 17:52        .
>> > >>>>>>>>>> 580M    2020-07-22 17:31        ./junit1031982597110125586
>> > >>>>>>>>>> 517M    2020-07-22 17:31
>> > >>>>>>>>>>
>> ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
>> > >>>>>>>>>> 517M    2020-07-22 17:31
>> > >>>>>>>>>>  ./junit1031982597110125586/junit8739924829337821410
>> > >>>>>>>>>> 263M    2020-07-22 12:23        ./pip-install-2GUhO_
>> > >>>>>>>>>> 263M    2020-07-20 09:30        ./pip-install-sxgwqr
>> > >>>>>>>>>> 263M    2020-07-17 13:56        ./pip-install-bWSKIV
>> > >>>>>>>>>> 242M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T
>> > >>>>>>>>>> 242M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK
>> > >>>>>>>>>> 242M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ
>> > >>>>>>>>>> 236M    2020-07-21 20:25
>> > >>>>>>>>>>  ./beam-pipeline-tempmByU6T/tmpOWj3Yr
>> > >>>>>>>>>> 236M    2020-07-21 20:21
>> > >>>>>>>>>>  ./beam-pipeline-tempV85xeK/tmppbQHB3
>> > >>>>>>>>>> 236M    2020-07-21 20:15
>> > >>>>>>>>>>  ./beam-pipeline-temp7dJROJ/tmpgOXPKW
>> > >>>>>>>>>> 111M    2020-07-23 00:57        ./pip-install-1JnyNE
>> > >>>>>>>>>> 105M    2020-07-23 00:17
>> ./beam-artifact1374651823280819755
>> > >>>>>>>>>> 105M    2020-07-23 00:16
>> ./beam-artifact5050755582921936972
>> > >>>>>>>>>> 105M    2020-07-23 00:16
>> ./beam-artifact1834064452502646289
>> > >>>>>>>>>> 105M    2020-07-23 00:15
>> ./beam-artifact682561790267074916
>> > >>>>>>>>>> 105M    2020-07-23 00:15
>> ./beam-artifact4691304965824489394
>> > >>>>>>>>>> 105M    2020-07-23 00:14
>> ./beam-artifact4050383819822604421
>> > >>>>>>>>>>
>> > >>>>>>>>>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <
>> > >>>>>>>>>> robertwb@google.com> wrote:
>> > >>>>>>>>>>
>> > >>>>>>>>>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <
>> > >>>>>>>>>>> tysonjh@google.com> wrote:
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>> Ah I see, thanks Kenn. I found some advice from the Apache
>> > >>>>>>>>>>>> infra wiki that also suggests using a tmpdir inside the
>> workspace [1]:
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> Procedures Projects can take to clean up disk space
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> Projects can help themselves and Infra by taking some basic
>> > >>>>>>>>>>>> steps to help clean up their jobs after themselves on the
>> build nodes.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>    1. Use a ./tmp dir in your jobs workspace. That way it
>> gets
>> > >>>>>>>>>>>>    cleaned up when job workspaces expire.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>> Tests should be (able to be) written to use the standard
>> > >>>>>>>>>>> temporary file mechanisms, and the environment set up on
>> Jenkins such that
>> > >>>>>>>>>>> that falls into the respective workspaces. Ideally this
>> should be as simple
>> > >>>>>>>>>>> as setting the TMPDIR (or similar) environment variable
>> (and making sure it
>> > >>>>>>>>>>> exists/is writable).
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>    1. Configure your jobs to wipe workspaces on start or
>> > >>>>>>>>>>>>    finish.
>> > >>>>>>>>>>>>    2. Configure your jobs to only keep 5 or 10 previous
>> builds.
>> > >>>>>>>>>>>>    3. Configure your jobs to only keep 5 or 10 previous
>> > >>>>>>>>>>>>    artifacts.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> [1]:
>> > >>>>>>>>>>>>
>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <
>> > >>>>>>>>>>>> kenn@apache.org> wrote:
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>> Those file listings look like the result of using standard
>> > >>>>>>>>>>>>> temp file APIs but with TMPDIR set to /tmp.
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <
>> > >>>>>>>>>>>>> tysonjh@google.com> wrote:
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> Jobs are hermetic as far as I can tell and use unique
>> > >>>>>>>>>>>>>> subdirectories inside of /tmp. Here is a quick look into
>> two examples:
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . |
>> sort
>> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20
>> > >>>>>>>>>>>>>> 1.6G    2020-07-21 02:25        .
>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:48
>> ./beam-pipeline-temp3ybuY4
>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:46
>> ./beam-pipeline-tempuxjiPT
>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:44
>> ./beam-pipeline-tempVpg1ME
>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:42
>> ./beam-pipeline-tempJ4EpyB
>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:39
>> ./beam-pipeline-tempepea7Q
>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:35
>> ./beam-pipeline-temp79qot2
>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:48
>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:46
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:44
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:42
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:39
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmpWy1vWX
>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:35
>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmpvN7vWA
>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:48
>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmprlh_di
>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:46
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:44
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:42
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:39
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmptYF1v1
>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:35
>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmplfV0Rg
>> > >>>>>>>>>>>>>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . |
>> sort
>> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20
>> > >>>>>>>>>>>>>> 817M    2020-07-21 02:26        .
>> > >>>>>>>>>>>>>> 242M    2020-07-19 12:14
>> ./beam-pipeline-tempUTXqlM
>> > >>>>>>>>>>>>>> 242M    2020-07-19 12:11
>> ./beam-pipeline-tempx3Yno3
>> > >>>>>>>>>>>>>> 242M    2020-07-19 12:05
>> ./beam-pipeline-tempyCrMYq
>> > >>>>>>>>>>>>>> 236M    2020-07-19 12:14
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpstXoL0
>> > >>>>>>>>>>>>>> 236M    2020-07-19 12:11
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpnnVn65
>> > >>>>>>>>>>>>>> 236M    2020-07-19 12:05
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpRF0iNs
>> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:14
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:11
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpsmmzqe
>> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:05
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:14
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpoj3orz
>> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:11
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmptng9sZ
>> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:05
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpWp6njc
>> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:14
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmphgdj35
>> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:11
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmp8ySXpm
>> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:05
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>> > >>>>>>>>>>>>>> 992K    2020-07-12 12:00        ./junit642086915811430564
>> > >>>>>>>>>>>>>> 988K    2020-07-12 12:00
>> ./junit642086915811430564/beam
>> > >>>>>>>>>>>>>> 984K    2020-07-12 12:00
>> > >>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes
>> > >>>>>>>>>>>>>> 980K    2020-07-12 12:00
>> > >>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes/0
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <
>> ehudm@google.com>
>> > >>>>>>>>>>>>>> wrote:
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> You're right, job workspaces should be hermetic.
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <
>> > >>>>>>>>>>>>>>> kenn@apache.org> wrote:
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>> I'm probably late to this discussion and missing
>> something,
>> > >>>>>>>>>>>>>>>> but why are we writing to /tmp at all? I would expect
>> TMPDIR to point
>> > >>>>>>>>>>>>>>>> somewhere inside the job directory that will be wiped
>> by Jenkins, and I
>> > >>>>>>>>>>>>>>>> would expect code to always create temp files via APIs
>> that respect this.
>> > >>>>>>>>>>>>>>>> Is Jenkins not cleaning up? Do we not have the ability
>> to set this up? Do
>> > >>>>>>>>>>>>>>>> we have bugs in our code (that we could probably find
>> by setting TMPDIR to
>> > >>>>>>>>>>>>>>>> somewhere not-/tmp and running the tests without write
>> permission to /tmp,
>> > >>>>>>>>>>>>>>>> etc)
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>> Kenn
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <
>> > >>>>>>>>>>>>>>>> altay@google.com> wrote:
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>> Related to workspace directory growth, +Udi Meiri
>> > >>>>>>>>>>>>>>>>> <eh...@google.com> filed a relevant issue previously
>> (
>> > >>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for
>> > >>>>>>>>>>>>>>>>> cleaning up workspace directory after successful
>> jobs. Alternatively, we
>> > >>>>>>>>>>>>>>>>> can consider periodically cleaning up the /src
>> directories.
>> > >>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>> I would suggest moving the cron task from internal
>> cron
>> > >>>>>>>>>>>>>>>>> scripts to the inventory job (
>> > >>>>>>>>>>>>>>>>>
>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51
>> ).
>> > >>>>>>>>>>>>>>>>> That way, we can see all the cron jobs as part of the
>> source tree, adjust
>> > >>>>>>>>>>>>>>>>> frequencies and clean up codes with PRs. I do not
>> know how internal cron
>> > >>>>>>>>>>>>>>>>> scripts are created, maintained, and how would they
>> be recreated for new
>> > >>>>>>>>>>>>>>>>> worker instances.
>> > >>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>> /cc +Tyson Hamilton <ty...@google.com>
>> > >>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>> > >>>>>>>>>>>>>>>>> damian.gadomski@polidea.com> wrote:
>> > >>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> Hey,
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> I've recently created a solution for the growing /tmp
>> > >>>>>>>>>>>>>>>>>> directory. Part of it is the job mentioned by Tyson:
>> > >>>>>>>>>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not
>> > >>>>>>>>>>>>>>>>>> triggered by cron and should be a last resort
>> solution for some strange
>> > >>>>>>>>>>>>>>>>>> cases.
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> Along with that job, I've also updated every worker
>> with
>> > >>>>>>>>>>>>>>>>>> an internal cron script. It's being executed once a
>> week and deletes all
>> > >>>>>>>>>>>>>>>>>> the files (and only files) that were not accessed
>> for at least three days.
>> > >>>>>>>>>>>>>>>>>> That's designed to be as safe as possible for the
>> running jobs on the
>> > >>>>>>>>>>>>>>>>>> worker (not to delete the files that are still in
>> use), and also to be
>> > >>>>>>>>>>>>>>>>>> insensitive to the current workload on the machine.
>> The cleanup will always
>> > >>>>>>>>>>>>>>>>>> happen, even if some long-running/stuck jobs are
>> blocking the machine.
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> I also think that currently the "No space left"
>> errors
>> > >>>>>>>>>>>>>>>>>> may be a consequence of growing workspace directory
>> rather than /tmp. I
>> > >>>>>>>>>>>>>>>>>> didn't do any detailed analysis but e.g. currently,
>> on
>> > >>>>>>>>>>>>>>>>>> apache-beam-jenkins-7 the workspace directory size
>> is 158 GB while /tmp is
>> > >>>>>>>>>>>>>>>>>> only 16 GB. We should either guarantee the disk size
>> to hold workspaces for
>> > >>>>>>>>>>>>>>>>>> all jobs (because eventually, every worker will
>> execute each job) or clear
>> > >>>>>>>>>>>>>>>>>> also the workspaces in some way.
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> Regards,
>> > >>>>>>>>>>>>>>>>>> Damian
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <
>> > >>>>>>>>>>>>>>>>>> mxm@apache.org> wrote:
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>> +1 for scheduling it via a cron job if it won't
>> lead to
>> > >>>>>>>>>>>>>>>>>>> test failures
>> > >>>>>>>>>>>>>>>>>>> while running. Not a Jenkins expert but maybe there
>> is
>> > >>>>>>>>>>>>>>>>>>> the notion of
>> > >>>>>>>>>>>>>>>>>>> running exclusively while no other tasks are
>> running?
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>> -Max
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>> > >>>>>>>>>>>>>>>>>>> > FYI there was a job introduced to do this in
>> Jenkins:
>> > >>>>>>>>>>>>>>>>>>> beam_Clean_tmp_directory
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> > Currently it needs to be run manually. I'm seeing
>> some
>> > >>>>>>>>>>>>>>>>>>> out of disk related errors in precommit tests
>> currently, perhaps we should
>> > >>>>>>>>>>>>>>>>>>> schedule this job with cron?
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <
>> > >>>>>>>>>>>>>>>>>>> heejong@google.com> wrote:
>> > >>>>>>>>>>>>>>>>>>> >> Still seeing no space left on device errors on
>> > >>>>>>>>>>>>>>>>>>> jenkins-7 (for example:
>> > >>>>>>>>>>>>>>>>>>> >>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>> > >>>>>>>>>>>>>>>>>>> )
>> > >>>>>>>>>>>>>>>>>>> >>
>> > >>>>>>>>>>>>>>>>>>> >>
>> > >>>>>>>>>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <
>> > >>>>>>>>>>>>>>>>>>> amyrvold@google.com> wrote:
>> > >>>>>>>>>>>>>>>>>>> >>
>> > >>>>>>>>>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by
>> jenkins
>> > >>>>>>>>>>>>>>>>>>> older than 3 days.
>> > >>>>>>>>>>>>>>>>>>> >>> Agree that we need a longer term solution.
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>> >>> Passing recent tests on all executors except
>> > >>>>>>>>>>>>>>>>>>> jenkins-12, which has not
>> > >>>>>>>>>>>>>>>>>>> >>> scheduled recent builds for the past 13 days.
>> Not
>> > >>>>>>>>>>>>>>>>>>> scheduling:
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>> Recent passing builds:
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <
>> > >>>>>>>>>>>>>>>>>>> altay@google.com> wrote:
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>> >>>> +Alan Myrvold <am...@google.com> is doing
>> a one
>> > >>>>>>>>>>>>>>>>>>> time cleanup. I agree
>> > >>>>>>>>>>>>>>>>>>> >>>> that we need to have a solution to automate
>> this
>> > >>>>>>>>>>>>>>>>>>> task or address the root
>> > >>>>>>>>>>>>>>>>>>> >>>> cause of the buildup.
>> > >>>>>>>>>>>>>>>>>>> >>>>
>> > >>>>>>>>>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>> > >>>>>>>>>>>>>>>>>>> michal.walenia@polidea.com>
>> > >>>>>>>>>>>>>>>>>>> >>>> wrote:
>> > >>>>>>>>>>>>>>>>>>> >>>>
>> > >>>>>>>>>>>>>>>>>>> >>>>> Hi there,
>> > >>>>>>>>>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins
>> workers
>> > >>>>>>>>>>>>>>>>>>> again. Nodes 1 and 7
>> > >>>>>>>>>>>>>>>>>>> >>>>> both fail jobs with "No space left on device".
>> > >>>>>>>>>>>>>>>>>>> >>>>> Who is the best person to contact in these
>> cases
>> > >>>>>>>>>>>>>>>>>>> (someone with access
>> > >>>>>>>>>>>>>>>>>>> >>>>> permissions to the workers).
>> > >>>>>>>>>>>>>>>>>>> >>>>>
>> > >>>>>>>>>>>>>>>>>>> >>>>> I also noticed that such errors are becoming
>> more
>> > >>>>>>>>>>>>>>>>>>> and more frequent
>> > >>>>>>>>>>>>>>>>>>> >>>>> recently and I'd like to discuss how can this
>> be
>> > >>>>>>>>>>>>>>>>>>> remedied. Can a cleanup
>> > >>>>>>>>>>>>>>>>>>> >>>>> task be automated on Jenkins somehow?
>> > >>>>>>>>>>>>>>>>>>> >>>>>
>> > >>>>>>>>>>>>>>>>>>> >>>>> Regards
>> > >>>>>>>>>>>>>>>>>>> >>>>> Michal
>> > >>>>>>>>>>>>>>>>>>> >>>>>
>> > >>>>>>>>>>>>>>>>>>> >>>>> --
>> > >>>>>>>>>>>>>>>>>>> >>>>>
>> > >>>>>>>>>>>>>>>>>>> >>>>> Michał Walenia
>> > >>>>>>>>>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software
>> > >>>>>>>>>>>>>>>>>>> Engineer
>> > >>>>>>>>>>>>>>>>>>> >>>>>
>> > >>>>>>>>>>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002>
>> <+48%20791%20432%20002> <
>> > >>>>>>>>>>>>>>>>>>> +48791432002 <+48%20791%20432%20002>
>> <+48%20791%20432%20002>>
>> > >>>>>>>>>>>>>>>>>>> >>>>> E: michal.walenia@polidea.com
>> > >>>>>>>>>>>>>>>>>>> >>>>>
>> > >>>>>>>>>>>>>>>>>>> >>>>> Unique Tech
>> > >>>>>>>>>>>>>>>>>>> >>>>> Check out our projects! <
>> > >>>>>>>>>>>>>>>>>>> https://www.polidea.com/our-work>
>> > >>>>>>>>>>>>>>>>>>> >>>>>
>> > >>>>>>>>>>>>>>>>>>> >>>>
>> > >>>>>>>>>>>>>>>>>>> >>
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>
>> >
>>
>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Udi Meiri <eh...@google.com>.
What about the workspaces, which can take up 175GB in some cases (see
above)?
I'm working on getting them cleaned up automatically:
https://github.com/apache/beam/pull/12326

My opinion is that we would get more mileage out of fixing the jobs that
leave behind files in /tmp and images/containers in Docker.
This would also help keep development machines clean.


On Mon, Jul 27, 2020 at 5:31 PM Tyson Hamilton <ty...@google.com> wrote:

> Here is a summery of how I understand things,
>
>   - /tmp and /var/lib/docker are the culprit for filling up disks
>   - inventory Jenkins job runs every 12 hours and runs a docker prune to
> clean up images older than 24hr
>   - crontab on each machine cleans up /tmp files older than three days
> weekly
>
> This doesn't seem to be working since we're still running out of disk
> periodically and requiring manual intervention. Knobs and options we have
> available:
>
>   1. increase frequency of deleting files
>   2. decrease the number of days required to delete a file (e.g. older
> than 2 days)
>
> The execution methods we have available are:
>
>   A. cron
>     - pro: runs even if a job gets stuck in Jenkins due to full disk
>     - con: config baked into VM which is tough to update, not discoverable
> or documented well
>   B. inventory job
>     - pro: easy to update, runs every 12h already
>     - con: could get stuck if Jenkins agent runs out of disk or is
> otherwise stuck, tied to all other inventory job frequency
>   C. configure startup scripts for the VMs that set up the cron job
> anytime the VM is restarted
>     - pro: similar to A. and easy to update
>     - con: similar to A.
>
> Between the three I prefer B. because it is consistent with other
> inventory jobs. If it ends up that stuck jobs prohibit scheduling of the
> inventory job often we could further investigate C to avoid having to
> rebuild the VM images repeatedly.
>
> Any objections or comments? If not, we'll go forward with B. and reduce
> the date check from 3 days to 2 days.
>
>
> On 2020/07/24 20:13:29, Ahmet Altay <al...@google.com> wrote:
> > Tests may not be doing docker cleanup. Inventory job runs a docker prune
> > every 12 hours for images older than 24 hrs [1]. Randomly looking at one
> of
> > the recent runs [2], it cleaned up a long list of containers consuming
> > 30+GB space. That should be just 12 hours worth of containers.
> >
> > [1]
> >
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L69
> > [2]
> >
> https://ci-beam.apache.org/job/beam_Inventory_apache-beam-jenkins-14/501/console
> >
> > On Fri, Jul 24, 2020 at 1:07 PM Tyson Hamilton <ty...@google.com>
> wrote:
> >
> > > Yes, these are on the same volume in the /var/lib/docker directory. I'm
> > > unsure if they clean up leftover images.
> > >
> > > On Fri, Jul 24, 2020 at 12:52 PM Udi Meiri <eh...@google.com> wrote:
> > >
> > >> I forgot Docker images:
> > >>
> > >> ehudm@apache-ci-beam-jenkins-3:~$ sudo docker system df
> > >> TYPE                TOTAL               ACTIVE              SIZE
> > >>        RECLAIMABLE
> > >> Images              88                  9                   125.4GB
> > >>       124.2GB (99%)
> > >> Containers          40                  4                   7.927GB
> > >>       7.871GB (99%)
> > >> Local Volumes       47                  0                   3.165GB
> > >>       3.165GB (100%)
> > >> Build Cache         0                   0                   0B
> > >>        0B
> > >>
> > >> There are about 90 images on that machine, with all but 1 less than 48
> > >> hours old.
> > >> I think the docker test jobs need to try harder at cleaning up their
> > >> leftover images. (assuming they're already doing it?)
> > >>
> > >> On Fri, Jul 24, 2020 at 12:31 PM Udi Meiri <eh...@google.com> wrote:
> > >>
> > >>> The additional slots (@3 directories) take up even more space now
> than
> > >>> before.
> > >>>
> > >>> I'm testing out https://github.com/apache/beam/pull/12326 which
> could
> > >>> help by cleaning up workspaces after a run (just started a seed job).
> > >>>
> > >>> On Fri, Jul 24, 2020 at 12:13 PM Tyson Hamilton <ty...@google.com>
> > >>> wrote:
> > >>>
> > >>>> 664M    beam_PreCommit_JavaPortabilityApi_Commit
> > >>>> 656M    beam_PreCommit_JavaPortabilityApi_Commit@2
> > >>>> 611M    beam_PreCommit_JavaPortabilityApi_Cron
> > >>>> 616M    beam_PreCommit_JavaPortabilityApiJava11_Commit
> > >>>> 598M    beam_PreCommit_JavaPortabilityApiJava11_Commit@2
> > >>>> 662M    beam_PreCommit_JavaPortabilityApiJava11_Cron
> > >>>> 2.9G    beam_PreCommit_Portable_Python_Commit
> > >>>> 2.9G    beam_PreCommit_Portable_Python_Commit@2
> > >>>> 1.7G    beam_PreCommit_Portable_Python_Commit@3
> > >>>> 3.4G    beam_PreCommit_Portable_Python_Cron
> > >>>> 1.9G    beam_PreCommit_Python2_PVR_Flink_Commit
> > >>>> 1.4G    beam_PreCommit_Python2_PVR_Flink_Cron
> > >>>> 1.3G    beam_PreCommit_Python2_PVR_Flink_Phrase
> > >>>> 6.2G    beam_PreCommit_Python_Commit
> > >>>> 7.5G    beam_PreCommit_Python_Commit@2
> > >>>> 7.5G    beam_PreCommit_Python_Cron
> > >>>> 1012M   beam_PreCommit_PythonDocker_Commit
> > >>>> 1011M   beam_PreCommit_PythonDocker_Commit@2
> > >>>> 1011M   beam_PreCommit_PythonDocker_Commit@3
> > >>>> 1002M   beam_PreCommit_PythonDocker_Cron
> > >>>> 877M    beam_PreCommit_PythonFormatter_Commit
> > >>>> 988M    beam_PreCommit_PythonFormatter_Cron
> > >>>> 986M    beam_PreCommit_PythonFormatter_Phrase
> > >>>> 1.7G    beam_PreCommit_PythonLint_Commit
> > >>>> 2.1G    beam_PreCommit_PythonLint_Cron
> > >>>> 7.5G    beam_PreCommit_Python_Phrase
> > >>>> 346M    beam_PreCommit_RAT_Commit
> > >>>> 341M    beam_PreCommit_RAT_Cron
> > >>>> 338M    beam_PreCommit_Spotless_Commit
> > >>>> 339M    beam_PreCommit_Spotless_Cron
> > >>>> 5.5G    beam_PreCommit_SQL_Commit
> > >>>> 5.5G    beam_PreCommit_SQL_Cron
> > >>>> 5.5G    beam_PreCommit_SQL_Java11_Commit
> > >>>> 750M    beam_PreCommit_Website_Commit
> > >>>> 750M    beam_PreCommit_Website_Commit@2
> > >>>> 750M    beam_PreCommit_Website_Cron
> > >>>> 764M    beam_PreCommit_Website_Stage_GCS_Commit
> > >>>> 771M    beam_PreCommit_Website_Stage_GCS_Cron
> > >>>> 336M    beam_Prober_CommunityMetrics
> > >>>> 693M    beam_python_mongoio_load_test
> > >>>> 339M    beam_SeedJob
> > >>>> 333M    beam_SeedJob_Standalone
> > >>>> 334M    beam_sonarqube_report
> > >>>> 556M    beam_SQLBigQueryIO_Batch_Performance_Test_Java
> > >>>> 175G    total
> > >>>>
> > >>>> On Fri, Jul 24, 2020 at 12:04 PM Tyson Hamilton <tysonjh@google.com
> >
> > >>>> wrote:
> > >>>>
> > >>>>> Ya looks like something in the workspaces is taking up room:
> > >>>>>
> > >>>>> @apache-ci-beam-jenkins-8:/home/jenkins$ sudo du -shc .
> > >>>>> 191G    .
> > >>>>> 191G    total
> > >>>>>
> > >>>>>
> > >>>>> On Fri, Jul 24, 2020 at 11:44 AM Tyson Hamilton <
> tysonjh@google.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Node 8 is also full. The partition that /tmp is on is here:
> > >>>>>>
> > >>>>>> Filesystem      Size  Used Avail Use% Mounted on
> > >>>>>> /dev/sda1       485G  482G  2.9G 100% /
> > >>>>>>
> > >>>>>> however after cleaning up tmp with the crontab command, there is
> only
> > >>>>>> 8G usage yet it still remains 100% full:
> > >>>>>>
> > >>>>>> @apache-ci-beam-jenkins-8:/tmp$ sudo du -shc /tmp
> > >>>>>> 8.0G    /tmp
> > >>>>>> 8.0G    total
> > >>>>>>
> > >>>>>> The workspaces are in the /home/jenkins/jenkins-slave/workspace
> > >>>>>> directory. When I run a du on that, it takes really long. I'll
> let it keep
> > >>>>>> running for a while to see if it ever returns a result but so far
> this
> > >>>>>> seems suspect.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> On Fri, Jul 24, 2020 at 11:19 AM Tyson Hamilton <
> tysonjh@google.com>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Everything I've been looking at is in the /tmp dir. Where are the
> > >>>>>>> workspaces, or what are the named?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Fri, Jul 24, 2020 at 11:03 AM Udi Meiri <eh...@google.com>
> wrote:
> > >>>>>>>
> > >>>>>>>> I'm curious to what you find. Was it /tmp or the workspaces
> using
> > >>>>>>>> up the space?
> > >>>>>>>>
> > >>>>>>>> On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton <
> tysonjh@google.com>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Bleck. I just realized that it is 'offline' so that won't work.
> > >>>>>>>>> I'll clean up manually on the machine using the cron command.
> > >>>>>>>>>
> > >>>>>>>>> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <
> > >>>>>>>>> tysonjh@google.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Something isn't working with the current set up because node
> 15
> > >>>>>>>>>> appears to be out of space and is currently 'offline'
> according to Jenkins.
> > >>>>>>>>>> Can someone run the cleanup job? The machine is full,
> > >>>>>>>>>>
> > >>>>>>>>>> @apache-ci-beam-jenkins-15:/tmp$ df -h
> > >>>>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
> > >>>>>>>>>> udev             52G     0   52G   0% /dev
> > >>>>>>>>>> tmpfs            11G  265M   10G   3% /run
> > >>>>>>>>>> */dev/sda1       485G  484G  880M 100% /*
> > >>>>>>>>>> tmpfs            52G     0   52G   0% /dev/shm
> > >>>>>>>>>> tmpfs           5.0M     0  5.0M   0% /run/lock
> > >>>>>>>>>> tmpfs            52G     0   52G   0% /sys/fs/cgroup
> > >>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1017
> > >>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1037
> > >>>>>>>>>>
> > >>>>>>>>>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort
> -rhk
> > >>>>>>>>>> 1,1 | head -n 20
> > >>>>>>>>>> 20G     2020-07-24 17:52        .
> > >>>>>>>>>> 580M    2020-07-22 17:31        ./junit1031982597110125586
> > >>>>>>>>>> 517M    2020-07-22 17:31
> > >>>>>>>>>>
> ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
> > >>>>>>>>>> 517M    2020-07-22 17:31
> > >>>>>>>>>>  ./junit1031982597110125586/junit8739924829337821410
> > >>>>>>>>>> 263M    2020-07-22 12:23        ./pip-install-2GUhO_
> > >>>>>>>>>> 263M    2020-07-20 09:30        ./pip-install-sxgwqr
> > >>>>>>>>>> 263M    2020-07-17 13:56        ./pip-install-bWSKIV
> > >>>>>>>>>> 242M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T
> > >>>>>>>>>> 242M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK
> > >>>>>>>>>> 242M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ
> > >>>>>>>>>> 236M    2020-07-21 20:25
> > >>>>>>>>>>  ./beam-pipeline-tempmByU6T/tmpOWj3Yr
> > >>>>>>>>>> 236M    2020-07-21 20:21
> > >>>>>>>>>>  ./beam-pipeline-tempV85xeK/tmppbQHB3
> > >>>>>>>>>> 236M    2020-07-21 20:15
> > >>>>>>>>>>  ./beam-pipeline-temp7dJROJ/tmpgOXPKW
> > >>>>>>>>>> 111M    2020-07-23 00:57        ./pip-install-1JnyNE
> > >>>>>>>>>> 105M    2020-07-23 00:17
> ./beam-artifact1374651823280819755
> > >>>>>>>>>> 105M    2020-07-23 00:16
> ./beam-artifact5050755582921936972
> > >>>>>>>>>> 105M    2020-07-23 00:16
> ./beam-artifact1834064452502646289
> > >>>>>>>>>> 105M    2020-07-23 00:15
> ./beam-artifact682561790267074916
> > >>>>>>>>>> 105M    2020-07-23 00:15
> ./beam-artifact4691304965824489394
> > >>>>>>>>>> 105M    2020-07-23 00:14
> ./beam-artifact4050383819822604421
> > >>>>>>>>>>
> > >>>>>>>>>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <
> > >>>>>>>>>> robertwb@google.com> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <
> > >>>>>>>>>>> tysonjh@google.com> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Ah I see, thanks Kenn. I found some advice from the Apache
> > >>>>>>>>>>>> infra wiki that also suggests using a tmpdir inside the
> workspace [1]:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Procedures Projects can take to clean up disk space
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Projects can help themselves and Infra by taking some basic
> > >>>>>>>>>>>> steps to help clean up their jobs after themselves on the
> build nodes.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>    1. Use a ./tmp dir in your jobs workspace. That way it
> gets
> > >>>>>>>>>>>>    cleaned up when job workspaces expire.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>> Tests should be (able to be) written to use the standard
> > >>>>>>>>>>> temporary file mechanisms, and the environment set up on
> Jenkins such that
> > >>>>>>>>>>> that falls into the respective workspaces. Ideally this
> should be as simple
> > >>>>>>>>>>> as setting the TMPDIR (or similar) environment variable (and
> making sure it
> > >>>>>>>>>>> exists/is writable).
> > >>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>    1. Configure your jobs to wipe workspaces on start or
> > >>>>>>>>>>>>    finish.
> > >>>>>>>>>>>>    2. Configure your jobs to only keep 5 or 10 previous
> builds.
> > >>>>>>>>>>>>    3. Configure your jobs to only keep 5 or 10 previous
> > >>>>>>>>>>>>    artifacts.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> [1]:
> > >>>>>>>>>>>>
> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <
> > >>>>>>>>>>>> kenn@apache.org> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Those file listings look like the result of using standard
> > >>>>>>>>>>>>> temp file APIs but with TMPDIR set to /tmp.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <
> > >>>>>>>>>>>>> tysonjh@google.com> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Jobs are hermetic as far as I can tell and use unique
> > >>>>>>>>>>>>>> subdirectories inside of /tmp. Here is a quick look into
> two examples:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . |
> sort
> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20
> > >>>>>>>>>>>>>> 1.6G    2020-07-21 02:25        .
> > >>>>>>>>>>>>>> 242M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4
> > >>>>>>>>>>>>>> 242M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT
> > >>>>>>>>>>>>>> 242M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME
> > >>>>>>>>>>>>>> 242M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB
> > >>>>>>>>>>>>>> 242M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q
> > >>>>>>>>>>>>>> 242M    2020-07-17 18:35        ./beam-pipeline-temp79qot2
> > >>>>>>>>>>>>>> 236M    2020-07-17 18:48
> > >>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
> > >>>>>>>>>>>>>> 236M    2020-07-17 18:46
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
> > >>>>>>>>>>>>>> 236M    2020-07-17 18:44
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpxSm8pX
> > >>>>>>>>>>>>>> 236M    2020-07-17 18:42
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpMZJU76
> > >>>>>>>>>>>>>> 236M    2020-07-17 18:39
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmpWy1vWX
> > >>>>>>>>>>>>>> 236M    2020-07-17 18:35
> > >>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmpvN7vWA
> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:48
> > >>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmprlh_di
> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:46
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpLmVWfe
> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:44
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpvrxbY7
> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:42
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:39
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmptYF1v1
> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:35
> > >>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmplfV0Rg
> > >>>>>>>>>>>>>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . |
> sort
> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20
> > >>>>>>>>>>>>>> 817M    2020-07-21 02:26        .
> > >>>>>>>>>>>>>> 242M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM
> > >>>>>>>>>>>>>> 242M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3
> > >>>>>>>>>>>>>> 242M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq
> > >>>>>>>>>>>>>> 236M    2020-07-19 12:14
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpstXoL0
> > >>>>>>>>>>>>>> 236M    2020-07-19 12:11
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpnnVn65
> > >>>>>>>>>>>>>> 236M    2020-07-19 12:05
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpRF0iNs
> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:14
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:11
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpsmmzqe
> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:05
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:14
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpoj3orz
> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:11
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmptng9sZ
> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:05
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpWp6njc
> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:14
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmphgdj35
> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:11
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmp8ySXpm
> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:05
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
> > >>>>>>>>>>>>>> 992K    2020-07-12 12:00        ./junit642086915811430564
> > >>>>>>>>>>>>>> 988K    2020-07-12 12:00
> ./junit642086915811430564/beam
> > >>>>>>>>>>>>>> 984K    2020-07-12 12:00
> > >>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes
> > >>>>>>>>>>>>>> 980K    2020-07-12 12:00
> > >>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes/0
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <
> ehudm@google.com>
> > >>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> You're right, job workspaces should be hermetic.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <
> > >>>>>>>>>>>>>>> kenn@apache.org> wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I'm probably late to this discussion and missing
> something,
> > >>>>>>>>>>>>>>>> but why are we writing to /tmp at all? I would expect
> TMPDIR to point
> > >>>>>>>>>>>>>>>> somewhere inside the job directory that will be wiped
> by Jenkins, and I
> > >>>>>>>>>>>>>>>> would expect code to always create temp files via APIs
> that respect this.
> > >>>>>>>>>>>>>>>> Is Jenkins not cleaning up? Do we not have the ability
> to set this up? Do
> > >>>>>>>>>>>>>>>> we have bugs in our code (that we could probably find
> by setting TMPDIR to
> > >>>>>>>>>>>>>>>> somewhere not-/tmp and running the tests without write
> permission to /tmp,
> > >>>>>>>>>>>>>>>> etc)
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Kenn
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <
> > >>>>>>>>>>>>>>>> altay@google.com> wrote:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Related to workspace directory growth, +Udi Meiri
> > >>>>>>>>>>>>>>>>> <eh...@google.com> filed a relevant issue previously (
> > >>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for
> > >>>>>>>>>>>>>>>>> cleaning up workspace directory after successful jobs.
> Alternatively, we
> > >>>>>>>>>>>>>>>>> can consider periodically cleaning up the /src
> directories.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> I would suggest moving the cron task from internal cron
> > >>>>>>>>>>>>>>>>> scripts to the inventory job (
> > >>>>>>>>>>>>>>>>>
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51
> ).
> > >>>>>>>>>>>>>>>>> That way, we can see all the cron jobs as part of the
> source tree, adjust
> > >>>>>>>>>>>>>>>>> frequencies and clean up codes with PRs. I do not know
> how internal cron
> > >>>>>>>>>>>>>>>>> scripts are created, maintained, and how would they be
> recreated for new
> > >>>>>>>>>>>>>>>>> worker instances.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> /cc +Tyson Hamilton <ty...@google.com>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
> > >>>>>>>>>>>>>>>>> damian.gadomski@polidea.com> wrote:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Hey,
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> I've recently created a solution for the growing /tmp
> > >>>>>>>>>>>>>>>>>> directory. Part of it is the job mentioned by Tyson:
> > >>>>>>>>>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not
> > >>>>>>>>>>>>>>>>>> triggered by cron and should be a last resort
> solution for some strange
> > >>>>>>>>>>>>>>>>>> cases.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Along with that job, I've also updated every worker
> with
> > >>>>>>>>>>>>>>>>>> an internal cron script. It's being executed once a
> week and deletes all
> > >>>>>>>>>>>>>>>>>> the files (and only files) that were not accessed for
> at least three days.
> > >>>>>>>>>>>>>>>>>> That's designed to be as safe as possible for the
> running jobs on the
> > >>>>>>>>>>>>>>>>>> worker (not to delete the files that are still in
> use), and also to be
> > >>>>>>>>>>>>>>>>>> insensitive to the current workload on the machine.
> The cleanup will always
> > >>>>>>>>>>>>>>>>>> happen, even if some long-running/stuck jobs are
> blocking the machine.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> I also think that currently the "No space left" errors
> > >>>>>>>>>>>>>>>>>> may be a consequence of growing workspace directory
> rather than /tmp. I
> > >>>>>>>>>>>>>>>>>> didn't do any detailed analysis but e.g. currently, on
> > >>>>>>>>>>>>>>>>>> apache-beam-jenkins-7 the workspace directory size is
> 158 GB while /tmp is
> > >>>>>>>>>>>>>>>>>> only 16 GB. We should either guarantee the disk size
> to hold workspaces for
> > >>>>>>>>>>>>>>>>>> all jobs (because eventually, every worker will
> execute each job) or clear
> > >>>>>>>>>>>>>>>>>> also the workspaces in some way.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Regards,
> > >>>>>>>>>>>>>>>>>> Damian
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <
> > >>>>>>>>>>>>>>>>>> mxm@apache.org> wrote:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> +1 for scheduling it via a cron job if it won't lead
> to
> > >>>>>>>>>>>>>>>>>>> test failures
> > >>>>>>>>>>>>>>>>>>> while running. Not a Jenkins expert but maybe there
> is
> > >>>>>>>>>>>>>>>>>>> the notion of
> > >>>>>>>>>>>>>>>>>>> running exclusively while no other tasks are running?
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> -Max
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
> > >>>>>>>>>>>>>>>>>>> > FYI there was a job introduced to do this in
> Jenkins:
> > >>>>>>>>>>>>>>>>>>> beam_Clean_tmp_directory
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> > Currently it needs to be run manually. I'm seeing
> some
> > >>>>>>>>>>>>>>>>>>> out of disk related errors in precommit tests
> currently, perhaps we should
> > >>>>>>>>>>>>>>>>>>> schedule this job with cron?
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <
> > >>>>>>>>>>>>>>>>>>> heejong@google.com> wrote:
> > >>>>>>>>>>>>>>>>>>> >> Still seeing no space left on device errors on
> > >>>>>>>>>>>>>>>>>>> jenkins-7 (for example:
> > >>>>>>>>>>>>>>>>>>> >>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
> > >>>>>>>>>>>>>>>>>>> )
> > >>>>>>>>>>>>>>>>>>> >>
> > >>>>>>>>>>>>>>>>>>> >>
> > >>>>>>>>>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <
> > >>>>>>>>>>>>>>>>>>> amyrvold@google.com> wrote:
> > >>>>>>>>>>>>>>>>>>> >>
> > >>>>>>>>>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by
> jenkins
> > >>>>>>>>>>>>>>>>>>> older than 3 days.
> > >>>>>>>>>>>>>>>>>>> >>> Agree that we need a longer term solution.
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>> >>> Passing recent tests on all executors except
> > >>>>>>>>>>>>>>>>>>> jenkins-12, which has not
> > >>>>>>>>>>>>>>>>>>> >>> scheduled recent builds for the past 13 days. Not
> > >>>>>>>>>>>>>>>>>>> scheduling:
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>> Recent passing builds:
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <
> > >>>>>>>>>>>>>>>>>>> altay@google.com> wrote:
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>> >>>> +Alan Myrvold <am...@google.com> is doing a
> one
> > >>>>>>>>>>>>>>>>>>> time cleanup. I agree
> > >>>>>>>>>>>>>>>>>>> >>>> that we need to have a solution to automate this
> > >>>>>>>>>>>>>>>>>>> task or address the root
> > >>>>>>>>>>>>>>>>>>> >>>> cause of the buildup.
> > >>>>>>>>>>>>>>>>>>> >>>>
> > >>>>>>>>>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
> > >>>>>>>>>>>>>>>>>>> michal.walenia@polidea.com>
> > >>>>>>>>>>>>>>>>>>> >>>> wrote:
> > >>>>>>>>>>>>>>>>>>> >>>>
> > >>>>>>>>>>>>>>>>>>> >>>>> Hi there,
> > >>>>>>>>>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins workers
> > >>>>>>>>>>>>>>>>>>> again. Nodes 1 and 7
> > >>>>>>>>>>>>>>>>>>> >>>>> both fail jobs with "No space left on device".
> > >>>>>>>>>>>>>>>>>>> >>>>> Who is the best person to contact in these
> cases
> > >>>>>>>>>>>>>>>>>>> (someone with access
> > >>>>>>>>>>>>>>>>>>> >>>>> permissions to the workers).
> > >>>>>>>>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>>>>>>>> >>>>> I also noticed that such errors are becoming
> more
> > >>>>>>>>>>>>>>>>>>> and more frequent
> > >>>>>>>>>>>>>>>>>>> >>>>> recently and I'd like to discuss how can this
> be
> > >>>>>>>>>>>>>>>>>>> remedied. Can a cleanup
> > >>>>>>>>>>>>>>>>>>> >>>>> task be automated on Jenkins somehow?
> > >>>>>>>>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>>>>>>>> >>>>> Regards
> > >>>>>>>>>>>>>>>>>>> >>>>> Michal
> > >>>>>>>>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>>>>>>>> >>>>> --
> > >>>>>>>>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>>>>>>>> >>>>> Michał Walenia
> > >>>>>>>>>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software
> > >>>>>>>>>>>>>>>>>>> Engineer
> > >>>>>>>>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002>
> <+48%20791%20432%20002> <
> > >>>>>>>>>>>>>>>>>>> +48791432002 <+48%20791%20432%20002>
> <+48%20791%20432%20002>>
> > >>>>>>>>>>>>>>>>>>> >>>>> E: michal.walenia@polidea.com
> > >>>>>>>>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>>>>>>>> >>>>> Unique Tech
> > >>>>>>>>>>>>>>>>>>> >>>>> Check out our projects! <
> > >>>>>>>>>>>>>>>>>>> https://www.polidea.com/our-work>
> > >>>>>>>>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>>>>>>>> >>>>
> > >>>>>>>>>>>>>>>>>>> >>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> >
>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Tyson Hamilton <ty...@google.com>.
Here is a summery of how I understand things,

  - /tmp and /var/lib/docker are the culprit for filling up disks
  - inventory Jenkins job runs every 12 hours and runs a docker prune to clean up images older than 24hr
  - crontab on each machine cleans up /tmp files older than three days weekly

This doesn't seem to be working since we're still running out of disk periodically and requiring manual intervention. Knobs and options we have available:

  1. increase frequency of deleting files
  2. decrease the number of days required to delete a file (e.g. older than 2 days)
 
The execution methods we have available are:

  A. cron
    - pro: runs even if a job gets stuck in Jenkins due to full disk
    - con: config baked into VM which is tough to update, not discoverable or documented well
  B. inventory job
    - pro: easy to update, runs every 12h already
    - con: could get stuck if Jenkins agent runs out of disk or is otherwise stuck, tied to all other inventory job frequency
  C. configure startup scripts for the VMs that set up the cron job anytime the VM is restarted
    - pro: similar to A. and easy to update
    - con: similar to A.

Between the three I prefer B. because it is consistent with other inventory jobs. If it ends up that stuck jobs prohibit scheduling of the inventory job often we could further investigate C to avoid having to rebuild the VM images repeatedly.

Any objections or comments? If not, we'll go forward with B. and reduce the date check from 3 days to 2 days.


On 2020/07/24 20:13:29, Ahmet Altay <al...@google.com> wrote: 
> Tests may not be doing docker cleanup. Inventory job runs a docker prune
> every 12 hours for images older than 24 hrs [1]. Randomly looking at one of
> the recent runs [2], it cleaned up a long list of containers consuming
> 30+GB space. That should be just 12 hours worth of containers.
> 
> [1]
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L69
> [2]
> https://ci-beam.apache.org/job/beam_Inventory_apache-beam-jenkins-14/501/console
> 
> On Fri, Jul 24, 2020 at 1:07 PM Tyson Hamilton <ty...@google.com> wrote:
> 
> > Yes, these are on the same volume in the /var/lib/docker directory. I'm
> > unsure if they clean up leftover images.
> >
> > On Fri, Jul 24, 2020 at 12:52 PM Udi Meiri <eh...@google.com> wrote:
> >
> >> I forgot Docker images:
> >>
> >> ehudm@apache-ci-beam-jenkins-3:~$ sudo docker system df
> >> TYPE                TOTAL               ACTIVE              SIZE
> >>        RECLAIMABLE
> >> Images              88                  9                   125.4GB
> >>       124.2GB (99%)
> >> Containers          40                  4                   7.927GB
> >>       7.871GB (99%)
> >> Local Volumes       47                  0                   3.165GB
> >>       3.165GB (100%)
> >> Build Cache         0                   0                   0B
> >>        0B
> >>
> >> There are about 90 images on that machine, with all but 1 less than 48
> >> hours old.
> >> I think the docker test jobs need to try harder at cleaning up their
> >> leftover images. (assuming they're already doing it?)
> >>
> >> On Fri, Jul 24, 2020 at 12:31 PM Udi Meiri <eh...@google.com> wrote:
> >>
> >>> The additional slots (@3 directories) take up even more space now than
> >>> before.
> >>>
> >>> I'm testing out https://github.com/apache/beam/pull/12326 which could
> >>> help by cleaning up workspaces after a run (just started a seed job).
> >>>
> >>> On Fri, Jul 24, 2020 at 12:13 PM Tyson Hamilton <ty...@google.com>
> >>> wrote:
> >>>
> >>>> 664M    beam_PreCommit_JavaPortabilityApi_Commit
> >>>> 656M    beam_PreCommit_JavaPortabilityApi_Commit@2
> >>>> 611M    beam_PreCommit_JavaPortabilityApi_Cron
> >>>> 616M    beam_PreCommit_JavaPortabilityApiJava11_Commit
> >>>> 598M    beam_PreCommit_JavaPortabilityApiJava11_Commit@2
> >>>> 662M    beam_PreCommit_JavaPortabilityApiJava11_Cron
> >>>> 2.9G    beam_PreCommit_Portable_Python_Commit
> >>>> 2.9G    beam_PreCommit_Portable_Python_Commit@2
> >>>> 1.7G    beam_PreCommit_Portable_Python_Commit@3
> >>>> 3.4G    beam_PreCommit_Portable_Python_Cron
> >>>> 1.9G    beam_PreCommit_Python2_PVR_Flink_Commit
> >>>> 1.4G    beam_PreCommit_Python2_PVR_Flink_Cron
> >>>> 1.3G    beam_PreCommit_Python2_PVR_Flink_Phrase
> >>>> 6.2G    beam_PreCommit_Python_Commit
> >>>> 7.5G    beam_PreCommit_Python_Commit@2
> >>>> 7.5G    beam_PreCommit_Python_Cron
> >>>> 1012M   beam_PreCommit_PythonDocker_Commit
> >>>> 1011M   beam_PreCommit_PythonDocker_Commit@2
> >>>> 1011M   beam_PreCommit_PythonDocker_Commit@3
> >>>> 1002M   beam_PreCommit_PythonDocker_Cron
> >>>> 877M    beam_PreCommit_PythonFormatter_Commit
> >>>> 988M    beam_PreCommit_PythonFormatter_Cron
> >>>> 986M    beam_PreCommit_PythonFormatter_Phrase
> >>>> 1.7G    beam_PreCommit_PythonLint_Commit
> >>>> 2.1G    beam_PreCommit_PythonLint_Cron
> >>>> 7.5G    beam_PreCommit_Python_Phrase
> >>>> 346M    beam_PreCommit_RAT_Commit
> >>>> 341M    beam_PreCommit_RAT_Cron
> >>>> 338M    beam_PreCommit_Spotless_Commit
> >>>> 339M    beam_PreCommit_Spotless_Cron
> >>>> 5.5G    beam_PreCommit_SQL_Commit
> >>>> 5.5G    beam_PreCommit_SQL_Cron
> >>>> 5.5G    beam_PreCommit_SQL_Java11_Commit
> >>>> 750M    beam_PreCommit_Website_Commit
> >>>> 750M    beam_PreCommit_Website_Commit@2
> >>>> 750M    beam_PreCommit_Website_Cron
> >>>> 764M    beam_PreCommit_Website_Stage_GCS_Commit
> >>>> 771M    beam_PreCommit_Website_Stage_GCS_Cron
> >>>> 336M    beam_Prober_CommunityMetrics
> >>>> 693M    beam_python_mongoio_load_test
> >>>> 339M    beam_SeedJob
> >>>> 333M    beam_SeedJob_Standalone
> >>>> 334M    beam_sonarqube_report
> >>>> 556M    beam_SQLBigQueryIO_Batch_Performance_Test_Java
> >>>> 175G    total
> >>>>
> >>>> On Fri, Jul 24, 2020 at 12:04 PM Tyson Hamilton <ty...@google.com>
> >>>> wrote:
> >>>>
> >>>>> Ya looks like something in the workspaces is taking up room:
> >>>>>
> >>>>> @apache-ci-beam-jenkins-8:/home/jenkins$ sudo du -shc .
> >>>>> 191G    .
> >>>>> 191G    total
> >>>>>
> >>>>>
> >>>>> On Fri, Jul 24, 2020 at 11:44 AM Tyson Hamilton <ty...@google.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Node 8 is also full. The partition that /tmp is on is here:
> >>>>>>
> >>>>>> Filesystem      Size  Used Avail Use% Mounted on
> >>>>>> /dev/sda1       485G  482G  2.9G 100% /
> >>>>>>
> >>>>>> however after cleaning up tmp with the crontab command, there is only
> >>>>>> 8G usage yet it still remains 100% full:
> >>>>>>
> >>>>>> @apache-ci-beam-jenkins-8:/tmp$ sudo du -shc /tmp
> >>>>>> 8.0G    /tmp
> >>>>>> 8.0G    total
> >>>>>>
> >>>>>> The workspaces are in the /home/jenkins/jenkins-slave/workspace
> >>>>>> directory. When I run a du on that, it takes really long. I'll let it keep
> >>>>>> running for a while to see if it ever returns a result but so far this
> >>>>>> seems suspect.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Jul 24, 2020 at 11:19 AM Tyson Hamilton <ty...@google.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Everything I've been looking at is in the /tmp dir. Where are the
> >>>>>>> workspaces, or what are the named?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, Jul 24, 2020 at 11:03 AM Udi Meiri <eh...@google.com> wrote:
> >>>>>>>
> >>>>>>>> I'm curious to what you find. Was it /tmp or the workspaces using
> >>>>>>>> up the space?
> >>>>>>>>
> >>>>>>>> On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton <ty...@google.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Bleck. I just realized that it is 'offline' so that won't work.
> >>>>>>>>> I'll clean up manually on the machine using the cron command.
> >>>>>>>>>
> >>>>>>>>> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <
> >>>>>>>>> tysonjh@google.com> wrote:
> >>>>>>>>>
> >>>>>>>>>> Something isn't working with the current set up because node 15
> >>>>>>>>>> appears to be out of space and is currently 'offline' according to Jenkins.
> >>>>>>>>>> Can someone run the cleanup job? The machine is full,
> >>>>>>>>>>
> >>>>>>>>>> @apache-ci-beam-jenkins-15:/tmp$ df -h
> >>>>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
> >>>>>>>>>> udev             52G     0   52G   0% /dev
> >>>>>>>>>> tmpfs            11G  265M   10G   3% /run
> >>>>>>>>>> */dev/sda1       485G  484G  880M 100% /*
> >>>>>>>>>> tmpfs            52G     0   52G   0% /dev/shm
> >>>>>>>>>> tmpfs           5.0M     0  5.0M   0% /run/lock
> >>>>>>>>>> tmpfs            52G     0   52G   0% /sys/fs/cgroup
> >>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1017
> >>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1037
> >>>>>>>>>>
> >>>>>>>>>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort -rhk
> >>>>>>>>>> 1,1 | head -n 20
> >>>>>>>>>> 20G     2020-07-24 17:52        .
> >>>>>>>>>> 580M    2020-07-22 17:31        ./junit1031982597110125586
> >>>>>>>>>> 517M    2020-07-22 17:31
> >>>>>>>>>>  ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
> >>>>>>>>>> 517M    2020-07-22 17:31
> >>>>>>>>>>  ./junit1031982597110125586/junit8739924829337821410
> >>>>>>>>>> 263M    2020-07-22 12:23        ./pip-install-2GUhO_
> >>>>>>>>>> 263M    2020-07-20 09:30        ./pip-install-sxgwqr
> >>>>>>>>>> 263M    2020-07-17 13:56        ./pip-install-bWSKIV
> >>>>>>>>>> 242M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T
> >>>>>>>>>> 242M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK
> >>>>>>>>>> 242M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ
> >>>>>>>>>> 236M    2020-07-21 20:25
> >>>>>>>>>>  ./beam-pipeline-tempmByU6T/tmpOWj3Yr
> >>>>>>>>>> 236M    2020-07-21 20:21
> >>>>>>>>>>  ./beam-pipeline-tempV85xeK/tmppbQHB3
> >>>>>>>>>> 236M    2020-07-21 20:15
> >>>>>>>>>>  ./beam-pipeline-temp7dJROJ/tmpgOXPKW
> >>>>>>>>>> 111M    2020-07-23 00:57        ./pip-install-1JnyNE
> >>>>>>>>>> 105M    2020-07-23 00:17        ./beam-artifact1374651823280819755
> >>>>>>>>>> 105M    2020-07-23 00:16        ./beam-artifact5050755582921936972
> >>>>>>>>>> 105M    2020-07-23 00:16        ./beam-artifact1834064452502646289
> >>>>>>>>>> 105M    2020-07-23 00:15        ./beam-artifact682561790267074916
> >>>>>>>>>> 105M    2020-07-23 00:15        ./beam-artifact4691304965824489394
> >>>>>>>>>> 105M    2020-07-23 00:14        ./beam-artifact4050383819822604421
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <
> >>>>>>>>>> robertwb@google.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <
> >>>>>>>>>>> tysonjh@google.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Ah I see, thanks Kenn. I found some advice from the Apache
> >>>>>>>>>>>> infra wiki that also suggests using a tmpdir inside the workspace [1]:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Procedures Projects can take to clean up disk space
> >>>>>>>>>>>>
> >>>>>>>>>>>> Projects can help themselves and Infra by taking some basic
> >>>>>>>>>>>> steps to help clean up their jobs after themselves on the build nodes.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>    1. Use a ./tmp dir in your jobs workspace. That way it gets
> >>>>>>>>>>>>    cleaned up when job workspaces expire.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> Tests should be (able to be) written to use the standard
> >>>>>>>>>>> temporary file mechanisms, and the environment set up on Jenkins such that
> >>>>>>>>>>> that falls into the respective workspaces. Ideally this should be as simple
> >>>>>>>>>>> as setting the TMPDIR (or similar) environment variable (and making sure it
> >>>>>>>>>>> exists/is writable).
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>    1. Configure your jobs to wipe workspaces on start or
> >>>>>>>>>>>>    finish.
> >>>>>>>>>>>>    2. Configure your jobs to only keep 5 or 10 previous builds.
> >>>>>>>>>>>>    3. Configure your jobs to only keep 5 or 10 previous
> >>>>>>>>>>>>    artifacts.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> [1]:
> >>>>>>>>>>>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <
> >>>>>>>>>>>> kenn@apache.org> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Those file listings look like the result of using standard
> >>>>>>>>>>>>> temp file APIs but with TMPDIR set to /tmp.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <
> >>>>>>>>>>>>> tysonjh@google.com> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Jobs are hermetic as far as I can tell and use unique
> >>>>>>>>>>>>>> subdirectories inside of /tmp. Here is a quick look into two examples:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort
> >>>>>>>>>>>>>> -rhk 1,1 | head -n 20
> >>>>>>>>>>>>>> 1.6G    2020-07-21 02:25        .
> >>>>>>>>>>>>>> 242M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4
> >>>>>>>>>>>>>> 242M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT
> >>>>>>>>>>>>>> 242M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME
> >>>>>>>>>>>>>> 242M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB
> >>>>>>>>>>>>>> 242M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q
> >>>>>>>>>>>>>> 242M    2020-07-17 18:35        ./beam-pipeline-temp79qot2
> >>>>>>>>>>>>>> 236M    2020-07-17 18:48
> >>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
> >>>>>>>>>>>>>> 236M    2020-07-17 18:46
> >>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
> >>>>>>>>>>>>>> 236M    2020-07-17 18:44
> >>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpxSm8pX
> >>>>>>>>>>>>>> 236M    2020-07-17 18:42
> >>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpMZJU76
> >>>>>>>>>>>>>> 236M    2020-07-17 18:39
> >>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmpWy1vWX
> >>>>>>>>>>>>>> 236M    2020-07-17 18:35
> >>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmpvN7vWA
> >>>>>>>>>>>>>> 3.7M    2020-07-17 18:48
> >>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmprlh_di
> >>>>>>>>>>>>>> 3.7M    2020-07-17 18:46
> >>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpLmVWfe
> >>>>>>>>>>>>>> 3.7M    2020-07-17 18:44
> >>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpvrxbY7
> >>>>>>>>>>>>>> 3.7M    2020-07-17 18:42
> >>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
> >>>>>>>>>>>>>> 3.7M    2020-07-17 18:39
> >>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmptYF1v1
> >>>>>>>>>>>>>> 3.7M    2020-07-17 18:35
> >>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmplfV0Rg
> >>>>>>>>>>>>>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort
> >>>>>>>>>>>>>> -rhk 1,1 | head -n 20
> >>>>>>>>>>>>>> 817M    2020-07-21 02:26        .
> >>>>>>>>>>>>>> 242M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM
> >>>>>>>>>>>>>> 242M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3
> >>>>>>>>>>>>>> 242M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq
> >>>>>>>>>>>>>> 236M    2020-07-19 12:14
> >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpstXoL0
> >>>>>>>>>>>>>> 236M    2020-07-19 12:11
> >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpnnVn65
> >>>>>>>>>>>>>> 236M    2020-07-19 12:05
> >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpRF0iNs
> >>>>>>>>>>>>>> 3.7M    2020-07-19 12:14
> >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
> >>>>>>>>>>>>>> 3.7M    2020-07-19 12:11
> >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpsmmzqe
> >>>>>>>>>>>>>> 3.7M    2020-07-19 12:05
> >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
> >>>>>>>>>>>>>> 2.0M    2020-07-19 12:14
> >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpoj3orz
> >>>>>>>>>>>>>> 2.0M    2020-07-19 12:11
> >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmptng9sZ
> >>>>>>>>>>>>>> 2.0M    2020-07-19 12:05
> >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpWp6njc
> >>>>>>>>>>>>>> 1.2M    2020-07-19 12:14
> >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmphgdj35
> >>>>>>>>>>>>>> 1.2M    2020-07-19 12:11
> >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmp8ySXpm
> >>>>>>>>>>>>>> 1.2M    2020-07-19 12:05
> >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
> >>>>>>>>>>>>>> 992K    2020-07-12 12:00        ./junit642086915811430564
> >>>>>>>>>>>>>> 988K    2020-07-12 12:00        ./junit642086915811430564/beam
> >>>>>>>>>>>>>> 984K    2020-07-12 12:00
> >>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes
> >>>>>>>>>>>>>> 980K    2020-07-12 12:00
> >>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes/0
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <eh...@google.com>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> You're right, job workspaces should be hermetic.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <
> >>>>>>>>>>>>>>> kenn@apache.org> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I'm probably late to this discussion and missing something,
> >>>>>>>>>>>>>>>> but why are we writing to /tmp at all? I would expect TMPDIR to point
> >>>>>>>>>>>>>>>> somewhere inside the job directory that will be wiped by Jenkins, and I
> >>>>>>>>>>>>>>>> would expect code to always create temp files via APIs that respect this.
> >>>>>>>>>>>>>>>> Is Jenkins not cleaning up? Do we not have the ability to set this up? Do
> >>>>>>>>>>>>>>>> we have bugs in our code (that we could probably find by setting TMPDIR to
> >>>>>>>>>>>>>>>> somewhere not-/tmp and running the tests without write permission to /tmp,
> >>>>>>>>>>>>>>>> etc)
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Kenn
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <
> >>>>>>>>>>>>>>>> altay@google.com> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Related to workspace directory growth, +Udi Meiri
> >>>>>>>>>>>>>>>>> <eh...@google.com> filed a relevant issue previously (
> >>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for
> >>>>>>>>>>>>>>>>> cleaning up workspace directory after successful jobs. Alternatively, we
> >>>>>>>>>>>>>>>>> can consider periodically cleaning up the /src directories.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I would suggest moving the cron task from internal cron
> >>>>>>>>>>>>>>>>> scripts to the inventory job (
> >>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
> >>>>>>>>>>>>>>>>> That way, we can see all the cron jobs as part of the source tree, adjust
> >>>>>>>>>>>>>>>>> frequencies and clean up codes with PRs. I do not know how internal cron
> >>>>>>>>>>>>>>>>> scripts are created, maintained, and how would they be recreated for new
> >>>>>>>>>>>>>>>>> worker instances.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> /cc +Tyson Hamilton <ty...@google.com>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
> >>>>>>>>>>>>>>>>> damian.gadomski@polidea.com> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Hey,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I've recently created a solution for the growing /tmp
> >>>>>>>>>>>>>>>>>> directory. Part of it is the job mentioned by Tyson:
> >>>>>>>>>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not
> >>>>>>>>>>>>>>>>>> triggered by cron and should be a last resort solution for some strange
> >>>>>>>>>>>>>>>>>> cases.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Along with that job, I've also updated every worker with
> >>>>>>>>>>>>>>>>>> an internal cron script. It's being executed once a week and deletes all
> >>>>>>>>>>>>>>>>>> the files (and only files) that were not accessed for at least three days.
> >>>>>>>>>>>>>>>>>> That's designed to be as safe as possible for the running jobs on the
> >>>>>>>>>>>>>>>>>> worker (not to delete the files that are still in use), and also to be
> >>>>>>>>>>>>>>>>>> insensitive to the current workload on the machine. The cleanup will always
> >>>>>>>>>>>>>>>>>> happen, even if some long-running/stuck jobs are blocking the machine.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I also think that currently the "No space left" errors
> >>>>>>>>>>>>>>>>>> may be a consequence of growing workspace directory rather than /tmp. I
> >>>>>>>>>>>>>>>>>> didn't do any detailed analysis but e.g. currently, on
> >>>>>>>>>>>>>>>>>> apache-beam-jenkins-7 the workspace directory size is 158 GB while /tmp is
> >>>>>>>>>>>>>>>>>> only 16 GB. We should either guarantee the disk size to hold workspaces for
> >>>>>>>>>>>>>>>>>> all jobs (because eventually, every worker will execute each job) or clear
> >>>>>>>>>>>>>>>>>> also the workspaces in some way.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>>> Damian
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <
> >>>>>>>>>>>>>>>>>> mxm@apache.org> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> +1 for scheduling it via a cron job if it won't lead to
> >>>>>>>>>>>>>>>>>>> test failures
> >>>>>>>>>>>>>>>>>>> while running. Not a Jenkins expert but maybe there is
> >>>>>>>>>>>>>>>>>>> the notion of
> >>>>>>>>>>>>>>>>>>> running exclusively while no other tasks are running?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> -Max
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
> >>>>>>>>>>>>>>>>>>> > FYI there was a job introduced to do this in Jenkins:
> >>>>>>>>>>>>>>>>>>> beam_Clean_tmp_directory
> >>>>>>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>>>>>> > Currently it needs to be run manually. I'm seeing some
> >>>>>>>>>>>>>>>>>>> out of disk related errors in precommit tests currently, perhaps we should
> >>>>>>>>>>>>>>>>>>> schedule this job with cron?
> >>>>>>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <
> >>>>>>>>>>>>>>>>>>> heejong@google.com> wrote:
> >>>>>>>>>>>>>>>>>>> >> Still seeing no space left on device errors on
> >>>>>>>>>>>>>>>>>>> jenkins-7 (for example:
> >>>>>>>>>>>>>>>>>>> >>
> >>>>>>>>>>>>>>>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
> >>>>>>>>>>>>>>>>>>> )
> >>>>>>>>>>>>>>>>>>> >>
> >>>>>>>>>>>>>>>>>>> >>
> >>>>>>>>>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <
> >>>>>>>>>>>>>>>>>>> amyrvold@google.com> wrote:
> >>>>>>>>>>>>>>>>>>> >>
> >>>>>>>>>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by jenkins
> >>>>>>>>>>>>>>>>>>> older than 3 days.
> >>>>>>>>>>>>>>>>>>> >>> Agree that we need a longer term solution.
> >>>>>>>>>>>>>>>>>>> >>>
> >>>>>>>>>>>>>>>>>>> >>> Passing recent tests on all executors except
> >>>>>>>>>>>>>>>>>>> jenkins-12, which has not
> >>>>>>>>>>>>>>>>>>> >>> scheduled recent builds for the past 13 days. Not
> >>>>>>>>>>>>>>>>>>> scheduling:
> >>>>>>>>>>>>>>>>>>> >>>
> >>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
> >>>>>>>>>>>>>>>>>>> >>> <
> >>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
> >>>>>>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>>>>>> >>> Recent passing builds:
> >>>>>>>>>>>>>>>>>>> >>>
> >>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
> >>>>>>>>>>>>>>>>>>> >>> <
> >>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
> >>>>>>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>>>>>> >>>
> >>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
> >>>>>>>>>>>>>>>>>>> >>> <
> >>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
> >>>>>>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>>>>>> >>>
> >>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
> >>>>>>>>>>>>>>>>>>> >>> <
> >>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
> >>>>>>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>>>>>> >>>
> >>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
> >>>>>>>>>>>>>>>>>>> >>> <
> >>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
> >>>>>>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>>>>>> >>>
> >>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
> >>>>>>>>>>>>>>>>>>> >>> <
> >>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
> >>>>>>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>>>>>> >>>
> >>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
> >>>>>>>>>>>>>>>>>>> >>> <
> >>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
> >>>>>>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>>>>>> >>>
> >>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
> >>>>>>>>>>>>>>>>>>> >>> <
> >>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
> >>>>>>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>>>>>> >>>
> >>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
> >>>>>>>>>>>>>>>>>>> >>> <
> >>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
> >>>>>>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>>>>>> >>>
> >>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
> >>>>>>>>>>>>>>>>>>> >>> <
> >>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
> >>>>>>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>>>>>> >>>
> >>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
> >>>>>>>>>>>>>>>>>>> >>> <
> >>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
> >>>>>>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>>>>>> >>>
> >>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
> >>>>>>>>>>>>>>>>>>> >>> <
> >>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
> >>>>>>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>>>>>> >>>
> >>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
> >>>>>>>>>>>>>>>>>>> >>> <
> >>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
> >>>>>>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>>>>>> >>>
> >>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
> >>>>>>>>>>>>>>>>>>> >>> <
> >>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
> >>>>>>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>>>>>> >>>
> >>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
> >>>>>>>>>>>>>>>>>>> >>> <
> >>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
> >>>>>>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>>>>>> >>>
> >>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
> >>>>>>>>>>>>>>>>>>> >>> <
> >>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
> >>>>>>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>>>>>> >>>
> >>>>>>>>>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <
> >>>>>>>>>>>>>>>>>>> altay@google.com> wrote:
> >>>>>>>>>>>>>>>>>>> >>>
> >>>>>>>>>>>>>>>>>>> >>>> +Alan Myrvold <am...@google.com> is doing a one
> >>>>>>>>>>>>>>>>>>> time cleanup. I agree
> >>>>>>>>>>>>>>>>>>> >>>> that we need to have a solution to automate this
> >>>>>>>>>>>>>>>>>>> task or address the root
> >>>>>>>>>>>>>>>>>>> >>>> cause of the buildup.
> >>>>>>>>>>>>>>>>>>> >>>>
> >>>>>>>>>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
> >>>>>>>>>>>>>>>>>>> michal.walenia@polidea.com>
> >>>>>>>>>>>>>>>>>>> >>>> wrote:
> >>>>>>>>>>>>>>>>>>> >>>>
> >>>>>>>>>>>>>>>>>>> >>>>> Hi there,
> >>>>>>>>>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins workers
> >>>>>>>>>>>>>>>>>>> again. Nodes 1 and 7
> >>>>>>>>>>>>>>>>>>> >>>>> both fail jobs with "No space left on device".
> >>>>>>>>>>>>>>>>>>> >>>>> Who is the best person to contact in these cases
> >>>>>>>>>>>>>>>>>>> (someone with access
> >>>>>>>>>>>>>>>>>>> >>>>> permissions to the workers).
> >>>>>>>>>>>>>>>>>>> >>>>>
> >>>>>>>>>>>>>>>>>>> >>>>> I also noticed that such errors are becoming more
> >>>>>>>>>>>>>>>>>>> and more frequent
> >>>>>>>>>>>>>>>>>>> >>>>> recently and I'd like to discuss how can this be
> >>>>>>>>>>>>>>>>>>> remedied. Can a cleanup
> >>>>>>>>>>>>>>>>>>> >>>>> task be automated on Jenkins somehow?
> >>>>>>>>>>>>>>>>>>> >>>>>
> >>>>>>>>>>>>>>>>>>> >>>>> Regards
> >>>>>>>>>>>>>>>>>>> >>>>> Michal
> >>>>>>>>>>>>>>>>>>> >>>>>
> >>>>>>>>>>>>>>>>>>> >>>>> --
> >>>>>>>>>>>>>>>>>>> >>>>>
> >>>>>>>>>>>>>>>>>>> >>>>> Michał Walenia
> >>>>>>>>>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software
> >>>>>>>>>>>>>>>>>>> Engineer
> >>>>>>>>>>>>>>>>>>> >>>>>
> >>>>>>>>>>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <
> >>>>>>>>>>>>>>>>>>> +48791432002 <+48%20791%20432%20002>>
> >>>>>>>>>>>>>>>>>>> >>>>> E: michal.walenia@polidea.com
> >>>>>>>>>>>>>>>>>>> >>>>>
> >>>>>>>>>>>>>>>>>>> >>>>> Unique Tech
> >>>>>>>>>>>>>>>>>>> >>>>> Check out our projects! <
> >>>>>>>>>>>>>>>>>>> https://www.polidea.com/our-work>
> >>>>>>>>>>>>>>>>>>> >>>>>
> >>>>>>>>>>>>>>>>>>> >>>>
> >>>>>>>>>>>>>>>>>>> >>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> 

Re: No space left on device - beam-jenkins 1 and 7

Posted by Ahmet Altay <al...@google.com>.
Tests may not be doing docker cleanup. Inventory job runs a docker prune
every 12 hours for images older than 24 hrs [1]. Randomly looking at one of
the recent runs [2], it cleaned up a long list of containers consuming
30+GB space. That should be just 12 hours worth of containers.

[1]
https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L69
[2]
https://ci-beam.apache.org/job/beam_Inventory_apache-beam-jenkins-14/501/console

On Fri, Jul 24, 2020 at 1:07 PM Tyson Hamilton <ty...@google.com> wrote:

> Yes, these are on the same volume in the /var/lib/docker directory. I'm
> unsure if they clean up leftover images.
>
> On Fri, Jul 24, 2020 at 12:52 PM Udi Meiri <eh...@google.com> wrote:
>
>> I forgot Docker images:
>>
>> ehudm@apache-ci-beam-jenkins-3:~$ sudo docker system df
>> TYPE                TOTAL               ACTIVE              SIZE
>>        RECLAIMABLE
>> Images              88                  9                   125.4GB
>>       124.2GB (99%)
>> Containers          40                  4                   7.927GB
>>       7.871GB (99%)
>> Local Volumes       47                  0                   3.165GB
>>       3.165GB (100%)
>> Build Cache         0                   0                   0B
>>        0B
>>
>> There are about 90 images on that machine, with all but 1 less than 48
>> hours old.
>> I think the docker test jobs need to try harder at cleaning up their
>> leftover images. (assuming they're already doing it?)
>>
>> On Fri, Jul 24, 2020 at 12:31 PM Udi Meiri <eh...@google.com> wrote:
>>
>>> The additional slots (@3 directories) take up even more space now than
>>> before.
>>>
>>> I'm testing out https://github.com/apache/beam/pull/12326 which could
>>> help by cleaning up workspaces after a run (just started a seed job).
>>>
>>> On Fri, Jul 24, 2020 at 12:13 PM Tyson Hamilton <ty...@google.com>
>>> wrote:
>>>
>>>> 664M    beam_PreCommit_JavaPortabilityApi_Commit
>>>> 656M    beam_PreCommit_JavaPortabilityApi_Commit@2
>>>> 611M    beam_PreCommit_JavaPortabilityApi_Cron
>>>> 616M    beam_PreCommit_JavaPortabilityApiJava11_Commit
>>>> 598M    beam_PreCommit_JavaPortabilityApiJava11_Commit@2
>>>> 662M    beam_PreCommit_JavaPortabilityApiJava11_Cron
>>>> 2.9G    beam_PreCommit_Portable_Python_Commit
>>>> 2.9G    beam_PreCommit_Portable_Python_Commit@2
>>>> 1.7G    beam_PreCommit_Portable_Python_Commit@3
>>>> 3.4G    beam_PreCommit_Portable_Python_Cron
>>>> 1.9G    beam_PreCommit_Python2_PVR_Flink_Commit
>>>> 1.4G    beam_PreCommit_Python2_PVR_Flink_Cron
>>>> 1.3G    beam_PreCommit_Python2_PVR_Flink_Phrase
>>>> 6.2G    beam_PreCommit_Python_Commit
>>>> 7.5G    beam_PreCommit_Python_Commit@2
>>>> 7.5G    beam_PreCommit_Python_Cron
>>>> 1012M   beam_PreCommit_PythonDocker_Commit
>>>> 1011M   beam_PreCommit_PythonDocker_Commit@2
>>>> 1011M   beam_PreCommit_PythonDocker_Commit@3
>>>> 1002M   beam_PreCommit_PythonDocker_Cron
>>>> 877M    beam_PreCommit_PythonFormatter_Commit
>>>> 988M    beam_PreCommit_PythonFormatter_Cron
>>>> 986M    beam_PreCommit_PythonFormatter_Phrase
>>>> 1.7G    beam_PreCommit_PythonLint_Commit
>>>> 2.1G    beam_PreCommit_PythonLint_Cron
>>>> 7.5G    beam_PreCommit_Python_Phrase
>>>> 346M    beam_PreCommit_RAT_Commit
>>>> 341M    beam_PreCommit_RAT_Cron
>>>> 338M    beam_PreCommit_Spotless_Commit
>>>> 339M    beam_PreCommit_Spotless_Cron
>>>> 5.5G    beam_PreCommit_SQL_Commit
>>>> 5.5G    beam_PreCommit_SQL_Cron
>>>> 5.5G    beam_PreCommit_SQL_Java11_Commit
>>>> 750M    beam_PreCommit_Website_Commit
>>>> 750M    beam_PreCommit_Website_Commit@2
>>>> 750M    beam_PreCommit_Website_Cron
>>>> 764M    beam_PreCommit_Website_Stage_GCS_Commit
>>>> 771M    beam_PreCommit_Website_Stage_GCS_Cron
>>>> 336M    beam_Prober_CommunityMetrics
>>>> 693M    beam_python_mongoio_load_test
>>>> 339M    beam_SeedJob
>>>> 333M    beam_SeedJob_Standalone
>>>> 334M    beam_sonarqube_report
>>>> 556M    beam_SQLBigQueryIO_Batch_Performance_Test_Java
>>>> 175G    total
>>>>
>>>> On Fri, Jul 24, 2020 at 12:04 PM Tyson Hamilton <ty...@google.com>
>>>> wrote:
>>>>
>>>>> Ya looks like something in the workspaces is taking up room:
>>>>>
>>>>> @apache-ci-beam-jenkins-8:/home/jenkins$ sudo du -shc .
>>>>> 191G    .
>>>>> 191G    total
>>>>>
>>>>>
>>>>> On Fri, Jul 24, 2020 at 11:44 AM Tyson Hamilton <ty...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Node 8 is also full. The partition that /tmp is on is here:
>>>>>>
>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>> /dev/sda1       485G  482G  2.9G 100% /
>>>>>>
>>>>>> however after cleaning up tmp with the crontab command, there is only
>>>>>> 8G usage yet it still remains 100% full:
>>>>>>
>>>>>> @apache-ci-beam-jenkins-8:/tmp$ sudo du -shc /tmp
>>>>>> 8.0G    /tmp
>>>>>> 8.0G    total
>>>>>>
>>>>>> The workspaces are in the /home/jenkins/jenkins-slave/workspace
>>>>>> directory. When I run a du on that, it takes really long. I'll let it keep
>>>>>> running for a while to see if it ever returns a result but so far this
>>>>>> seems suspect.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 24, 2020 at 11:19 AM Tyson Hamilton <ty...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Everything I've been looking at is in the /tmp dir. Where are the
>>>>>>> workspaces, or what are the named?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 24, 2020 at 11:03 AM Udi Meiri <eh...@google.com> wrote:
>>>>>>>
>>>>>>>> I'm curious to what you find. Was it /tmp or the workspaces using
>>>>>>>> up the space?
>>>>>>>>
>>>>>>>> On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton <ty...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Bleck. I just realized that it is 'offline' so that won't work.
>>>>>>>>> I'll clean up manually on the machine using the cron command.
>>>>>>>>>
>>>>>>>>> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <
>>>>>>>>> tysonjh@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> Something isn't working with the current set up because node 15
>>>>>>>>>> appears to be out of space and is currently 'offline' according to Jenkins.
>>>>>>>>>> Can someone run the cleanup job? The machine is full,
>>>>>>>>>>
>>>>>>>>>> @apache-ci-beam-jenkins-15:/tmp$ df -h
>>>>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>>>>>> udev             52G     0   52G   0% /dev
>>>>>>>>>> tmpfs            11G  265M   10G   3% /run
>>>>>>>>>> */dev/sda1       485G  484G  880M 100% /*
>>>>>>>>>> tmpfs            52G     0   52G   0% /dev/shm
>>>>>>>>>> tmpfs           5.0M     0  5.0M   0% /run/lock
>>>>>>>>>> tmpfs            52G     0   52G   0% /sys/fs/cgroup
>>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1017
>>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1037
>>>>>>>>>>
>>>>>>>>>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort -rhk
>>>>>>>>>> 1,1 | head -n 20
>>>>>>>>>> 20G     2020-07-24 17:52        .
>>>>>>>>>> 580M    2020-07-22 17:31        ./junit1031982597110125586
>>>>>>>>>> 517M    2020-07-22 17:31
>>>>>>>>>>  ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
>>>>>>>>>> 517M    2020-07-22 17:31
>>>>>>>>>>  ./junit1031982597110125586/junit8739924829337821410
>>>>>>>>>> 263M    2020-07-22 12:23        ./pip-install-2GUhO_
>>>>>>>>>> 263M    2020-07-20 09:30        ./pip-install-sxgwqr
>>>>>>>>>> 263M    2020-07-17 13:56        ./pip-install-bWSKIV
>>>>>>>>>> 242M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T
>>>>>>>>>> 242M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK
>>>>>>>>>> 242M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ
>>>>>>>>>> 236M    2020-07-21 20:25
>>>>>>>>>>  ./beam-pipeline-tempmByU6T/tmpOWj3Yr
>>>>>>>>>> 236M    2020-07-21 20:21
>>>>>>>>>>  ./beam-pipeline-tempV85xeK/tmppbQHB3
>>>>>>>>>> 236M    2020-07-21 20:15
>>>>>>>>>>  ./beam-pipeline-temp7dJROJ/tmpgOXPKW
>>>>>>>>>> 111M    2020-07-23 00:57        ./pip-install-1JnyNE
>>>>>>>>>> 105M    2020-07-23 00:17        ./beam-artifact1374651823280819755
>>>>>>>>>> 105M    2020-07-23 00:16        ./beam-artifact5050755582921936972
>>>>>>>>>> 105M    2020-07-23 00:16        ./beam-artifact1834064452502646289
>>>>>>>>>> 105M    2020-07-23 00:15        ./beam-artifact682561790267074916
>>>>>>>>>> 105M    2020-07-23 00:15        ./beam-artifact4691304965824489394
>>>>>>>>>> 105M    2020-07-23 00:14        ./beam-artifact4050383819822604421
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <
>>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <
>>>>>>>>>>> tysonjh@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Ah I see, thanks Kenn. I found some advice from the Apache
>>>>>>>>>>>> infra wiki that also suggests using a tmpdir inside the workspace [1]:
>>>>>>>>>>>>
>>>>>>>>>>>> Procedures Projects can take to clean up disk space
>>>>>>>>>>>>
>>>>>>>>>>>> Projects can help themselves and Infra by taking some basic
>>>>>>>>>>>> steps to help clean up their jobs after themselves on the build nodes.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    1. Use a ./tmp dir in your jobs workspace. That way it gets
>>>>>>>>>>>>    cleaned up when job workspaces expire.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Tests should be (able to be) written to use the standard
>>>>>>>>>>> temporary file mechanisms, and the environment set up on Jenkins such that
>>>>>>>>>>> that falls into the respective workspaces. Ideally this should be as simple
>>>>>>>>>>> as setting the TMPDIR (or similar) environment variable (and making sure it
>>>>>>>>>>> exists/is writable).
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    1. Configure your jobs to wipe workspaces on start or
>>>>>>>>>>>>    finish.
>>>>>>>>>>>>    2. Configure your jobs to only keep 5 or 10 previous builds.
>>>>>>>>>>>>    3. Configure your jobs to only keep 5 or 10 previous
>>>>>>>>>>>>    artifacts.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [1]:
>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <
>>>>>>>>>>>> kenn@apache.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Those file listings look like the result of using standard
>>>>>>>>>>>>> temp file APIs but with TMPDIR set to /tmp.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <
>>>>>>>>>>>>> tysonjh@google.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jobs are hermetic as far as I can tell and use unique
>>>>>>>>>>>>>> subdirectories inside of /tmp. Here is a quick look into two examples:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort
>>>>>>>>>>>>>> -rhk 1,1 | head -n 20
>>>>>>>>>>>>>> 1.6G    2020-07-21 02:25        .
>>>>>>>>>>>>>> 242M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4
>>>>>>>>>>>>>> 242M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT
>>>>>>>>>>>>>> 242M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME
>>>>>>>>>>>>>> 242M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB
>>>>>>>>>>>>>> 242M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q
>>>>>>>>>>>>>> 242M    2020-07-17 18:35        ./beam-pipeline-temp79qot2
>>>>>>>>>>>>>> 236M    2020-07-17 18:48
>>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>>>>>>>>>>>>>> 236M    2020-07-17 18:46
>>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>>>>>>>>>>>>>> 236M    2020-07-17 18:44
>>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>>>>>>>>>>>>>> 236M    2020-07-17 18:42
>>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>>>>>>>>>>>>>> 236M    2020-07-17 18:39
>>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmpWy1vWX
>>>>>>>>>>>>>> 236M    2020-07-17 18:35
>>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmpvN7vWA
>>>>>>>>>>>>>> 3.7M    2020-07-17 18:48
>>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmprlh_di
>>>>>>>>>>>>>> 3.7M    2020-07-17 18:46
>>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>>>>>>>>>>>>>> 3.7M    2020-07-17 18:44
>>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>>>>>>>>>>>>>> 3.7M    2020-07-17 18:42
>>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>>>>>>>>>>>>>> 3.7M    2020-07-17 18:39
>>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmptYF1v1
>>>>>>>>>>>>>> 3.7M    2020-07-17 18:35
>>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmplfV0Rg
>>>>>>>>>>>>>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort
>>>>>>>>>>>>>> -rhk 1,1 | head -n 20
>>>>>>>>>>>>>> 817M    2020-07-21 02:26        .
>>>>>>>>>>>>>> 242M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM
>>>>>>>>>>>>>> 242M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3
>>>>>>>>>>>>>> 242M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq
>>>>>>>>>>>>>> 236M    2020-07-19 12:14
>>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpstXoL0
>>>>>>>>>>>>>> 236M    2020-07-19 12:11
>>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpnnVn65
>>>>>>>>>>>>>> 236M    2020-07-19 12:05
>>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpRF0iNs
>>>>>>>>>>>>>> 3.7M    2020-07-19 12:14
>>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>>>>>>>>>>>>>> 3.7M    2020-07-19 12:11
>>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpsmmzqe
>>>>>>>>>>>>>> 3.7M    2020-07-19 12:05
>>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>>>>>>>>>>>>>> 2.0M    2020-07-19 12:14
>>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpoj3orz
>>>>>>>>>>>>>> 2.0M    2020-07-19 12:11
>>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmptng9sZ
>>>>>>>>>>>>>> 2.0M    2020-07-19 12:05
>>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpWp6njc
>>>>>>>>>>>>>> 1.2M    2020-07-19 12:14
>>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmphgdj35
>>>>>>>>>>>>>> 1.2M    2020-07-19 12:11
>>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmp8ySXpm
>>>>>>>>>>>>>> 1.2M    2020-07-19 12:05
>>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>>>>>>>>>>>>>> 992K    2020-07-12 12:00        ./junit642086915811430564
>>>>>>>>>>>>>> 988K    2020-07-12 12:00        ./junit642086915811430564/beam
>>>>>>>>>>>>>> 984K    2020-07-12 12:00
>>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes
>>>>>>>>>>>>>> 980K    2020-07-12 12:00
>>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes/0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <eh...@google.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You're right, job workspaces should be hermetic.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <
>>>>>>>>>>>>>>> kenn@apache.org> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm probably late to this discussion and missing something,
>>>>>>>>>>>>>>>> but why are we writing to /tmp at all? I would expect TMPDIR to point
>>>>>>>>>>>>>>>> somewhere inside the job directory that will be wiped by Jenkins, and I
>>>>>>>>>>>>>>>> would expect code to always create temp files via APIs that respect this.
>>>>>>>>>>>>>>>> Is Jenkins not cleaning up? Do we not have the ability to set this up? Do
>>>>>>>>>>>>>>>> we have bugs in our code (that we could probably find by setting TMPDIR to
>>>>>>>>>>>>>>>> somewhere not-/tmp and running the tests without write permission to /tmp,
>>>>>>>>>>>>>>>> etc)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Kenn
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <
>>>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Related to workspace directory growth, +Udi Meiri
>>>>>>>>>>>>>>>>> <eh...@google.com> filed a relevant issue previously (
>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for
>>>>>>>>>>>>>>>>> cleaning up workspace directory after successful jobs. Alternatively, we
>>>>>>>>>>>>>>>>> can consider periodically cleaning up the /src directories.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I would suggest moving the cron task from internal cron
>>>>>>>>>>>>>>>>> scripts to the inventory job (
>>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
>>>>>>>>>>>>>>>>> That way, we can see all the cron jobs as part of the source tree, adjust
>>>>>>>>>>>>>>>>> frequencies and clean up codes with PRs. I do not know how internal cron
>>>>>>>>>>>>>>>>> scripts are created, maintained, and how would they be recreated for new
>>>>>>>>>>>>>>>>> worker instances.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> /cc +Tyson Hamilton <ty...@google.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>>>>>>>>>>>>>>>>> damian.gadomski@polidea.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I've recently created a solution for the growing /tmp
>>>>>>>>>>>>>>>>>> directory. Part of it is the job mentioned by Tyson:
>>>>>>>>>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not
>>>>>>>>>>>>>>>>>> triggered by cron and should be a last resort solution for some strange
>>>>>>>>>>>>>>>>>> cases.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Along with that job, I've also updated every worker with
>>>>>>>>>>>>>>>>>> an internal cron script. It's being executed once a week and deletes all
>>>>>>>>>>>>>>>>>> the files (and only files) that were not accessed for at least three days.
>>>>>>>>>>>>>>>>>> That's designed to be as safe as possible for the running jobs on the
>>>>>>>>>>>>>>>>>> worker (not to delete the files that are still in use), and also to be
>>>>>>>>>>>>>>>>>> insensitive to the current workload on the machine. The cleanup will always
>>>>>>>>>>>>>>>>>> happen, even if some long-running/stuck jobs are blocking the machine.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I also think that currently the "No space left" errors
>>>>>>>>>>>>>>>>>> may be a consequence of growing workspace directory rather than /tmp. I
>>>>>>>>>>>>>>>>>> didn't do any detailed analysis but e.g. currently, on
>>>>>>>>>>>>>>>>>> apache-beam-jenkins-7 the workspace directory size is 158 GB while /tmp is
>>>>>>>>>>>>>>>>>> only 16 GB. We should either guarantee the disk size to hold workspaces for
>>>>>>>>>>>>>>>>>> all jobs (because eventually, every worker will execute each job) or clear
>>>>>>>>>>>>>>>>>> also the workspaces in some way.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>> Damian
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <
>>>>>>>>>>>>>>>>>> mxm@apache.org> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> +1 for scheduling it via a cron job if it won't lead to
>>>>>>>>>>>>>>>>>>> test failures
>>>>>>>>>>>>>>>>>>> while running. Not a Jenkins expert but maybe there is
>>>>>>>>>>>>>>>>>>> the notion of
>>>>>>>>>>>>>>>>>>> running exclusively while no other tasks are running?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> -Max
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>>>>>>>>>>>>>>>>> > FYI there was a job introduced to do this in Jenkins:
>>>>>>>>>>>>>>>>>>> beam_Clean_tmp_directory
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> > Currently it needs to be run manually. I'm seeing some
>>>>>>>>>>>>>>>>>>> out of disk related errors in precommit tests currently, perhaps we should
>>>>>>>>>>>>>>>>>>> schedule this job with cron?
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <
>>>>>>>>>>>>>>>>>>> heejong@google.com> wrote:
>>>>>>>>>>>>>>>>>>> >> Still seeing no space left on device errors on
>>>>>>>>>>>>>>>>>>> jenkins-7 (for example:
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>>>>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <
>>>>>>>>>>>>>>>>>>> amyrvold@google.com> wrote:
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by jenkins
>>>>>>>>>>>>>>>>>>> older than 3 days.
>>>>>>>>>>>>>>>>>>> >>> Agree that we need a longer term solution.
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> >>> Passing recent tests on all executors except
>>>>>>>>>>>>>>>>>>> jenkins-12, which has not
>>>>>>>>>>>>>>>>>>> >>> scheduled recent builds for the past 13 days. Not
>>>>>>>>>>>>>>>>>>> scheduling:
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>> Recent passing builds:
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <
>>>>>>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> >>>> +Alan Myrvold <am...@google.com> is doing a one
>>>>>>>>>>>>>>>>>>> time cleanup. I agree
>>>>>>>>>>>>>>>>>>> >>>> that we need to have a solution to automate this
>>>>>>>>>>>>>>>>>>> task or address the root
>>>>>>>>>>>>>>>>>>> >>>> cause of the buildup.
>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>>>>>>>>>>>>>>>>>>> michal.walenia@polidea.com>
>>>>>>>>>>>>>>>>>>> >>>> wrote:
>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>> >>>>> Hi there,
>>>>>>>>>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins workers
>>>>>>>>>>>>>>>>>>> again. Nodes 1 and 7
>>>>>>>>>>>>>>>>>>> >>>>> both fail jobs with "No space left on device".
>>>>>>>>>>>>>>>>>>> >>>>> Who is the best person to contact in these cases
>>>>>>>>>>>>>>>>>>> (someone with access
>>>>>>>>>>>>>>>>>>> >>>>> permissions to the workers).
>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>> >>>>> I also noticed that such errors are becoming more
>>>>>>>>>>>>>>>>>>> and more frequent
>>>>>>>>>>>>>>>>>>> >>>>> recently and I'd like to discuss how can this be
>>>>>>>>>>>>>>>>>>> remedied. Can a cleanup
>>>>>>>>>>>>>>>>>>> >>>>> task be automated on Jenkins somehow?
>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>> >>>>> Regards
>>>>>>>>>>>>>>>>>>> >>>>> Michal
>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>> >>>>> --
>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>> >>>>> Michał Walenia
>>>>>>>>>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software
>>>>>>>>>>>>>>>>>>> Engineer
>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <
>>>>>>>>>>>>>>>>>>> +48791432002 <+48%20791%20432%20002>>
>>>>>>>>>>>>>>>>>>> >>>>> E: michal.walenia@polidea.com
>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>> >>>>> Unique Tech
>>>>>>>>>>>>>>>>>>> >>>>> Check out our projects! <
>>>>>>>>>>>>>>>>>>> https://www.polidea.com/our-work>
>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Tyson Hamilton <ty...@google.com>.
Yes, these are on the same volume in the /var/lib/docker directory. I'm
unsure if they clean up leftover images.

On Fri, Jul 24, 2020 at 12:52 PM Udi Meiri <eh...@google.com> wrote:

> I forgot Docker images:
>
> ehudm@apache-ci-beam-jenkins-3:~$ sudo docker system df
> TYPE                TOTAL               ACTIVE              SIZE
>      RECLAIMABLE
> Images              88                  9                   125.4GB
>       124.2GB (99%)
> Containers          40                  4                   7.927GB
>       7.871GB (99%)
> Local Volumes       47                  0                   3.165GB
>       3.165GB (100%)
> Build Cache         0                   0                   0B
>      0B
>
> There are about 90 images on that machine, with all but 1 less than 48
> hours old.
> I think the docker test jobs need to try harder at cleaning up their
> leftover images. (assuming they're already doing it?)
>
> On Fri, Jul 24, 2020 at 12:31 PM Udi Meiri <eh...@google.com> wrote:
>
>> The additional slots (@3 directories) take up even more space now than
>> before.
>>
>> I'm testing out https://github.com/apache/beam/pull/12326 which could
>> help by cleaning up workspaces after a run (just started a seed job).
>>
>> On Fri, Jul 24, 2020 at 12:13 PM Tyson Hamilton <ty...@google.com>
>> wrote:
>>
>>> 664M    beam_PreCommit_JavaPortabilityApi_Commit
>>> 656M    beam_PreCommit_JavaPortabilityApi_Commit@2
>>> 611M    beam_PreCommit_JavaPortabilityApi_Cron
>>> 616M    beam_PreCommit_JavaPortabilityApiJava11_Commit
>>> 598M    beam_PreCommit_JavaPortabilityApiJava11_Commit@2
>>> 662M    beam_PreCommit_JavaPortabilityApiJava11_Cron
>>> 2.9G    beam_PreCommit_Portable_Python_Commit
>>> 2.9G    beam_PreCommit_Portable_Python_Commit@2
>>> 1.7G    beam_PreCommit_Portable_Python_Commit@3
>>> 3.4G    beam_PreCommit_Portable_Python_Cron
>>> 1.9G    beam_PreCommit_Python2_PVR_Flink_Commit
>>> 1.4G    beam_PreCommit_Python2_PVR_Flink_Cron
>>> 1.3G    beam_PreCommit_Python2_PVR_Flink_Phrase
>>> 6.2G    beam_PreCommit_Python_Commit
>>> 7.5G    beam_PreCommit_Python_Commit@2
>>> 7.5G    beam_PreCommit_Python_Cron
>>> 1012M   beam_PreCommit_PythonDocker_Commit
>>> 1011M   beam_PreCommit_PythonDocker_Commit@2
>>> 1011M   beam_PreCommit_PythonDocker_Commit@3
>>> 1002M   beam_PreCommit_PythonDocker_Cron
>>> 877M    beam_PreCommit_PythonFormatter_Commit
>>> 988M    beam_PreCommit_PythonFormatter_Cron
>>> 986M    beam_PreCommit_PythonFormatter_Phrase
>>> 1.7G    beam_PreCommit_PythonLint_Commit
>>> 2.1G    beam_PreCommit_PythonLint_Cron
>>> 7.5G    beam_PreCommit_Python_Phrase
>>> 346M    beam_PreCommit_RAT_Commit
>>> 341M    beam_PreCommit_RAT_Cron
>>> 338M    beam_PreCommit_Spotless_Commit
>>> 339M    beam_PreCommit_Spotless_Cron
>>> 5.5G    beam_PreCommit_SQL_Commit
>>> 5.5G    beam_PreCommit_SQL_Cron
>>> 5.5G    beam_PreCommit_SQL_Java11_Commit
>>> 750M    beam_PreCommit_Website_Commit
>>> 750M    beam_PreCommit_Website_Commit@2
>>> 750M    beam_PreCommit_Website_Cron
>>> 764M    beam_PreCommit_Website_Stage_GCS_Commit
>>> 771M    beam_PreCommit_Website_Stage_GCS_Cron
>>> 336M    beam_Prober_CommunityMetrics
>>> 693M    beam_python_mongoio_load_test
>>> 339M    beam_SeedJob
>>> 333M    beam_SeedJob_Standalone
>>> 334M    beam_sonarqube_report
>>> 556M    beam_SQLBigQueryIO_Batch_Performance_Test_Java
>>> 175G    total
>>>
>>> On Fri, Jul 24, 2020 at 12:04 PM Tyson Hamilton <ty...@google.com>
>>> wrote:
>>>
>>>> Ya looks like something in the workspaces is taking up room:
>>>>
>>>> @apache-ci-beam-jenkins-8:/home/jenkins$ sudo du -shc .
>>>> 191G    .
>>>> 191G    total
>>>>
>>>>
>>>> On Fri, Jul 24, 2020 at 11:44 AM Tyson Hamilton <ty...@google.com>
>>>> wrote:
>>>>
>>>>> Node 8 is also full. The partition that /tmp is on is here:
>>>>>
>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>> /dev/sda1       485G  482G  2.9G 100% /
>>>>>
>>>>> however after cleaning up tmp with the crontab command, there is only
>>>>> 8G usage yet it still remains 100% full:
>>>>>
>>>>> @apache-ci-beam-jenkins-8:/tmp$ sudo du -shc /tmp
>>>>> 8.0G    /tmp
>>>>> 8.0G    total
>>>>>
>>>>> The workspaces are in the /home/jenkins/jenkins-slave/workspace
>>>>> directory. When I run a du on that, it takes really long. I'll let it keep
>>>>> running for a while to see if it ever returns a result but so far this
>>>>> seems suspect.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jul 24, 2020 at 11:19 AM Tyson Hamilton <ty...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Everything I've been looking at is in the /tmp dir. Where are the
>>>>>> workspaces, or what are the named?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 24, 2020 at 11:03 AM Udi Meiri <eh...@google.com> wrote:
>>>>>>
>>>>>>> I'm curious to what you find. Was it /tmp or the workspaces using up
>>>>>>> the space?
>>>>>>>
>>>>>>> On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton <ty...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Bleck. I just realized that it is 'offline' so that won't work.
>>>>>>>> I'll clean up manually on the machine using the cron command.
>>>>>>>>
>>>>>>>> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <ty...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Something isn't working with the current set up because node 15
>>>>>>>>> appears to be out of space and is currently 'offline' according to Jenkins.
>>>>>>>>> Can someone run the cleanup job? The machine is full,
>>>>>>>>>
>>>>>>>>> @apache-ci-beam-jenkins-15:/tmp$ df -h
>>>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>>>>> udev             52G     0   52G   0% /dev
>>>>>>>>> tmpfs            11G  265M   10G   3% /run
>>>>>>>>> */dev/sda1       485G  484G  880M 100% /*
>>>>>>>>> tmpfs            52G     0   52G   0% /dev/shm
>>>>>>>>> tmpfs           5.0M     0  5.0M   0% /run/lock
>>>>>>>>> tmpfs            52G     0   52G   0% /sys/fs/cgroup
>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1017
>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1037
>>>>>>>>>
>>>>>>>>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort -rhk
>>>>>>>>> 1,1 | head -n 20
>>>>>>>>> 20G     2020-07-24 17:52        .
>>>>>>>>> 580M    2020-07-22 17:31        ./junit1031982597110125586
>>>>>>>>> 517M    2020-07-22 17:31
>>>>>>>>>  ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
>>>>>>>>> 517M    2020-07-22 17:31
>>>>>>>>>  ./junit1031982597110125586/junit8739924829337821410
>>>>>>>>> 263M    2020-07-22 12:23        ./pip-install-2GUhO_
>>>>>>>>> 263M    2020-07-20 09:30        ./pip-install-sxgwqr
>>>>>>>>> 263M    2020-07-17 13:56        ./pip-install-bWSKIV
>>>>>>>>> 242M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T
>>>>>>>>> 242M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK
>>>>>>>>> 242M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ
>>>>>>>>> 236M    2020-07-21 20:25
>>>>>>>>>  ./beam-pipeline-tempmByU6T/tmpOWj3Yr
>>>>>>>>> 236M    2020-07-21 20:21
>>>>>>>>>  ./beam-pipeline-tempV85xeK/tmppbQHB3
>>>>>>>>> 236M    2020-07-21 20:15
>>>>>>>>>  ./beam-pipeline-temp7dJROJ/tmpgOXPKW
>>>>>>>>> 111M    2020-07-23 00:57        ./pip-install-1JnyNE
>>>>>>>>> 105M    2020-07-23 00:17        ./beam-artifact1374651823280819755
>>>>>>>>> 105M    2020-07-23 00:16        ./beam-artifact5050755582921936972
>>>>>>>>> 105M    2020-07-23 00:16        ./beam-artifact1834064452502646289
>>>>>>>>> 105M    2020-07-23 00:15        ./beam-artifact682561790267074916
>>>>>>>>> 105M    2020-07-23 00:15        ./beam-artifact4691304965824489394
>>>>>>>>> 105M    2020-07-23 00:14        ./beam-artifact4050383819822604421
>>>>>>>>>
>>>>>>>>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <
>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <
>>>>>>>>>> tysonjh@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Ah I see, thanks Kenn. I found some advice from the Apache infra
>>>>>>>>>>> wiki that also suggests using a tmpdir inside the workspace [1]:
>>>>>>>>>>>
>>>>>>>>>>> Procedures Projects can take to clean up disk space
>>>>>>>>>>>
>>>>>>>>>>> Projects can help themselves and Infra by taking some basic
>>>>>>>>>>> steps to help clean up their jobs after themselves on the build nodes.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    1. Use a ./tmp dir in your jobs workspace. That way it gets
>>>>>>>>>>>    cleaned up when job workspaces expire.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Tests should be (able to be) written to use the standard
>>>>>>>>>> temporary file mechanisms, and the environment set up on Jenkins such that
>>>>>>>>>> that falls into the respective workspaces. Ideally this should be as simple
>>>>>>>>>> as setting the TMPDIR (or similar) environment variable (and making sure it
>>>>>>>>>> exists/is writable).
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    1. Configure your jobs to wipe workspaces on start or finish.
>>>>>>>>>>>    2. Configure your jobs to only keep 5 or 10 previous builds.
>>>>>>>>>>>    3. Configure your jobs to only keep 5 or 10 previous
>>>>>>>>>>>    artifacts.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [1]:
>>>>>>>>>>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <ke...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Those file listings look like the result of using standard temp
>>>>>>>>>>>> file APIs but with TMPDIR set to /tmp.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <
>>>>>>>>>>>> tysonjh@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Jobs are hermetic as far as I can tell and use unique
>>>>>>>>>>>>> subdirectories inside of /tmp. Here is a quick look into two examples:
>>>>>>>>>>>>>
>>>>>>>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort
>>>>>>>>>>>>> -rhk 1,1 | head -n 20
>>>>>>>>>>>>> 1.6G    2020-07-21 02:25        .
>>>>>>>>>>>>> 242M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4
>>>>>>>>>>>>> 242M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT
>>>>>>>>>>>>> 242M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME
>>>>>>>>>>>>> 242M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB
>>>>>>>>>>>>> 242M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q
>>>>>>>>>>>>> 242M    2020-07-17 18:35        ./beam-pipeline-temp79qot2
>>>>>>>>>>>>> 236M    2020-07-17 18:48
>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>>>>>>>>>>>>> 236M    2020-07-17 18:46
>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>>>>>>>>>>>>> 236M    2020-07-17 18:44
>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>>>>>>>>>>>>> 236M    2020-07-17 18:42
>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>>>>>>>>>>>>> 236M    2020-07-17 18:39
>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmpWy1vWX
>>>>>>>>>>>>> 236M    2020-07-17 18:35
>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmpvN7vWA
>>>>>>>>>>>>> 3.7M    2020-07-17 18:48
>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmprlh_di
>>>>>>>>>>>>> 3.7M    2020-07-17 18:46
>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>>>>>>>>>>>>> 3.7M    2020-07-17 18:44
>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>>>>>>>>>>>>> 3.7M    2020-07-17 18:42
>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>>>>>>>>>>>>> 3.7M    2020-07-17 18:39
>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmptYF1v1
>>>>>>>>>>>>> 3.7M    2020-07-17 18:35
>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmplfV0Rg
>>>>>>>>>>>>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort
>>>>>>>>>>>>> -rhk 1,1 | head -n 20
>>>>>>>>>>>>> 817M    2020-07-21 02:26        .
>>>>>>>>>>>>> 242M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM
>>>>>>>>>>>>> 242M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3
>>>>>>>>>>>>> 242M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq
>>>>>>>>>>>>> 236M    2020-07-19 12:14
>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpstXoL0
>>>>>>>>>>>>> 236M    2020-07-19 12:11
>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpnnVn65
>>>>>>>>>>>>> 236M    2020-07-19 12:05
>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpRF0iNs
>>>>>>>>>>>>> 3.7M    2020-07-19 12:14
>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>>>>>>>>>>>>> 3.7M    2020-07-19 12:11
>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpsmmzqe
>>>>>>>>>>>>> 3.7M    2020-07-19 12:05
>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>>>>>>>>>>>>> 2.0M    2020-07-19 12:14
>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpoj3orz
>>>>>>>>>>>>> 2.0M    2020-07-19 12:11
>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmptng9sZ
>>>>>>>>>>>>> 2.0M    2020-07-19 12:05
>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpWp6njc
>>>>>>>>>>>>> 1.2M    2020-07-19 12:14
>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmphgdj35
>>>>>>>>>>>>> 1.2M    2020-07-19 12:11
>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmp8ySXpm
>>>>>>>>>>>>> 1.2M    2020-07-19 12:05
>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>>>>>>>>>>>>> 992K    2020-07-12 12:00        ./junit642086915811430564
>>>>>>>>>>>>> 988K    2020-07-12 12:00        ./junit642086915811430564/beam
>>>>>>>>>>>>> 984K    2020-07-12 12:00
>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes
>>>>>>>>>>>>> 980K    2020-07-12 12:00
>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes/0
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <eh...@google.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> You're right, job workspaces should be hermetic.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <
>>>>>>>>>>>>>> kenn@apache.org> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm probably late to this discussion and missing something,
>>>>>>>>>>>>>>> but why are we writing to /tmp at all? I would expect TMPDIR to point
>>>>>>>>>>>>>>> somewhere inside the job directory that will be wiped by Jenkins, and I
>>>>>>>>>>>>>>> would expect code to always create temp files via APIs that respect this.
>>>>>>>>>>>>>>> Is Jenkins not cleaning up? Do we not have the ability to set this up? Do
>>>>>>>>>>>>>>> we have bugs in our code (that we could probably find by setting TMPDIR to
>>>>>>>>>>>>>>> somewhere not-/tmp and running the tests without write permission to /tmp,
>>>>>>>>>>>>>>> etc)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Kenn
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <
>>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Related to workspace directory growth, +Udi Meiri
>>>>>>>>>>>>>>>> <eh...@google.com> filed a relevant issue previously (
>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for
>>>>>>>>>>>>>>>> cleaning up workspace directory after successful jobs. Alternatively, we
>>>>>>>>>>>>>>>> can consider periodically cleaning up the /src directories.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I would suggest moving the cron task from internal cron
>>>>>>>>>>>>>>>> scripts to the inventory job (
>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
>>>>>>>>>>>>>>>> That way, we can see all the cron jobs as part of the source tree, adjust
>>>>>>>>>>>>>>>> frequencies and clean up codes with PRs. I do not know how internal cron
>>>>>>>>>>>>>>>> scripts are created, maintained, and how would they be recreated for new
>>>>>>>>>>>>>>>> worker instances.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> /cc +Tyson Hamilton <ty...@google.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>>>>>>>>>>>>>>>> damian.gadomski@polidea.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I've recently created a solution for the growing /tmp
>>>>>>>>>>>>>>>>> directory. Part of it is the job mentioned by Tyson:
>>>>>>>>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not
>>>>>>>>>>>>>>>>> triggered by cron and should be a last resort solution for some strange
>>>>>>>>>>>>>>>>> cases.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Along with that job, I've also updated every worker with
>>>>>>>>>>>>>>>>> an internal cron script. It's being executed once a week and deletes all
>>>>>>>>>>>>>>>>> the files (and only files) that were not accessed for at least three days.
>>>>>>>>>>>>>>>>> That's designed to be as safe as possible for the running jobs on the
>>>>>>>>>>>>>>>>> worker (not to delete the files that are still in use), and also to be
>>>>>>>>>>>>>>>>> insensitive to the current workload on the machine. The cleanup will always
>>>>>>>>>>>>>>>>> happen, even if some long-running/stuck jobs are blocking the machine.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I also think that currently the "No space left" errors
>>>>>>>>>>>>>>>>> may be a consequence of growing workspace directory rather than /tmp. I
>>>>>>>>>>>>>>>>> didn't do any detailed analysis but e.g. currently, on
>>>>>>>>>>>>>>>>> apache-beam-jenkins-7 the workspace directory size is 158 GB while /tmp is
>>>>>>>>>>>>>>>>> only 16 GB. We should either guarantee the disk size to hold workspaces for
>>>>>>>>>>>>>>>>> all jobs (because eventually, every worker will execute each job) or clear
>>>>>>>>>>>>>>>>> also the workspaces in some way.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>> Damian
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <
>>>>>>>>>>>>>>>>> mxm@apache.org> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> +1 for scheduling it via a cron job if it won't lead to
>>>>>>>>>>>>>>>>>> test failures
>>>>>>>>>>>>>>>>>> while running. Not a Jenkins expert but maybe there is
>>>>>>>>>>>>>>>>>> the notion of
>>>>>>>>>>>>>>>>>> running exclusively while no other tasks are running?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -Max
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>>>>>>>>>>>>>>>> > FYI there was a job introduced to do this in Jenkins:
>>>>>>>>>>>>>>>>>> beam_Clean_tmp_directory
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > Currently it needs to be run manually. I'm seeing some
>>>>>>>>>>>>>>>>>> out of disk related errors in precommit tests currently, perhaps we should
>>>>>>>>>>>>>>>>>> schedule this job with cron?
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <he...@google.com>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> >> Still seeing no space left on device errors on
>>>>>>>>>>>>>>>>>> jenkins-7 (for example:
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>>>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <
>>>>>>>>>>>>>>>>>> amyrvold@google.com> wrote:
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by jenkins
>>>>>>>>>>>>>>>>>> older than 3 days.
>>>>>>>>>>>>>>>>>> >>> Agree that we need a longer term solution.
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> >>> Passing recent tests on all executors except
>>>>>>>>>>>>>>>>>> jenkins-12, which has not
>>>>>>>>>>>>>>>>>> >>> scheduled recent builds for the past 13 days. Not
>>>>>>>>>>>>>>>>>> scheduling:
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >>> Recent passing builds:
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <
>>>>>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> >>>> +Alan Myrvold <am...@google.com> is doing a one
>>>>>>>>>>>>>>>>>> time cleanup. I agree
>>>>>>>>>>>>>>>>>> >>>> that we need to have a solution to automate this
>>>>>>>>>>>>>>>>>> task or address the root
>>>>>>>>>>>>>>>>>> >>>> cause of the buildup.
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>>>>>>>>>>>>>>>>>> michal.walenia@polidea.com>
>>>>>>>>>>>>>>>>>> >>>> wrote:
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>> >>>>> Hi there,
>>>>>>>>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins workers
>>>>>>>>>>>>>>>>>> again. Nodes 1 and 7
>>>>>>>>>>>>>>>>>> >>>>> both fail jobs with "No space left on device".
>>>>>>>>>>>>>>>>>> >>>>> Who is the best person to contact in these cases
>>>>>>>>>>>>>>>>>> (someone with access
>>>>>>>>>>>>>>>>>> >>>>> permissions to the workers).
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> I also noticed that such errors are becoming more
>>>>>>>>>>>>>>>>>> and more frequent
>>>>>>>>>>>>>>>>>> >>>>> recently and I'd like to discuss how can this be
>>>>>>>>>>>>>>>>>> remedied. Can a cleanup
>>>>>>>>>>>>>>>>>> >>>>> task be automated on Jenkins somehow?
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> Regards
>>>>>>>>>>>>>>>>>> >>>>> Michal
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> --
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> Michał Walenia
>>>>>>>>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software
>>>>>>>>>>>>>>>>>> Engineer
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <
>>>>>>>>>>>>>>>>>> +48791432002 <+48%20791%20432%20002>>
>>>>>>>>>>>>>>>>>> >>>>> E: michal.walenia@polidea.com
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>> Unique Tech
>>>>>>>>>>>>>>>>>> >>>>> Check out our projects! <
>>>>>>>>>>>>>>>>>> https://www.polidea.com/our-work>
>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Udi Meiri <eh...@google.com>.
I forgot Docker images:

ehudm@apache-ci-beam-jenkins-3:~$ sudo docker system df
TYPE                TOTAL               ACTIVE              SIZE
     RECLAIMABLE
Images              88                  9                   125.4GB
    124.2GB (99%)
Containers          40                  4                   7.927GB
    7.871GB (99%)
Local Volumes       47                  0                   3.165GB
    3.165GB (100%)
Build Cache         0                   0                   0B
     0B

There are about 90 images on that machine, with all but 1 less than 48
hours old.
I think the docker test jobs need to try harder at cleaning up their
leftover images. (assuming they're already doing it?)

On Fri, Jul 24, 2020 at 12:31 PM Udi Meiri <eh...@google.com> wrote:

> The additional slots (@3 directories) take up even more space now than
> before.
>
> I'm testing out https://github.com/apache/beam/pull/12326 which could
> help by cleaning up workspaces after a run (just started a seed job).
>
> On Fri, Jul 24, 2020 at 12:13 PM Tyson Hamilton <ty...@google.com>
> wrote:
>
>> 664M    beam_PreCommit_JavaPortabilityApi_Commit
>> 656M    beam_PreCommit_JavaPortabilityApi_Commit@2
>> 611M    beam_PreCommit_JavaPortabilityApi_Cron
>> 616M    beam_PreCommit_JavaPortabilityApiJava11_Commit
>> 598M    beam_PreCommit_JavaPortabilityApiJava11_Commit@2
>> 662M    beam_PreCommit_JavaPortabilityApiJava11_Cron
>> 2.9G    beam_PreCommit_Portable_Python_Commit
>> 2.9G    beam_PreCommit_Portable_Python_Commit@2
>> 1.7G    beam_PreCommit_Portable_Python_Commit@3
>> 3.4G    beam_PreCommit_Portable_Python_Cron
>> 1.9G    beam_PreCommit_Python2_PVR_Flink_Commit
>> 1.4G    beam_PreCommit_Python2_PVR_Flink_Cron
>> 1.3G    beam_PreCommit_Python2_PVR_Flink_Phrase
>> 6.2G    beam_PreCommit_Python_Commit
>> 7.5G    beam_PreCommit_Python_Commit@2
>> 7.5G    beam_PreCommit_Python_Cron
>> 1012M   beam_PreCommit_PythonDocker_Commit
>> 1011M   beam_PreCommit_PythonDocker_Commit@2
>> 1011M   beam_PreCommit_PythonDocker_Commit@3
>> 1002M   beam_PreCommit_PythonDocker_Cron
>> 877M    beam_PreCommit_PythonFormatter_Commit
>> 988M    beam_PreCommit_PythonFormatter_Cron
>> 986M    beam_PreCommit_PythonFormatter_Phrase
>> 1.7G    beam_PreCommit_PythonLint_Commit
>> 2.1G    beam_PreCommit_PythonLint_Cron
>> 7.5G    beam_PreCommit_Python_Phrase
>> 346M    beam_PreCommit_RAT_Commit
>> 341M    beam_PreCommit_RAT_Cron
>> 338M    beam_PreCommit_Spotless_Commit
>> 339M    beam_PreCommit_Spotless_Cron
>> 5.5G    beam_PreCommit_SQL_Commit
>> 5.5G    beam_PreCommit_SQL_Cron
>> 5.5G    beam_PreCommit_SQL_Java11_Commit
>> 750M    beam_PreCommit_Website_Commit
>> 750M    beam_PreCommit_Website_Commit@2
>> 750M    beam_PreCommit_Website_Cron
>> 764M    beam_PreCommit_Website_Stage_GCS_Commit
>> 771M    beam_PreCommit_Website_Stage_GCS_Cron
>> 336M    beam_Prober_CommunityMetrics
>> 693M    beam_python_mongoio_load_test
>> 339M    beam_SeedJob
>> 333M    beam_SeedJob_Standalone
>> 334M    beam_sonarqube_report
>> 556M    beam_SQLBigQueryIO_Batch_Performance_Test_Java
>> 175G    total
>>
>> On Fri, Jul 24, 2020 at 12:04 PM Tyson Hamilton <ty...@google.com>
>> wrote:
>>
>>> Ya looks like something in the workspaces is taking up room:
>>>
>>> @apache-ci-beam-jenkins-8:/home/jenkins$ sudo du -shc .
>>> 191G    .
>>> 191G    total
>>>
>>>
>>> On Fri, Jul 24, 2020 at 11:44 AM Tyson Hamilton <ty...@google.com>
>>> wrote:
>>>
>>>> Node 8 is also full. The partition that /tmp is on is here:
>>>>
>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>> /dev/sda1       485G  482G  2.9G 100% /
>>>>
>>>> however after cleaning up tmp with the crontab command, there is only
>>>> 8G usage yet it still remains 100% full:
>>>>
>>>> @apache-ci-beam-jenkins-8:/tmp$ sudo du -shc /tmp
>>>> 8.0G    /tmp
>>>> 8.0G    total
>>>>
>>>> The workspaces are in the /home/jenkins/jenkins-slave/workspace
>>>> directory. When I run a du on that, it takes really long. I'll let it keep
>>>> running for a while to see if it ever returns a result but so far this
>>>> seems suspect.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jul 24, 2020 at 11:19 AM Tyson Hamilton <ty...@google.com>
>>>> wrote:
>>>>
>>>>> Everything I've been looking at is in the /tmp dir. Where are the
>>>>> workspaces, or what are the named?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jul 24, 2020 at 11:03 AM Udi Meiri <eh...@google.com> wrote:
>>>>>
>>>>>> I'm curious to what you find. Was it /tmp or the workspaces using up
>>>>>> the space?
>>>>>>
>>>>>> On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton <ty...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Bleck. I just realized that it is 'offline' so that won't work. I'll
>>>>>>> clean up manually on the machine using the cron command.
>>>>>>>
>>>>>>> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <ty...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Something isn't working with the current set up because node 15
>>>>>>>> appears to be out of space and is currently 'offline' according to Jenkins.
>>>>>>>> Can someone run the cleanup job? The machine is full,
>>>>>>>>
>>>>>>>> @apache-ci-beam-jenkins-15:/tmp$ df -h
>>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>>>> udev             52G     0   52G   0% /dev
>>>>>>>> tmpfs            11G  265M   10G   3% /run
>>>>>>>> */dev/sda1       485G  484G  880M 100% /*
>>>>>>>> tmpfs            52G     0   52G   0% /dev/shm
>>>>>>>> tmpfs           5.0M     0  5.0M   0% /run/lock
>>>>>>>> tmpfs            52G     0   52G   0% /sys/fs/cgroup
>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1017
>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1037
>>>>>>>>
>>>>>>>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort -rhk
>>>>>>>> 1,1 | head -n 20
>>>>>>>> 20G     2020-07-24 17:52        .
>>>>>>>> 580M    2020-07-22 17:31        ./junit1031982597110125586
>>>>>>>> 517M    2020-07-22 17:31
>>>>>>>>  ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
>>>>>>>> 517M    2020-07-22 17:31
>>>>>>>>  ./junit1031982597110125586/junit8739924829337821410
>>>>>>>> 263M    2020-07-22 12:23        ./pip-install-2GUhO_
>>>>>>>> 263M    2020-07-20 09:30        ./pip-install-sxgwqr
>>>>>>>> 263M    2020-07-17 13:56        ./pip-install-bWSKIV
>>>>>>>> 242M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T
>>>>>>>> 242M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK
>>>>>>>> 242M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ
>>>>>>>> 236M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T/tmpOWj3Yr
>>>>>>>> 236M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK/tmppbQHB3
>>>>>>>> 236M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ/tmpgOXPKW
>>>>>>>> 111M    2020-07-23 00:57        ./pip-install-1JnyNE
>>>>>>>> 105M    2020-07-23 00:17        ./beam-artifact1374651823280819755
>>>>>>>> 105M    2020-07-23 00:16        ./beam-artifact5050755582921936972
>>>>>>>> 105M    2020-07-23 00:16        ./beam-artifact1834064452502646289
>>>>>>>> 105M    2020-07-23 00:15        ./beam-artifact682561790267074916
>>>>>>>> 105M    2020-07-23 00:15        ./beam-artifact4691304965824489394
>>>>>>>> 105M    2020-07-23 00:14        ./beam-artifact4050383819822604421
>>>>>>>>
>>>>>>>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <
>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>
>>>>>>>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <
>>>>>>>>> tysonjh@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> Ah I see, thanks Kenn. I found some advice from the Apache infra
>>>>>>>>>> wiki that also suggests using a tmpdir inside the workspace [1]:
>>>>>>>>>>
>>>>>>>>>> Procedures Projects can take to clean up disk space
>>>>>>>>>>
>>>>>>>>>> Projects can help themselves and Infra by taking some basic steps
>>>>>>>>>> to help clean up their jobs after themselves on the build nodes.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    1. Use a ./tmp dir in your jobs workspace. That way it gets
>>>>>>>>>>    cleaned up when job workspaces expire.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Tests should be (able to be) written to use the standard temporary
>>>>>>>>> file mechanisms, and the environment set up on Jenkins such that that falls
>>>>>>>>> into the respective workspaces. Ideally this should be as simple as setting
>>>>>>>>> the TMPDIR (or similar) environment variable (and making sure it exists/is
>>>>>>>>> writable).
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    1. Configure your jobs to wipe workspaces on start or finish.
>>>>>>>>>>    2. Configure your jobs to only keep 5 or 10 previous builds.
>>>>>>>>>>    3. Configure your jobs to only keep 5 or 10 previous
>>>>>>>>>>    artifacts.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [1]:
>>>>>>>>>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <ke...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Those file listings look like the result of using standard temp
>>>>>>>>>>> file APIs but with TMPDIR set to /tmp.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <
>>>>>>>>>>> tysonjh@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Jobs are hermetic as far as I can tell and use unique
>>>>>>>>>>>> subdirectories inside of /tmp. Here is a quick look into two examples:
>>>>>>>>>>>>
>>>>>>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort
>>>>>>>>>>>> -rhk 1,1 | head -n 20
>>>>>>>>>>>> 1.6G    2020-07-21 02:25        .
>>>>>>>>>>>> 242M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4
>>>>>>>>>>>> 242M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT
>>>>>>>>>>>> 242M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME
>>>>>>>>>>>> 242M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB
>>>>>>>>>>>> 242M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q
>>>>>>>>>>>> 242M    2020-07-17 18:35        ./beam-pipeline-temp79qot2
>>>>>>>>>>>> 236M    2020-07-17 18:48
>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>>>>>>>>>>>> 236M    2020-07-17 18:46
>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>>>>>>>>>>>> 236M    2020-07-17 18:44
>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>>>>>>>>>>>> 236M    2020-07-17 18:42
>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>>>>>>>>>>>> 236M    2020-07-17 18:39
>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmpWy1vWX
>>>>>>>>>>>> 236M    2020-07-17 18:35
>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmpvN7vWA
>>>>>>>>>>>> 3.7M    2020-07-17 18:48
>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmprlh_di
>>>>>>>>>>>> 3.7M    2020-07-17 18:46
>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>>>>>>>>>>>> 3.7M    2020-07-17 18:44
>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>>>>>>>>>>>> 3.7M    2020-07-17 18:42
>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>>>>>>>>>>>> 3.7M    2020-07-17 18:39
>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmptYF1v1
>>>>>>>>>>>> 3.7M    2020-07-17 18:35
>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmplfV0Rg
>>>>>>>>>>>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort
>>>>>>>>>>>> -rhk 1,1 | head -n 20
>>>>>>>>>>>> 817M    2020-07-21 02:26        .
>>>>>>>>>>>> 242M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM
>>>>>>>>>>>> 242M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3
>>>>>>>>>>>> 242M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq
>>>>>>>>>>>> 236M    2020-07-19 12:14
>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpstXoL0
>>>>>>>>>>>> 236M    2020-07-19 12:11
>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpnnVn65
>>>>>>>>>>>> 236M    2020-07-19 12:05
>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpRF0iNs
>>>>>>>>>>>> 3.7M    2020-07-19 12:14
>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>>>>>>>>>>>> 3.7M    2020-07-19 12:11
>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpsmmzqe
>>>>>>>>>>>> 3.7M    2020-07-19 12:05
>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>>>>>>>>>>>> 2.0M    2020-07-19 12:14
>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpoj3orz
>>>>>>>>>>>> 2.0M    2020-07-19 12:11
>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmptng9sZ
>>>>>>>>>>>> 2.0M    2020-07-19 12:05
>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpWp6njc
>>>>>>>>>>>> 1.2M    2020-07-19 12:14
>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmphgdj35
>>>>>>>>>>>> 1.2M    2020-07-19 12:11
>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmp8ySXpm
>>>>>>>>>>>> 1.2M    2020-07-19 12:05
>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>>>>>>>>>>>> 992K    2020-07-12 12:00        ./junit642086915811430564
>>>>>>>>>>>> 988K    2020-07-12 12:00        ./junit642086915811430564/beam
>>>>>>>>>>>> 984K    2020-07-12 12:00
>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes
>>>>>>>>>>>> 980K    2020-07-12 12:00
>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes/0
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <eh...@google.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> You're right, job workspaces should be hermetic.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <
>>>>>>>>>>>>> kenn@apache.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm probably late to this discussion and missing something,
>>>>>>>>>>>>>> but why are we writing to /tmp at all? I would expect TMPDIR to point
>>>>>>>>>>>>>> somewhere inside the job directory that will be wiped by Jenkins, and I
>>>>>>>>>>>>>> would expect code to always create temp files via APIs that respect this.
>>>>>>>>>>>>>> Is Jenkins not cleaning up? Do we not have the ability to set this up? Do
>>>>>>>>>>>>>> we have bugs in our code (that we could probably find by setting TMPDIR to
>>>>>>>>>>>>>> somewhere not-/tmp and running the tests without write permission to /tmp,
>>>>>>>>>>>>>> etc)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Kenn
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <
>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Related to workspace directory growth, +Udi Meiri
>>>>>>>>>>>>>>> <eh...@google.com> filed a relevant issue previously (
>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for
>>>>>>>>>>>>>>> cleaning up workspace directory after successful jobs. Alternatively, we
>>>>>>>>>>>>>>> can consider periodically cleaning up the /src directories.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I would suggest moving the cron task from internal cron
>>>>>>>>>>>>>>> scripts to the inventory job (
>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
>>>>>>>>>>>>>>> That way, we can see all the cron jobs as part of the source tree, adjust
>>>>>>>>>>>>>>> frequencies and clean up codes with PRs. I do not know how internal cron
>>>>>>>>>>>>>>> scripts are created, maintained, and how would they be recreated for new
>>>>>>>>>>>>>>> worker instances.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /cc +Tyson Hamilton <ty...@google.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>>>>>>>>>>>>>>> damian.gadomski@polidea.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I've recently created a solution for the growing /tmp
>>>>>>>>>>>>>>>> directory. Part of it is the job mentioned by Tyson:
>>>>>>>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not
>>>>>>>>>>>>>>>> triggered by cron and should be a last resort solution for some strange
>>>>>>>>>>>>>>>> cases.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Along with that job, I've also updated every worker with an
>>>>>>>>>>>>>>>> internal cron script. It's being executed once a week and deletes all the
>>>>>>>>>>>>>>>> files (and only files) that were not accessed for at least three days.
>>>>>>>>>>>>>>>> That's designed to be as safe as possible for the running jobs on the
>>>>>>>>>>>>>>>> worker (not to delete the files that are still in use), and also to be
>>>>>>>>>>>>>>>> insensitive to the current workload on the machine. The cleanup will always
>>>>>>>>>>>>>>>> happen, even if some long-running/stuck jobs are blocking the machine.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I also think that currently the "No space left" errors may
>>>>>>>>>>>>>>>> be a consequence of growing workspace directory rather than /tmp. I didn't
>>>>>>>>>>>>>>>> do any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the
>>>>>>>>>>>>>>>> workspace directory size is 158 GB while /tmp is only 16 GB. We should
>>>>>>>>>>>>>>>> either guarantee the disk size to hold workspaces for all jobs (because
>>>>>>>>>>>>>>>> eventually, every worker will execute each job) or clear also the
>>>>>>>>>>>>>>>> workspaces in some way.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>> Damian
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <
>>>>>>>>>>>>>>>> mxm@apache.org> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> +1 for scheduling it via a cron job if it won't lead to
>>>>>>>>>>>>>>>>> test failures
>>>>>>>>>>>>>>>>> while running. Not a Jenkins expert but maybe there is the
>>>>>>>>>>>>>>>>> notion of
>>>>>>>>>>>>>>>>> running exclusively while no other tasks are running?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -Max
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>>>>>>>>>>>>>>> > FYI there was a job introduced to do this in Jenkins:
>>>>>>>>>>>>>>>>> beam_Clean_tmp_directory
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > Currently it needs to be run manually. I'm seeing some
>>>>>>>>>>>>>>>>> out of disk related errors in precommit tests currently, perhaps we should
>>>>>>>>>>>>>>>>> schedule this job with cron?
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <he...@google.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> >> Still seeing no space left on device errors on
>>>>>>>>>>>>>>>>> jenkins-7 (for example:
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <
>>>>>>>>>>>>>>>>> amyrvold@google.com> wrote:
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by jenkins
>>>>>>>>>>>>>>>>> older than 3 days.
>>>>>>>>>>>>>>>>> >>> Agree that we need a longer term solution.
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> Passing recent tests on all executors except
>>>>>>>>>>>>>>>>> jenkins-12, which has not
>>>>>>>>>>>>>>>>> >>> scheduled recent builds for the past 13 days. Not
>>>>>>>>>>>>>>>>> scheduling:
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >>> Recent passing builds:
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <
>>>>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>>> +Alan Myrvold <am...@google.com> is doing a one
>>>>>>>>>>>>>>>>> time cleanup. I agree
>>>>>>>>>>>>>>>>> >>>> that we need to have a solution to automate this task
>>>>>>>>>>>>>>>>> or address the root
>>>>>>>>>>>>>>>>> >>>> cause of the buildup.
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>>>>>>>>>>>>>>>>> michal.walenia@polidea.com>
>>>>>>>>>>>>>>>>> >>>> wrote:
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>>> Hi there,
>>>>>>>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins workers
>>>>>>>>>>>>>>>>> again. Nodes 1 and 7
>>>>>>>>>>>>>>>>> >>>>> both fail jobs with "No space left on device".
>>>>>>>>>>>>>>>>> >>>>> Who is the best person to contact in these cases
>>>>>>>>>>>>>>>>> (someone with access
>>>>>>>>>>>>>>>>> >>>>> permissions to the workers).
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>> I also noticed that such errors are becoming more
>>>>>>>>>>>>>>>>> and more frequent
>>>>>>>>>>>>>>>>> >>>>> recently and I'd like to discuss how can this be
>>>>>>>>>>>>>>>>> remedied. Can a cleanup
>>>>>>>>>>>>>>>>> >>>>> task be automated on Jenkins somehow?
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>> Regards
>>>>>>>>>>>>>>>>> >>>>> Michal
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>> --
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>> Michał Walenia
>>>>>>>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software
>>>>>>>>>>>>>>>>> Engineer
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <
>>>>>>>>>>>>>>>>> +48791432002 <+48%20791%20432%20002>>
>>>>>>>>>>>>>>>>> >>>>> E: michal.walenia@polidea.com
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>> Unique Tech
>>>>>>>>>>>>>>>>> >>>>> Check out our projects! <
>>>>>>>>>>>>>>>>> https://www.polidea.com/our-work>
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Udi Meiri <eh...@google.com>.
The additional slots (@3 directories) take up even more space now than
before.

I'm testing out https://github.com/apache/beam/pull/12326 which could help
by cleaning up workspaces after a run (just started a seed job).

On Fri, Jul 24, 2020 at 12:13 PM Tyson Hamilton <ty...@google.com> wrote:

> 664M    beam_PreCommit_JavaPortabilityApi_Commit
> 656M    beam_PreCommit_JavaPortabilityApi_Commit@2
> 611M    beam_PreCommit_JavaPortabilityApi_Cron
> 616M    beam_PreCommit_JavaPortabilityApiJava11_Commit
> 598M    beam_PreCommit_JavaPortabilityApiJava11_Commit@2
> 662M    beam_PreCommit_JavaPortabilityApiJava11_Cron
> 2.9G    beam_PreCommit_Portable_Python_Commit
> 2.9G    beam_PreCommit_Portable_Python_Commit@2
> 1.7G    beam_PreCommit_Portable_Python_Commit@3
> 3.4G    beam_PreCommit_Portable_Python_Cron
> 1.9G    beam_PreCommit_Python2_PVR_Flink_Commit
> 1.4G    beam_PreCommit_Python2_PVR_Flink_Cron
> 1.3G    beam_PreCommit_Python2_PVR_Flink_Phrase
> 6.2G    beam_PreCommit_Python_Commit
> 7.5G    beam_PreCommit_Python_Commit@2
> 7.5G    beam_PreCommit_Python_Cron
> 1012M   beam_PreCommit_PythonDocker_Commit
> 1011M   beam_PreCommit_PythonDocker_Commit@2
> 1011M   beam_PreCommit_PythonDocker_Commit@3
> 1002M   beam_PreCommit_PythonDocker_Cron
> 877M    beam_PreCommit_PythonFormatter_Commit
> 988M    beam_PreCommit_PythonFormatter_Cron
> 986M    beam_PreCommit_PythonFormatter_Phrase
> 1.7G    beam_PreCommit_PythonLint_Commit
> 2.1G    beam_PreCommit_PythonLint_Cron
> 7.5G    beam_PreCommit_Python_Phrase
> 346M    beam_PreCommit_RAT_Commit
> 341M    beam_PreCommit_RAT_Cron
> 338M    beam_PreCommit_Spotless_Commit
> 339M    beam_PreCommit_Spotless_Cron
> 5.5G    beam_PreCommit_SQL_Commit
> 5.5G    beam_PreCommit_SQL_Cron
> 5.5G    beam_PreCommit_SQL_Java11_Commit
> 750M    beam_PreCommit_Website_Commit
> 750M    beam_PreCommit_Website_Commit@2
> 750M    beam_PreCommit_Website_Cron
> 764M    beam_PreCommit_Website_Stage_GCS_Commit
> 771M    beam_PreCommit_Website_Stage_GCS_Cron
> 336M    beam_Prober_CommunityMetrics
> 693M    beam_python_mongoio_load_test
> 339M    beam_SeedJob
> 333M    beam_SeedJob_Standalone
> 334M    beam_sonarqube_report
> 556M    beam_SQLBigQueryIO_Batch_Performance_Test_Java
> 175G    total
>
> On Fri, Jul 24, 2020 at 12:04 PM Tyson Hamilton <ty...@google.com>
> wrote:
>
>> Ya looks like something in the workspaces is taking up room:
>>
>> @apache-ci-beam-jenkins-8:/home/jenkins$ sudo du -shc .
>> 191G    .
>> 191G    total
>>
>>
>> On Fri, Jul 24, 2020 at 11:44 AM Tyson Hamilton <ty...@google.com>
>> wrote:
>>
>>> Node 8 is also full. The partition that /tmp is on is here:
>>>
>>> Filesystem      Size  Used Avail Use% Mounted on
>>> /dev/sda1       485G  482G  2.9G 100% /
>>>
>>> however after cleaning up tmp with the crontab command, there is only 8G
>>> usage yet it still remains 100% full:
>>>
>>> @apache-ci-beam-jenkins-8:/tmp$ sudo du -shc /tmp
>>> 8.0G    /tmp
>>> 8.0G    total
>>>
>>> The workspaces are in the /home/jenkins/jenkins-slave/workspace
>>> directory. When I run a du on that, it takes really long. I'll let it keep
>>> running for a while to see if it ever returns a result but so far this
>>> seems suspect.
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Jul 24, 2020 at 11:19 AM Tyson Hamilton <ty...@google.com>
>>> wrote:
>>>
>>>> Everything I've been looking at is in the /tmp dir. Where are the
>>>> workspaces, or what are the named?
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jul 24, 2020 at 11:03 AM Udi Meiri <eh...@google.com> wrote:
>>>>
>>>>> I'm curious to what you find. Was it /tmp or the workspaces using up
>>>>> the space?
>>>>>
>>>>> On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton <ty...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Bleck. I just realized that it is 'offline' so that won't work. I'll
>>>>>> clean up manually on the machine using the cron command.
>>>>>>
>>>>>> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <ty...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Something isn't working with the current set up because node 15
>>>>>>> appears to be out of space and is currently 'offline' according to Jenkins.
>>>>>>> Can someone run the cleanup job? The machine is full,
>>>>>>>
>>>>>>> @apache-ci-beam-jenkins-15:/tmp$ df -h
>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>>> udev             52G     0   52G   0% /dev
>>>>>>> tmpfs            11G  265M   10G   3% /run
>>>>>>> */dev/sda1       485G  484G  880M 100% /*
>>>>>>> tmpfs            52G     0   52G   0% /dev/shm
>>>>>>> tmpfs           5.0M     0  5.0M   0% /run/lock
>>>>>>> tmpfs            52G     0   52G   0% /sys/fs/cgroup
>>>>>>> tmpfs            11G     0   11G   0% /run/user/1017
>>>>>>> tmpfs            11G     0   11G   0% /run/user/1037
>>>>>>>
>>>>>>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort -rhk 1,1
>>>>>>> | head -n 20
>>>>>>> 20G     2020-07-24 17:52        .
>>>>>>> 580M    2020-07-22 17:31        ./junit1031982597110125586
>>>>>>> 517M    2020-07-22 17:31
>>>>>>>  ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
>>>>>>> 517M    2020-07-22 17:31
>>>>>>>  ./junit1031982597110125586/junit8739924829337821410
>>>>>>> 263M    2020-07-22 12:23        ./pip-install-2GUhO_
>>>>>>> 263M    2020-07-20 09:30        ./pip-install-sxgwqr
>>>>>>> 263M    2020-07-17 13:56        ./pip-install-bWSKIV
>>>>>>> 242M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T
>>>>>>> 242M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK
>>>>>>> 242M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ
>>>>>>> 236M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T/tmpOWj3Yr
>>>>>>> 236M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK/tmppbQHB3
>>>>>>> 236M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ/tmpgOXPKW
>>>>>>> 111M    2020-07-23 00:57        ./pip-install-1JnyNE
>>>>>>> 105M    2020-07-23 00:17        ./beam-artifact1374651823280819755
>>>>>>> 105M    2020-07-23 00:16        ./beam-artifact5050755582921936972
>>>>>>> 105M    2020-07-23 00:16        ./beam-artifact1834064452502646289
>>>>>>> 105M    2020-07-23 00:15        ./beam-artifact682561790267074916
>>>>>>> 105M    2020-07-23 00:15        ./beam-artifact4691304965824489394
>>>>>>> 105M    2020-07-23 00:14        ./beam-artifact4050383819822604421
>>>>>>>
>>>>>>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <
>>>>>>> robertwb@google.com> wrote:
>>>>>>>
>>>>>>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <ty...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Ah I see, thanks Kenn. I found some advice from the Apache infra
>>>>>>>>> wiki that also suggests using a tmpdir inside the workspace [1]:
>>>>>>>>>
>>>>>>>>> Procedures Projects can take to clean up disk space
>>>>>>>>>
>>>>>>>>> Projects can help themselves and Infra by taking some basic steps
>>>>>>>>> to help clean up their jobs after themselves on the build nodes.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    1. Use a ./tmp dir in your jobs workspace. That way it gets
>>>>>>>>>    cleaned up when job workspaces expire.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Tests should be (able to be) written to use the standard temporary
>>>>>>>> file mechanisms, and the environment set up on Jenkins such that that falls
>>>>>>>> into the respective workspaces. Ideally this should be as simple as setting
>>>>>>>> the TMPDIR (or similar) environment variable (and making sure it exists/is
>>>>>>>> writable).
>>>>>>>>
>>>>>>>>>
>>>>>>>>>    1. Configure your jobs to wipe workspaces on start or finish.
>>>>>>>>>    2. Configure your jobs to only keep 5 or 10 previous builds.
>>>>>>>>>    3. Configure your jobs to only keep 5 or 10 previous artifacts.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1]:
>>>>>>>>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
>>>>>>>>>
>>>>>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <ke...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Those file listings look like the result of using standard temp
>>>>>>>>>> file APIs but with TMPDIR set to /tmp.
>>>>>>>>>>
>>>>>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <
>>>>>>>>>> tysonjh@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Jobs are hermetic as far as I can tell and use unique
>>>>>>>>>>> subdirectories inside of /tmp. Here is a quick look into two examples:
>>>>>>>>>>>
>>>>>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort -rhk
>>>>>>>>>>> 1,1 | head -n 20
>>>>>>>>>>> 1.6G    2020-07-21 02:25        .
>>>>>>>>>>> 242M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4
>>>>>>>>>>> 242M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT
>>>>>>>>>>> 242M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME
>>>>>>>>>>> 242M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB
>>>>>>>>>>> 242M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q
>>>>>>>>>>> 242M    2020-07-17 18:35        ./beam-pipeline-temp79qot2
>>>>>>>>>>> 236M    2020-07-17 18:48
>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>>>>>>>>>>> 236M    2020-07-17 18:46
>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>>>>>>>>>>> 236M    2020-07-17 18:44
>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>>>>>>>>>>> 236M    2020-07-17 18:42
>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>>>>>>>>>>> 236M    2020-07-17 18:39
>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmpWy1vWX
>>>>>>>>>>> 236M    2020-07-17 18:35
>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmpvN7vWA
>>>>>>>>>>> 3.7M    2020-07-17 18:48
>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmprlh_di
>>>>>>>>>>> 3.7M    2020-07-17 18:46
>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>>>>>>>>>>> 3.7M    2020-07-17 18:44
>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>>>>>>>>>>> 3.7M    2020-07-17 18:42
>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>>>>>>>>>>> 3.7M    2020-07-17 18:39
>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmptYF1v1
>>>>>>>>>>> 3.7M    2020-07-17 18:35
>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmplfV0Rg
>>>>>>>>>>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort
>>>>>>>>>>> -rhk 1,1 | head -n 20
>>>>>>>>>>> 817M    2020-07-21 02:26        .
>>>>>>>>>>> 242M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM
>>>>>>>>>>> 242M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3
>>>>>>>>>>> 242M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq
>>>>>>>>>>> 236M    2020-07-19 12:14
>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpstXoL0
>>>>>>>>>>> 236M    2020-07-19 12:11
>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpnnVn65
>>>>>>>>>>> 236M    2020-07-19 12:05
>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpRF0iNs
>>>>>>>>>>> 3.7M    2020-07-19 12:14
>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>>>>>>>>>>> 3.7M    2020-07-19 12:11
>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpsmmzqe
>>>>>>>>>>> 3.7M    2020-07-19 12:05
>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>>>>>>>>>>> 2.0M    2020-07-19 12:14
>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpoj3orz
>>>>>>>>>>> 2.0M    2020-07-19 12:11
>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmptng9sZ
>>>>>>>>>>> 2.0M    2020-07-19 12:05
>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpWp6njc
>>>>>>>>>>> 1.2M    2020-07-19 12:14
>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmphgdj35
>>>>>>>>>>> 1.2M    2020-07-19 12:11
>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmp8ySXpm
>>>>>>>>>>> 1.2M    2020-07-19 12:05
>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>>>>>>>>>>> 992K    2020-07-12 12:00        ./junit642086915811430564
>>>>>>>>>>> 988K    2020-07-12 12:00        ./junit642086915811430564/beam
>>>>>>>>>>> 984K    2020-07-12 12:00
>>>>>>>>>>>  ./junit642086915811430564/beam/nodes
>>>>>>>>>>> 980K    2020-07-12 12:00
>>>>>>>>>>>  ./junit642086915811430564/beam/nodes/0
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <eh...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> You're right, job workspaces should be hermetic.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <
>>>>>>>>>>>> kenn@apache.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I'm probably late to this discussion and missing something,
>>>>>>>>>>>>> but why are we writing to /tmp at all? I would expect TMPDIR to point
>>>>>>>>>>>>> somewhere inside the job directory that will be wiped by Jenkins, and I
>>>>>>>>>>>>> would expect code to always create temp files via APIs that respect this.
>>>>>>>>>>>>> Is Jenkins not cleaning up? Do we not have the ability to set this up? Do
>>>>>>>>>>>>> we have bugs in our code (that we could probably find by setting TMPDIR to
>>>>>>>>>>>>> somewhere not-/tmp and running the tests without write permission to /tmp,
>>>>>>>>>>>>> etc)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Kenn
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <al...@google.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Related to workspace directory growth, +Udi Meiri
>>>>>>>>>>>>>> <eh...@google.com> filed a relevant issue previously (
>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for
>>>>>>>>>>>>>> cleaning up workspace directory after successful jobs. Alternatively, we
>>>>>>>>>>>>>> can consider periodically cleaning up the /src directories.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I would suggest moving the cron task from internal cron
>>>>>>>>>>>>>> scripts to the inventory job (
>>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
>>>>>>>>>>>>>> That way, we can see all the cron jobs as part of the source tree, adjust
>>>>>>>>>>>>>> frequencies and clean up codes with PRs. I do not know how internal cron
>>>>>>>>>>>>>> scripts are created, maintained, and how would they be recreated for new
>>>>>>>>>>>>>> worker instances.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /cc +Tyson Hamilton <ty...@google.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>>>>>>>>>>>>>> damian.gadomski@polidea.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I've recently created a solution for the growing /tmp
>>>>>>>>>>>>>>> directory. Part of it is the job mentioned by Tyson:
>>>>>>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not
>>>>>>>>>>>>>>> triggered by cron and should be a last resort solution for some strange
>>>>>>>>>>>>>>> cases.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Along with that job, I've also updated every worker with an
>>>>>>>>>>>>>>> internal cron script. It's being executed once a week and deletes all the
>>>>>>>>>>>>>>> files (and only files) that were not accessed for at least three days.
>>>>>>>>>>>>>>> That's designed to be as safe as possible for the running jobs on the
>>>>>>>>>>>>>>> worker (not to delete the files that are still in use), and also to be
>>>>>>>>>>>>>>> insensitive to the current workload on the machine. The cleanup will always
>>>>>>>>>>>>>>> happen, even if some long-running/stuck jobs are blocking the machine.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I also think that currently the "No space left" errors may
>>>>>>>>>>>>>>> be a consequence of growing workspace directory rather than /tmp. I didn't
>>>>>>>>>>>>>>> do any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the
>>>>>>>>>>>>>>> workspace directory size is 158 GB while /tmp is only 16 GB. We should
>>>>>>>>>>>>>>> either guarantee the disk size to hold workspaces for all jobs (because
>>>>>>>>>>>>>>> eventually, every worker will execute each job) or clear also the
>>>>>>>>>>>>>>> workspaces in some way.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Damian
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <
>>>>>>>>>>>>>>> mxm@apache.org> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +1 for scheduling it via a cron job if it won't lead to
>>>>>>>>>>>>>>>> test failures
>>>>>>>>>>>>>>>> while running. Not a Jenkins expert but maybe there is the
>>>>>>>>>>>>>>>> notion of
>>>>>>>>>>>>>>>> running exclusively while no other tasks are running?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Max
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>>>>>>>>>>>>>> > FYI there was a job introduced to do this in Jenkins:
>>>>>>>>>>>>>>>> beam_Clean_tmp_directory
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > Currently it needs to be run manually. I'm seeing some
>>>>>>>>>>>>>>>> out of disk related errors in precommit tests currently, perhaps we should
>>>>>>>>>>>>>>>> schedule this job with cron?
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <he...@google.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> >> Still seeing no space left on device errors on jenkins-7
>>>>>>>>>>>>>>>> (for example:
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <
>>>>>>>>>>>>>>>> amyrvold@google.com> wrote:
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by jenkins
>>>>>>>>>>>>>>>> older than 3 days.
>>>>>>>>>>>>>>>> >>> Agree that we need a longer term solution.
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>> Passing recent tests on all executors except
>>>>>>>>>>>>>>>> jenkins-12, which has not
>>>>>>>>>>>>>>>> >>> scheduled recent builds for the past 13 days. Not
>>>>>>>>>>>>>>>> scheduling:
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >>> Recent passing builds:
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <
>>>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>>> +Alan Myrvold <am...@google.com> is doing a one
>>>>>>>>>>>>>>>> time cleanup. I agree
>>>>>>>>>>>>>>>> >>>> that we need to have a solution to automate this task
>>>>>>>>>>>>>>>> or address the root
>>>>>>>>>>>>>>>> >>>> cause of the buildup.
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>>>>>>>>>>>>>>>> michal.walenia@polidea.com>
>>>>>>>>>>>>>>>> >>>> wrote:
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>>> Hi there,
>>>>>>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins workers
>>>>>>>>>>>>>>>> again. Nodes 1 and 7
>>>>>>>>>>>>>>>> >>>>> both fail jobs with "No space left on device".
>>>>>>>>>>>>>>>> >>>>> Who is the best person to contact in these cases
>>>>>>>>>>>>>>>> (someone with access
>>>>>>>>>>>>>>>> >>>>> permissions to the workers).
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>>> I also noticed that such errors are becoming more and
>>>>>>>>>>>>>>>> more frequent
>>>>>>>>>>>>>>>> >>>>> recently and I'd like to discuss how can this be
>>>>>>>>>>>>>>>> remedied. Can a cleanup
>>>>>>>>>>>>>>>> >>>>> task be automated on Jenkins somehow?
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>>> Regards
>>>>>>>>>>>>>>>> >>>>> Michal
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>>> --
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>>> Michał Walenia
>>>>>>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software
>>>>>>>>>>>>>>>> Engineer
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <
>>>>>>>>>>>>>>>> +48791432002 <+48%20791%20432%20002>>
>>>>>>>>>>>>>>>> >>>>> E: michal.walenia@polidea.com
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>>> Unique Tech
>>>>>>>>>>>>>>>> >>>>> Check out our projects! <
>>>>>>>>>>>>>>>> https://www.polidea.com/our-work>
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Tyson Hamilton <ty...@google.com>.
664M    beam_PreCommit_JavaPortabilityApi_Commit
656M    beam_PreCommit_JavaPortabilityApi_Commit@2
611M    beam_PreCommit_JavaPortabilityApi_Cron
616M    beam_PreCommit_JavaPortabilityApiJava11_Commit
598M    beam_PreCommit_JavaPortabilityApiJava11_Commit@2
662M    beam_PreCommit_JavaPortabilityApiJava11_Cron
2.9G    beam_PreCommit_Portable_Python_Commit
2.9G    beam_PreCommit_Portable_Python_Commit@2
1.7G    beam_PreCommit_Portable_Python_Commit@3
3.4G    beam_PreCommit_Portable_Python_Cron
1.9G    beam_PreCommit_Python2_PVR_Flink_Commit
1.4G    beam_PreCommit_Python2_PVR_Flink_Cron
1.3G    beam_PreCommit_Python2_PVR_Flink_Phrase
6.2G    beam_PreCommit_Python_Commit
7.5G    beam_PreCommit_Python_Commit@2
7.5G    beam_PreCommit_Python_Cron
1012M   beam_PreCommit_PythonDocker_Commit
1011M   beam_PreCommit_PythonDocker_Commit@2
1011M   beam_PreCommit_PythonDocker_Commit@3
1002M   beam_PreCommit_PythonDocker_Cron
877M    beam_PreCommit_PythonFormatter_Commit
988M    beam_PreCommit_PythonFormatter_Cron
986M    beam_PreCommit_PythonFormatter_Phrase
1.7G    beam_PreCommit_PythonLint_Commit
2.1G    beam_PreCommit_PythonLint_Cron
7.5G    beam_PreCommit_Python_Phrase
346M    beam_PreCommit_RAT_Commit
341M    beam_PreCommit_RAT_Cron
338M    beam_PreCommit_Spotless_Commit
339M    beam_PreCommit_Spotless_Cron
5.5G    beam_PreCommit_SQL_Commit
5.5G    beam_PreCommit_SQL_Cron
5.5G    beam_PreCommit_SQL_Java11_Commit
750M    beam_PreCommit_Website_Commit
750M    beam_PreCommit_Website_Commit@2
750M    beam_PreCommit_Website_Cron
764M    beam_PreCommit_Website_Stage_GCS_Commit
771M    beam_PreCommit_Website_Stage_GCS_Cron
336M    beam_Prober_CommunityMetrics
693M    beam_python_mongoio_load_test
339M    beam_SeedJob
333M    beam_SeedJob_Standalone
334M    beam_sonarqube_report
556M    beam_SQLBigQueryIO_Batch_Performance_Test_Java
175G    total

On Fri, Jul 24, 2020 at 12:04 PM Tyson Hamilton <ty...@google.com> wrote:

> Ya looks like something in the workspaces is taking up room:
>
> @apache-ci-beam-jenkins-8:/home/jenkins$ sudo du -shc .
> 191G    .
> 191G    total
>
>
> On Fri, Jul 24, 2020 at 11:44 AM Tyson Hamilton <ty...@google.com>
> wrote:
>
>> Node 8 is also full. The partition that /tmp is on is here:
>>
>> Filesystem      Size  Used Avail Use% Mounted on
>> /dev/sda1       485G  482G  2.9G 100% /
>>
>> however after cleaning up tmp with the crontab command, there is only 8G
>> usage yet it still remains 100% full:
>>
>> @apache-ci-beam-jenkins-8:/tmp$ sudo du -shc /tmp
>> 8.0G    /tmp
>> 8.0G    total
>>
>> The workspaces are in the /home/jenkins/jenkins-slave/workspace
>> directory. When I run a du on that, it takes really long. I'll let it keep
>> running for a while to see if it ever returns a result but so far this
>> seems suspect.
>>
>>
>>
>>
>>
>> On Fri, Jul 24, 2020 at 11:19 AM Tyson Hamilton <ty...@google.com>
>> wrote:
>>
>>> Everything I've been looking at is in the /tmp dir. Where are the
>>> workspaces, or what are the named?
>>>
>>>
>>>
>>>
>>> On Fri, Jul 24, 2020 at 11:03 AM Udi Meiri <eh...@google.com> wrote:
>>>
>>>> I'm curious to what you find. Was it /tmp or the workspaces using up
>>>> the space?
>>>>
>>>> On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton <ty...@google.com>
>>>> wrote:
>>>>
>>>>> Bleck. I just realized that it is 'offline' so that won't work. I'll
>>>>> clean up manually on the machine using the cron command.
>>>>>
>>>>> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <ty...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Something isn't working with the current set up because node 15
>>>>>> appears to be out of space and is currently 'offline' according to Jenkins.
>>>>>> Can someone run the cleanup job? The machine is full,
>>>>>>
>>>>>> @apache-ci-beam-jenkins-15:/tmp$ df -h
>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>> udev             52G     0   52G   0% /dev
>>>>>> tmpfs            11G  265M   10G   3% /run
>>>>>> */dev/sda1       485G  484G  880M 100% /*
>>>>>> tmpfs            52G     0   52G   0% /dev/shm
>>>>>> tmpfs           5.0M     0  5.0M   0% /run/lock
>>>>>> tmpfs            52G     0   52G   0% /sys/fs/cgroup
>>>>>> tmpfs            11G     0   11G   0% /run/user/1017
>>>>>> tmpfs            11G     0   11G   0% /run/user/1037
>>>>>>
>>>>>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort -rhk 1,1
>>>>>> | head -n 20
>>>>>> 20G     2020-07-24 17:52        .
>>>>>> 580M    2020-07-22 17:31        ./junit1031982597110125586
>>>>>> 517M    2020-07-22 17:31
>>>>>>  ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
>>>>>> 517M    2020-07-22 17:31
>>>>>>  ./junit1031982597110125586/junit8739924829337821410
>>>>>> 263M    2020-07-22 12:23        ./pip-install-2GUhO_
>>>>>> 263M    2020-07-20 09:30        ./pip-install-sxgwqr
>>>>>> 263M    2020-07-17 13:56        ./pip-install-bWSKIV
>>>>>> 242M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T
>>>>>> 242M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK
>>>>>> 242M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ
>>>>>> 236M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T/tmpOWj3Yr
>>>>>> 236M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK/tmppbQHB3
>>>>>> 236M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ/tmpgOXPKW
>>>>>> 111M    2020-07-23 00:57        ./pip-install-1JnyNE
>>>>>> 105M    2020-07-23 00:17        ./beam-artifact1374651823280819755
>>>>>> 105M    2020-07-23 00:16        ./beam-artifact5050755582921936972
>>>>>> 105M    2020-07-23 00:16        ./beam-artifact1834064452502646289
>>>>>> 105M    2020-07-23 00:15        ./beam-artifact682561790267074916
>>>>>> 105M    2020-07-23 00:15        ./beam-artifact4691304965824489394
>>>>>> 105M    2020-07-23 00:14        ./beam-artifact4050383819822604421
>>>>>>
>>>>>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <ro...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <ty...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Ah I see, thanks Kenn. I found some advice from the Apache infra
>>>>>>>> wiki that also suggests using a tmpdir inside the workspace [1]:
>>>>>>>>
>>>>>>>> Procedures Projects can take to clean up disk space
>>>>>>>>
>>>>>>>> Projects can help themselves and Infra by taking some basic steps
>>>>>>>> to help clean up their jobs after themselves on the build nodes.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>    1. Use a ./tmp dir in your jobs workspace. That way it gets
>>>>>>>>    cleaned up when job workspaces expire.
>>>>>>>>
>>>>>>>>
>>>>>>> Tests should be (able to be) written to use the standard temporary
>>>>>>> file mechanisms, and the environment set up on Jenkins such that that falls
>>>>>>> into the respective workspaces. Ideally this should be as simple as setting
>>>>>>> the TMPDIR (or similar) environment variable (and making sure it exists/is
>>>>>>> writable).
>>>>>>>
>>>>>>>>
>>>>>>>>    1. Configure your jobs to wipe workspaces on start or finish.
>>>>>>>>    2. Configure your jobs to only keep 5 or 10 previous builds.
>>>>>>>>    3. Configure your jobs to only keep 5 or 10 previous artifacts.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> [1]:
>>>>>>>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
>>>>>>>>
>>>>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <ke...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Those file listings look like the result of using standard temp
>>>>>>>>> file APIs but with TMPDIR set to /tmp.
>>>>>>>>>
>>>>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <ty...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Jobs are hermetic as far as I can tell and use unique
>>>>>>>>>> subdirectories inside of /tmp. Here is a quick look into two examples:
>>>>>>>>>>
>>>>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort -rhk
>>>>>>>>>> 1,1 | head -n 20
>>>>>>>>>> 1.6G    2020-07-21 02:25        .
>>>>>>>>>> 242M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4
>>>>>>>>>> 242M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT
>>>>>>>>>> 242M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME
>>>>>>>>>> 242M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB
>>>>>>>>>> 242M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q
>>>>>>>>>> 242M    2020-07-17 18:35        ./beam-pipeline-temp79qot2
>>>>>>>>>> 236M    2020-07-17 18:48
>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>>>>>>>>>> 236M    2020-07-17 18:46
>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>>>>>>>>>> 236M    2020-07-17 18:44
>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>>>>>>>>>> 236M    2020-07-17 18:42
>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>>>>>>>>>> 236M    2020-07-17 18:39
>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmpWy1vWX
>>>>>>>>>> 236M    2020-07-17 18:35
>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmpvN7vWA
>>>>>>>>>> 3.7M    2020-07-17 18:48
>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmprlh_di
>>>>>>>>>> 3.7M    2020-07-17 18:46
>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>>>>>>>>>> 3.7M    2020-07-17 18:44
>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>>>>>>>>>> 3.7M    2020-07-17 18:42
>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>>>>>>>>>> 3.7M    2020-07-17 18:39
>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmptYF1v1
>>>>>>>>>> 3.7M    2020-07-17 18:35
>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmplfV0Rg
>>>>>>>>>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort -rhk
>>>>>>>>>> 1,1 | head -n 20
>>>>>>>>>> 817M    2020-07-21 02:26        .
>>>>>>>>>> 242M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM
>>>>>>>>>> 242M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3
>>>>>>>>>> 242M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq
>>>>>>>>>> 236M    2020-07-19 12:14
>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpstXoL0
>>>>>>>>>> 236M    2020-07-19 12:11
>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpnnVn65
>>>>>>>>>> 236M    2020-07-19 12:05
>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpRF0iNs
>>>>>>>>>> 3.7M    2020-07-19 12:14
>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>>>>>>>>>> 3.7M    2020-07-19 12:11
>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpsmmzqe
>>>>>>>>>> 3.7M    2020-07-19 12:05
>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>>>>>>>>>> 2.0M    2020-07-19 12:14
>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpoj3orz
>>>>>>>>>> 2.0M    2020-07-19 12:11
>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmptng9sZ
>>>>>>>>>> 2.0M    2020-07-19 12:05
>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpWp6njc
>>>>>>>>>> 1.2M    2020-07-19 12:14
>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmphgdj35
>>>>>>>>>> 1.2M    2020-07-19 12:11
>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmp8ySXpm
>>>>>>>>>> 1.2M    2020-07-19 12:05
>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>>>>>>>>>> 992K    2020-07-12 12:00        ./junit642086915811430564
>>>>>>>>>> 988K    2020-07-12 12:00        ./junit642086915811430564/beam
>>>>>>>>>> 984K    2020-07-12 12:00
>>>>>>>>>>  ./junit642086915811430564/beam/nodes
>>>>>>>>>> 980K    2020-07-12 12:00
>>>>>>>>>>  ./junit642086915811430564/beam/nodes/0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <eh...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> You're right, job workspaces should be hermetic.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <ke...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I'm probably late to this discussion and missing something, but
>>>>>>>>>>>> why are we writing to /tmp at all? I would expect TMPDIR to point somewhere
>>>>>>>>>>>> inside the job directory that will be wiped by Jenkins, and I would expect
>>>>>>>>>>>> code to always create temp files via APIs that respect this. Is Jenkins not
>>>>>>>>>>>> cleaning up? Do we not have the ability to set this up? Do we have bugs in
>>>>>>>>>>>> our code (that we could probably find by setting TMPDIR to somewhere
>>>>>>>>>>>> not-/tmp and running the tests without write permission to /tmp, etc)
>>>>>>>>>>>>
>>>>>>>>>>>> Kenn
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <al...@google.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Related to workspace directory growth, +Udi Meiri
>>>>>>>>>>>>> <eh...@google.com> filed a relevant issue previously (
>>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for cleaning
>>>>>>>>>>>>> up workspace directory after successful jobs. Alternatively, we can
>>>>>>>>>>>>> consider periodically cleaning up the /src directories.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would suggest moving the cron task from internal cron
>>>>>>>>>>>>> scripts to the inventory job (
>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
>>>>>>>>>>>>> That way, we can see all the cron jobs as part of the source tree, adjust
>>>>>>>>>>>>> frequencies and clean up codes with PRs. I do not know how internal cron
>>>>>>>>>>>>> scripts are created, maintained, and how would they be recreated for new
>>>>>>>>>>>>> worker instances.
>>>>>>>>>>>>>
>>>>>>>>>>>>> /cc +Tyson Hamilton <ty...@google.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>>>>>>>>>>>>> damian.gadomski@polidea.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've recently created a solution for the growing /tmp
>>>>>>>>>>>>>> directory. Part of it is the job mentioned by Tyson:
>>>>>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not triggered
>>>>>>>>>>>>>> by cron and should be a last resort solution for some strange cases.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Along with that job, I've also updated every worker with an
>>>>>>>>>>>>>> internal cron script. It's being executed once a week and deletes all the
>>>>>>>>>>>>>> files (and only files) that were not accessed for at least three days.
>>>>>>>>>>>>>> That's designed to be as safe as possible for the running jobs on the
>>>>>>>>>>>>>> worker (not to delete the files that are still in use), and also to be
>>>>>>>>>>>>>> insensitive to the current workload on the machine. The cleanup will always
>>>>>>>>>>>>>> happen, even if some long-running/stuck jobs are blocking the machine.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I also think that currently the "No space left" errors may
>>>>>>>>>>>>>> be a consequence of growing workspace directory rather than /tmp. I didn't
>>>>>>>>>>>>>> do any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the
>>>>>>>>>>>>>> workspace directory size is 158 GB while /tmp is only 16 GB. We should
>>>>>>>>>>>>>> either guarantee the disk size to hold workspaces for all jobs (because
>>>>>>>>>>>>>> eventually, every worker will execute each job) or clear also the
>>>>>>>>>>>>>> workspaces in some way.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Damian
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <
>>>>>>>>>>>>>> mxm@apache.org> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +1 for scheduling it via a cron job if it won't lead to test
>>>>>>>>>>>>>>> failures
>>>>>>>>>>>>>>> while running. Not a Jenkins expert but maybe there is the
>>>>>>>>>>>>>>> notion of
>>>>>>>>>>>>>>> running exclusively while no other tasks are running?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Max
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>>>>>>>>>>>>> > FYI there was a job introduced to do this in Jenkins:
>>>>>>>>>>>>>>> beam_Clean_tmp_directory
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Currently it needs to be run manually. I'm seeing some out
>>>>>>>>>>>>>>> of disk related errors in precommit tests currently, perhaps we should
>>>>>>>>>>>>>>> schedule this job with cron?
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <he...@google.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> >> Still seeing no space left on device errors on jenkins-7
>>>>>>>>>>>>>>> (for example:
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <
>>>>>>>>>>>>>>> amyrvold@google.com> wrote:
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by jenkins
>>>>>>>>>>>>>>> older than 3 days.
>>>>>>>>>>>>>>> >>> Agree that we need a longer term solution.
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> Passing recent tests on all executors except jenkins-12,
>>>>>>>>>>>>>>> which has not
>>>>>>>>>>>>>>> >>> scheduled recent builds for the past 13 days. Not
>>>>>>>>>>>>>>> scheduling:
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >>> Recent passing builds:
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <
>>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>>> +Alan Myrvold <am...@google.com> is doing a one
>>>>>>>>>>>>>>> time cleanup. I agree
>>>>>>>>>>>>>>> >>>> that we need to have a solution to automate this task
>>>>>>>>>>>>>>> or address the root
>>>>>>>>>>>>>>> >>>> cause of the buildup.
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>>>>>>>>>>>>>>> michal.walenia@polidea.com>
>>>>>>>>>>>>>>> >>>> wrote:
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>>> Hi there,
>>>>>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins workers again.
>>>>>>>>>>>>>>> Nodes 1 and 7
>>>>>>>>>>>>>>> >>>>> both fail jobs with "No space left on device".
>>>>>>>>>>>>>>> >>>>> Who is the best person to contact in these cases
>>>>>>>>>>>>>>> (someone with access
>>>>>>>>>>>>>>> >>>>> permissions to the workers).
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> I also noticed that such errors are becoming more and
>>>>>>>>>>>>>>> more frequent
>>>>>>>>>>>>>>> >>>>> recently and I'd like to discuss how can this be
>>>>>>>>>>>>>>> remedied. Can a cleanup
>>>>>>>>>>>>>>> >>>>> task be automated on Jenkins somehow?
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> Regards
>>>>>>>>>>>>>>> >>>>> Michal
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> --
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> Michał Walenia
>>>>>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software Engineer
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <
>>>>>>>>>>>>>>> +48791432002 <+48%20791%20432%20002>>
>>>>>>>>>>>>>>> >>>>> E: michal.walenia@polidea.com
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> Unique Tech
>>>>>>>>>>>>>>> >>>>> Check out our projects! <
>>>>>>>>>>>>>>> https://www.polidea.com/our-work>
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Tyson Hamilton <ty...@google.com>.
Ya looks like something in the workspaces is taking up room:

@apache-ci-beam-jenkins-8:/home/jenkins$ sudo du -shc .
191G    .
191G    total


On Fri, Jul 24, 2020 at 11:44 AM Tyson Hamilton <ty...@google.com> wrote:

> Node 8 is also full. The partition that /tmp is on is here:
>
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/sda1       485G  482G  2.9G 100% /
>
> however after cleaning up tmp with the crontab command, there is only 8G
> usage yet it still remains 100% full:
>
> @apache-ci-beam-jenkins-8:/tmp$ sudo du -shc /tmp
> 8.0G    /tmp
> 8.0G    total
>
> The workspaces are in the /home/jenkins/jenkins-slave/workspace directory.
> When I run a du on that, it takes really long. I'll let it keep running for
> a while to see if it ever returns a result but so far this seems suspect.
>
>
>
>
>
> On Fri, Jul 24, 2020 at 11:19 AM Tyson Hamilton <ty...@google.com>
> wrote:
>
>> Everything I've been looking at is in the /tmp dir. Where are the
>> workspaces, or what are the named?
>>
>>
>>
>>
>> On Fri, Jul 24, 2020 at 11:03 AM Udi Meiri <eh...@google.com> wrote:
>>
>>> I'm curious to what you find. Was it /tmp or the workspaces using up the
>>> space?
>>>
>>> On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton <ty...@google.com>
>>> wrote:
>>>
>>>> Bleck. I just realized that it is 'offline' so that won't work. I'll
>>>> clean up manually on the machine using the cron command.
>>>>
>>>> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <ty...@google.com>
>>>> wrote:
>>>>
>>>>> Something isn't working with the current set up because node 15
>>>>> appears to be out of space and is currently 'offline' according to Jenkins.
>>>>> Can someone run the cleanup job? The machine is full,
>>>>>
>>>>> @apache-ci-beam-jenkins-15:/tmp$ df -h
>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>> udev             52G     0   52G   0% /dev
>>>>> tmpfs            11G  265M   10G   3% /run
>>>>> */dev/sda1       485G  484G  880M 100% /*
>>>>> tmpfs            52G     0   52G   0% /dev/shm
>>>>> tmpfs           5.0M     0  5.0M   0% /run/lock
>>>>> tmpfs            52G     0   52G   0% /sys/fs/cgroup
>>>>> tmpfs            11G     0   11G   0% /run/user/1017
>>>>> tmpfs            11G     0   11G   0% /run/user/1037
>>>>>
>>>>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
>>>>> head -n 20
>>>>> 20G     2020-07-24 17:52        .
>>>>> 580M    2020-07-22 17:31        ./junit1031982597110125586
>>>>> 517M    2020-07-22 17:31
>>>>>  ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
>>>>> 517M    2020-07-22 17:31
>>>>>  ./junit1031982597110125586/junit8739924829337821410
>>>>> 263M    2020-07-22 12:23        ./pip-install-2GUhO_
>>>>> 263M    2020-07-20 09:30        ./pip-install-sxgwqr
>>>>> 263M    2020-07-17 13:56        ./pip-install-bWSKIV
>>>>> 242M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T
>>>>> 242M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK
>>>>> 242M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ
>>>>> 236M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T/tmpOWj3Yr
>>>>> 236M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK/tmppbQHB3
>>>>> 236M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ/tmpgOXPKW
>>>>> 111M    2020-07-23 00:57        ./pip-install-1JnyNE
>>>>> 105M    2020-07-23 00:17        ./beam-artifact1374651823280819755
>>>>> 105M    2020-07-23 00:16        ./beam-artifact5050755582921936972
>>>>> 105M    2020-07-23 00:16        ./beam-artifact1834064452502646289
>>>>> 105M    2020-07-23 00:15        ./beam-artifact682561790267074916
>>>>> 105M    2020-07-23 00:15        ./beam-artifact4691304965824489394
>>>>> 105M    2020-07-23 00:14        ./beam-artifact4050383819822604421
>>>>>
>>>>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <ro...@google.com>
>>>>> wrote:
>>>>>
>>>>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <ty...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Ah I see, thanks Kenn. I found some advice from the Apache infra
>>>>>>> wiki that also suggests using a tmpdir inside the workspace [1]:
>>>>>>>
>>>>>>> Procedures Projects can take to clean up disk space
>>>>>>>
>>>>>>> Projects can help themselves and Infra by taking some basic steps to
>>>>>>> help clean up their jobs after themselves on the build nodes.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>    1. Use a ./tmp dir in your jobs workspace. That way it gets
>>>>>>>    cleaned up when job workspaces expire.
>>>>>>>
>>>>>>>
>>>>>> Tests should be (able to be) written to use the standard temporary
>>>>>> file mechanisms, and the environment set up on Jenkins such that that falls
>>>>>> into the respective workspaces. Ideally this should be as simple as setting
>>>>>> the TMPDIR (or similar) environment variable (and making sure it exists/is
>>>>>> writable).
>>>>>>
>>>>>>>
>>>>>>>    1. Configure your jobs to wipe workspaces on start or finish.
>>>>>>>    2. Configure your jobs to only keep 5 or 10 previous builds.
>>>>>>>    3. Configure your jobs to only keep 5 or 10 previous artifacts.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [1]:
>>>>>>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
>>>>>>>
>>>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <ke...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Those file listings look like the result of using standard temp
>>>>>>>> file APIs but with TMPDIR set to /tmp.
>>>>>>>>
>>>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <ty...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Jobs are hermetic as far as I can tell and use unique
>>>>>>>>> subdirectories inside of /tmp. Here is a quick look into two examples:
>>>>>>>>>
>>>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort -rhk
>>>>>>>>> 1,1 | head -n 20
>>>>>>>>> 1.6G    2020-07-21 02:25        .
>>>>>>>>> 242M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4
>>>>>>>>> 242M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT
>>>>>>>>> 242M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME
>>>>>>>>> 242M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB
>>>>>>>>> 242M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q
>>>>>>>>> 242M    2020-07-17 18:35        ./beam-pipeline-temp79qot2
>>>>>>>>> 236M    2020-07-17 18:48
>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>>>>>>>>> 236M    2020-07-17 18:46
>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>>>>>>>>> 236M    2020-07-17 18:44
>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>>>>>>>>> 236M    2020-07-17 18:42
>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>>>>>>>>> 236M    2020-07-17 18:39
>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmpWy1vWX
>>>>>>>>> 236M    2020-07-17 18:35
>>>>>>>>>  ./beam-pipeline-temp79qot2/tmpvN7vWA
>>>>>>>>> 3.7M    2020-07-17 18:48
>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmprlh_di
>>>>>>>>> 3.7M    2020-07-17 18:46
>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>>>>>>>>> 3.7M    2020-07-17 18:44
>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>>>>>>>>> 3.7M    2020-07-17 18:42
>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>>>>>>>>> 3.7M    2020-07-17 18:39
>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmptYF1v1
>>>>>>>>> 3.7M    2020-07-17 18:35
>>>>>>>>>  ./beam-pipeline-temp79qot2/tmplfV0Rg
>>>>>>>>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort -rhk
>>>>>>>>> 1,1 | head -n 20
>>>>>>>>> 817M    2020-07-21 02:26        .
>>>>>>>>> 242M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM
>>>>>>>>> 242M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3
>>>>>>>>> 242M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq
>>>>>>>>> 236M    2020-07-19 12:14
>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpstXoL0
>>>>>>>>> 236M    2020-07-19 12:11
>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpnnVn65
>>>>>>>>> 236M    2020-07-19 12:05
>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpRF0iNs
>>>>>>>>> 3.7M    2020-07-19 12:14
>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>>>>>>>>> 3.7M    2020-07-19 12:11
>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpsmmzqe
>>>>>>>>> 3.7M    2020-07-19 12:05
>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>>>>>>>>> 2.0M    2020-07-19 12:14
>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpoj3orz
>>>>>>>>> 2.0M    2020-07-19 12:11
>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmptng9sZ
>>>>>>>>> 2.0M    2020-07-19 12:05
>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpWp6njc
>>>>>>>>> 1.2M    2020-07-19 12:14
>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmphgdj35
>>>>>>>>> 1.2M    2020-07-19 12:11
>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmp8ySXpm
>>>>>>>>> 1.2M    2020-07-19 12:05
>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>>>>>>>>> 992K    2020-07-12 12:00        ./junit642086915811430564
>>>>>>>>> 988K    2020-07-12 12:00        ./junit642086915811430564/beam
>>>>>>>>> 984K    2020-07-12 12:00
>>>>>>>>>  ./junit642086915811430564/beam/nodes
>>>>>>>>> 980K    2020-07-12 12:00
>>>>>>>>>  ./junit642086915811430564/beam/nodes/0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <eh...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> You're right, job workspaces should be hermetic.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <ke...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I'm probably late to this discussion and missing something, but
>>>>>>>>>>> why are we writing to /tmp at all? I would expect TMPDIR to point somewhere
>>>>>>>>>>> inside the job directory that will be wiped by Jenkins, and I would expect
>>>>>>>>>>> code to always create temp files via APIs that respect this. Is Jenkins not
>>>>>>>>>>> cleaning up? Do we not have the ability to set this up? Do we have bugs in
>>>>>>>>>>> our code (that we could probably find by setting TMPDIR to somewhere
>>>>>>>>>>> not-/tmp and running the tests without write permission to /tmp, etc)
>>>>>>>>>>>
>>>>>>>>>>> Kenn
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <al...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Related to workspace directory growth, +Udi Meiri
>>>>>>>>>>>> <eh...@google.com> filed a relevant issue previously (
>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for cleaning
>>>>>>>>>>>> up workspace directory after successful jobs. Alternatively, we can
>>>>>>>>>>>> consider periodically cleaning up the /src directories.
>>>>>>>>>>>>
>>>>>>>>>>>> I would suggest moving the cron task from internal cron scripts
>>>>>>>>>>>> to the inventory job (
>>>>>>>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
>>>>>>>>>>>> That way, we can see all the cron jobs as part of the source tree, adjust
>>>>>>>>>>>> frequencies and clean up codes with PRs. I do not know how internal cron
>>>>>>>>>>>> scripts are created, maintained, and how would they be recreated for new
>>>>>>>>>>>> worker instances.
>>>>>>>>>>>>
>>>>>>>>>>>> /cc +Tyson Hamilton <ty...@google.com>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>>>>>>>>>>>> damian.gadomski@polidea.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've recently created a solution for the growing /tmp
>>>>>>>>>>>>> directory. Part of it is the job mentioned by Tyson:
>>>>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not triggered
>>>>>>>>>>>>> by cron and should be a last resort solution for some strange cases.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Along with that job, I've also updated every worker with an
>>>>>>>>>>>>> internal cron script. It's being executed once a week and deletes all the
>>>>>>>>>>>>> files (and only files) that were not accessed for at least three days.
>>>>>>>>>>>>> That's designed to be as safe as possible for the running jobs on the
>>>>>>>>>>>>> worker (not to delete the files that are still in use), and also to be
>>>>>>>>>>>>> insensitive to the current workload on the machine. The cleanup will always
>>>>>>>>>>>>> happen, even if some long-running/stuck jobs are blocking the machine.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I also think that currently the "No space left" errors may be
>>>>>>>>>>>>> a consequence of growing workspace directory rather than /tmp. I didn't do
>>>>>>>>>>>>> any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the
>>>>>>>>>>>>> workspace directory size is 158 GB while /tmp is only 16 GB. We should
>>>>>>>>>>>>> either guarantee the disk size to hold workspaces for all jobs (because
>>>>>>>>>>>>> eventually, every worker will execute each job) or clear also the
>>>>>>>>>>>>> workspaces in some way.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Damian
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <
>>>>>>>>>>>>> mxm@apache.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> +1 for scheduling it via a cron job if it won't lead to test
>>>>>>>>>>>>>> failures
>>>>>>>>>>>>>> while running. Not a Jenkins expert but maybe there is the
>>>>>>>>>>>>>> notion of
>>>>>>>>>>>>>> running exclusively while no other tasks are running?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Max
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>>>>>>>>>>>> > FYI there was a job introduced to do this in Jenkins:
>>>>>>>>>>>>>> beam_Clean_tmp_directory
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Currently it needs to be run manually. I'm seeing some out
>>>>>>>>>>>>>> of disk related errors in precommit tests currently, perhaps we should
>>>>>>>>>>>>>> schedule this job with cron?
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <he...@google.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> >> Still seeing no space left on device errors on jenkins-7
>>>>>>>>>>>>>> (for example:
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>>>>>>>>>>>>>> )
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <
>>>>>>>>>>>>>> amyrvold@google.com> wrote:
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by jenkins
>>>>>>>>>>>>>> older than 3 days.
>>>>>>>>>>>>>> >>> Agree that we need a longer term solution.
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>> Passing recent tests on all executors except jenkins-12,
>>>>>>>>>>>>>> which has not
>>>>>>>>>>>>>> >>> scheduled recent builds for the past 13 days. Not
>>>>>>>>>>>>>> scheduling:
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >>> Recent passing builds:
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <
>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>>> +Alan Myrvold <am...@google.com> is doing a one time
>>>>>>>>>>>>>> cleanup. I agree
>>>>>>>>>>>>>> >>>> that we need to have a solution to automate this task or
>>>>>>>>>>>>>> address the root
>>>>>>>>>>>>>> >>>> cause of the buildup.
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>>>>>>>>>>>>>> michal.walenia@polidea.com>
>>>>>>>>>>>>>> >>>> wrote:
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>>> Hi there,
>>>>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins workers again.
>>>>>>>>>>>>>> Nodes 1 and 7
>>>>>>>>>>>>>> >>>>> both fail jobs with "No space left on device".
>>>>>>>>>>>>>> >>>>> Who is the best person to contact in these cases
>>>>>>>>>>>>>> (someone with access
>>>>>>>>>>>>>> >>>>> permissions to the workers).
>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>> >>>>> I also noticed that such errors are becoming more and
>>>>>>>>>>>>>> more frequent
>>>>>>>>>>>>>> >>>>> recently and I'd like to discuss how can this be
>>>>>>>>>>>>>> remedied. Can a cleanup
>>>>>>>>>>>>>> >>>>> task be automated on Jenkins somehow?
>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>> >>>>> Regards
>>>>>>>>>>>>>> >>>>> Michal
>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>> >>>>> --
>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>> >>>>> Michał Walenia
>>>>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software Engineer
>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <
>>>>>>>>>>>>>> +48791432002 <+48%20791%20432%20002>>
>>>>>>>>>>>>>> >>>>> E: michal.walenia@polidea.com
>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>> >>>>> Unique Tech
>>>>>>>>>>>>>> >>>>> Check out our projects! <
>>>>>>>>>>>>>> https://www.polidea.com/our-work>
>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Tyson Hamilton <ty...@google.com>.
Node 8 is also full. The partition that /tmp is on is here:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       485G  482G  2.9G 100% /

however after cleaning up tmp with the crontab command, there is only 8G
usage yet it still remains 100% full:

@apache-ci-beam-jenkins-8:/tmp$ sudo du -shc /tmp
8.0G    /tmp
8.0G    total

The workspaces are in the /home/jenkins/jenkins-slave/workspace directory.
When I run a du on that, it takes really long. I'll let it keep running for
a while to see if it ever returns a result but so far this seems suspect.





On Fri, Jul 24, 2020 at 11:19 AM Tyson Hamilton <ty...@google.com> wrote:

> Everything I've been looking at is in the /tmp dir. Where are the
> workspaces, or what are the named?
>
>
>
>
> On Fri, Jul 24, 2020 at 11:03 AM Udi Meiri <eh...@google.com> wrote:
>
>> I'm curious to what you find. Was it /tmp or the workspaces using up the
>> space?
>>
>> On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton <ty...@google.com>
>> wrote:
>>
>>> Bleck. I just realized that it is 'offline' so that won't work. I'll
>>> clean up manually on the machine using the cron command.
>>>
>>> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <ty...@google.com>
>>> wrote:
>>>
>>>> Something isn't working with the current set up because node 15 appears
>>>> to be out of space and is currently 'offline' according to Jenkins. Can
>>>> someone run the cleanup job? The machine is full,
>>>>
>>>> @apache-ci-beam-jenkins-15:/tmp$ df -h
>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>> udev             52G     0   52G   0% /dev
>>>> tmpfs            11G  265M   10G   3% /run
>>>> */dev/sda1       485G  484G  880M 100% /*
>>>> tmpfs            52G     0   52G   0% /dev/shm
>>>> tmpfs           5.0M     0  5.0M   0% /run/lock
>>>> tmpfs            52G     0   52G   0% /sys/fs/cgroup
>>>> tmpfs            11G     0   11G   0% /run/user/1017
>>>> tmpfs            11G     0   11G   0% /run/user/1037
>>>>
>>>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
>>>> head -n 20
>>>> 20G     2020-07-24 17:52        .
>>>> 580M    2020-07-22 17:31        ./junit1031982597110125586
>>>> 517M    2020-07-22 17:31
>>>>  ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
>>>> 517M    2020-07-22 17:31
>>>>  ./junit1031982597110125586/junit8739924829337821410
>>>> 263M    2020-07-22 12:23        ./pip-install-2GUhO_
>>>> 263M    2020-07-20 09:30        ./pip-install-sxgwqr
>>>> 263M    2020-07-17 13:56        ./pip-install-bWSKIV
>>>> 242M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T
>>>> 242M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK
>>>> 242M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ
>>>> 236M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T/tmpOWj3Yr
>>>> 236M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK/tmppbQHB3
>>>> 236M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ/tmpgOXPKW
>>>> 111M    2020-07-23 00:57        ./pip-install-1JnyNE
>>>> 105M    2020-07-23 00:17        ./beam-artifact1374651823280819755
>>>> 105M    2020-07-23 00:16        ./beam-artifact5050755582921936972
>>>> 105M    2020-07-23 00:16        ./beam-artifact1834064452502646289
>>>> 105M    2020-07-23 00:15        ./beam-artifact682561790267074916
>>>> 105M    2020-07-23 00:15        ./beam-artifact4691304965824489394
>>>> 105M    2020-07-23 00:14        ./beam-artifact4050383819822604421
>>>>
>>>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <ro...@google.com>
>>>> wrote:
>>>>
>>>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <ty...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Ah I see, thanks Kenn. I found some advice from the Apache infra wiki
>>>>>> that also suggests using a tmpdir inside the workspace [1]:
>>>>>>
>>>>>> Procedures Projects can take to clean up disk space
>>>>>>
>>>>>> Projects can help themselves and Infra by taking some basic steps to
>>>>>> help clean up their jobs after themselves on the build nodes.
>>>>>>
>>>>>>
>>>>>>
>>>>>>    1. Use a ./tmp dir in your jobs workspace. That way it gets
>>>>>>    cleaned up when job workspaces expire.
>>>>>>
>>>>>>
>>>>> Tests should be (able to be) written to use the standard temporary
>>>>> file mechanisms, and the environment set up on Jenkins such that that falls
>>>>> into the respective workspaces. Ideally this should be as simple as setting
>>>>> the TMPDIR (or similar) environment variable (and making sure it exists/is
>>>>> writable).
>>>>>
>>>>>>
>>>>>>    1. Configure your jobs to wipe workspaces on start or finish.
>>>>>>    2. Configure your jobs to only keep 5 or 10 previous builds.
>>>>>>    3. Configure your jobs to only keep 5 or 10 previous artifacts.
>>>>>>
>>>>>>
>>>>>>
>>>>>> [1]:
>>>>>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
>>>>>>
>>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <ke...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Those file listings look like the result of using standard temp file
>>>>>>> APIs but with TMPDIR set to /tmp.
>>>>>>>
>>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <ty...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Jobs are hermetic as far as I can tell and use unique
>>>>>>>> subdirectories inside of /tmp. Here is a quick look into two examples:
>>>>>>>>
>>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort -rhk
>>>>>>>> 1,1 | head -n 20
>>>>>>>> 1.6G    2020-07-21 02:25        .
>>>>>>>> 242M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4
>>>>>>>> 242M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT
>>>>>>>> 242M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME
>>>>>>>> 242M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB
>>>>>>>> 242M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q
>>>>>>>> 242M    2020-07-17 18:35        ./beam-pipeline-temp79qot2
>>>>>>>> 236M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>>>>>>>> 236M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>>>>>>>> 236M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>>>>>>>> 236M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>>>>>>>> 236M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q/tmpWy1vWX
>>>>>>>> 236M    2020-07-17 18:35        ./beam-pipeline-temp79qot2/tmpvN7vWA
>>>>>>>> 3.7M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4/tmprlh_di
>>>>>>>> 3.7M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>>>>>>>> 3.7M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>>>>>>>> 3.7M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>>>>>>>> 3.7M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q/tmptYF1v1
>>>>>>>> 3.7M    2020-07-17 18:35        ./beam-pipeline-temp79qot2/tmplfV0Rg
>>>>>>>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
>>>>>>>>
>>>>>>>>
>>>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort -rhk
>>>>>>>> 1,1 | head -n 20
>>>>>>>> 817M    2020-07-21 02:26        .
>>>>>>>> 242M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM
>>>>>>>> 242M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3
>>>>>>>> 242M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq
>>>>>>>> 236M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpstXoL0
>>>>>>>> 236M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmpnnVn65
>>>>>>>> 236M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpRF0iNs
>>>>>>>> 3.7M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>>>>>>>> 3.7M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmpsmmzqe
>>>>>>>> 3.7M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>>>>>>>> 2.0M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpoj3orz
>>>>>>>> 2.0M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmptng9sZ
>>>>>>>> 2.0M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpWp6njc
>>>>>>>> 1.2M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmphgdj35
>>>>>>>> 1.2M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmp8ySXpm
>>>>>>>> 1.2M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>>>>>>>> 992K    2020-07-12 12:00        ./junit642086915811430564
>>>>>>>> 988K    2020-07-12 12:00        ./junit642086915811430564/beam
>>>>>>>> 984K    2020-07-12 12:00        ./junit642086915811430564/beam/nodes
>>>>>>>> 980K    2020-07-12 12:00
>>>>>>>>  ./junit642086915811430564/beam/nodes/0
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <eh...@google.com> wrote:
>>>>>>>>
>>>>>>>>> You're right, job workspaces should be hermetic.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <ke...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I'm probably late to this discussion and missing something, but
>>>>>>>>>> why are we writing to /tmp at all? I would expect TMPDIR to point somewhere
>>>>>>>>>> inside the job directory that will be wiped by Jenkins, and I would expect
>>>>>>>>>> code to always create temp files via APIs that respect this. Is Jenkins not
>>>>>>>>>> cleaning up? Do we not have the ability to set this up? Do we have bugs in
>>>>>>>>>> our code (that we could probably find by setting TMPDIR to somewhere
>>>>>>>>>> not-/tmp and running the tests without write permission to /tmp, etc)
>>>>>>>>>>
>>>>>>>>>> Kenn
>>>>>>>>>>
>>>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <al...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Related to workspace directory growth, +Udi Meiri
>>>>>>>>>>> <eh...@google.com> filed a relevant issue previously (
>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for cleaning
>>>>>>>>>>> up workspace directory after successful jobs. Alternatively, we can
>>>>>>>>>>> consider periodically cleaning up the /src directories.
>>>>>>>>>>>
>>>>>>>>>>> I would suggest moving the cron task from internal cron scripts
>>>>>>>>>>> to the inventory job (
>>>>>>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
>>>>>>>>>>> That way, we can see all the cron jobs as part of the source tree, adjust
>>>>>>>>>>> frequencies and clean up codes with PRs. I do not know how internal cron
>>>>>>>>>>> scripts are created, maintained, and how would they be recreated for new
>>>>>>>>>>> worker instances.
>>>>>>>>>>>
>>>>>>>>>>> /cc +Tyson Hamilton <ty...@google.com>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>>>>>>>>>>> damian.gadomski@polidea.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hey,
>>>>>>>>>>>>
>>>>>>>>>>>> I've recently created a solution for the growing /tmp
>>>>>>>>>>>> directory. Part of it is the job mentioned by Tyson:
>>>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not triggered
>>>>>>>>>>>> by cron and should be a last resort solution for some strange cases.
>>>>>>>>>>>>
>>>>>>>>>>>> Along with that job, I've also updated every worker with an
>>>>>>>>>>>> internal cron script. It's being executed once a week and deletes all the
>>>>>>>>>>>> files (and only files) that were not accessed for at least three days.
>>>>>>>>>>>> That's designed to be as safe as possible for the running jobs on the
>>>>>>>>>>>> worker (not to delete the files that are still in use), and also to be
>>>>>>>>>>>> insensitive to the current workload on the machine. The cleanup will always
>>>>>>>>>>>> happen, even if some long-running/stuck jobs are blocking the machine.
>>>>>>>>>>>>
>>>>>>>>>>>> I also think that currently the "No space left" errors may be
>>>>>>>>>>>> a consequence of growing workspace directory rather than /tmp. I didn't do
>>>>>>>>>>>> any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the
>>>>>>>>>>>> workspace directory size is 158 GB while /tmp is only 16 GB. We should
>>>>>>>>>>>> either guarantee the disk size to hold workspaces for all jobs (because
>>>>>>>>>>>> eventually, every worker will execute each job) or clear also the
>>>>>>>>>>>> workspaces in some way.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Damian
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <
>>>>>>>>>>>> mxm@apache.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> +1 for scheduling it via a cron job if it won't lead to test
>>>>>>>>>>>>> failures
>>>>>>>>>>>>> while running. Not a Jenkins expert but maybe there is the
>>>>>>>>>>>>> notion of
>>>>>>>>>>>>> running exclusively while no other tasks are running?
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Max
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>>>>>>>>>>> > FYI there was a job introduced to do this in Jenkins:
>>>>>>>>>>>>> beam_Clean_tmp_directory
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Currently it needs to be run manually. I'm seeing some out
>>>>>>>>>>>>> of disk related errors in precommit tests currently, perhaps we should
>>>>>>>>>>>>> schedule this job with cron?
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <he...@google.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> >> Still seeing no space left on device errors on jenkins-7
>>>>>>>>>>>>> (for example:
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>>>>>>>>>>>>> )
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <
>>>>>>>>>>>>> amyrvold@google.com> wrote:
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by jenkins older
>>>>>>>>>>>>> than 3 days.
>>>>>>>>>>>>> >>> Agree that we need a longer term solution.
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> >>> Passing recent tests on all executors except jenkins-12,
>>>>>>>>>>>>> which has not
>>>>>>>>>>>>> >>> scheduled recent builds for the past 13 days. Not
>>>>>>>>>>>>> scheduling:
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>> Recent passing builds:
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <
>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> >>>> +Alan Myrvold <am...@google.com> is doing a one time
>>>>>>>>>>>>> cleanup. I agree
>>>>>>>>>>>>> >>>> that we need to have a solution to automate this task or
>>>>>>>>>>>>> address the root
>>>>>>>>>>>>> >>>> cause of the buildup.
>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>>>>>>>>>>>>> michal.walenia@polidea.com>
>>>>>>>>>>>>> >>>> wrote:
>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>> >>>>> Hi there,
>>>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins workers again.
>>>>>>>>>>>>> Nodes 1 and 7
>>>>>>>>>>>>> >>>>> both fail jobs with "No space left on device".
>>>>>>>>>>>>> >>>>> Who is the best person to contact in these cases
>>>>>>>>>>>>> (someone with access
>>>>>>>>>>>>> >>>>> permissions to the workers).
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> I also noticed that such errors are becoming more and
>>>>>>>>>>>>> more frequent
>>>>>>>>>>>>> >>>>> recently and I'd like to discuss how can this be
>>>>>>>>>>>>> remedied. Can a cleanup
>>>>>>>>>>>>> >>>>> task be automated on Jenkins somehow?
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> Regards
>>>>>>>>>>>>> >>>>> Michal
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> --
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> Michał Walenia
>>>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software Engineer
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <+48791432002
>>>>>>>>>>>>> <+48%20791%20432%20002>>
>>>>>>>>>>>>> >>>>> E: michal.walenia@polidea.com
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> Unique Tech
>>>>>>>>>>>>> >>>>> Check out our projects! <
>>>>>>>>>>>>> https://www.polidea.com/our-work>
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>> >>
>>>>>>>>>>>>>
>>>>>>>>>>>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Udi Meiri <eh...@google.com>.
In the "jenkins" user home directory.

On Fri, Jul 24, 2020, 11:19 Tyson Hamilton <ty...@google.com> wrote:

> Everything I've been looking at is in the /tmp dir. Where are the
> workspaces, or what are the named?
>
>
>
>
> On Fri, Jul 24, 2020 at 11:03 AM Udi Meiri <eh...@google.com> wrote:
>
>> I'm curious to what you find. Was it /tmp or the workspaces using up the
>> space?
>>
>> On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton <ty...@google.com>
>> wrote:
>>
>>> Bleck. I just realized that it is 'offline' so that won't work. I'll
>>> clean up manually on the machine using the cron command.
>>>
>>> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <ty...@google.com>
>>> wrote:
>>>
>>>> Something isn't working with the current set up because node 15 appears
>>>> to be out of space and is currently 'offline' according to Jenkins. Can
>>>> someone run the cleanup job? The machine is full,
>>>>
>>>> @apache-ci-beam-jenkins-15:/tmp$ df -h
>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>> udev             52G     0   52G   0% /dev
>>>> tmpfs            11G  265M   10G   3% /run
>>>> */dev/sda1       485G  484G  880M 100% /*
>>>> tmpfs            52G     0   52G   0% /dev/shm
>>>> tmpfs           5.0M     0  5.0M   0% /run/lock
>>>> tmpfs            52G     0   52G   0% /sys/fs/cgroup
>>>> tmpfs            11G     0   11G   0% /run/user/1017
>>>> tmpfs            11G     0   11G   0% /run/user/1037
>>>>
>>>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
>>>> head -n 20
>>>> 20G     2020-07-24 17:52        .
>>>> 580M    2020-07-22 17:31        ./junit1031982597110125586
>>>> 517M    2020-07-22 17:31
>>>>  ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
>>>> 517M    2020-07-22 17:31
>>>>  ./junit1031982597110125586/junit8739924829337821410
>>>> 263M    2020-07-22 12:23        ./pip-install-2GUhO_
>>>> 263M    2020-07-20 09:30        ./pip-install-sxgwqr
>>>> 263M    2020-07-17 13:56        ./pip-install-bWSKIV
>>>> 242M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T
>>>> 242M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK
>>>> 242M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ
>>>> 236M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T/tmpOWj3Yr
>>>> 236M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK/tmppbQHB3
>>>> 236M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ/tmpgOXPKW
>>>> 111M    2020-07-23 00:57        ./pip-install-1JnyNE
>>>> 105M    2020-07-23 00:17        ./beam-artifact1374651823280819755
>>>> 105M    2020-07-23 00:16        ./beam-artifact5050755582921936972
>>>> 105M    2020-07-23 00:16        ./beam-artifact1834064452502646289
>>>> 105M    2020-07-23 00:15        ./beam-artifact682561790267074916
>>>> 105M    2020-07-23 00:15        ./beam-artifact4691304965824489394
>>>> 105M    2020-07-23 00:14        ./beam-artifact4050383819822604421
>>>>
>>>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <ro...@google.com>
>>>> wrote:
>>>>
>>>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <ty...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Ah I see, thanks Kenn. I found some advice from the Apache infra wiki
>>>>>> that also suggests using a tmpdir inside the workspace [1]:
>>>>>>
>>>>>> Procedures Projects can take to clean up disk space
>>>>>>
>>>>>> Projects can help themselves and Infra by taking some basic steps to
>>>>>> help clean up their jobs after themselves on the build nodes.
>>>>>>
>>>>>>
>>>>>>
>>>>>>    1. Use a ./tmp dir in your jobs workspace. That way it gets
>>>>>>    cleaned up when job workspaces expire.
>>>>>>
>>>>>>
>>>>> Tests should be (able to be) written to use the standard temporary
>>>>> file mechanisms, and the environment set up on Jenkins such that that falls
>>>>> into the respective workspaces. Ideally this should be as simple as setting
>>>>> the TMPDIR (or similar) environment variable (and making sure it exists/is
>>>>> writable).
>>>>>
>>>>>>
>>>>>>    1. Configure your jobs to wipe workspaces on start or finish.
>>>>>>    2. Configure your jobs to only keep 5 or 10 previous builds.
>>>>>>    3. Configure your jobs to only keep 5 or 10 previous artifacts.
>>>>>>
>>>>>>
>>>>>>
>>>>>> [1]:
>>>>>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
>>>>>>
>>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <ke...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Those file listings look like the result of using standard temp file
>>>>>>> APIs but with TMPDIR set to /tmp.
>>>>>>>
>>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <ty...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Jobs are hermetic as far as I can tell and use unique
>>>>>>>> subdirectories inside of /tmp. Here is a quick look into two examples:
>>>>>>>>
>>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort -rhk
>>>>>>>> 1,1 | head -n 20
>>>>>>>> 1.6G    2020-07-21 02:25        .
>>>>>>>> 242M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4
>>>>>>>> 242M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT
>>>>>>>> 242M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME
>>>>>>>> 242M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB
>>>>>>>> 242M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q
>>>>>>>> 242M    2020-07-17 18:35        ./beam-pipeline-temp79qot2
>>>>>>>> 236M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>>>>>>>> 236M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>>>>>>>> 236M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>>>>>>>> 236M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>>>>>>>> 236M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q/tmpWy1vWX
>>>>>>>> 236M    2020-07-17 18:35        ./beam-pipeline-temp79qot2/tmpvN7vWA
>>>>>>>> 3.7M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4/tmprlh_di
>>>>>>>> 3.7M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>>>>>>>> 3.7M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>>>>>>>> 3.7M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>>>>>>>> 3.7M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q/tmptYF1v1
>>>>>>>> 3.7M    2020-07-17 18:35        ./beam-pipeline-temp79qot2/tmplfV0Rg
>>>>>>>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
>>>>>>>>
>>>>>>>>
>>>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort -rhk
>>>>>>>> 1,1 | head -n 20
>>>>>>>> 817M    2020-07-21 02:26        .
>>>>>>>> 242M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM
>>>>>>>> 242M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3
>>>>>>>> 242M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq
>>>>>>>> 236M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpstXoL0
>>>>>>>> 236M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmpnnVn65
>>>>>>>> 236M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpRF0iNs
>>>>>>>> 3.7M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>>>>>>>> 3.7M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmpsmmzqe
>>>>>>>> 3.7M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>>>>>>>> 2.0M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpoj3orz
>>>>>>>> 2.0M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmptng9sZ
>>>>>>>> 2.0M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpWp6njc
>>>>>>>> 1.2M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmphgdj35
>>>>>>>> 1.2M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmp8ySXpm
>>>>>>>> 1.2M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>>>>>>>> 992K    2020-07-12 12:00        ./junit642086915811430564
>>>>>>>> 988K    2020-07-12 12:00        ./junit642086915811430564/beam
>>>>>>>> 984K    2020-07-12 12:00        ./junit642086915811430564/beam/nodes
>>>>>>>> 980K    2020-07-12 12:00
>>>>>>>>  ./junit642086915811430564/beam/nodes/0
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <eh...@google.com> wrote:
>>>>>>>>
>>>>>>>>> You're right, job workspaces should be hermetic.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <ke...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I'm probably late to this discussion and missing something, but
>>>>>>>>>> why are we writing to /tmp at all? I would expect TMPDIR to point somewhere
>>>>>>>>>> inside the job directory that will be wiped by Jenkins, and I would expect
>>>>>>>>>> code to always create temp files via APIs that respect this. Is Jenkins not
>>>>>>>>>> cleaning up? Do we not have the ability to set this up? Do we have bugs in
>>>>>>>>>> our code (that we could probably find by setting TMPDIR to somewhere
>>>>>>>>>> not-/tmp and running the tests without write permission to /tmp, etc)
>>>>>>>>>>
>>>>>>>>>> Kenn
>>>>>>>>>>
>>>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <al...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Related to workspace directory growth, +Udi Meiri
>>>>>>>>>>> <eh...@google.com> filed a relevant issue previously (
>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for cleaning
>>>>>>>>>>> up workspace directory after successful jobs. Alternatively, we can
>>>>>>>>>>> consider periodically cleaning up the /src directories.
>>>>>>>>>>>
>>>>>>>>>>> I would suggest moving the cron task from internal cron scripts
>>>>>>>>>>> to the inventory job (
>>>>>>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
>>>>>>>>>>> That way, we can see all the cron jobs as part of the source tree, adjust
>>>>>>>>>>> frequencies and clean up codes with PRs. I do not know how internal cron
>>>>>>>>>>> scripts are created, maintained, and how would they be recreated for new
>>>>>>>>>>> worker instances.
>>>>>>>>>>>
>>>>>>>>>>> /cc +Tyson Hamilton <ty...@google.com>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>>>>>>>>>>> damian.gadomski@polidea.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hey,
>>>>>>>>>>>>
>>>>>>>>>>>> I've recently created a solution for the growing /tmp
>>>>>>>>>>>> directory. Part of it is the job mentioned by Tyson:
>>>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not triggered
>>>>>>>>>>>> by cron and should be a last resort solution for some strange cases.
>>>>>>>>>>>>
>>>>>>>>>>>> Along with that job, I've also updated every worker with an
>>>>>>>>>>>> internal cron script. It's being executed once a week and deletes all the
>>>>>>>>>>>> files (and only files) that were not accessed for at least three days.
>>>>>>>>>>>> That's designed to be as safe as possible for the running jobs on the
>>>>>>>>>>>> worker (not to delete the files that are still in use), and also to be
>>>>>>>>>>>> insensitive to the current workload on the machine. The cleanup will always
>>>>>>>>>>>> happen, even if some long-running/stuck jobs are blocking the machine.
>>>>>>>>>>>>
>>>>>>>>>>>> I also think that currently the "No space left" errors may be
>>>>>>>>>>>> a consequence of growing workspace directory rather than /tmp. I didn't do
>>>>>>>>>>>> any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the
>>>>>>>>>>>> workspace directory size is 158 GB while /tmp is only 16 GB. We should
>>>>>>>>>>>> either guarantee the disk size to hold workspaces for all jobs (because
>>>>>>>>>>>> eventually, every worker will execute each job) or clear also the
>>>>>>>>>>>> workspaces in some way.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Damian
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <
>>>>>>>>>>>> mxm@apache.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> +1 for scheduling it via a cron job if it won't lead to test
>>>>>>>>>>>>> failures
>>>>>>>>>>>>> while running. Not a Jenkins expert but maybe there is the
>>>>>>>>>>>>> notion of
>>>>>>>>>>>>> running exclusively while no other tasks are running?
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Max
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>>>>>>>>>>> > FYI there was a job introduced to do this in Jenkins:
>>>>>>>>>>>>> beam_Clean_tmp_directory
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Currently it needs to be run manually. I'm seeing some out
>>>>>>>>>>>>> of disk related errors in precommit tests currently, perhaps we should
>>>>>>>>>>>>> schedule this job with cron?
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <he...@google.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> >> Still seeing no space left on device errors on jenkins-7
>>>>>>>>>>>>> (for example:
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>>>>>>>>>>>>> )
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <
>>>>>>>>>>>>> amyrvold@google.com> wrote:
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by jenkins older
>>>>>>>>>>>>> than 3 days.
>>>>>>>>>>>>> >>> Agree that we need a longer term solution.
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> >>> Passing recent tests on all executors except jenkins-12,
>>>>>>>>>>>>> which has not
>>>>>>>>>>>>> >>> scheduled recent builds for the past 13 days. Not
>>>>>>>>>>>>> scheduling:
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>> Recent passing builds:
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <
>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> >>>> +Alan Myrvold <am...@google.com> is doing a one time
>>>>>>>>>>>>> cleanup. I agree
>>>>>>>>>>>>> >>>> that we need to have a solution to automate this task or
>>>>>>>>>>>>> address the root
>>>>>>>>>>>>> >>>> cause of the buildup.
>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>>>>>>>>>>>>> michal.walenia@polidea.com>
>>>>>>>>>>>>> >>>> wrote:
>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>> >>>>> Hi there,
>>>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins workers again.
>>>>>>>>>>>>> Nodes 1 and 7
>>>>>>>>>>>>> >>>>> both fail jobs with "No space left on device".
>>>>>>>>>>>>> >>>>> Who is the best person to contact in these cases
>>>>>>>>>>>>> (someone with access
>>>>>>>>>>>>> >>>>> permissions to the workers).
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> I also noticed that such errors are becoming more and
>>>>>>>>>>>>> more frequent
>>>>>>>>>>>>> >>>>> recently and I'd like to discuss how can this be
>>>>>>>>>>>>> remedied. Can a cleanup
>>>>>>>>>>>>> >>>>> task be automated on Jenkins somehow?
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> Regards
>>>>>>>>>>>>> >>>>> Michal
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> --
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> Michał Walenia
>>>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software Engineer
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <+48791432002
>>>>>>>>>>>>> <+48%20791%20432%20002>>
>>>>>>>>>>>>> >>>>> E: michal.walenia@polidea.com
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> Unique Tech
>>>>>>>>>>>>> >>>>> Check out our projects! <
>>>>>>>>>>>>> https://www.polidea.com/our-work>
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>> >>
>>>>>>>>>>>>>
>>>>>>>>>>>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Tyson Hamilton <ty...@google.com>.
Everything I've been looking at is in the /tmp dir. Where are the
workspaces, or what are the named?




On Fri, Jul 24, 2020 at 11:03 AM Udi Meiri <eh...@google.com> wrote:

> I'm curious to what you find. Was it /tmp or the workspaces using up the
> space?
>
> On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton <ty...@google.com>
> wrote:
>
>> Bleck. I just realized that it is 'offline' so that won't work. I'll
>> clean up manually on the machine using the cron command.
>>
>> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <ty...@google.com>
>> wrote:
>>
>>> Something isn't working with the current set up because node 15 appears
>>> to be out of space and is currently 'offline' according to Jenkins. Can
>>> someone run the cleanup job? The machine is full,
>>>
>>> @apache-ci-beam-jenkins-15:/tmp$ df -h
>>> Filesystem      Size  Used Avail Use% Mounted on
>>> udev             52G     0   52G   0% /dev
>>> tmpfs            11G  265M   10G   3% /run
>>> */dev/sda1       485G  484G  880M 100% /*
>>> tmpfs            52G     0   52G   0% /dev/shm
>>> tmpfs           5.0M     0  5.0M   0% /run/lock
>>> tmpfs            52G     0   52G   0% /sys/fs/cgroup
>>> tmpfs            11G     0   11G   0% /run/user/1017
>>> tmpfs            11G     0   11G   0% /run/user/1037
>>>
>>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
>>> head -n 20
>>> 20G     2020-07-24 17:52        .
>>> 580M    2020-07-22 17:31        ./junit1031982597110125586
>>> 517M    2020-07-22 17:31
>>>  ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
>>> 517M    2020-07-22 17:31
>>>  ./junit1031982597110125586/junit8739924829337821410
>>> 263M    2020-07-22 12:23        ./pip-install-2GUhO_
>>> 263M    2020-07-20 09:30        ./pip-install-sxgwqr
>>> 263M    2020-07-17 13:56        ./pip-install-bWSKIV
>>> 242M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T
>>> 242M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK
>>> 242M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ
>>> 236M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T/tmpOWj3Yr
>>> 236M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK/tmppbQHB3
>>> 236M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ/tmpgOXPKW
>>> 111M    2020-07-23 00:57        ./pip-install-1JnyNE
>>> 105M    2020-07-23 00:17        ./beam-artifact1374651823280819755
>>> 105M    2020-07-23 00:16        ./beam-artifact5050755582921936972
>>> 105M    2020-07-23 00:16        ./beam-artifact1834064452502646289
>>> 105M    2020-07-23 00:15        ./beam-artifact682561790267074916
>>> 105M    2020-07-23 00:15        ./beam-artifact4691304965824489394
>>> 105M    2020-07-23 00:14        ./beam-artifact4050383819822604421
>>>
>>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <ro...@google.com>
>>> wrote:
>>>
>>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <ty...@google.com>
>>>> wrote:
>>>>
>>>>> Ah I see, thanks Kenn. I found some advice from the Apache infra wiki
>>>>> that also suggests using a tmpdir inside the workspace [1]:
>>>>>
>>>>> Procedures Projects can take to clean up disk space
>>>>>
>>>>> Projects can help themselves and Infra by taking some basic steps to
>>>>> help clean up their jobs after themselves on the build nodes.
>>>>>
>>>>>
>>>>>
>>>>>    1. Use a ./tmp dir in your jobs workspace. That way it gets
>>>>>    cleaned up when job workspaces expire.
>>>>>
>>>>>
>>>> Tests should be (able to be) written to use the standard temporary file
>>>> mechanisms, and the environment set up on Jenkins such that that falls into
>>>> the respective workspaces. Ideally this should be as simple as setting
>>>> the TMPDIR (or similar) environment variable (and making sure it exists/is
>>>> writable).
>>>>
>>>>>
>>>>>    1. Configure your jobs to wipe workspaces on start or finish.
>>>>>    2. Configure your jobs to only keep 5 or 10 previous builds.
>>>>>    3. Configure your jobs to only keep 5 or 10 previous artifacts.
>>>>>
>>>>>
>>>>>
>>>>> [1]:
>>>>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
>>>>>
>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <ke...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Those file listings look like the result of using standard temp file
>>>>>> APIs but with TMPDIR set to /tmp.
>>>>>>
>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <ty...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Jobs are hermetic as far as I can tell and use unique subdirectories
>>>>>>> inside of /tmp. Here is a quick look into two examples:
>>>>>>>
>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort -rhk 1,1
>>>>>>> | head -n 20
>>>>>>> 1.6G    2020-07-21 02:25        .
>>>>>>> 242M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4
>>>>>>> 242M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT
>>>>>>> 242M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME
>>>>>>> 242M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB
>>>>>>> 242M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q
>>>>>>> 242M    2020-07-17 18:35        ./beam-pipeline-temp79qot2
>>>>>>> 236M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>>>>>>> 236M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>>>>>>> 236M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>>>>>>> 236M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>>>>>>> 236M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q/tmpWy1vWX
>>>>>>> 236M    2020-07-17 18:35        ./beam-pipeline-temp79qot2/tmpvN7vWA
>>>>>>> 3.7M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4/tmprlh_di
>>>>>>> 3.7M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>>>>>>> 3.7M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>>>>>>> 3.7M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>>>>>>> 3.7M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q/tmptYF1v1
>>>>>>> 3.7M    2020-07-17 18:35        ./beam-pipeline-temp79qot2/tmplfV0Rg
>>>>>>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
>>>>>>>
>>>>>>>
>>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort -rhk
>>>>>>> 1,1 | head -n 20
>>>>>>> 817M    2020-07-21 02:26        .
>>>>>>> 242M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM
>>>>>>> 242M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3
>>>>>>> 242M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq
>>>>>>> 236M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpstXoL0
>>>>>>> 236M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmpnnVn65
>>>>>>> 236M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpRF0iNs
>>>>>>> 3.7M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>>>>>>> 3.7M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmpsmmzqe
>>>>>>> 3.7M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>>>>>>> 2.0M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpoj3orz
>>>>>>> 2.0M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmptng9sZ
>>>>>>> 2.0M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpWp6njc
>>>>>>> 1.2M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmphgdj35
>>>>>>> 1.2M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmp8ySXpm
>>>>>>> 1.2M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>>>>>>> 992K    2020-07-12 12:00        ./junit642086915811430564
>>>>>>> 988K    2020-07-12 12:00        ./junit642086915811430564/beam
>>>>>>> 984K    2020-07-12 12:00        ./junit642086915811430564/beam/nodes
>>>>>>> 980K    2020-07-12 12:00
>>>>>>>  ./junit642086915811430564/beam/nodes/0
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <eh...@google.com> wrote:
>>>>>>>
>>>>>>>> You're right, job workspaces should be hermetic.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <ke...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I'm probably late to this discussion and missing something, but
>>>>>>>>> why are we writing to /tmp at all? I would expect TMPDIR to point somewhere
>>>>>>>>> inside the job directory that will be wiped by Jenkins, and I would expect
>>>>>>>>> code to always create temp files via APIs that respect this. Is Jenkins not
>>>>>>>>> cleaning up? Do we not have the ability to set this up? Do we have bugs in
>>>>>>>>> our code (that we could probably find by setting TMPDIR to somewhere
>>>>>>>>> not-/tmp and running the tests without write permission to /tmp, etc)
>>>>>>>>>
>>>>>>>>> Kenn
>>>>>>>>>
>>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <al...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Related to workspace directory growth, +Udi Meiri
>>>>>>>>>> <eh...@google.com> filed a relevant issue previously (
>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for cleaning up
>>>>>>>>>> workspace directory after successful jobs. Alternatively, we can consider
>>>>>>>>>> periodically cleaning up the /src directories.
>>>>>>>>>>
>>>>>>>>>> I would suggest moving the cron task from internal cron scripts
>>>>>>>>>> to the inventory job (
>>>>>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
>>>>>>>>>> That way, we can see all the cron jobs as part of the source tree, adjust
>>>>>>>>>> frequencies and clean up codes with PRs. I do not know how internal cron
>>>>>>>>>> scripts are created, maintained, and how would they be recreated for new
>>>>>>>>>> worker instances.
>>>>>>>>>>
>>>>>>>>>> /cc +Tyson Hamilton <ty...@google.com>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>>>>>>>>>> damian.gadomski@polidea.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hey,
>>>>>>>>>>>
>>>>>>>>>>> I've recently created a solution for the growing /tmp directory.
>>>>>>>>>>> Part of it is the job mentioned by Tyson:
>>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not triggered by
>>>>>>>>>>> cron and should be a last resort solution for some strange cases.
>>>>>>>>>>>
>>>>>>>>>>> Along with that job, I've also updated every worker with an
>>>>>>>>>>> internal cron script. It's being executed once a week and deletes all the
>>>>>>>>>>> files (and only files) that were not accessed for at least three days.
>>>>>>>>>>> That's designed to be as safe as possible for the running jobs on the
>>>>>>>>>>> worker (not to delete the files that are still in use), and also to be
>>>>>>>>>>> insensitive to the current workload on the machine. The cleanup will always
>>>>>>>>>>> happen, even if some long-running/stuck jobs are blocking the machine.
>>>>>>>>>>>
>>>>>>>>>>> I also think that currently the "No space left" errors may be a
>>>>>>>>>>> consequence of growing workspace directory rather than /tmp. I didn't do
>>>>>>>>>>> any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the
>>>>>>>>>>> workspace directory size is 158 GB while /tmp is only 16 GB. We should
>>>>>>>>>>> either guarantee the disk size to hold workspaces for all jobs (because
>>>>>>>>>>> eventually, every worker will execute each job) or clear also the
>>>>>>>>>>> workspaces in some way.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Damian
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <
>>>>>>>>>>> mxm@apache.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> +1 for scheduling it via a cron job if it won't lead to test
>>>>>>>>>>>> failures
>>>>>>>>>>>> while running. Not a Jenkins expert but maybe there is the
>>>>>>>>>>>> notion of
>>>>>>>>>>>> running exclusively while no other tasks are running?
>>>>>>>>>>>>
>>>>>>>>>>>> -Max
>>>>>>>>>>>>
>>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>>>>>>>>>> > FYI there was a job introduced to do this in Jenkins:
>>>>>>>>>>>> beam_Clean_tmp_directory
>>>>>>>>>>>> >
>>>>>>>>>>>> > Currently it needs to be run manually. I'm seeing some out of
>>>>>>>>>>>> disk related errors in precommit tests currently, perhaps we should
>>>>>>>>>>>> schedule this job with cron?
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <he...@google.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> >> Still seeing no space left on device errors on jenkins-7
>>>>>>>>>>>> (for example:
>>>>>>>>>>>> >>
>>>>>>>>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>>>>>>>>>>>> )
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <
>>>>>>>>>>>> amyrvold@google.com> wrote:
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by jenkins older
>>>>>>>>>>>> than 3 days.
>>>>>>>>>>>> >>> Agree that we need a longer term solution.
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> Passing recent tests on all executors except jenkins-12,
>>>>>>>>>>>> which has not
>>>>>>>>>>>> >>> scheduled recent builds for the past 13 days. Not
>>>>>>>>>>>> scheduling:
>>>>>>>>>>>> >>>
>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>>>>>>>>>> >>> <
>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>>>>>>>>>> >
>>>>>>>>>>>> >>> Recent passing builds:
>>>>>>>>>>>> >>>
>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>>>>>>>>>> >>> <
>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>>>>>>>>>> >
>>>>>>>>>>>> >>>
>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>>>>>>>>>> >>> <
>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>>>>>>>>>> >
>>>>>>>>>>>> >>>
>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>>>>>>>>>> >>> <
>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>>>>>>>>>> >
>>>>>>>>>>>> >>>
>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>>>>>>>>>> >>> <
>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>>>>>>>>>> >
>>>>>>>>>>>> >>>
>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>>>>>>>>>> >>> <
>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>>>>>>>>>> >
>>>>>>>>>>>> >>>
>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>>>>>>>>>> >>> <
>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>>>>>>>>>> >
>>>>>>>>>>>> >>>
>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>>>>>>>>>> >>> <
>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>>>>>>>>>> >
>>>>>>>>>>>> >>>
>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>>>>>>>>>> >>> <
>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>>>>>>>>>> >
>>>>>>>>>>>> >>>
>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>>>>>>>>>> >>> <
>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>>>>>>>>>> >
>>>>>>>>>>>> >>>
>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>>>>>>>>>> >>> <
>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>>>>>>>>>> >
>>>>>>>>>>>> >>>
>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>>>>>>>>>> >>> <
>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>>>>>>>>>> >
>>>>>>>>>>>> >>>
>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>>>>>>>>>> >>> <
>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>>>>>>>>>> >
>>>>>>>>>>>> >>>
>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>>>>>>>>>> >>> <
>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>>>>>>>>>> >
>>>>>>>>>>>> >>>
>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>>>>>>>>>> >>> <
>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>>>>>>>>>> >
>>>>>>>>>>>> >>>
>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>>>>>>>>>> >>> <
>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>>>>>>>>>> >
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <
>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>>> +Alan Myrvold <am...@google.com> is doing a one time
>>>>>>>>>>>> cleanup. I agree
>>>>>>>>>>>> >>>> that we need to have a solution to automate this task or
>>>>>>>>>>>> address the root
>>>>>>>>>>>> >>>> cause of the buildup.
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>>>>>>>>>>>> michal.walenia@polidea.com>
>>>>>>>>>>>> >>>> wrote:
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>>> Hi there,
>>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins workers again.
>>>>>>>>>>>> Nodes 1 and 7
>>>>>>>>>>>> >>>>> both fail jobs with "No space left on device".
>>>>>>>>>>>> >>>>> Who is the best person to contact in these cases (someone
>>>>>>>>>>>> with access
>>>>>>>>>>>> >>>>> permissions to the workers).
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> I also noticed that such errors are becoming more and
>>>>>>>>>>>> more frequent
>>>>>>>>>>>> >>>>> recently and I'd like to discuss how can this be
>>>>>>>>>>>> remedied. Can a cleanup
>>>>>>>>>>>> >>>>> task be automated on Jenkins somehow?
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> Regards
>>>>>>>>>>>> >>>>> Michal
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> --
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> Michał Walenia
>>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software Engineer
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <+48791432002
>>>>>>>>>>>> <+48%20791%20432%20002>>
>>>>>>>>>>>> >>>>> E: michal.walenia@polidea.com
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> Unique Tech
>>>>>>>>>>>> >>>>> Check out our projects! <https://www.polidea.com/our-work
>>>>>>>>>>>> >
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>
>>>>>>>>>>>>
>>>>>>>>>>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Udi Meiri <eh...@google.com>.
I'm curious to what you find. Was it /tmp or the workspaces using up the
space?

On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton <ty...@google.com> wrote:

> Bleck. I just realized that it is 'offline' so that won't work. I'll clean
> up manually on the machine using the cron command.
>
> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <ty...@google.com>
> wrote:
>
>> Something isn't working with the current set up because node 15 appears
>> to be out of space and is currently 'offline' according to Jenkins. Can
>> someone run the cleanup job? The machine is full,
>>
>> @apache-ci-beam-jenkins-15:/tmp$ df -h
>> Filesystem      Size  Used Avail Use% Mounted on
>> udev             52G     0   52G   0% /dev
>> tmpfs            11G  265M   10G   3% /run
>> */dev/sda1       485G  484G  880M 100% /*
>> tmpfs            52G     0   52G   0% /dev/shm
>> tmpfs           5.0M     0  5.0M   0% /run/lock
>> tmpfs            52G     0   52G   0% /sys/fs/cgroup
>> tmpfs            11G     0   11G   0% /run/user/1017
>> tmpfs            11G     0   11G   0% /run/user/1037
>>
>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
>> head -n 20
>> 20G     2020-07-24 17:52        .
>> 580M    2020-07-22 17:31        ./junit1031982597110125586
>> 517M    2020-07-22 17:31
>>  ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
>> 517M    2020-07-22 17:31
>>  ./junit1031982597110125586/junit8739924829337821410
>> 263M    2020-07-22 12:23        ./pip-install-2GUhO_
>> 263M    2020-07-20 09:30        ./pip-install-sxgwqr
>> 263M    2020-07-17 13:56        ./pip-install-bWSKIV
>> 242M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T
>> 242M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK
>> 242M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ
>> 236M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T/tmpOWj3Yr
>> 236M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK/tmppbQHB3
>> 236M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ/tmpgOXPKW
>> 111M    2020-07-23 00:57        ./pip-install-1JnyNE
>> 105M    2020-07-23 00:17        ./beam-artifact1374651823280819755
>> 105M    2020-07-23 00:16        ./beam-artifact5050755582921936972
>> 105M    2020-07-23 00:16        ./beam-artifact1834064452502646289
>> 105M    2020-07-23 00:15        ./beam-artifact682561790267074916
>> 105M    2020-07-23 00:15        ./beam-artifact4691304965824489394
>> 105M    2020-07-23 00:14        ./beam-artifact4050383819822604421
>>
>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <ro...@google.com>
>> wrote:
>>
>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <ty...@google.com>
>>> wrote:
>>>
>>>> Ah I see, thanks Kenn. I found some advice from the Apache infra wiki
>>>> that also suggests using a tmpdir inside the workspace [1]:
>>>>
>>>> Procedures Projects can take to clean up disk space
>>>>
>>>> Projects can help themselves and Infra by taking some basic steps to
>>>> help clean up their jobs after themselves on the build nodes.
>>>>
>>>>
>>>>
>>>>    1. Use a ./tmp dir in your jobs workspace. That way it gets cleaned
>>>>    up when job workspaces expire.
>>>>
>>>>
>>> Tests should be (able to be) written to use the standard temporary file
>>> mechanisms, and the environment set up on Jenkins such that that falls into
>>> the respective workspaces. Ideally this should be as simple as setting
>>> the TMPDIR (or similar) environment variable (and making sure it exists/is
>>> writable).
>>>
>>>>
>>>>    1. Configure your jobs to wipe workspaces on start or finish.
>>>>    2. Configure your jobs to only keep 5 or 10 previous builds.
>>>>    3. Configure your jobs to only keep 5 or 10 previous artifacts.
>>>>
>>>>
>>>>
>>>> [1]:
>>>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
>>>>
>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <ke...@apache.org>
>>>> wrote:
>>>>
>>>>> Those file listings look like the result of using standard temp file
>>>>> APIs but with TMPDIR set to /tmp.
>>>>>
>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <ty...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Jobs are hermetic as far as I can tell and use unique subdirectories
>>>>>> inside of /tmp. Here is a quick look into two examples:
>>>>>>
>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort -rhk 1,1
>>>>>> | head -n 20
>>>>>> 1.6G    2020-07-21 02:25        .
>>>>>> 242M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4
>>>>>> 242M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT
>>>>>> 242M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME
>>>>>> 242M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB
>>>>>> 242M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q
>>>>>> 242M    2020-07-17 18:35        ./beam-pipeline-temp79qot2
>>>>>> 236M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>>>>>> 236M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>>>>>> 236M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>>>>>> 236M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>>>>>> 236M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q/tmpWy1vWX
>>>>>> 236M    2020-07-17 18:35        ./beam-pipeline-temp79qot2/tmpvN7vWA
>>>>>> 3.7M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4/tmprlh_di
>>>>>> 3.7M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>>>>>> 3.7M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>>>>>> 3.7M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>>>>>> 3.7M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q/tmptYF1v1
>>>>>> 3.7M    2020-07-17 18:35        ./beam-pipeline-temp79qot2/tmplfV0Rg
>>>>>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
>>>>>>
>>>>>>
>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort -rhk 1,1
>>>>>> | head -n 20
>>>>>> 817M    2020-07-21 02:26        .
>>>>>> 242M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM
>>>>>> 242M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3
>>>>>> 242M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq
>>>>>> 236M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpstXoL0
>>>>>> 236M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmpnnVn65
>>>>>> 236M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpRF0iNs
>>>>>> 3.7M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>>>>>> 3.7M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmpsmmzqe
>>>>>> 3.7M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>>>>>> 2.0M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpoj3orz
>>>>>> 2.0M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmptng9sZ
>>>>>> 2.0M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpWp6njc
>>>>>> 1.2M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmphgdj35
>>>>>> 1.2M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmp8ySXpm
>>>>>> 1.2M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>>>>>> 992K    2020-07-12 12:00        ./junit642086915811430564
>>>>>> 988K    2020-07-12 12:00        ./junit642086915811430564/beam
>>>>>> 984K    2020-07-12 12:00        ./junit642086915811430564/beam/nodes
>>>>>> 980K    2020-07-12 12:00        ./junit642086915811430564/beam/nodes/0
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <eh...@google.com> wrote:
>>>>>>
>>>>>>> You're right, job workspaces should be hermetic.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <ke...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I'm probably late to this discussion and missing something, but why
>>>>>>>> are we writing to /tmp at all? I would expect TMPDIR to point somewhere
>>>>>>>> inside the job directory that will be wiped by Jenkins, and I would expect
>>>>>>>> code to always create temp files via APIs that respect this. Is Jenkins not
>>>>>>>> cleaning up? Do we not have the ability to set this up? Do we have bugs in
>>>>>>>> our code (that we could probably find by setting TMPDIR to somewhere
>>>>>>>> not-/tmp and running the tests without write permission to /tmp, etc)
>>>>>>>>
>>>>>>>> Kenn
>>>>>>>>
>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <al...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Related to workspace directory growth, +Udi Meiri
>>>>>>>>> <eh...@google.com> filed a relevant issue previously (
>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for cleaning up
>>>>>>>>> workspace directory after successful jobs. Alternatively, we can consider
>>>>>>>>> periodically cleaning up the /src directories.
>>>>>>>>>
>>>>>>>>> I would suggest moving the cron task from internal cron scripts to
>>>>>>>>> the inventory job (
>>>>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
>>>>>>>>> That way, we can see all the cron jobs as part of the source tree, adjust
>>>>>>>>> frequencies and clean up codes with PRs. I do not know how internal cron
>>>>>>>>> scripts are created, maintained, and how would they be recreated for new
>>>>>>>>> worker instances.
>>>>>>>>>
>>>>>>>>> /cc +Tyson Hamilton <ty...@google.com>
>>>>>>>>>
>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>>>>>>>>> damian.gadomski@polidea.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hey,
>>>>>>>>>>
>>>>>>>>>> I've recently created a solution for the growing /tmp directory.
>>>>>>>>>> Part of it is the job mentioned by Tyson:
>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not triggered by
>>>>>>>>>> cron and should be a last resort solution for some strange cases.
>>>>>>>>>>
>>>>>>>>>> Along with that job, I've also updated every worker with an
>>>>>>>>>> internal cron script. It's being executed once a week and deletes all the
>>>>>>>>>> files (and only files) that were not accessed for at least three days.
>>>>>>>>>> That's designed to be as safe as possible for the running jobs on the
>>>>>>>>>> worker (not to delete the files that are still in use), and also to be
>>>>>>>>>> insensitive to the current workload on the machine. The cleanup will always
>>>>>>>>>> happen, even if some long-running/stuck jobs are blocking the machine.
>>>>>>>>>>
>>>>>>>>>> I also think that currently the "No space left" errors may be a
>>>>>>>>>> consequence of growing workspace directory rather than /tmp. I didn't do
>>>>>>>>>> any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the
>>>>>>>>>> workspace directory size is 158 GB while /tmp is only 16 GB. We should
>>>>>>>>>> either guarantee the disk size to hold workspaces for all jobs (because
>>>>>>>>>> eventually, every worker will execute each job) or clear also the
>>>>>>>>>> workspaces in some way.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Damian
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <
>>>>>>>>>> mxm@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> +1 for scheduling it via a cron job if it won't lead to test
>>>>>>>>>>> failures
>>>>>>>>>>> while running. Not a Jenkins expert but maybe there is the
>>>>>>>>>>> notion of
>>>>>>>>>>> running exclusively while no other tasks are running?
>>>>>>>>>>>
>>>>>>>>>>> -Max
>>>>>>>>>>>
>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>>>>>>>>> > FYI there was a job introduced to do this in Jenkins:
>>>>>>>>>>> beam_Clean_tmp_directory
>>>>>>>>>>> >
>>>>>>>>>>> > Currently it needs to be run manually. I'm seeing some out of
>>>>>>>>>>> disk related errors in precommit tests currently, perhaps we should
>>>>>>>>>>> schedule this job with cron?
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <he...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> >> Still seeing no space left on device errors on jenkins-7 (for
>>>>>>>>>>> example:
>>>>>>>>>>> >>
>>>>>>>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>>>>>>>>>>> )
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <
>>>>>>>>>>> amyrvold@google.com> wrote:
>>>>>>>>>>> >>
>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by jenkins older
>>>>>>>>>>> than 3 days.
>>>>>>>>>>> >>> Agree that we need a longer term solution.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Passing recent tests on all executors except jenkins-12,
>>>>>>>>>>> which has not
>>>>>>>>>>> >>> scheduled recent builds for the past 13 days. Not scheduling:
>>>>>>>>>>> >>>
>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>>>>>>>>> >>> <
>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>>>>>>>>> >
>>>>>>>>>>> >>> Recent passing builds:
>>>>>>>>>>> >>>
>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>>>>>>>>> >>> <
>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>>>>>>>>> >
>>>>>>>>>>> >>>
>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>>>>>>>>> >>> <
>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>>>>>>>>> >
>>>>>>>>>>> >>>
>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>>>>>>>>> >>> <
>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>>>>>>>>> >
>>>>>>>>>>> >>>
>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>>>>>>>>> >>> <
>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>>>>>>>>> >
>>>>>>>>>>> >>>
>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>>>>>>>>> >>> <
>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>>>>>>>>> >
>>>>>>>>>>> >>>
>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>>>>>>>>> >>> <
>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>>>>>>>>> >
>>>>>>>>>>> >>>
>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>>>>>>>>> >>> <
>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>>>>>>>>> >
>>>>>>>>>>> >>>
>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>>>>>>>>> >>> <
>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>>>>>>>>> >
>>>>>>>>>>> >>>
>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>>>>>>>>> >>> <
>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>>>>>>>>> >
>>>>>>>>>>> >>>
>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>>>>>>>>> >>> <
>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>>>>>>>>> >
>>>>>>>>>>> >>>
>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>>>>>>>>> >>> <
>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>>>>>>>>> >
>>>>>>>>>>> >>>
>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>>>>>>>>> >>> <
>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>>>>>>>>> >
>>>>>>>>>>> >>>
>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>>>>>>>>> >>> <
>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>>>>>>>>> >
>>>>>>>>>>> >>>
>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>>>>>>>>> >>> <
>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>>>>>>>>> >
>>>>>>>>>>> >>>
>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>>>>>>>>> >>> <
>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>>>>>>>>> >
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <
>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>> +Alan Myrvold <am...@google.com> is doing a one time
>>>>>>>>>>> cleanup. I agree
>>>>>>>>>>> >>>> that we need to have a solution to automate this task or
>>>>>>>>>>> address the root
>>>>>>>>>>> >>>> cause of the buildup.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>>>>>>>>>>> michal.walenia@polidea.com>
>>>>>>>>>>> >>>> wrote:
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>> Hi there,
>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins workers again.
>>>>>>>>>>> Nodes 1 and 7
>>>>>>>>>>> >>>>> both fail jobs with "No space left on device".
>>>>>>>>>>> >>>>> Who is the best person to contact in these cases (someone
>>>>>>>>>>> with access
>>>>>>>>>>> >>>>> permissions to the workers).
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> I also noticed that such errors are becoming more and more
>>>>>>>>>>> frequent
>>>>>>>>>>> >>>>> recently and I'd like to discuss how can this be remedied.
>>>>>>>>>>> Can a cleanup
>>>>>>>>>>> >>>>> task be automated on Jenkins somehow?
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Regards
>>>>>>>>>>> >>>>> Michal
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> --
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Michał Walenia
>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software Engineer
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <+48791432002
>>>>>>>>>>> <+48%20791%20432%20002>>
>>>>>>>>>>> >>>>> E: michal.walenia@polidea.com
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Unique Tech
>>>>>>>>>>> >>>>> Check out our projects! <https://www.polidea.com/our-work>
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>
>>>>>>>>>>>
>>>>>>>>>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Tyson Hamilton <ty...@google.com>.
Bleck. I just realized that it is 'offline' so that won't work. I'll clean
up manually on the machine using the cron command.

On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <ty...@google.com> wrote:

> Something isn't working with the current set up because node 15 appears to
> be out of space and is currently 'offline' according to Jenkins. Can
> someone run the cleanup job? The machine is full,
>
> @apache-ci-beam-jenkins-15:/tmp$ df -h
> Filesystem      Size  Used Avail Use% Mounted on
> udev             52G     0   52G   0% /dev
> tmpfs            11G  265M   10G   3% /run
> */dev/sda1       485G  484G  880M 100% /*
> tmpfs            52G     0   52G   0% /dev/shm
> tmpfs           5.0M     0  5.0M   0% /run/lock
> tmpfs            52G     0   52G   0% /sys/fs/cgroup
> tmpfs            11G     0   11G   0% /run/user/1017
> tmpfs            11G     0   11G   0% /run/user/1037
>
> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
> head -n 20
> 20G     2020-07-24 17:52        .
> 580M    2020-07-22 17:31        ./junit1031982597110125586
> 517M    2020-07-22 17:31
>  ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
> 517M    2020-07-22 17:31
>  ./junit1031982597110125586/junit8739924829337821410
> 263M    2020-07-22 12:23        ./pip-install-2GUhO_
> 263M    2020-07-20 09:30        ./pip-install-sxgwqr
> 263M    2020-07-17 13:56        ./pip-install-bWSKIV
> 242M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T
> 242M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK
> 242M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ
> 236M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T/tmpOWj3Yr
> 236M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK/tmppbQHB3
> 236M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ/tmpgOXPKW
> 111M    2020-07-23 00:57        ./pip-install-1JnyNE
> 105M    2020-07-23 00:17        ./beam-artifact1374651823280819755
> 105M    2020-07-23 00:16        ./beam-artifact5050755582921936972
> 105M    2020-07-23 00:16        ./beam-artifact1834064452502646289
> 105M    2020-07-23 00:15        ./beam-artifact682561790267074916
> 105M    2020-07-23 00:15        ./beam-artifact4691304965824489394
> 105M    2020-07-23 00:14        ./beam-artifact4050383819822604421
>
> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <ty...@google.com>
>> wrote:
>>
>>> Ah I see, thanks Kenn. I found some advice from the Apache infra wiki
>>> that also suggests using a tmpdir inside the workspace [1]:
>>>
>>> Procedures Projects can take to clean up disk space
>>>
>>> Projects can help themselves and Infra by taking some basic steps to
>>> help clean up their jobs after themselves on the build nodes.
>>>
>>>
>>>
>>>    1. Use a ./tmp dir in your jobs workspace. That way it gets cleaned
>>>    up when job workspaces expire.
>>>
>>>
>> Tests should be (able to be) written to use the standard temporary file
>> mechanisms, and the environment set up on Jenkins such that that falls into
>> the respective workspaces. Ideally this should be as simple as setting
>> the TMPDIR (or similar) environment variable (and making sure it exists/is
>> writable).
>>
>>>
>>>    1. Configure your jobs to wipe workspaces on start or finish.
>>>    2. Configure your jobs to only keep 5 or 10 previous builds.
>>>    3. Configure your jobs to only keep 5 or 10 previous artifacts.
>>>
>>>
>>>
>>> [1]:
>>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
>>>
>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <ke...@apache.org> wrote:
>>>
>>>> Those file listings look like the result of using standard temp file
>>>> APIs but with TMPDIR set to /tmp.
>>>>
>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <ty...@google.com>
>>>> wrote:
>>>>
>>>>> Jobs are hermetic as far as I can tell and use unique subdirectories
>>>>> inside of /tmp. Here is a quick look into two examples:
>>>>>
>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
>>>>> head -n 20
>>>>> 1.6G    2020-07-21 02:25        .
>>>>> 242M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4
>>>>> 242M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT
>>>>> 242M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME
>>>>> 242M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB
>>>>> 242M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q
>>>>> 242M    2020-07-17 18:35        ./beam-pipeline-temp79qot2
>>>>> 236M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>>>>> 236M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>>>>> 236M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>>>>> 236M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>>>>> 236M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q/tmpWy1vWX
>>>>> 236M    2020-07-17 18:35        ./beam-pipeline-temp79qot2/tmpvN7vWA
>>>>> 3.7M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4/tmprlh_di
>>>>> 3.7M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>>>>> 3.7M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>>>>> 3.7M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>>>>> 3.7M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q/tmptYF1v1
>>>>> 3.7M    2020-07-17 18:35        ./beam-pipeline-temp79qot2/tmplfV0Rg
>>>>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
>>>>>
>>>>>
>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort -rhk 1,1
>>>>> | head -n 20
>>>>> 817M    2020-07-21 02:26        .
>>>>> 242M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM
>>>>> 242M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3
>>>>> 242M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq
>>>>> 236M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpstXoL0
>>>>> 236M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmpnnVn65
>>>>> 236M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpRF0iNs
>>>>> 3.7M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>>>>> 3.7M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmpsmmzqe
>>>>> 3.7M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>>>>> 2.0M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpoj3orz
>>>>> 2.0M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmptng9sZ
>>>>> 2.0M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpWp6njc
>>>>> 1.2M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmphgdj35
>>>>> 1.2M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmp8ySXpm
>>>>> 1.2M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>>>>> 992K    2020-07-12 12:00        ./junit642086915811430564
>>>>> 988K    2020-07-12 12:00        ./junit642086915811430564/beam
>>>>> 984K    2020-07-12 12:00        ./junit642086915811430564/beam/nodes
>>>>> 980K    2020-07-12 12:00        ./junit642086915811430564/beam/nodes/0
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <eh...@google.com> wrote:
>>>>>
>>>>>> You're right, job workspaces should be hermetic.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <ke...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> I'm probably late to this discussion and missing something, but why
>>>>>>> are we writing to /tmp at all? I would expect TMPDIR to point somewhere
>>>>>>> inside the job directory that will be wiped by Jenkins, and I would expect
>>>>>>> code to always create temp files via APIs that respect this. Is Jenkins not
>>>>>>> cleaning up? Do we not have the ability to set this up? Do we have bugs in
>>>>>>> our code (that we could probably find by setting TMPDIR to somewhere
>>>>>>> not-/tmp and running the tests without write permission to /tmp, etc)
>>>>>>>
>>>>>>> Kenn
>>>>>>>
>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <al...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Related to workspace directory growth, +Udi Meiri
>>>>>>>> <eh...@google.com> filed a relevant issue previously (
>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for cleaning up
>>>>>>>> workspace directory after successful jobs. Alternatively, we can consider
>>>>>>>> periodically cleaning up the /src directories.
>>>>>>>>
>>>>>>>> I would suggest moving the cron task from internal cron scripts to
>>>>>>>> the inventory job (
>>>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
>>>>>>>> That way, we can see all the cron jobs as part of the source tree, adjust
>>>>>>>> frequencies and clean up codes with PRs. I do not know how internal cron
>>>>>>>> scripts are created, maintained, and how would they be recreated for new
>>>>>>>> worker instances.
>>>>>>>>
>>>>>>>> /cc +Tyson Hamilton <ty...@google.com>
>>>>>>>>
>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>>>>>>>> damian.gadomski@polidea.com> wrote:
>>>>>>>>
>>>>>>>>> Hey,
>>>>>>>>>
>>>>>>>>> I've recently created a solution for the growing /tmp directory.
>>>>>>>>> Part of it is the job mentioned by Tyson:
>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not triggered by
>>>>>>>>> cron and should be a last resort solution for some strange cases.
>>>>>>>>>
>>>>>>>>> Along with that job, I've also updated every worker with an
>>>>>>>>> internal cron script. It's being executed once a week and deletes all the
>>>>>>>>> files (and only files) that were not accessed for at least three days.
>>>>>>>>> That's designed to be as safe as possible for the running jobs on the
>>>>>>>>> worker (not to delete the files that are still in use), and also to be
>>>>>>>>> insensitive to the current workload on the machine. The cleanup will always
>>>>>>>>> happen, even if some long-running/stuck jobs are blocking the machine.
>>>>>>>>>
>>>>>>>>> I also think that currently the "No space left" errors may be a
>>>>>>>>> consequence of growing workspace directory rather than /tmp. I didn't do
>>>>>>>>> any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the
>>>>>>>>> workspace directory size is 158 GB while /tmp is only 16 GB. We should
>>>>>>>>> either guarantee the disk size to hold workspaces for all jobs (because
>>>>>>>>> eventually, every worker will execute each job) or clear also the
>>>>>>>>> workspaces in some way.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Damian
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <
>>>>>>>>> mxm@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> +1 for scheduling it via a cron job if it won't lead to test
>>>>>>>>>> failures
>>>>>>>>>> while running. Not a Jenkins expert but maybe there is the notion
>>>>>>>>>> of
>>>>>>>>>> running exclusively while no other tasks are running?
>>>>>>>>>>
>>>>>>>>>> -Max
>>>>>>>>>>
>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>>>>>>>> > FYI there was a job introduced to do this in Jenkins:
>>>>>>>>>> beam_Clean_tmp_directory
>>>>>>>>>> >
>>>>>>>>>> > Currently it needs to be run manually. I'm seeing some out of
>>>>>>>>>> disk related errors in precommit tests currently, perhaps we should
>>>>>>>>>> schedule this job with cron?
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <he...@google.com> wrote:
>>>>>>>>>> >> Still seeing no space left on device errors on jenkins-7 (for
>>>>>>>>>> example:
>>>>>>>>>> >>
>>>>>>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>>>>>>>>>> )
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <
>>>>>>>>>> amyrvold@google.com> wrote:
>>>>>>>>>> >>
>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by jenkins older
>>>>>>>>>> than 3 days.
>>>>>>>>>> >>> Agree that we need a longer term solution.
>>>>>>>>>> >>>
>>>>>>>>>> >>> Passing recent tests on all executors except jenkins-12,
>>>>>>>>>> which has not
>>>>>>>>>> >>> scheduled recent builds for the past 13 days. Not scheduling:
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>> Recent passing builds:
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <al...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>> >>>
>>>>>>>>>> >>>> +Alan Myrvold <am...@google.com> is doing a one time
>>>>>>>>>> cleanup. I agree
>>>>>>>>>> >>>> that we need to have a solution to automate this task or
>>>>>>>>>> address the root
>>>>>>>>>> >>>> cause of the buildup.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>>>>>>>>>> michal.walenia@polidea.com>
>>>>>>>>>> >>>> wrote:
>>>>>>>>>> >>>>
>>>>>>>>>> >>>>> Hi there,
>>>>>>>>>> >>>>> it seems we have a problem with Jenkins workers again.
>>>>>>>>>> Nodes 1 and 7
>>>>>>>>>> >>>>> both fail jobs with "No space left on device".
>>>>>>>>>> >>>>> Who is the best person to contact in these cases (someone
>>>>>>>>>> with access
>>>>>>>>>> >>>>> permissions to the workers).
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> I also noticed that such errors are becoming more and more
>>>>>>>>>> frequent
>>>>>>>>>> >>>>> recently and I'd like to discuss how can this be remedied.
>>>>>>>>>> Can a cleanup
>>>>>>>>>> >>>>> task be automated on Jenkins somehow?
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Regards
>>>>>>>>>> >>>>> Michal
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> --
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Michał Walenia
>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software Engineer
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <+48791432002
>>>>>>>>>> <+48%20791%20432%20002>>
>>>>>>>>>> >>>>> E: michal.walenia@polidea.com
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Unique Tech
>>>>>>>>>> >>>>> Check out our projects! <https://www.polidea.com/our-work>
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>
>>>>>>>>>> >>
>>>>>>>>>>
>>>>>>>>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Tyson Hamilton <ty...@google.com>.
Something isn't working with the current set up because node 15 appears to
be out of space and is currently 'offline' according to Jenkins. Can
someone run the cleanup job? The machine is full,

@apache-ci-beam-jenkins-15:/tmp$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             52G     0   52G   0% /dev
tmpfs            11G  265M   10G   3% /run
*/dev/sda1       485G  484G  880M 100% /*
tmpfs            52G     0   52G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            52G     0   52G   0% /sys/fs/cgroup
tmpfs            11G     0   11G   0% /run/user/1017
tmpfs            11G     0   11G   0% /run/user/1037

apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort -rhk 1,1 | head
-n 20
20G     2020-07-24 17:52        .
580M    2020-07-22 17:31        ./junit1031982597110125586
517M    2020-07-22 17:31
 ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
517M    2020-07-22 17:31
 ./junit1031982597110125586/junit8739924829337821410
263M    2020-07-22 12:23        ./pip-install-2GUhO_
263M    2020-07-20 09:30        ./pip-install-sxgwqr
263M    2020-07-17 13:56        ./pip-install-bWSKIV
242M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T
242M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK
242M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ
236M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T/tmpOWj3Yr
236M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK/tmppbQHB3
236M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ/tmpgOXPKW
111M    2020-07-23 00:57        ./pip-install-1JnyNE
105M    2020-07-23 00:17        ./beam-artifact1374651823280819755
105M    2020-07-23 00:16        ./beam-artifact5050755582921936972
105M    2020-07-23 00:16        ./beam-artifact1834064452502646289
105M    2020-07-23 00:15        ./beam-artifact682561790267074916
105M    2020-07-23 00:15        ./beam-artifact4691304965824489394
105M    2020-07-23 00:14        ./beam-artifact4050383819822604421

On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <ro...@google.com>
wrote:

> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <ty...@google.com>
> wrote:
>
>> Ah I see, thanks Kenn. I found some advice from the Apache infra wiki
>> that also suggests using a tmpdir inside the workspace [1]:
>>
>> Procedures Projects can take to clean up disk space
>>
>> Projects can help themselves and Infra by taking some basic steps to help
>> clean up their jobs after themselves on the build nodes.
>>
>>
>>
>>    1. Use a ./tmp dir in your jobs workspace. That way it gets cleaned
>>    up when job workspaces expire.
>>
>>
> Tests should be (able to be) written to use the standard temporary file
> mechanisms, and the environment set up on Jenkins such that that falls into
> the respective workspaces. Ideally this should be as simple as setting
> the TMPDIR (or similar) environment variable (and making sure it exists/is
> writable).
>
>>
>>    1. Configure your jobs to wipe workspaces on start or finish.
>>    2. Configure your jobs to only keep 5 or 10 previous builds.
>>    3. Configure your jobs to only keep 5 or 10 previous artifacts.
>>
>>
>>
>> [1]:
>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
>>
>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <ke...@apache.org> wrote:
>>
>>> Those file listings look like the result of using standard temp file
>>> APIs but with TMPDIR set to /tmp.
>>>
>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <ty...@google.com>
>>> wrote:
>>>
>>>> Jobs are hermetic as far as I can tell and use unique subdirectories
>>>> inside of /tmp. Here is a quick look into two examples:
>>>>
>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
>>>> head -n 20
>>>> 1.6G    2020-07-21 02:25        .
>>>> 242M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4
>>>> 242M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT
>>>> 242M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME
>>>> 242M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB
>>>> 242M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q
>>>> 242M    2020-07-17 18:35        ./beam-pipeline-temp79qot2
>>>> 236M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>>>> 236M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>>>> 236M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>>>> 236M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>>>> 236M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q/tmpWy1vWX
>>>> 236M    2020-07-17 18:35        ./beam-pipeline-temp79qot2/tmpvN7vWA
>>>> 3.7M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4/tmprlh_di
>>>> 3.7M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>>>> 3.7M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>>>> 3.7M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>>>> 3.7M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q/tmptYF1v1
>>>> 3.7M    2020-07-17 18:35        ./beam-pipeline-temp79qot2/tmplfV0Rg
>>>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
>>>>
>>>>
>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
>>>> head -n 20
>>>> 817M    2020-07-21 02:26        .
>>>> 242M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM
>>>> 242M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3
>>>> 242M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq
>>>> 236M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpstXoL0
>>>> 236M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmpnnVn65
>>>> 236M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpRF0iNs
>>>> 3.7M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>>>> 3.7M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmpsmmzqe
>>>> 3.7M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>>>> 2.0M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpoj3orz
>>>> 2.0M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmptng9sZ
>>>> 2.0M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpWp6njc
>>>> 1.2M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmphgdj35
>>>> 1.2M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmp8ySXpm
>>>> 1.2M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>>>> 992K    2020-07-12 12:00        ./junit642086915811430564
>>>> 988K    2020-07-12 12:00        ./junit642086915811430564/beam
>>>> 984K    2020-07-12 12:00        ./junit642086915811430564/beam/nodes
>>>> 980K    2020-07-12 12:00        ./junit642086915811430564/beam/nodes/0
>>>>
>>>>
>>>>
>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <eh...@google.com> wrote:
>>>>
>>>>> You're right, job workspaces should be hermetic.
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <ke...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> I'm probably late to this discussion and missing something, but why
>>>>>> are we writing to /tmp at all? I would expect TMPDIR to point somewhere
>>>>>> inside the job directory that will be wiped by Jenkins, and I would expect
>>>>>> code to always create temp files via APIs that respect this. Is Jenkins not
>>>>>> cleaning up? Do we not have the ability to set this up? Do we have bugs in
>>>>>> our code (that we could probably find by setting TMPDIR to somewhere
>>>>>> not-/tmp and running the tests without write permission to /tmp, etc)
>>>>>>
>>>>>> Kenn
>>>>>>
>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <al...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Related to workspace directory growth, +Udi Meiri <eh...@google.com> filed
>>>>>>> a relevant issue previously (
>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for cleaning up
>>>>>>> workspace directory after successful jobs. Alternatively, we can consider
>>>>>>> periodically cleaning up the /src directories.
>>>>>>>
>>>>>>> I would suggest moving the cron task from internal cron scripts to
>>>>>>> the inventory job (
>>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
>>>>>>> That way, we can see all the cron jobs as part of the source tree, adjust
>>>>>>> frequencies and clean up codes with PRs. I do not know how internal cron
>>>>>>> scripts are created, maintained, and how would they be recreated for new
>>>>>>> worker instances.
>>>>>>>
>>>>>>> /cc +Tyson Hamilton <ty...@google.com>
>>>>>>>
>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>>>>>>> damian.gadomski@polidea.com> wrote:
>>>>>>>
>>>>>>>> Hey,
>>>>>>>>
>>>>>>>> I've recently created a solution for the growing /tmp directory.
>>>>>>>> Part of it is the job mentioned by Tyson:
>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not triggered by
>>>>>>>> cron and should be a last resort solution for some strange cases.
>>>>>>>>
>>>>>>>> Along with that job, I've also updated every worker with an
>>>>>>>> internal cron script. It's being executed once a week and deletes all the
>>>>>>>> files (and only files) that were not accessed for at least three days.
>>>>>>>> That's designed to be as safe as possible for the running jobs on the
>>>>>>>> worker (not to delete the files that are still in use), and also to be
>>>>>>>> insensitive to the current workload on the machine. The cleanup will always
>>>>>>>> happen, even if some long-running/stuck jobs are blocking the machine.
>>>>>>>>
>>>>>>>> I also think that currently the "No space left" errors may be a
>>>>>>>> consequence of growing workspace directory rather than /tmp. I didn't do
>>>>>>>> any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the
>>>>>>>> workspace directory size is 158 GB while /tmp is only 16 GB. We should
>>>>>>>> either guarantee the disk size to hold workspaces for all jobs (because
>>>>>>>> eventually, every worker will execute each job) or clear also the
>>>>>>>> workspaces in some way.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Damian
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <mx...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> +1 for scheduling it via a cron job if it won't lead to test
>>>>>>>>> failures
>>>>>>>>> while running. Not a Jenkins expert but maybe there is the notion
>>>>>>>>> of
>>>>>>>>> running exclusively while no other tasks are running?
>>>>>>>>>
>>>>>>>>> -Max
>>>>>>>>>
>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>>>>>>> > FYI there was a job introduced to do this in Jenkins:
>>>>>>>>> beam_Clean_tmp_directory
>>>>>>>>> >
>>>>>>>>> > Currently it needs to be run manually. I'm seeing some out of
>>>>>>>>> disk related errors in precommit tests currently, perhaps we should
>>>>>>>>> schedule this job with cron?
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <he...@google.com> wrote:
>>>>>>>>> >> Still seeing no space left on device errors on jenkins-7 (for
>>>>>>>>> example:
>>>>>>>>> >>
>>>>>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>>>>>>>>> )
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <
>>>>>>>>> amyrvold@google.com> wrote:
>>>>>>>>> >>
>>>>>>>>> >>> Did a one time cleanup of tmp files owned by jenkins older
>>>>>>>>> than 3 days.
>>>>>>>>> >>> Agree that we need a longer term solution.
>>>>>>>>> >>>
>>>>>>>>> >>> Passing recent tests on all executors except jenkins-12, which
>>>>>>>>> has not
>>>>>>>>> >>> scheduled recent builds for the past 13 days. Not scheduling:
>>>>>>>>> >>>
>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>>>>>>> >>> <
>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>>>>>>> >
>>>>>>>>> >>> Recent passing builds:
>>>>>>>>> >>>
>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>>>>>>> >>> <
>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>>>>>>> >
>>>>>>>>> >>>
>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>>>>>>> >>> <
>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>>>>>>> >
>>>>>>>>> >>>
>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>>>>>>> >>> <
>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>>>>>>> >
>>>>>>>>> >>>
>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>>>>>>> >>> <
>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>>>>>>> >
>>>>>>>>> >>>
>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>>>>>>> >>> <
>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>>>>>>> >
>>>>>>>>> >>>
>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>>>>>>> >>> <
>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>>>>>>> >
>>>>>>>>> >>>
>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>>>>>>> >>> <
>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>>>>>>> >
>>>>>>>>> >>>
>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>>>>>>> >>> <
>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>>>>>>> >
>>>>>>>>> >>>
>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>>>>>>> >>> <
>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>>>>>>> >
>>>>>>>>> >>>
>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>>>>>>> >>> <
>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>>>>>>> >
>>>>>>>>> >>>
>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>>>>>>> >>> <
>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>>>>>>> >
>>>>>>>>> >>>
>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>>>>>>> >>> <
>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>>>>>>> >
>>>>>>>>> >>>
>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>>>>>>> >>> <
>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>>>>>>> >
>>>>>>>>> >>>
>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>>>>>>> >>> <
>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>>>>>>> >
>>>>>>>>> >>>
>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>>>>>>> >>> <
>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>>>>>>> >
>>>>>>>>> >>>
>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <al...@google.com>
>>>>>>>>> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>>> +Alan Myrvold <am...@google.com> is doing a one time
>>>>>>>>> cleanup. I agree
>>>>>>>>> >>>> that we need to have a solution to automate this task or
>>>>>>>>> address the root
>>>>>>>>> >>>> cause of the buildup.
>>>>>>>>> >>>>
>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>>>>>>>>> michal.walenia@polidea.com>
>>>>>>>>> >>>> wrote:
>>>>>>>>> >>>>
>>>>>>>>> >>>>> Hi there,
>>>>>>>>> >>>>> it seems we have a problem with Jenkins workers again. Nodes
>>>>>>>>> 1 and 7
>>>>>>>>> >>>>> both fail jobs with "No space left on device".
>>>>>>>>> >>>>> Who is the best person to contact in these cases (someone
>>>>>>>>> with access
>>>>>>>>> >>>>> permissions to the workers).
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> I also noticed that such errors are becoming more and more
>>>>>>>>> frequent
>>>>>>>>> >>>>> recently and I'd like to discuss how can this be remedied.
>>>>>>>>> Can a cleanup
>>>>>>>>> >>>>> task be automated on Jenkins somehow?
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Regards
>>>>>>>>> >>>>> Michal
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> --
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Michał Walenia
>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software Engineer
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <+48791432002
>>>>>>>>> <+48%20791%20432%20002>>
>>>>>>>>> >>>>> E: michal.walenia@polidea.com
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Unique Tech
>>>>>>>>> >>>>> Check out our projects! <https://www.polidea.com/our-work>
>>>>>>>>> >>>>>
>>>>>>>>> >>>>
>>>>>>>>> >>
>>>>>>>>>
>>>>>>>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Robert Bradshaw <ro...@google.com>.
On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <ty...@google.com> wrote:

> Ah I see, thanks Kenn. I found some advice from the Apache infra wiki that
> also suggests using a tmpdir inside the workspace [1]:
>
> Procedures Projects can take to clean up disk space
>
> Projects can help themselves and Infra by taking some basic steps to help
> clean up their jobs after themselves on the build nodes.
>
>
>
>    1. Use a ./tmp dir in your jobs workspace. That way it gets cleaned up
>    when job workspaces expire.
>
>
Tests should be (able to be) written to use the standard temporary file
mechanisms, and the environment set up on Jenkins such that that falls into
the respective workspaces. Ideally this should be as simple as setting
the TMPDIR (or similar) environment variable (and making sure it exists/is
writable).

>
>    1. Configure your jobs to wipe workspaces on start or finish.
>    2. Configure your jobs to only keep 5 or 10 previous builds.
>    3. Configure your jobs to only keep 5 or 10 previous artifacts.
>
>
>
> [1]:
> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
>
> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <ke...@apache.org> wrote:
>
>> Those file listings look like the result of using standard temp file APIs
>> but with TMPDIR set to /tmp.
>>
>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <ty...@google.com>
>> wrote:
>>
>>> Jobs are hermetic as far as I can tell and use unique subdirectories
>>> inside of /tmp. Here is a quick look into two examples:
>>>
>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
>>> head -n 20
>>> 1.6G    2020-07-21 02:25        .
>>> 242M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4
>>> 242M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT
>>> 242M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME
>>> 242M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB
>>> 242M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q
>>> 242M    2020-07-17 18:35        ./beam-pipeline-temp79qot2
>>> 236M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>>> 236M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>>> 236M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>>> 236M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>>> 236M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q/tmpWy1vWX
>>> 236M    2020-07-17 18:35        ./beam-pipeline-temp79qot2/tmpvN7vWA
>>> 3.7M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4/tmprlh_di
>>> 3.7M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>>> 3.7M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>>> 3.7M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>>> 3.7M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q/tmptYF1v1
>>> 3.7M    2020-07-17 18:35        ./beam-pipeline-temp79qot2/tmplfV0Rg
>>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
>>>
>>>
>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
>>> head -n 20
>>> 817M    2020-07-21 02:26        .
>>> 242M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM
>>> 242M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3
>>> 242M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq
>>> 236M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpstXoL0
>>> 236M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmpnnVn65
>>> 236M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpRF0iNs
>>> 3.7M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>>> 3.7M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmpsmmzqe
>>> 3.7M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>>> 2.0M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpoj3orz
>>> 2.0M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmptng9sZ
>>> 2.0M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpWp6njc
>>> 1.2M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmphgdj35
>>> 1.2M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmp8ySXpm
>>> 1.2M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>>> 992K    2020-07-12 12:00        ./junit642086915811430564
>>> 988K    2020-07-12 12:00        ./junit642086915811430564/beam
>>> 984K    2020-07-12 12:00        ./junit642086915811430564/beam/nodes
>>> 980K    2020-07-12 12:00        ./junit642086915811430564/beam/nodes/0
>>>
>>>
>>>
>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <eh...@google.com> wrote:
>>>
>>>> You're right, job workspaces should be hermetic.
>>>>
>>>>
>>>>
>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <ke...@apache.org>
>>>> wrote:
>>>>
>>>>> I'm probably late to this discussion and missing something, but why
>>>>> are we writing to /tmp at all? I would expect TMPDIR to point somewhere
>>>>> inside the job directory that will be wiped by Jenkins, and I would expect
>>>>> code to always create temp files via APIs that respect this. Is Jenkins not
>>>>> cleaning up? Do we not have the ability to set this up? Do we have bugs in
>>>>> our code (that we could probably find by setting TMPDIR to somewhere
>>>>> not-/tmp and running the tests without write permission to /tmp, etc)
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <al...@google.com> wrote:
>>>>>
>>>>>> Related to workspace directory growth, +Udi Meiri <eh...@google.com> filed
>>>>>> a relevant issue previously (
>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for cleaning up
>>>>>> workspace directory after successful jobs. Alternatively, we can consider
>>>>>> periodically cleaning up the /src directories.
>>>>>>
>>>>>> I would suggest moving the cron task from internal cron scripts to
>>>>>> the inventory job (
>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
>>>>>> That way, we can see all the cron jobs as part of the source tree, adjust
>>>>>> frequencies and clean up codes with PRs. I do not know how internal cron
>>>>>> scripts are created, maintained, and how would they be recreated for new
>>>>>> worker instances.
>>>>>>
>>>>>> /cc +Tyson Hamilton <ty...@google.com>
>>>>>>
>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>>>>>> damian.gadomski@polidea.com> wrote:
>>>>>>
>>>>>>> Hey,
>>>>>>>
>>>>>>> I've recently created a solution for the growing /tmp directory.
>>>>>>> Part of it is the job mentioned by Tyson: *beam_Clean_tmp_directory*.
>>>>>>> It's intentionally not triggered by cron and should be a last resort
>>>>>>> solution for some strange cases.
>>>>>>>
>>>>>>> Along with that job, I've also updated every worker with an internal
>>>>>>> cron script. It's being executed once a week and deletes all the files (and
>>>>>>> only files) that were not accessed for at least three days. That's designed
>>>>>>> to be as safe as possible for the running jobs on the worker (not to delete
>>>>>>> the files that are still in use), and also to be insensitive to the current
>>>>>>> workload on the machine. The cleanup will always happen, even if some
>>>>>>> long-running/stuck jobs are blocking the machine.
>>>>>>>
>>>>>>> I also think that currently the "No space left" errors may be a
>>>>>>> consequence of growing workspace directory rather than /tmp. I didn't do
>>>>>>> any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the
>>>>>>> workspace directory size is 158 GB while /tmp is only 16 GB. We should
>>>>>>> either guarantee the disk size to hold workspaces for all jobs (because
>>>>>>> eventually, every worker will execute each job) or clear also the
>>>>>>> workspaces in some way.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Damian
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <mx...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> +1 for scheduling it via a cron job if it won't lead to test
>>>>>>>> failures
>>>>>>>> while running. Not a Jenkins expert but maybe there is the notion
>>>>>>>> of
>>>>>>>> running exclusively while no other tasks are running?
>>>>>>>>
>>>>>>>> -Max
>>>>>>>>
>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>>>>>> > FYI there was a job introduced to do this in Jenkins:
>>>>>>>> beam_Clean_tmp_directory
>>>>>>>> >
>>>>>>>> > Currently it needs to be run manually. I'm seeing some out of
>>>>>>>> disk related errors in precommit tests currently, perhaps we should
>>>>>>>> schedule this job with cron?
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <he...@google.com> wrote:
>>>>>>>> >> Still seeing no space left on device errors on jenkins-7 (for
>>>>>>>> example:
>>>>>>>> >>
>>>>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>>>>>>>> )
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <am...@google.com>
>>>>>>>> wrote:
>>>>>>>> >>
>>>>>>>> >>> Did a one time cleanup of tmp files owned by jenkins older than
>>>>>>>> 3 days.
>>>>>>>> >>> Agree that we need a longer term solution.
>>>>>>>> >>>
>>>>>>>> >>> Passing recent tests on all executors except jenkins-12, which
>>>>>>>> has not
>>>>>>>> >>> scheduled recent builds for the past 13 days. Not scheduling:
>>>>>>>> >>>
>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>>>>>> >>> <
>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>>>>>> >
>>>>>>>> >>> Recent passing builds:
>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>>>>>> >>> <
>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>>>>>> >
>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>>>>>> >>> <
>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>>>>>> >
>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>>>>>> >>> <
>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>>>>>> >
>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>>>>>> >>> <
>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>>>>>> >
>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>>>>>> >>> <
>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>>>>>> >
>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>>>>>> >>> <
>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>>>>>> >
>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>>>>>> >>> <
>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>>>>>> >
>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>>>>>> >>> <
>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>>>>>> >
>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>>>>>> >>> <
>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>>>>>> >
>>>>>>>> >>>
>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>>>>>> >>> <
>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>>>>>> >
>>>>>>>> >>>
>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>>>>>> >>> <
>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>>>>>> >
>>>>>>>> >>>
>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>>>>>> >>> <
>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>>>>>> >
>>>>>>>> >>>
>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>>>>>> >>> <
>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>>>>>> >
>>>>>>>> >>>
>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>>>>>> >>> <
>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>>>>>> >
>>>>>>>> >>>
>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>>>>>> >>> <
>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>>>>>> >
>>>>>>>> >>>
>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <al...@google.com>
>>>>>>>> wrote:
>>>>>>>> >>>
>>>>>>>> >>>> +Alan Myrvold <am...@google.com> is doing a one time
>>>>>>>> cleanup. I agree
>>>>>>>> >>>> that we need to have a solution to automate this task or
>>>>>>>> address the root
>>>>>>>> >>>> cause of the buildup.
>>>>>>>> >>>>
>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>>>>>>>> michal.walenia@polidea.com>
>>>>>>>> >>>> wrote:
>>>>>>>> >>>>
>>>>>>>> >>>>> Hi there,
>>>>>>>> >>>>> it seems we have a problem with Jenkins workers again. Nodes
>>>>>>>> 1 and 7
>>>>>>>> >>>>> both fail jobs with "No space left on device".
>>>>>>>> >>>>> Who is the best person to contact in these cases (someone
>>>>>>>> with access
>>>>>>>> >>>>> permissions to the workers).
>>>>>>>> >>>>>
>>>>>>>> >>>>> I also noticed that such errors are becoming more and more
>>>>>>>> frequent
>>>>>>>> >>>>> recently and I'd like to discuss how can this be remedied.
>>>>>>>> Can a cleanup
>>>>>>>> >>>>> task be automated on Jenkins somehow?
>>>>>>>> >>>>>
>>>>>>>> >>>>> Regards
>>>>>>>> >>>>> Michal
>>>>>>>> >>>>>
>>>>>>>> >>>>> --
>>>>>>>> >>>>>
>>>>>>>> >>>>> Michał Walenia
>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software Engineer
>>>>>>>> >>>>>
>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <+48791432002
>>>>>>>> <+48%20791%20432%20002>>
>>>>>>>> >>>>> E: michal.walenia@polidea.com
>>>>>>>> >>>>>
>>>>>>>> >>>>> Unique Tech
>>>>>>>> >>>>> Check out our projects! <https://www.polidea.com/our-work>
>>>>>>>> >>>>>
>>>>>>>> >>>>
>>>>>>>> >>
>>>>>>>>
>>>>>>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Tyson Hamilton <ty...@google.com>.
Ah I see, thanks Kenn. I found some advice from the Apache infra wiki that
also suggests using a tmpdir inside the workspace [1]:

Procedures Projects can take to clean up disk space

Projects can help themselves and Infra by taking some basic steps to help
clean up their jobs after themselves on the build nodes.



   1. Use a ./tmp dir in your jobs workspace. That way it gets cleaned up
   when job workspaces expire.
   2. Configure your jobs to wipe workspaces on start or finish.
   3. Configure your jobs to only keep 5 or 10 previous builds.
   4. Configure your jobs to only keep 5 or 10 previous artifacts.



[1]:
https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes

On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <ke...@apache.org> wrote:

> Those file listings look like the result of using standard temp file APIs
> but with TMPDIR set to /tmp.
>
> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <ty...@google.com> wrote:
>
>> Jobs are hermetic as far as I can tell and use unique subdirectories
>> inside of /tmp. Here is a quick look into two examples:
>>
>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
>> head -n 20
>> 1.6G    2020-07-21 02:25        .
>> 242M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4
>> 242M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT
>> 242M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME
>> 242M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB
>> 242M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q
>> 242M    2020-07-17 18:35        ./beam-pipeline-temp79qot2
>> 236M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>> 236M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>> 236M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>> 236M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>> 236M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q/tmpWy1vWX
>> 236M    2020-07-17 18:35        ./beam-pipeline-temp79qot2/tmpvN7vWA
>> 3.7M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4/tmprlh_di
>> 3.7M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>> 3.7M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>> 3.7M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>> 3.7M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q/tmptYF1v1
>> 3.7M    2020-07-17 18:35        ./beam-pipeline-temp79qot2/tmplfV0Rg
>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
>>
>>
>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
>> head -n 20
>> 817M    2020-07-21 02:26        .
>> 242M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM
>> 242M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3
>> 242M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq
>> 236M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpstXoL0
>> 236M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmpnnVn65
>> 236M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpRF0iNs
>> 3.7M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>> 3.7M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmpsmmzqe
>> 3.7M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>> 2.0M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpoj3orz
>> 2.0M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmptng9sZ
>> 2.0M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpWp6njc
>> 1.2M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmphgdj35
>> 1.2M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmp8ySXpm
>> 1.2M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>> 992K    2020-07-12 12:00        ./junit642086915811430564
>> 988K    2020-07-12 12:00        ./junit642086915811430564/beam
>> 984K    2020-07-12 12:00        ./junit642086915811430564/beam/nodes
>> 980K    2020-07-12 12:00        ./junit642086915811430564/beam/nodes/0
>>
>>
>>
>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <eh...@google.com> wrote:
>>
>>> You're right, job workspaces should be hermetic.
>>>
>>>
>>>
>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <ke...@apache.org> wrote:
>>>
>>>> I'm probably late to this discussion and missing something, but why are
>>>> we writing to /tmp at all? I would expect TMPDIR to point somewhere inside
>>>> the job directory that will be wiped by Jenkins, and I would expect code to
>>>> always create temp files via APIs that respect this. Is Jenkins not
>>>> cleaning up? Do we not have the ability to set this up? Do we have bugs in
>>>> our code (that we could probably find by setting TMPDIR to somewhere
>>>> not-/tmp and running the tests without write permission to /tmp, etc)
>>>>
>>>> Kenn
>>>>
>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <al...@google.com> wrote:
>>>>
>>>>> Related to workspace directory growth, +Udi Meiri <eh...@google.com> filed
>>>>> a relevant issue previously (
>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for cleaning up
>>>>> workspace directory after successful jobs. Alternatively, we can consider
>>>>> periodically cleaning up the /src directories.
>>>>>
>>>>> I would suggest moving the cron task from internal cron scripts to the
>>>>> inventory job (
>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
>>>>> That way, we can see all the cron jobs as part of the source tree, adjust
>>>>> frequencies and clean up codes with PRs. I do not know how internal cron
>>>>> scripts are created, maintained, and how would they be recreated for new
>>>>> worker instances.
>>>>>
>>>>> /cc +Tyson Hamilton <ty...@google.com>
>>>>>
>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>>>>> damian.gadomski@polidea.com> wrote:
>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> I've recently created a solution for the growing /tmp directory. Part
>>>>>> of it is the job mentioned by Tyson: *beam_Clean_tmp_directory*.
>>>>>> It's intentionally not triggered by cron and should be a last resort
>>>>>> solution for some strange cases.
>>>>>>
>>>>>> Along with that job, I've also updated every worker with an internal
>>>>>> cron script. It's being executed once a week and deletes all the files (and
>>>>>> only files) that were not accessed for at least three days. That's designed
>>>>>> to be as safe as possible for the running jobs on the worker (not to delete
>>>>>> the files that are still in use), and also to be insensitive to the current
>>>>>> workload on the machine. The cleanup will always happen, even if some
>>>>>> long-running/stuck jobs are blocking the machine.
>>>>>>
>>>>>> I also think that currently the "No space left" errors may be a
>>>>>> consequence of growing workspace directory rather than /tmp. I didn't do
>>>>>> any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the
>>>>>> workspace directory size is 158 GB while /tmp is only 16 GB. We should
>>>>>> either guarantee the disk size to hold workspaces for all jobs (because
>>>>>> eventually, every worker will execute each job) or clear also the
>>>>>> workspaces in some way.
>>>>>>
>>>>>> Regards,
>>>>>> Damian
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <mx...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> +1 for scheduling it via a cron job if it won't lead to test
>>>>>>> failures
>>>>>>> while running. Not a Jenkins expert but maybe there is the notion of
>>>>>>> running exclusively while no other tasks are running?
>>>>>>>
>>>>>>> -Max
>>>>>>>
>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>>>>> > FYI there was a job introduced to do this in Jenkins:
>>>>>>> beam_Clean_tmp_directory
>>>>>>> >
>>>>>>> > Currently it needs to be run manually. I'm seeing some out of disk
>>>>>>> related errors in precommit tests currently, perhaps we should schedule
>>>>>>> this job with cron?
>>>>>>> >
>>>>>>> >
>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <he...@google.com> wrote:
>>>>>>> >> Still seeing no space left on device errors on jenkins-7 (for
>>>>>>> example:
>>>>>>> >>
>>>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>>>>>>> )
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <am...@google.com>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >>> Did a one time cleanup of tmp files owned by jenkins older than
>>>>>>> 3 days.
>>>>>>> >>> Agree that we need a longer term solution.
>>>>>>> >>>
>>>>>>> >>> Passing recent tests on all executors except jenkins-12, which
>>>>>>> has not
>>>>>>> >>> scheduled recent builds for the past 13 days. Not scheduling:
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>>>>> >
>>>>>>> >>> Recent passing builds:
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>>>>> >
>>>>>>> >>>
>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <al...@google.com>
>>>>>>> wrote:
>>>>>>> >>>
>>>>>>> >>>> +Alan Myrvold <am...@google.com> is doing a one time
>>>>>>> cleanup. I agree
>>>>>>> >>>> that we need to have a solution to automate this task or
>>>>>>> address the root
>>>>>>> >>>> cause of the buildup.
>>>>>>> >>>>
>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>>>>>>> michal.walenia@polidea.com>
>>>>>>> >>>> wrote:
>>>>>>> >>>>
>>>>>>> >>>>> Hi there,
>>>>>>> >>>>> it seems we have a problem with Jenkins workers again. Nodes 1
>>>>>>> and 7
>>>>>>> >>>>> both fail jobs with "No space left on device".
>>>>>>> >>>>> Who is the best person to contact in these cases (someone with
>>>>>>> access
>>>>>>> >>>>> permissions to the workers).
>>>>>>> >>>>>
>>>>>>> >>>>> I also noticed that such errors are becoming more and more
>>>>>>> frequent
>>>>>>> >>>>> recently and I'd like to discuss how can this be remedied. Can
>>>>>>> a cleanup
>>>>>>> >>>>> task be automated on Jenkins somehow?
>>>>>>> >>>>>
>>>>>>> >>>>> Regards
>>>>>>> >>>>> Michal
>>>>>>> >>>>>
>>>>>>> >>>>> --
>>>>>>> >>>>>
>>>>>>> >>>>> Michał Walenia
>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software Engineer
>>>>>>> >>>>>
>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <+48791432002
>>>>>>> <+48%20791%20432%20002>>
>>>>>>> >>>>> E: michal.walenia@polidea.com
>>>>>>> >>>>>
>>>>>>> >>>>> Unique Tech
>>>>>>> >>>>> Check out our projects! <https://www.polidea.com/our-work>
>>>>>>> >>>>>
>>>>>>> >>>>
>>>>>>> >>
>>>>>>>
>>>>>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Kenneth Knowles <ke...@apache.org>.
Those file listings look like the result of using standard temp file APIs
but with TMPDIR set to /tmp.

On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <ty...@google.com> wrote:

> Jobs are hermetic as far as I can tell and use unique subdirectories
> inside of /tmp. Here is a quick look into two examples:
>
> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
> head -n 20
> 1.6G    2020-07-21 02:25        .
> 242M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4
> 242M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT
> 242M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME
> 242M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB
> 242M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q
> 242M    2020-07-17 18:35        ./beam-pipeline-temp79qot2
> 236M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
> 236M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
> 236M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME/tmpxSm8pX
> 236M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB/tmpMZJU76
> 236M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q/tmpWy1vWX
> 236M    2020-07-17 18:35        ./beam-pipeline-temp79qot2/tmpvN7vWA
> 3.7M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4/tmprlh_di
> 3.7M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT/tmpLmVWfe
> 3.7M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME/tmpvrxbY7
> 3.7M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
> 3.7M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q/tmptYF1v1
> 3.7M    2020-07-17 18:35        ./beam-pipeline-temp79qot2/tmplfV0Rg
> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
>
>
> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
> head -n 20
> 817M    2020-07-21 02:26        .
> 242M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM
> 242M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3
> 242M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq
> 236M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpstXoL0
> 236M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmpnnVn65
> 236M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpRF0iNs
> 3.7M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
> 3.7M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmpsmmzqe
> 3.7M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
> 2.0M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpoj3orz
> 2.0M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmptng9sZ
> 2.0M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpWp6njc
> 1.2M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmphgdj35
> 1.2M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmp8ySXpm
> 1.2M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
> 992K    2020-07-12 12:00        ./junit642086915811430564
> 988K    2020-07-12 12:00        ./junit642086915811430564/beam
> 984K    2020-07-12 12:00        ./junit642086915811430564/beam/nodes
> 980K    2020-07-12 12:00        ./junit642086915811430564/beam/nodes/0
>
>
>
> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <eh...@google.com> wrote:
>
>> You're right, job workspaces should be hermetic.
>>
>>
>>
>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <ke...@apache.org> wrote:
>>
>>> I'm probably late to this discussion and missing something, but why are
>>> we writing to /tmp at all? I would expect TMPDIR to point somewhere inside
>>> the job directory that will be wiped by Jenkins, and I would expect code to
>>> always create temp files via APIs that respect this. Is Jenkins not
>>> cleaning up? Do we not have the ability to set this up? Do we have bugs in
>>> our code (that we could probably find by setting TMPDIR to somewhere
>>> not-/tmp and running the tests without write permission to /tmp, etc)
>>>
>>> Kenn
>>>
>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <al...@google.com> wrote:
>>>
>>>> Related to workspace directory growth, +Udi Meiri <eh...@google.com> filed
>>>> a relevant issue previously (
>>>> https://issues.apache.org/jira/browse/BEAM-9865) for cleaning up
>>>> workspace directory after successful jobs. Alternatively, we can consider
>>>> periodically cleaning up the /src directories.
>>>>
>>>> I would suggest moving the cron task from internal cron scripts to the
>>>> inventory job (
>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
>>>> That way, we can see all the cron jobs as part of the source tree, adjust
>>>> frequencies and clean up codes with PRs. I do not know how internal cron
>>>> scripts are created, maintained, and how would they be recreated for new
>>>> worker instances.
>>>>
>>>> /cc +Tyson Hamilton <ty...@google.com>
>>>>
>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>>>> damian.gadomski@polidea.com> wrote:
>>>>
>>>>> Hey,
>>>>>
>>>>> I've recently created a solution for the growing /tmp directory. Part
>>>>> of it is the job mentioned by Tyson: *beam_Clean_tmp_directory*. It's
>>>>> intentionally not triggered by cron and should be a last resort solution
>>>>> for some strange cases.
>>>>>
>>>>> Along with that job, I've also updated every worker with an internal
>>>>> cron script. It's being executed once a week and deletes all the files (and
>>>>> only files) that were not accessed for at least three days. That's designed
>>>>> to be as safe as possible for the running jobs on the worker (not to delete
>>>>> the files that are still in use), and also to be insensitive to the current
>>>>> workload on the machine. The cleanup will always happen, even if some
>>>>> long-running/stuck jobs are blocking the machine.
>>>>>
>>>>> I also think that currently the "No space left" errors may be a
>>>>> consequence of growing workspace directory rather than /tmp. I didn't do
>>>>> any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the
>>>>> workspace directory size is 158 GB while /tmp is only 16 GB. We should
>>>>> either guarantee the disk size to hold workspaces for all jobs (because
>>>>> eventually, every worker will execute each job) or clear also the
>>>>> workspaces in some way.
>>>>>
>>>>> Regards,
>>>>> Damian
>>>>>
>>>>>
>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <mx...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> +1 for scheduling it via a cron job if it won't lead to test failures
>>>>>> while running. Not a Jenkins expert but maybe there is the notion of
>>>>>> running exclusively while no other tasks are running?
>>>>>>
>>>>>> -Max
>>>>>>
>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>>>> > FYI there was a job introduced to do this in Jenkins:
>>>>>> beam_Clean_tmp_directory
>>>>>> >
>>>>>> > Currently it needs to be run manually. I'm seeing some out of disk
>>>>>> related errors in precommit tests currently, perhaps we should schedule
>>>>>> this job with cron?
>>>>>> >
>>>>>> >
>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <he...@google.com> wrote:
>>>>>> >> Still seeing no space left on device errors on jenkins-7 (for
>>>>>> example:
>>>>>> >>
>>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/)
>>>>>> >>
>>>>>> >>
>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <am...@google.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >>> Did a one time cleanup of tmp files owned by jenkins older than 3
>>>>>> days.
>>>>>> >>> Agree that we need a longer term solution.
>>>>>> >>>
>>>>>> >>> Passing recent tests on all executors except jenkins-12, which
>>>>>> has not
>>>>>> >>> scheduled recent builds for the past 13 days. Not scheduling:
>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>>>> >>> <
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>>>> >
>>>>>> >>> Recent passing builds:
>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>>>> >>> <
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>>>> >
>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>>>> >>> <
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>>>> >
>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>>>> >>> <
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>>>> >
>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>>>> >>> <
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>>>> >
>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>>>> >>> <
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>>>> >
>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>>>> >>> <
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>>>> >
>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>>>> >>> <
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>>>> >
>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>>>> >>> <
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>>>> >
>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>>>> >>> <
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>>>> >
>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>>>> >>> <
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>>>> >
>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>>>> >>> <
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>>>> >
>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>>>> >>> <
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>>>> >
>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>>>> >>> <
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>>>> >
>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>>>> >>> <
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>>>> >
>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>>>> >>> <
>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>>>> >
>>>>>> >>>
>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <al...@google.com>
>>>>>> wrote:
>>>>>> >>>
>>>>>> >>>> +Alan Myrvold <am...@google.com> is doing a one time
>>>>>> cleanup. I agree
>>>>>> >>>> that we need to have a solution to automate this task or address
>>>>>> the root
>>>>>> >>>> cause of the buildup.
>>>>>> >>>>
>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>>>>>> michal.walenia@polidea.com>
>>>>>> >>>> wrote:
>>>>>> >>>>
>>>>>> >>>>> Hi there,
>>>>>> >>>>> it seems we have a problem with Jenkins workers again. Nodes 1
>>>>>> and 7
>>>>>> >>>>> both fail jobs with "No space left on device".
>>>>>> >>>>> Who is the best person to contact in these cases (someone with
>>>>>> access
>>>>>> >>>>> permissions to the workers).
>>>>>> >>>>>
>>>>>> >>>>> I also noticed that such errors are becoming more and more
>>>>>> frequent
>>>>>> >>>>> recently and I'd like to discuss how can this be remedied. Can
>>>>>> a cleanup
>>>>>> >>>>> task be automated on Jenkins somehow?
>>>>>> >>>>>
>>>>>> >>>>> Regards
>>>>>> >>>>> Michal
>>>>>> >>>>>
>>>>>> >>>>> --
>>>>>> >>>>>
>>>>>> >>>>> Michał Walenia
>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software Engineer
>>>>>> >>>>>
>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <+48791432002
>>>>>> <+48%20791%20432%20002>>
>>>>>> >>>>> E: michal.walenia@polidea.com
>>>>>> >>>>>
>>>>>> >>>>> Unique Tech
>>>>>> >>>>> Check out our projects! <https://www.polidea.com/our-work>
>>>>>> >>>>>
>>>>>> >>>>
>>>>>> >>
>>>>>>
>>>>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Tyson Hamilton <ty...@google.com>.
Jobs are hermetic as far as I can tell and use unique subdirectories inside
of /tmp. Here is a quick look into two examples:

@apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort -rhk 1,1 | head
-n 20
1.6G    2020-07-21 02:25        .
242M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4
242M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT
242M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME
242M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB
242M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q
242M    2020-07-17 18:35        ./beam-pipeline-temp79qot2
236M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
236M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
236M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME/tmpxSm8pX
236M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB/tmpMZJU76
236M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q/tmpWy1vWX
236M    2020-07-17 18:35        ./beam-pipeline-temp79qot2/tmpvN7vWA
3.7M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4/tmprlh_di
3.7M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT/tmpLmVWfe
3.7M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME/tmpvrxbY7
3.7M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
3.7M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q/tmptYF1v1
3.7M    2020-07-17 18:35        ./beam-pipeline-temp79qot2/tmplfV0Rg
2.7M    2020-07-17 20:10        ./pip-install-q9l227ef


@apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
head -n 20
817M    2020-07-21 02:26        .
242M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM
242M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3
242M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq
236M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpstXoL0
236M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmpnnVn65
236M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpRF0iNs
3.7M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
3.7M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmpsmmzqe
3.7M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
2.0M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpoj3orz
2.0M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmptng9sZ
2.0M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpWp6njc
1.2M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmphgdj35
1.2M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmp8ySXpm
1.2M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
992K    2020-07-12 12:00        ./junit642086915811430564
988K    2020-07-12 12:00        ./junit642086915811430564/beam
984K    2020-07-12 12:00        ./junit642086915811430564/beam/nodes
980K    2020-07-12 12:00        ./junit642086915811430564/beam/nodes/0



On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <eh...@google.com> wrote:

> You're right, job workspaces should be hermetic.
>
>
>
> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <ke...@apache.org> wrote:
>
>> I'm probably late to this discussion and missing something, but why are
>> we writing to /tmp at all? I would expect TMPDIR to point somewhere inside
>> the job directory that will be wiped by Jenkins, and I would expect code to
>> always create temp files via APIs that respect this. Is Jenkins not
>> cleaning up? Do we not have the ability to set this up? Do we have bugs in
>> our code (that we could probably find by setting TMPDIR to somewhere
>> not-/tmp and running the tests without write permission to /tmp, etc)
>>
>> Kenn
>>
>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <al...@google.com> wrote:
>>
>>> Related to workspace directory growth, +Udi Meiri <eh...@google.com> filed
>>> a relevant issue previously (
>>> https://issues.apache.org/jira/browse/BEAM-9865) for cleaning up
>>> workspace directory after successful jobs. Alternatively, we can consider
>>> periodically cleaning up the /src directories.
>>>
>>> I would suggest moving the cron task from internal cron scripts to the
>>> inventory job (
>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
>>> That way, we can see all the cron jobs as part of the source tree, adjust
>>> frequencies and clean up codes with PRs. I do not know how internal cron
>>> scripts are created, maintained, and how would they be recreated for new
>>> worker instances.
>>>
>>> /cc +Tyson Hamilton <ty...@google.com>
>>>
>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>>> damian.gadomski@polidea.com> wrote:
>>>
>>>> Hey,
>>>>
>>>> I've recently created a solution for the growing /tmp directory. Part
>>>> of it is the job mentioned by Tyson: *beam_Clean_tmp_directory*. It's
>>>> intentionally not triggered by cron and should be a last resort solution
>>>> for some strange cases.
>>>>
>>>> Along with that job, I've also updated every worker with an internal
>>>> cron script. It's being executed once a week and deletes all the files (and
>>>> only files) that were not accessed for at least three days. That's designed
>>>> to be as safe as possible for the running jobs on the worker (not to delete
>>>> the files that are still in use), and also to be insensitive to the current
>>>> workload on the machine. The cleanup will always happen, even if some
>>>> long-running/stuck jobs are blocking the machine.
>>>>
>>>> I also think that currently the "No space left" errors may be a
>>>> consequence of growing workspace directory rather than /tmp. I didn't do
>>>> any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the
>>>> workspace directory size is 158 GB while /tmp is only 16 GB. We should
>>>> either guarantee the disk size to hold workspaces for all jobs (because
>>>> eventually, every worker will execute each job) or clear also the
>>>> workspaces in some way.
>>>>
>>>> Regards,
>>>> Damian
>>>>
>>>>
>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <mx...@apache.org>
>>>> wrote:
>>>>
>>>>> +1 for scheduling it via a cron job if it won't lead to test failures
>>>>> while running. Not a Jenkins expert but maybe there is the notion of
>>>>> running exclusively while no other tasks are running?
>>>>>
>>>>> -Max
>>>>>
>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>>> > FYI there was a job introduced to do this in Jenkins:
>>>>> beam_Clean_tmp_directory
>>>>> >
>>>>> > Currently it needs to be run manually. I'm seeing some out of disk
>>>>> related errors in precommit tests currently, perhaps we should schedule
>>>>> this job with cron?
>>>>> >
>>>>> >
>>>>> > On 2020/03/11 19:31:13, Heejong Lee <he...@google.com> wrote:
>>>>> >> Still seeing no space left on device errors on jenkins-7 (for
>>>>> example:
>>>>> >>
>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/)
>>>>> >>
>>>>> >>
>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <am...@google.com>
>>>>> wrote:
>>>>> >>
>>>>> >>> Did a one time cleanup of tmp files owned by jenkins older than 3
>>>>> days.
>>>>> >>> Agree that we need a longer term solution.
>>>>> >>>
>>>>> >>> Passing recent tests on all executors except jenkins-12, which has
>>>>> not
>>>>> >>> scheduled recent builds for the past 13 days. Not scheduling:
>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>>> >>> <
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>>> >
>>>>> >>> Recent passing builds:
>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>>> >>> <
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>>> >
>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>>> >>> <
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>>> >
>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>>> >>> <
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>>> >
>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>>> >>> <
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>>> >
>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>>> >>> <
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>>> >
>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>>> >>> <
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>>> >
>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>>> >>> <
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>>> >
>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>>> >>> <
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>>> >
>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>>> >>> <
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>>> >
>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>>> >>> <
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>>> >
>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>>> >>> <
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>>> >
>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>>> >>> <
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>>> >
>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>>> >>> <
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>>> >
>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>>> >>> <
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>>> >
>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>>> >>> <
>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>>> >
>>>>> >>>
>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <al...@google.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>>> +Alan Myrvold <am...@google.com> is doing a one time cleanup.
>>>>> I agree
>>>>> >>>> that we need to have a solution to automate this task or address
>>>>> the root
>>>>> >>>> cause of the buildup.
>>>>> >>>>
>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>>>>> michal.walenia@polidea.com>
>>>>> >>>> wrote:
>>>>> >>>>
>>>>> >>>>> Hi there,
>>>>> >>>>> it seems we have a problem with Jenkins workers again. Nodes 1
>>>>> and 7
>>>>> >>>>> both fail jobs with "No space left on device".
>>>>> >>>>> Who is the best person to contact in these cases (someone with
>>>>> access
>>>>> >>>>> permissions to the workers).
>>>>> >>>>>
>>>>> >>>>> I also noticed that such errors are becoming more and more
>>>>> frequent
>>>>> >>>>> recently and I'd like to discuss how can this be remedied. Can a
>>>>> cleanup
>>>>> >>>>> task be automated on Jenkins somehow?
>>>>> >>>>>
>>>>> >>>>> Regards
>>>>> >>>>> Michal
>>>>> >>>>>
>>>>> >>>>> --
>>>>> >>>>>
>>>>> >>>>> Michał Walenia
>>>>> >>>>> Polidea <https://www.polidea.com/> | Software Engineer
>>>>> >>>>>
>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <+48791432002
>>>>> <+48%20791%20432%20002>>
>>>>> >>>>> E: michal.walenia@polidea.com
>>>>> >>>>>
>>>>> >>>>> Unique Tech
>>>>> >>>>> Check out our projects! <https://www.polidea.com/our-work>
>>>>> >>>>>
>>>>> >>>>
>>>>> >>
>>>>>
>>>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Udi Meiri <eh...@google.com>.
You're right, job workspaces should be hermetic.



On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <ke...@apache.org> wrote:

> I'm probably late to this discussion and missing something, but why are we
> writing to /tmp at all? I would expect TMPDIR to point somewhere inside the
> job directory that will be wiped by Jenkins, and I would expect code to
> always create temp files via APIs that respect this. Is Jenkins not
> cleaning up? Do we not have the ability to set this up? Do we have bugs in
> our code (that we could probably find by setting TMPDIR to somewhere
> not-/tmp and running the tests without write permission to /tmp, etc)
>
> Kenn
>
> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <al...@google.com> wrote:
>
>> Related to workspace directory growth, +Udi Meiri <eh...@google.com> filed
>> a relevant issue previously (
>> https://issues.apache.org/jira/browse/BEAM-9865) for cleaning up
>> workspace directory after successful jobs. Alternatively, we can consider
>> periodically cleaning up the /src directories.
>>
>> I would suggest moving the cron task from internal cron scripts to the
>> inventory job (
>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
>> That way, we can see all the cron jobs as part of the source tree, adjust
>> frequencies and clean up codes with PRs. I do not know how internal cron
>> scripts are created, maintained, and how would they be recreated for new
>> worker instances.
>>
>> /cc +Tyson Hamilton <ty...@google.com>
>>
>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>> damian.gadomski@polidea.com> wrote:
>>
>>> Hey,
>>>
>>> I've recently created a solution for the growing /tmp directory. Part of
>>> it is the job mentioned by Tyson: *beam_Clean_tmp_directory*. It's
>>> intentionally not triggered by cron and should be a last resort solution
>>> for some strange cases.
>>>
>>> Along with that job, I've also updated every worker with an internal
>>> cron script. It's being executed once a week and deletes all the files (and
>>> only files) that were not accessed for at least three days. That's designed
>>> to be as safe as possible for the running jobs on the worker (not to delete
>>> the files that are still in use), and also to be insensitive to the current
>>> workload on the machine. The cleanup will always happen, even if some
>>> long-running/stuck jobs are blocking the machine.
>>>
>>> I also think that currently the "No space left" errors may be a
>>> consequence of growing workspace directory rather than /tmp. I didn't do
>>> any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the
>>> workspace directory size is 158 GB while /tmp is only 16 GB. We should
>>> either guarantee the disk size to hold workspaces for all jobs (because
>>> eventually, every worker will execute each job) or clear also the
>>> workspaces in some way.
>>>
>>> Regards,
>>> Damian
>>>
>>>
>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <mx...@apache.org>
>>> wrote:
>>>
>>>> +1 for scheduling it via a cron job if it won't lead to test failures
>>>> while running. Not a Jenkins expert but maybe there is the notion of
>>>> running exclusively while no other tasks are running?
>>>>
>>>> -Max
>>>>
>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>> > FYI there was a job introduced to do this in Jenkins:
>>>> beam_Clean_tmp_directory
>>>> >
>>>> > Currently it needs to be run manually. I'm seeing some out of disk
>>>> related errors in precommit tests currently, perhaps we should schedule
>>>> this job with cron?
>>>> >
>>>> >
>>>> > On 2020/03/11 19:31:13, Heejong Lee <he...@google.com> wrote:
>>>> >> Still seeing no space left on device errors on jenkins-7 (for
>>>> example:
>>>> >> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>>>> )
>>>> >>
>>>> >>
>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <am...@google.com>
>>>> wrote:
>>>> >>
>>>> >>> Did a one time cleanup of tmp files owned by jenkins older than 3
>>>> days.
>>>> >>> Agree that we need a longer term solution.
>>>> >>>
>>>> >>> Passing recent tests on all executors except jenkins-12, which has
>>>> not
>>>> >>> scheduled recent builds for the past 13 days. Not scheduling:
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>> >
>>>> >>> Recent passing builds:
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>> >
>>>> >>>
>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <al...@google.com>
>>>> wrote:
>>>> >>>
>>>> >>>> +Alan Myrvold <am...@google.com> is doing a one time cleanup.
>>>> I agree
>>>> >>>> that we need to have a solution to automate this task or address
>>>> the root
>>>> >>>> cause of the buildup.
>>>> >>>>
>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>>>> michal.walenia@polidea.com>
>>>> >>>> wrote:
>>>> >>>>
>>>> >>>>> Hi there,
>>>> >>>>> it seems we have a problem with Jenkins workers again. Nodes 1
>>>> and 7
>>>> >>>>> both fail jobs with "No space left on device".
>>>> >>>>> Who is the best person to contact in these cases (someone with
>>>> access
>>>> >>>>> permissions to the workers).
>>>> >>>>>
>>>> >>>>> I also noticed that such errors are becoming more and more
>>>> frequent
>>>> >>>>> recently and I'd like to discuss how can this be remedied. Can a
>>>> cleanup
>>>> >>>>> task be automated on Jenkins somehow?
>>>> >>>>>
>>>> >>>>> Regards
>>>> >>>>> Michal
>>>> >>>>>
>>>> >>>>> --
>>>> >>>>>
>>>> >>>>> Michał Walenia
>>>> >>>>> Polidea <https://www.polidea.com/> | Software Engineer
>>>> >>>>>
>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <+48791432002
>>>> <+48%20791%20432%20002>>
>>>> >>>>> E: michal.walenia@polidea.com
>>>> >>>>>
>>>> >>>>> Unique Tech
>>>> >>>>> Check out our projects! <https://www.polidea.com/our-work>
>>>> >>>>>
>>>> >>>>
>>>> >>
>>>>
>>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Kenneth Knowles <ke...@apache.org>.
I'm probably late to this discussion and missing something, but why are we
writing to /tmp at all? I would expect TMPDIR to point somewhere inside the
job directory that will be wiped by Jenkins, and I would expect code to
always create temp files via APIs that respect this. Is Jenkins not
cleaning up? Do we not have the ability to set this up? Do we have bugs in
our code (that we could probably find by setting TMPDIR to somewhere
not-/tmp and running the tests without write permission to /tmp, etc)

Kenn

On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <al...@google.com> wrote:

> Related to workspace directory growth, +Udi Meiri <eh...@google.com> filed
> a relevant issue previously (
> https://issues.apache.org/jira/browse/BEAM-9865) for cleaning up
> workspace directory after successful jobs. Alternatively, we can consider
> periodically cleaning up the /src directories.
>
> I would suggest moving the cron task from internal cron scripts to the
> inventory job (
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
> That way, we can see all the cron jobs as part of the source tree, adjust
> frequencies and clean up codes with PRs. I do not know how internal cron
> scripts are created, maintained, and how would they be recreated for new
> worker instances.
>
> /cc +Tyson Hamilton <ty...@google.com>
>
> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
> damian.gadomski@polidea.com> wrote:
>
>> Hey,
>>
>> I've recently created a solution for the growing /tmp directory. Part of
>> it is the job mentioned by Tyson: *beam_Clean_tmp_directory*. It's
>> intentionally not triggered by cron and should be a last resort solution
>> for some strange cases.
>>
>> Along with that job, I've also updated every worker with an internal cron
>> script. It's being executed once a week and deletes all the files (and only
>> files) that were not accessed for at least three days. That's designed to
>> be as safe as possible for the running jobs on the worker (not to delete
>> the files that are still in use), and also to be insensitive to the current
>> workload on the machine. The cleanup will always happen, even if some
>> long-running/stuck jobs are blocking the machine.
>>
>> I also think that currently the "No space left" errors may be a
>> consequence of growing workspace directory rather than /tmp. I didn't do
>> any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the
>> workspace directory size is 158 GB while /tmp is only 16 GB. We should
>> either guarantee the disk size to hold workspaces for all jobs (because
>> eventually, every worker will execute each job) or clear also the
>> workspaces in some way.
>>
>> Regards,
>> Damian
>>
>>
>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <mx...@apache.org>
>> wrote:
>>
>>> +1 for scheduling it via a cron job if it won't lead to test failures
>>> while running. Not a Jenkins expert but maybe there is the notion of
>>> running exclusively while no other tasks are running?
>>>
>>> -Max
>>>
>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>> > FYI there was a job introduced to do this in Jenkins:
>>> beam_Clean_tmp_directory
>>> >
>>> > Currently it needs to be run manually. I'm seeing some out of disk
>>> related errors in precommit tests currently, perhaps we should schedule
>>> this job with cron?
>>> >
>>> >
>>> > On 2020/03/11 19:31:13, Heejong Lee <he...@google.com> wrote:
>>> >> Still seeing no space left on device errors on jenkins-7 (for example:
>>> >> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/)
>>> >>
>>> >>
>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <am...@google.com>
>>> wrote:
>>> >>
>>> >>> Did a one time cleanup of tmp files owned by jenkins older than 3
>>> days.
>>> >>> Agree that we need a longer term solution.
>>> >>>
>>> >>> Passing recent tests on all executors except jenkins-12, which has
>>> not
>>> >>> scheduled recent builds for the past 13 days. Not scheduling:
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>> >
>>> >>> Recent passing builds:
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>> >
>>> >>>
>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <al...@google.com>
>>> wrote:
>>> >>>
>>> >>>> +Alan Myrvold <am...@google.com> is doing a one time cleanup. I
>>> agree
>>> >>>> that we need to have a solution to automate this task or address
>>> the root
>>> >>>> cause of the buildup.
>>> >>>>
>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>>> michal.walenia@polidea.com>
>>> >>>> wrote:
>>> >>>>
>>> >>>>> Hi there,
>>> >>>>> it seems we have a problem with Jenkins workers again. Nodes 1 and
>>> 7
>>> >>>>> both fail jobs with "No space left on device".
>>> >>>>> Who is the best person to contact in these cases (someone with
>>> access
>>> >>>>> permissions to the workers).
>>> >>>>>
>>> >>>>> I also noticed that such errors are becoming more and more frequent
>>> >>>>> recently and I'd like to discuss how can this be remedied. Can a
>>> cleanup
>>> >>>>> task be automated on Jenkins somehow?
>>> >>>>>
>>> >>>>> Regards
>>> >>>>> Michal
>>> >>>>>
>>> >>>>> --
>>> >>>>>
>>> >>>>> Michał Walenia
>>> >>>>> Polidea <https://www.polidea.com/> | Software Engineer
>>> >>>>>
>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <+48791432002
>>> <+48%20791%20432%20002>>
>>> >>>>> E: michal.walenia@polidea.com
>>> >>>>>
>>> >>>>> Unique Tech
>>> >>>>> Check out our projects! <https://www.polidea.com/our-work>
>>> >>>>>
>>> >>>>
>>> >>
>>>
>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Ahmet Altay <al...@google.com>.
Related to workspace directory growth, +Udi Meiri <eh...@google.com> filed
a relevant issue previously (https://issues.apache.org/jira/browse/BEAM-9865)
for cleaning up workspace directory after successful jobs. Alternatively,
we can consider periodically cleaning up the /src directories.

I would suggest moving the cron task from internal cron scripts to the
inventory job (
https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
That way, we can see all the cron jobs as part of the source tree, adjust
frequencies and clean up codes with PRs. I do not know how internal cron
scripts are created, maintained, and how would they be recreated for new
worker instances.

/cc +Tyson Hamilton <ty...@google.com>

On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <da...@polidea.com>
wrote:

> Hey,
>
> I've recently created a solution for the growing /tmp directory. Part of
> it is the job mentioned by Tyson: *beam_Clean_tmp_directory*. It's
> intentionally not triggered by cron and should be a last resort solution
> for some strange cases.
>
> Along with that job, I've also updated every worker with an internal cron
> script. It's being executed once a week and deletes all the files (and only
> files) that were not accessed for at least three days. That's designed to
> be as safe as possible for the running jobs on the worker (not to delete
> the files that are still in use), and also to be insensitive to the current
> workload on the machine. The cleanup will always happen, even if some
> long-running/stuck jobs are blocking the machine.
>
> I also think that currently the "No space left" errors may be a
> consequence of growing workspace directory rather than /tmp. I didn't do
> any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the
> workspace directory size is 158 GB while /tmp is only 16 GB. We should
> either guarantee the disk size to hold workspaces for all jobs (because
> eventually, every worker will execute each job) or clear also the
> workspaces in some way.
>
> Regards,
> Damian
>
>
> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <mx...@apache.org>
> wrote:
>
>> +1 for scheduling it via a cron job if it won't lead to test failures
>> while running. Not a Jenkins expert but maybe there is the notion of
>> running exclusively while no other tasks are running?
>>
>> -Max
>>
>> On 17.07.20 21:49, Tyson Hamilton wrote:
>> > FYI there was a job introduced to do this in Jenkins:
>> beam_Clean_tmp_directory
>> >
>> > Currently it needs to be run manually. I'm seeing some out of disk
>> related errors in precommit tests currently, perhaps we should schedule
>> this job with cron?
>> >
>> >
>> > On 2020/03/11 19:31:13, Heejong Lee <he...@google.com> wrote:
>> >> Still seeing no space left on device errors on jenkins-7 (for example:
>> >> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/)
>> >>
>> >>
>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <am...@google.com>
>> wrote:
>> >>
>> >>> Did a one time cleanup of tmp files owned by jenkins older than 3
>> days.
>> >>> Agree that we need a longer term solution.
>> >>>
>> >>> Passing recent tests on all executors except jenkins-12, which has not
>> >>> scheduled recent builds for the past 13 days. Not scheduling:
>> >>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>> >>> <
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>> >
>> >>> Recent passing builds:
>> >>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>> >>> <
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>> >
>> >>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>> >>> <
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>> >
>> >>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>> >>> <
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>> >
>> >>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>> >>> <
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>> >
>> >>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>> >>> <
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>> >
>> >>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>> >>> <
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>> >
>> >>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>> >>> <
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>> >
>> >>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>> >>> <
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>> >
>> >>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>> >>> <
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>> >
>> >>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>> >>> <
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>> >
>> >>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>> >>> <
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>> >
>> >>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>> >>> <
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>> >
>> >>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>> >>> <
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>> >
>> >>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>> >>> <
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>> >
>> >>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>> >>> <
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>> >
>> >>>
>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <al...@google.com> wrote:
>> >>>
>> >>>> +Alan Myrvold <am...@google.com> is doing a one time cleanup. I
>> agree
>> >>>> that we need to have a solution to automate this task or address the
>> root
>> >>>> cause of the buildup.
>> >>>>
>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>> michal.walenia@polidea.com>
>> >>>> wrote:
>> >>>>
>> >>>>> Hi there,
>> >>>>> it seems we have a problem with Jenkins workers again. Nodes 1 and 7
>> >>>>> both fail jobs with "No space left on device".
>> >>>>> Who is the best person to contact in these cases (someone with
>> access
>> >>>>> permissions to the workers).
>> >>>>>
>> >>>>> I also noticed that such errors are becoming more and more frequent
>> >>>>> recently and I'd like to discuss how can this be remedied. Can a
>> cleanup
>> >>>>> task be automated on Jenkins somehow?
>> >>>>>
>> >>>>> Regards
>> >>>>> Michal
>> >>>>>
>> >>>>> --
>> >>>>>
>> >>>>> Michał Walenia
>> >>>>> Polidea <https://www.polidea.com/> | Software Engineer
>> >>>>>
>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <+48791432002
>> <+48%20791%20432%20002>>
>> >>>>> E: michal.walenia@polidea.com
>> >>>>>
>> >>>>> Unique Tech
>> >>>>> Check out our projects! <https://www.polidea.com/our-work>
>> >>>>>
>> >>>>
>> >>
>>
>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Damian Gadomski <da...@polidea.com>.
Hey,

I've recently created a solution for the growing /tmp directory. Part of it
is the job mentioned by Tyson: *beam_Clean_tmp_directory*. It's
intentionally not triggered by cron and should be a last resort solution
for some strange cases.

Along with that job, I've also updated every worker with an internal cron
script. It's being executed once a week and deletes all the files (and only
files) that were not accessed for at least three days. That's designed to
be as safe as possible for the running jobs on the worker (not to delete
the files that are still in use), and also to be insensitive to the current
workload on the machine. The cleanup will always happen, even if some
long-running/stuck jobs are blocking the machine.

I also think that currently the "No space left" errors may be a consequence
of growing workspace directory rather than /tmp. I didn't do any detailed
analysis but e.g. currently, on apache-beam-jenkins-7 the workspace
directory size is 158 GB while /tmp is only 16 GB. We should either
guarantee the disk size to hold workspaces for all jobs (because
eventually, every worker will execute each job) or clear also the
workspaces in some way.

Regards,
Damian


On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <mx...@apache.org> wrote:

> +1 for scheduling it via a cron job if it won't lead to test failures
> while running. Not a Jenkins expert but maybe there is the notion of
> running exclusively while no other tasks are running?
>
> -Max
>
> On 17.07.20 21:49, Tyson Hamilton wrote:
> > FYI there was a job introduced to do this in Jenkins:
> beam_Clean_tmp_directory
> >
> > Currently it needs to be run manually. I'm seeing some out of disk
> related errors in precommit tests currently, perhaps we should schedule
> this job with cron?
> >
> >
> > On 2020/03/11 19:31:13, Heejong Lee <he...@google.com> wrote:
> >> Still seeing no space left on device errors on jenkins-7 (for example:
> >> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/)
> >>
> >>
> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <am...@google.com>
> wrote:
> >>
> >>> Did a one time cleanup of tmp files owned by jenkins older than 3 days.
> >>> Agree that we need a longer term solution.
> >>>
> >>> Passing recent tests on all executors except jenkins-12, which has not
> >>> scheduled recent builds for the past 13 days. Not scheduling:
> >>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
> >
> >>> Recent passing builds:
> >>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
> >
> >>>
> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <al...@google.com> wrote:
> >>>
> >>>> +Alan Myrvold <am...@google.com> is doing a one time cleanup. I
> agree
> >>>> that we need to have a solution to automate this task or address the
> root
> >>>> cause of the buildup.
> >>>>
> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
> michal.walenia@polidea.com>
> >>>> wrote:
> >>>>
> >>>>> Hi there,
> >>>>> it seems we have a problem with Jenkins workers again. Nodes 1 and 7
> >>>>> both fail jobs with "No space left on device".
> >>>>> Who is the best person to contact in these cases (someone with access
> >>>>> permissions to the workers).
> >>>>>
> >>>>> I also noticed that such errors are becoming more and more frequent
> >>>>> recently and I'd like to discuss how can this be remedied. Can a
> cleanup
> >>>>> task be automated on Jenkins somehow?
> >>>>>
> >>>>> Regards
> >>>>> Michal
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Michał Walenia
> >>>>> Polidea <https://www.polidea.com/> | Software Engineer
> >>>>>
> >>>>> M: +48 791 432 002 <+48791432002>
> >>>>> E: michal.walenia@polidea.com
> >>>>>
> >>>>> Unique Tech
> >>>>> Check out our projects! <https://www.polidea.com/our-work>
> >>>>>
> >>>>
> >>
>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Maximilian Michels <mx...@apache.org>.
+1 for scheduling it via a cron job if it won't lead to test failures 
while running. Not a Jenkins expert but maybe there is the notion of 
running exclusively while no other tasks are running?

-Max

On 17.07.20 21:49, Tyson Hamilton wrote:
> FYI there was a job introduced to do this in Jenkins: beam_Clean_tmp_directory
> 
> Currently it needs to be run manually. I'm seeing some out of disk related errors in precommit tests currently, perhaps we should schedule this job with cron?
> 
> 
> On 2020/03/11 19:31:13, Heejong Lee <he...@google.com> wrote:
>> Still seeing no space left on device errors on jenkins-7 (for example:
>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/)
>>
>>
>> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <am...@google.com> wrote:
>>
>>> Did a one time cleanup of tmp files owned by jenkins older than 3 days.
>>> Agree that we need a longer term solution.
>>>
>>> Passing recent tests on all executors except jenkins-12, which has not
>>> scheduled recent builds for the past 13 days. Not scheduling:
>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D>
>>> Recent passing builds:
>>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D>
>>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D>
>>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D>
>>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D>
>>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D>
>>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D>
>>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D>
>>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D>
>>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D>
>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D>
>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D>
>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D>
>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D>
>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D>
>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D>
>>>
>>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <al...@google.com> wrote:
>>>
>>>> +Alan Myrvold <am...@google.com> is doing a one time cleanup. I agree
>>>> that we need to have a solution to automate this task or address the root
>>>> cause of the buildup.
>>>>
>>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <mi...@polidea.com>
>>>> wrote:
>>>>
>>>>> Hi there,
>>>>> it seems we have a problem with Jenkins workers again. Nodes 1 and 7
>>>>> both fail jobs with "No space left on device".
>>>>> Who is the best person to contact in these cases (someone with access
>>>>> permissions to the workers).
>>>>>
>>>>> I also noticed that such errors are becoming more and more frequent
>>>>> recently and I'd like to discuss how can this be remedied. Can a cleanup
>>>>> task be automated on Jenkins somehow?
>>>>>
>>>>> Regards
>>>>> Michal
>>>>>
>>>>> --
>>>>>
>>>>> Michał Walenia
>>>>> Polidea <https://www.polidea.com/> | Software Engineer
>>>>>
>>>>> M: +48 791 432 002 <+48791432002>
>>>>> E: michal.walenia@polidea.com
>>>>>
>>>>> Unique Tech
>>>>> Check out our projects! <https://www.polidea.com/our-work>
>>>>>
>>>>
>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Tyson Hamilton <ty...@google.com>.
FYI there was a job introduced to do this in Jenkins: beam_Clean_tmp_directory

Currently it needs to be run manually. I'm seeing some out of disk related errors in precommit tests currently, perhaps we should schedule this job with cron?


On 2020/03/11 19:31:13, Heejong Lee <he...@google.com> wrote: 
> Still seeing no space left on device errors on jenkins-7 (for example:
> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/)
> 
> 
> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <am...@google.com> wrote:
> 
> > Did a one time cleanup of tmp files owned by jenkins older than 3 days.
> > Agree that we need a longer term solution.
> >
> > Passing recent tests on all executors except jenkins-12, which has not
> > scheduled recent builds for the past 13 days. Not scheduling:
> > https://builds.apache.org/computer/apache-beam-jenkins-12/builds
> > <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D>
> > Recent passing builds:
> > https://builds.apache.org/computer/apache-beam-jenkins-1/builds
> > <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D>
> > https://builds.apache.org/computer/apache-beam-jenkins-2/builds
> > <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D>
> > https://builds.apache.org/computer/apache-beam-jenkins-3/builds
> > <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D>
> > https://builds.apache.org/computer/apache-beam-jenkins-4/builds
> > <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D>
> > https://builds.apache.org/computer/apache-beam-jenkins-5/builds
> > <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D>
> > https://builds.apache.org/computer/apache-beam-jenkins-6/builds
> > <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D>
> > https://builds.apache.org/computer/apache-beam-jenkins-7/builds
> > <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D>
> > https://builds.apache.org/computer/apache-beam-jenkins-8/builds
> > <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D>
> > https://builds.apache.org/computer/apache-beam-jenkins-9/builds
> > <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D>
> > https://builds.apache.org/computer/apache-beam-jenkins-10/builds
> > <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D>
> > https://builds.apache.org/computer/apache-beam-jenkins-11/builds
> > <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D>
> > https://builds.apache.org/computer/apache-beam-jenkins-13/builds
> > <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D>
> > https://builds.apache.org/computer/apache-beam-jenkins-14/builds
> > <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D>
> > https://builds.apache.org/computer/apache-beam-jenkins-15/builds
> > <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D>
> > https://builds.apache.org/computer/apache-beam-jenkins-16/builds
> > <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D>
> >
> > On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <al...@google.com> wrote:
> >
> >> +Alan Myrvold <am...@google.com> is doing a one time cleanup. I agree
> >> that we need to have a solution to automate this task or address the root
> >> cause of the buildup.
> >>
> >> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <mi...@polidea.com>
> >> wrote:
> >>
> >>> Hi there,
> >>> it seems we have a problem with Jenkins workers again. Nodes 1 and 7
> >>> both fail jobs with "No space left on device".
> >>> Who is the best person to contact in these cases (someone with access
> >>> permissions to the workers).
> >>>
> >>> I also noticed that such errors are becoming more and more frequent
> >>> recently and I'd like to discuss how can this be remedied. Can a cleanup
> >>> task be automated on Jenkins somehow?
> >>>
> >>> Regards
> >>> Michal
> >>>
> >>> --
> >>>
> >>> Michał Walenia
> >>> Polidea <https://www.polidea.com/> | Software Engineer
> >>>
> >>> M: +48 791 432 002 <+48791432002>
> >>> E: michal.walenia@polidea.com
> >>>
> >>> Unique Tech
> >>> Check out our projects! <https://www.polidea.com/our-work>
> >>>
> >>
> 

Re: No space left on device - beam-jenkins 1 and 7

Posted by Heejong Lee <he...@google.com>.
Still seeing no space left on device errors on jenkins-7 (for example:
https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/)


On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <am...@google.com> wrote:

> Did a one time cleanup of tmp files owned by jenkins older than 3 days.
> Agree that we need a longer term solution.
>
> Passing recent tests on all executors except jenkins-12, which has not
> scheduled recent builds for the past 13 days. Not scheduling:
> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D>
> Recent passing builds:
> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D>
> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D>
> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D>
> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D>
> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D>
> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D>
> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D>
> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D>
> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D>
> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D>
> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D>
> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D>
> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D>
> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D>
> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
> <https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D>
>
> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <al...@google.com> wrote:
>
>> +Alan Myrvold <am...@google.com> is doing a one time cleanup. I agree
>> that we need to have a solution to automate this task or address the root
>> cause of the buildup.
>>
>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <mi...@polidea.com>
>> wrote:
>>
>>> Hi there,
>>> it seems we have a problem with Jenkins workers again. Nodes 1 and 7
>>> both fail jobs with "No space left on device".
>>> Who is the best person to contact in these cases (someone with access
>>> permissions to the workers).
>>>
>>> I also noticed that such errors are becoming more and more frequent
>>> recently and I'd like to discuss how can this be remedied. Can a cleanup
>>> task be automated on Jenkins somehow?
>>>
>>> Regards
>>> Michal
>>>
>>> --
>>>
>>> Michał Walenia
>>> Polidea <https://www.polidea.com/> | Software Engineer
>>>
>>> M: +48 791 432 002 <+48791432002>
>>> E: michal.walenia@polidea.com
>>>
>>> Unique Tech
>>> Check out our projects! <https://www.polidea.com/our-work>
>>>
>>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Alan Myrvold <am...@google.com>.
Did a one time cleanup of tmp files owned by jenkins older than 3 days.
Agree that we need a longer term solution.

Passing recent tests on all executors except jenkins-12, which has not
scheduled recent builds for the past 13 days. Not scheduling:
https://builds.apache.org/computer/apache-beam-jenkins-12/builds
<https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D>
Recent passing builds:
https://builds.apache.org/computer/apache-beam-jenkins-1/builds
<https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D>
https://builds.apache.org/computer/apache-beam-jenkins-2/builds
<https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D>
https://builds.apache.org/computer/apache-beam-jenkins-3/builds
<https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D>
https://builds.apache.org/computer/apache-beam-jenkins-4/builds
<https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D>
https://builds.apache.org/computer/apache-beam-jenkins-5/builds
<https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D>
https://builds.apache.org/computer/apache-beam-jenkins-6/builds
<https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D>
https://builds.apache.org/computer/apache-beam-jenkins-7/builds
<https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D>
https://builds.apache.org/computer/apache-beam-jenkins-8/builds
<https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D>
https://builds.apache.org/computer/apache-beam-jenkins-9/builds
<https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D>
https://builds.apache.org/computer/apache-beam-jenkins-10/builds
<https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D>
https://builds.apache.org/computer/apache-beam-jenkins-11/builds
<https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D>
https://builds.apache.org/computer/apache-beam-jenkins-13/builds
<https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D>
https://builds.apache.org/computer/apache-beam-jenkins-14/builds
<https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D>
https://builds.apache.org/computer/apache-beam-jenkins-15/builds
<https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D>
https://builds.apache.org/computer/apache-beam-jenkins-16/builds
<https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D>

On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <al...@google.com> wrote:

> +Alan Myrvold <am...@google.com> is doing a one time cleanup. I agree
> that we need to have a solution to automate this task or address the root
> cause of the buildup.
>
> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <mi...@polidea.com>
> wrote:
>
>> Hi there,
>> it seems we have a problem with Jenkins workers again. Nodes 1 and 7 both
>> fail jobs with "No space left on device".
>> Who is the best person to contact in these cases (someone with access
>> permissions to the workers).
>>
>> I also noticed that such errors are becoming more and more frequent
>> recently and I'd like to discuss how can this be remedied. Can a cleanup
>> task be automated on Jenkins somehow?
>>
>> Regards
>> Michal
>>
>> --
>>
>> Michał Walenia
>> Polidea <https://www.polidea.com/> | Software Engineer
>>
>> M: +48 791 432 002 <+48791432002>
>> E: michal.walenia@polidea.com
>>
>> Unique Tech
>> Check out our projects! <https://www.polidea.com/our-work>
>>
>

Re: No space left on device - beam-jenkins 1 and 7

Posted by Ahmet Altay <al...@google.com>.
+Alan Myrvold <am...@google.com> is doing a one time cleanup. I agree
that we need to have a solution to automate this task or address the root
cause of the buildup.

On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <mi...@polidea.com>
wrote:

> Hi there,
> it seems we have a problem with Jenkins workers again. Nodes 1 and 7 both
> fail jobs with "No space left on device".
> Who is the best person to contact in these cases (someone with access
> permissions to the workers).
>
> I also noticed that such errors are becoming more and more frequent
> recently and I'd like to discuss how can this be remedied. Can a cleanup
> task be automated on Jenkins somehow?
>
> Regards
> Michal
>
> --
>
> Michał Walenia
> Polidea <https://www.polidea.com/> | Software Engineer
>
> M: +48 791 432 002 <+48791432002>
> E: michal.walenia@polidea.com
>
> Unique Tech
> Check out our projects! <https://www.polidea.com/our-work>
>