You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Jeroen Steggink | knowsy <je...@knowsy.nl> on 2018/04/18 13:58:02 UTC

Jars uploaded to jobmanager are deleted but not free'ed by OS

Sorry, I meant the jobmanager, not the taskmanager.


On 18-Apr-18 15:44, Jeroen Steggink | knowsy wrote:
> Hi,
>
> I'm having some troubles running the Flink taskmanager in a Docker 
> container (OpenShift). The container's internal storage is filling up 
> because the deleted jar files in blob storage are probably still in 
> use and therefore resources are not free'ed.
>
> We are using Apache Beam to start an Apache Flink process, so the Jars 
> are sent to Apache Flink everytime we fire a batch.
>
> I enabled the debug logging, but I can't seem to find anything showing 
> these deletes. Maybe someone has an idea why resources are not 
> free'ed? I checked the blob store, and it indeed are the jars.
>
> 208875129    0 lr-x------   1 1000150000 root           64 Apr 18 
> 12:58 /proc/1/fd/142 -> 
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_90964be94a2f4471844a00284e44fb32/blob_p-5202910b36af8c12548df97a7e4a057b77786217-ffa3f85003b1f124cd1cccdb0f72a8e0\ 
> (deleted)
>
> 208875130    0 lr-x------   1 1000150000 root           64 Apr 18 
> 12:58 /proc/1/fd/143 -> 
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_b7c00268b488411a8f6e1af984bcdcc2/blob_p-5202910b36af8c12548df97a7e4a057b77786217-8bab07adb34d4ce8de20846ec72059ce\ 
> (deleted)
>
> 208875131    0 lr-x------   1 1000150000 root           64 Apr 18 
> 12:58 /proc/1/fd/144 -> 
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_46183ac02f1dcd3543f8e481f59948b5/blob_p-5202910b36af8c12548df97a7e4a057b77786217-ac6bc86d8932e7d631416d9bafab4ab4\ 
> (deleted)
>
> 208875132    0 lr-x------   1 1000150000 root           64 Apr 18 
> 12:58 /proc/1/fd/145 -> 
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_717bf3f4b3f80700c1cc44d6076c2aca/blob_p-5202910b36af8c12548df97a7e4a057b77786217-780dd2383dee11a2361ac20a5da7bbb8\ 
> (deleted)
>
> 208875133    0 lr-x------   1 1000150000 root           64 Apr 18 
> 12:58 /proc/1/fd/146 -> 
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_22e67caac65c9c4e537caa3b072b8cc3/blob_p-5202910b36af8c12548df97a7e4a057b77786217-e0b523663672c641b368e5d1440b0b70\ 
> (deleted)
>
> 208875134    0 lr-x------   1 1000150000 root           64 Apr 18 
> 12:58 /proc/1/fd/147 -> 
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_3afe5b02ccb95b3494a1acd8677c66f0/blob_p-5202910b36af8c12548df97a7e4a057b77786217-9a8cd48c09a4b518adf0309a0255b339\ 
> (deleted)
>
> 208875135    0 lr-x------   1 1000150000 root           64 Apr 18 
> 12:58 /proc/1/fd/148 -> 
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_cb024c561531905e81c9768ec62a2fe0/blob_p-5202910b36af8c12548df97a7e4a057b77786217-0addc83aaf9a2f781528ad035fd79cc8\ 
> (deleted)
>
> 208875136    0 lr-x------   1 1000150000 root           64 Apr 18 
> 12:58 /proc/1/fd/149 -> 
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_d3dc0b0608d71ffa77575771f088e80e/blob_p-5202910b36af8c12548df97a7e4a057b77786217-c9015b012ec4b249f32872471a31a500\ 
> (deleted)
>
> 208875137    0 lr-x------   1 1000150000 root           64 Apr 18 
> 12:58 /proc/1/fd/150 -> 
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_1b4cdb127bb2c345e1b099e3e446bf58/blob_p-5202910b36af8c12548df97a7e4a057b77786217-ac4457b393b7ff0565c47c1e38786005\ 
> (deleted)
>
> 208875138    0 lr-x------   1 1000150000 root           64 Apr 18 
> 12:58 /proc/1/fd/151 -> 
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_8c23503c614a88e8c8f7a54a31e41886/blob_p-5202910b36af8c12548df97a7e4a057b77786217-d096b3ef150bf7e8e98224e0b8f17292\ 
> (deleted)
>
> 208875139    0 lr-x------   1 1000150000 root           64 Apr 18 
> 12:58 /proc/1/fd/152 -> 
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_e7c8132da483bd14e5abfe9390adeeb1/blob_p-5202910b36af8c12548df97a7e4a057b77786217-f370d8dcad0cb36581f9a5f1568e1487\ 
> (deleted)
>
> 208875140    0 lr-x------   1 1000150000 root           64 Apr 18 
> 12:58 /proc/1/fd/153 -> 
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_cbee9f15b0c6adba0f5ddb67b587b607/blob_p-5202910b36af8c12548df97a7e4a057b77786217-9ae77c3419d77adab8f44258ca4290c5\ 
> (deleted)
>
> 208875141    0 lr-x------   1 1000150000 root           64 Apr 18 
> 12:58 /proc/1/fd/154 -> 
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_29c5a145ae231be4c0d53717625c3938/blob_p-5202910b36af8c12548df97a7e4a057b77786217-76bb4d83f962a887d41effb2646bd63d\ 
> (deleted)
>
>
>
> There are several places in the code where the returned boolean of the 
> file delete is not read, so we have no clue if the file was deleted 
> succesfully. Maybe it can be changed to something like 
> java.nio.file.Files.delete to get an IOException when something goes 
> wrong.  Though this is not a solution, but it can make it more 
> transparent when things go wrong.
>
> Thanks,
> Jeroen
>