You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Konstantin Knauf <ko...@tngtech.com> on 2016/09/30 07:12:46 UTC

Blobstorage Locally and on HDFS

Hi,

we are running a Flink (1.1.2) Stand-Alone Cluster with JM HA, and HDFS
as checkpoint and recovery storage dir. What we see is that blobStores
are stored in HDFS as well as under the local Jobmanagers and
Taskmanagers /tmp directory.

Is this the expected behaviour? Is there any documentation on which
blobs are stored locally and which are stored in HDFS in our case? In
particular, we would need to know when it is save to delete blobs stored
locally because there are not cleanup up by Flink and fill up the /tmp
partition eventually.

Cheers,

Konstantin


-- 
Konstantin Knauf * konstantin.knauf@tngtech.com * +49-174-3413182
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082

Re: Blobstorage Locally and on HDFS

Posted by Maximilian Michels <mx...@apache.org>.

Hi Konstantin,

This looks fine. Generally it is fine to delete Blobs in /tmp once the
Job is running or has finished. When the job is running, the Flink
classloader has already opened these files. Thus, the file system will
still have these available through the file descriptor and defer
deletion until the descriptor is closed (at least in Unix like
systems). When the job is finished, the blobs will be cleaned after
some time.

In the latest master, we have changed the descriptors to immediately
release file descriptors. In Flink 1.1.x we still hold on to them
until the job history is cleared from the web interface.

-Max


On Tue, Oct 4, 2016 at 4:54 PM, Konstantin Knauf
<ko...@tngtech.com> wrote:
> Hi Ufuk,
>
> any ideas? Any configuration that could be wrong?
>
> Cheers,
>
> Konstantin
>
> On 30.09.2016 13:13, Konstantin Knauf wrote:
>> Hi Ufuk,
>>
>> thanks for your quick answer.
>>
>> Setup: 2 Servers, each running a JM as well as TM
>>
>> 1) Removing all existing blobstores locally (/tmp) as well as on HDFS
>> 2) Starting a flink streaming job
>>
>> Now there are the following BLOBs:
>>
>> Local:
>>
>> *Leader JM:
>>
>> 4.0K    /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4/incoming
>>
>> 64M     /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4
>>
>> 64M     /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4/cache
>>
>> 64M     /tmp/blobStore-c6b93d41-8916-4a8d-b595-6e35f0b10401
>>
>> 64M     /tmp/blobStore-c6b93d41-8916-4a8d-b595-6e35f0b10401/cache
>>
>> *Standby JM:
>>
>> 64M     /tmp/blobStore-4cbfd3c0-2a70-4485-8fc0-045ca7f08cea
>>
>> 64M     /tmp/blobStore-4cbfd3c0-2a70-4485-8fc0-045ca7f08cea/cache
>>
>> HDFS:
>>
>> 66595700 2016-09-30 13:03
>> <..>/flink/blob/cache/blob_da76e12b949a83404f97b6eb59416deaa31a907b
>>
>>
>> 3) Cancelinng both jobs via command line:
>>
>> Now there are the following BLOBs:
>>
>> **same as above**
>>
>> When starting the same job again, no new blobs are created.
>>
>> Is it a problem to delete local blobStores of running jobs or will the
>> blobs just be downloaded again from HDFS if needed?
>>
>> Cheers,
>>
>> Konstantin
>>
>>
>>
>> Is it correct, that ea
>>
>> On 30.09.2016 10:28, Ufuk Celebi wrote:
>>> On Fri, Sep 30, 2016 at 9:12 AM, Konstantin Knauf
>>> <ko...@tngtech.com> wrote:
>>>> we are running a Flink (1.1.2) Stand-Alone Cluster with JM HA, and HDFS
>>>> as checkpoint and recovery storage dir. What we see is that blobStores
>>>> are stored in HDFS as well as under the local Jobmanagers and
>>>> Taskmanagers /tmp directory.
>>>>
>>>> Is this the expected behaviour? Is there any documentation on which
>>>> blobs are stored locally and which are stored in HDFS in our case? In
>>>> particular, we would need to know when it is save to delete blobs stored
>>>> locally because there are not cleanup up by Flink and fill up the /tmp
>>>> partition eventually.
>>>
>>> BLOBs are copied to another directory in case of HA in order to be
>>> available for other job managers that might take over.
>>>
>>> On regular termination (cancel, finish) all BLOBs should be cleaned
>>> up. With hard failures, it can happen that BLOBs are not cleaned up.
>>>
>>> Do you know in which cases you see BLOBs not being cleaned up? If it
>>> is the first one, that sounds like a bug to me.
>>>
>>> – Ufuk
>>>
>>
>
> --
> Konstantin Knauf * konstantin.knauf@tngtech.com * +49-174-3413182
> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>

Re: Blobstorage Locally and on HDFS

Posted by Konstantin Knauf <ko...@tngtech.com>.

Hi Ufuk,

any ideas? Any configuration that could be wrong?

Cheers,

Konstantin

On 30.09.2016 13:13, Konstantin Knauf wrote:
> Hi Ufuk,
> 
> thanks for your quick answer.
> 
> Setup: 2 Servers, each running a JM as well as TM
> 
> 1) Removing all existing blobstores locally (/tmp) as well as on HDFS
> 2) Starting a flink streaming job
> 
> Now there are the following BLOBs:
> 
> Local:
> 
> *Leader JM:
> 
> 4.0K    /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4/incoming
> 
> 64M     /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4
> 
> 64M     /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4/cache
> 
> 64M     /tmp/blobStore-c6b93d41-8916-4a8d-b595-6e35f0b10401
> 
> 64M     /tmp/blobStore-c6b93d41-8916-4a8d-b595-6e35f0b10401/cache
> 
> *Standby JM:
> 
> 64M     /tmp/blobStore-4cbfd3c0-2a70-4485-8fc0-045ca7f08cea
> 
> 64M     /tmp/blobStore-4cbfd3c0-2a70-4485-8fc0-045ca7f08cea/cache
> 
> HDFS:
> 
> 66595700 2016-09-30 13:03
> <..>/flink/blob/cache/blob_da76e12b949a83404f97b6eb59416deaa31a907b
> 
> 
> 3) Cancelinng both jobs via command line:
> 
> Now there are the following BLOBs:
> 
> **same as above**
> 
> When starting the same job again, no new blobs are created.
> 
> Is it a problem to delete local blobStores of running jobs or will the
> blobs just be downloaded again from HDFS if needed?
> 
> Cheers,
> 
> Konstantin
> 
> 
> 
> Is it correct, that ea
> 
> On 30.09.2016 10:28, Ufuk Celebi wrote:
>> On Fri, Sep 30, 2016 at 9:12 AM, Konstantin Knauf
>> <ko...@tngtech.com> wrote:
>>> we are running a Flink (1.1.2) Stand-Alone Cluster with JM HA, and HDFS
>>> as checkpoint and recovery storage dir. What we see is that blobStores
>>> are stored in HDFS as well as under the local Jobmanagers and
>>> Taskmanagers /tmp directory.
>>>
>>> Is this the expected behaviour? Is there any documentation on which
>>> blobs are stored locally and which are stored in HDFS in our case? In
>>> particular, we would need to know when it is save to delete blobs stored
>>> locally because there are not cleanup up by Flink and fill up the /tmp
>>> partition eventually.
>>
>> BLOBs are copied to another directory in case of HA in order to be
>> available for other job managers that might take over.
>>
>> On regular termination (cancel, finish) all BLOBs should be cleaned
>> up. With hard failures, it can happen that BLOBs are not cleaned up.
>>
>> Do you know in which cases you see BLOBs not being cleaned up? If it
>> is the first one, that sounds like a bug to me.
>>
>> – Ufuk
>>
> 

-- 
Konstantin Knauf * konstantin.knauf@tngtech.com * +49-174-3413182
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082

Re: Blobstorage Locally and on HDFS

Posted by Konstantin Knauf <ko...@tngtech.com>.

Hi Ufuk,

thanks for your quick answer.

Setup: 2 Servers, each running a JM as well as TM

1) Removing all existing blobstores locally (/tmp) as well as on HDFS
2) Starting a flink streaming job

Now there are the following BLOBs:

Local:

*Leader JM:

4.0K    /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4/incoming

64M     /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4

64M     /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4/cache

64M     /tmp/blobStore-c6b93d41-8916-4a8d-b595-6e35f0b10401

64M     /tmp/blobStore-c6b93d41-8916-4a8d-b595-6e35f0b10401/cache

*Standby JM:

64M     /tmp/blobStore-4cbfd3c0-2a70-4485-8fc0-045ca7f08cea

64M     /tmp/blobStore-4cbfd3c0-2a70-4485-8fc0-045ca7f08cea/cache

HDFS:

66595700 2016-09-30 13:03
<..>/flink/blob/cache/blob_da76e12b949a83404f97b6eb59416deaa31a907b


3) Cancelinng both jobs via command line:

Now there are the following BLOBs:

**same as above**

When starting the same job again, no new blobs are created.

Is it a problem to delete local blobStores of running jobs or will the
blobs just be downloaded again from HDFS if needed?

Cheers,

Konstantin



Is it correct, that ea

On 30.09.2016 10:28, Ufuk Celebi wrote:
> On Fri, Sep 30, 2016 at 9:12 AM, Konstantin Knauf
> <ko...@tngtech.com> wrote:
>> we are running a Flink (1.1.2) Stand-Alone Cluster with JM HA, and HDFS
>> as checkpoint and recovery storage dir. What we see is that blobStores
>> are stored in HDFS as well as under the local Jobmanagers and
>> Taskmanagers /tmp directory.
>>
>> Is this the expected behaviour? Is there any documentation on which
>> blobs are stored locally and which are stored in HDFS in our case? In
>> particular, we would need to know when it is save to delete blobs stored
>> locally because there are not cleanup up by Flink and fill up the /tmp
>> partition eventually.
> 
> BLOBs are copied to another directory in case of HA in order to be
> available for other job managers that might take over.
> 
> On regular termination (cancel, finish) all BLOBs should be cleaned
> up. With hard failures, it can happen that BLOBs are not cleaned up.
> 
> Do you know in which cases you see BLOBs not being cleaned up? If it
> is the first one, that sounds like a bug to me.
> 
> – Ufuk
> 

-- 
Konstantin Knauf * konstantin.knauf@tngtech.com * +49-174-3413182
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082

Re: Blobstorage Locally and on HDFS

Posted by Ufuk Celebi <uc...@apache.org>.

On Fri, Sep 30, 2016 at 9:12 AM, Konstantin Knauf
<ko...@tngtech.com> wrote:
> we are running a Flink (1.1.2) Stand-Alone Cluster with JM HA, and HDFS
> as checkpoint and recovery storage dir. What we see is that blobStores
> are stored in HDFS as well as under the local Jobmanagers and
> Taskmanagers /tmp directory.
>
> Is this the expected behaviour? Is there any documentation on which
> blobs are stored locally and which are stored in HDFS in our case? In
> particular, we would need to know when it is save to delete blobs stored
> locally because there are not cleanup up by Flink and fill up the /tmp
> partition eventually.

BLOBs are copied to another directory in case of HA in order to be
available for other job managers that might take over.

On regular termination (cancel, finish) all BLOBs should be cleaned
up. With hard failures, it can happen that BLOBs are not cleaned up.

Do you know in which cases you see BLOBs not being cleaned up? If it
is the first one, that sounds like a bug to me.

– Ufuk