You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Mark Harris <ma...@hivehome.com> on 2020/01/21 14:38:11 UTC

GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Hi,

We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail (causing all the jobs running on them to fail) with an "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager (and jobs that should be running on it) remain down until manually restarted.

I managed to take and analyze a memory dump from one of the afflicted taskmanagers.

It showed that 85% of the heap was made up of the java.io.DeleteOnExitHook.files hashset. The majority of the strings in that hashset (9041060 out of ~9041100) pointed to files that began /tmp/hadoop-yarn/s3a/s3ablock

The problem seems to affect jobs that make use of the StreamingFileSink - all of the taskmanager crashes have been on the taskmaster running at least one job using this sink, and a cluster running only a single taskmanager / job that uses the StreamingFileSink crashed with the GC overhead limit exceeded error.

I've had a look for advice on handling this error more broadly without luck.

Any suggestions or advice gratefully received.

Best regards,

Mark Harris



The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Posted by Piotr Nowojski <pi...@ververica.com>.
Hi,

Thanks for getting back with the semi solution!

Sorry that I was not responding before - I was trying to figure this out with some of my colleagues.

> I think the DeleteOnExit problem will mean it needs to be restarted every few weeks, but that's acceptable for now.

I hope by the time you find this annoying, Hadoop issue will be fixed somehow for you (AWS using Hadoop 3.3+?)

Piotrek

> On 3 Feb 2020, at 15:54, Mark Harris <ma...@hivehome.com> wrote:
> 
> Hi all,
> 
> The out-of-memory heap dump had the answer - the job was failing with an OutOfMemoryError because the activeBuckets members of 3 instances of org.apache.flink.streaming.api.functions.sink.filesystem.Buckets were filling a significant enough part of the memory of the taskmanager that no progress was being made. Increasing the memory available to the TM seems to have fixed the problem.
> 
> I think the DeleteOnExit problem will mean it needs to be restarted every few weeks, but that's acceptable for now.
> 
> Thanks again,
> 
> Mark
> From: Mark Harris <ma...@hivehome.com>
> Sent: 30 January 2020 14:36
> To: Piotr Nowojski <pi...@ververica.com>
> Cc: Cliff Resnick <cr...@gmail.com>; David Magalhães <sp...@gmail.com>; Till Rohrmann <tr...@apache.org>; flink-user@apache.org <fl...@apache.org>; kkloudas <kk...@apache.org>
> Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files
>  
> Hi,
> 
> Thanks for your help with this. 🙂
> 
> The EMR cluster has 3 15GB VMs, and the flink cluster is started with:
> 
> /usr/lib/flink/bin/yarn-session.sh -d -n 3 -tm 5760 -jm 5760 -s 3
> 
> Usually the task runs for about 15 minutes before it restarts, usually due to with an "java.lang.OutOfMemoryError: Java heap space" exception. 
> 
> The figures came from a MemoryAnalyzer session on a manual memory dump from one of the taskmanagers. The total size of that heap was only 1.8gb.  In that heap, 1.7gb is taken up by the static field "files" in DeleteOnExitHook, which is a linked hash set containing the 9 million strings. 
> 
> A full example of one the path is /tmp/hadoop-yarn/s3a/s3ablock-0001-6061210725685.tmp, at for 120 bytes per char[] for a solid 1.2gb of chars. Then 200mb for their String wrappers and another 361MB for LinkedHashMap$Entry objects. Despite valiantly holding on to an array of 16777216 HashMap$Node elements, the LinkedHashMap can only contribute another 20MB or so. 
> I goofed in not taking that 85% figure from MemoryAnalyzer - it tells me DeleteOnExitHook is responsible for 96.98% of the heap dump.
> 
> Looking at the files it managed to write before this started to happen regularly, it looks like they're being written approximately every 3 minutes. I'll triple check our config, but I'm reasonably sure the job is configured to checkpoint every 15 minutes - could something else be causing it to write?
> 
> This may all be a red herring - something else may be taking up the taskmanagers memory which didn't make it into that heap dump. I plan to repeat the analysis on a heapdump created by  -XX:+HeapDumpOnOutOfMemoryError shortly.
> 
> Best regards,
> 
> Mark
> 
> From: Piotr Nowojski <pi...@ververica.com>
> Sent: 30 January 2020 13:44
> To: Mark Harris <ma...@hivehome.com>
> Cc: Cliff Resnick <cr...@gmail.com>; David Magalhães <sp...@gmail.com>; Till Rohrmann <tr...@apache.org>; flink-user@apache.org <fl...@apache.org>; kkloudas <kk...@apache.org>
> Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files
>  
> Hi,
> 
> What is your job setup? Size of the nodes, memory settings of the Flink/JVM?
> 
> 9 041 060 strings is awfully small number to bring down a whole cluster. With each tmp string having ~30 bytes, that’s only 271MB. Is this really 85% of the heap? And also, with parallelism of 6 and checkpoints every 15 minutes, 9 000 000 of leaked strings should happen only after one month  assuming 500-600 total number of buckets. (Also assuming that there is a separate file per each bucket).
> 
> Piotrek 
> 
>> On 30 Jan 2020, at 14:21, Mark Harris <mark.harris@hivehome.com <ma...@hivehome.com>> wrote:
>> 
>> Trying a few different approaches to the fs.s3a.fast.upload settings has bought me no joy - the taskmanagers end up simply crashing or complaining of high GC load. Heap dumps suggest that this time they're clogged with buffers instead, which makes sense.
>> 
>> Our job has parallelism of 6 and checkpoints every 15 minutes - if anything, we'd like to increase the frequency of that checkpoint duration. I suspect this could be affected by the partition structure we were bucketing to as well, and at any given moment we could be receiving data for up to 280 buckets at once.
>> Could this be a factor?
>> 
>> Best regards,
>> 
>> Mark
>> From: Piotr Nowojski <piotr@ververica.com <ma...@ververica.com>>
>> Sent: 27 January 2020 16:16
>> To: Cliff Resnick <cresny@gmail.com <ma...@gmail.com>>
>> Cc: David Magalhães <speeddragon@gmail.com <ma...@gmail.com>>; Mark Harris <mark.harris@hivehome.com <ma...@hivehome.com>>; Till Rohrmann <trohrmann@apache.org <ma...@apache.org>>; flink-user@apache.org <ma...@apache.org> <flink-user@apache.org <ma...@apache.org>>; kkloudas <kkloudas@apache.org <ma...@apache.org>>
>> Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files
>>  
>> Hi,
>> 
>> I think reducing the frequency of the checkpoints and decreasing parallelism of the things using the S3AOutputStream class, would help to mitigate the issue. 
>> 
>> I don’t know about other solutions. I would suggest to ask this question directly to Steve L. in the bug ticket [1], as he is the one that fixed the issue. If there is no workaround, maybe it would be possible to put a pressure on the Hadoop guys to back port the fix to older versions?
>> 
>> Piotrek
>> 
>> [1] https://issues.apache.org/jira/browse/HADOOP-15658 <https://issues.apache.org/jira/browse/HADOOP-15658>
>> 
>>> On 27 Jan 2020, at 15:41, Cliff Resnick <cresny@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> I know from experience that Flink's shaded S3A FileSystem does not reference core-site.xml, though I don't remember offhand what file (s) it does reference. However since it's shaded, maybe this could be fixed by building a Flink FS referencing 3.3.0? Last I checked I think it referenced 3.1.0.
>>> 
>>> On Mon, Jan 27, 2020, 8:48 AM David Magalhães <speeddragon@gmail.com <ma...@gmail.com>> wrote:
>>> Does StreamingFileSink use core-site.xml ? When I was using it, it didn't load any configurations from core-site.xml.
>>> 
>>> On Mon, Jan 27, 2020 at 12:08 PM Mark Harris <mark.harris@hivehome.com <ma...@hivehome.com>> wrote:
>>> Hi Piotr,
>>> 
>>> Thanks for the link to the issue.
>>> 
>>> Do you know if there's a workaround? I've tried setting the following in my core-site.xml:
>>> 
>>> ​fs.s3a.fast.upload.buffer=true
>>> 
>>> To try and avoid writing the buffer files, but the taskmanager breaks with the same problem.
>>> 
>>> Best regards,
>>> 
>>> Mark
>>> From: Piotr Nowojski <piotr@data-artisans.com <ma...@data-artisans.com>> on behalf of Piotr Nowojski <piotr@ververica.com <ma...@ververica.com>>
>>> Sent: 22 January 2020 13:29
>>> To: Till Rohrmann <trohrmann@apache.org <ma...@apache.org>>
>>> Cc: Mark Harris <mark.harris@hivehome.com <ma...@hivehome.com>>; flink-user@apache.org <ma...@apache.org> <flink-user@apache.org <ma...@apache.org>>; kkloudas <kkloudas@apache.org <ma...@apache.org>>
>>> Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files
>>>  
>>> Hi,
>>> 
>>> This is probably a known issue of Hadoop [1]. Unfortunately it was only fixed in 3.3.0.
>>> 
>>> Piotrek
>>> 
>>> [1] https://issues.apache.org/jira/browse/HADOOP-15658 <https://issues.apache.org/jira/browse/HADOOP-15658>
>>> 
>>>> On 22 Jan 2020, at 13:56, Till Rohrmann <trohrmann@apache.org <ma...@apache.org>> wrote:
>>>> 
>>>> Thanks for reporting this issue Mark. I'm pulling Klou into this conversation who knows more about the StreamingFileSink. @Klou does the StreamingFileSink relies on DeleteOnExitHooks to clean up files?
>>>> 
>>>> Cheers,
>>>> Till
>>>> 
>>>> On Tue, Jan 21, 2020 at 3:38 PM Mark Harris <mark.harris@hivehome.com <ma...@hivehome.com>> wrote:
>>>> Hi,
>>>> 
>>>> We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail (causing all the jobs running on them to fail) with an "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager (and jobs that should be running on it) remain down until manually restarted.
>>>> 
>>>> I managed to take and analyze a memory dump from one of the afflicted taskmanagers. 
>>>> 
>>>> It showed that 85% of the heap was made up of the java.io.DeleteOnExitHook.files hashset. The majority of the strings in that hashset (9041060 out of ~9041100) pointed to files that began /tmp/hadoop-yarn/s3a/s3ablock
>>>> 
>>>> The problem seems to affect jobs that make use of the StreamingFileSink - all of the taskmanager crashes have been on the taskmaster running at least one job using this sink, and a cluster running only a single taskmanager / job that uses the StreamingFileSink crashed with the GC overhead limit exceeded error.
>>>> 
>>>> I've had a look for advice on handling this error more broadly without luck.
>>>> 
>>>> Any suggestions or advice gratefully received.
>>>> 
>>>> Best regards,
>>>> 
>>>> Mark Harris
>>>> 
>>>> 
>>>> 
>>>> The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law. 
>>>> 
>>>> The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary. 
>>>> 
>>>> Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email <https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email> 
>>>> 
>>>> PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD. 
>>>> 
>>>> British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.
>>> 
>>> 
>>> 
>>> The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law. 
>>> 
>>> The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary. 
>>> 
>>> Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email <https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email> 
>>> 
>>> PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD. 
>>> 
>>> British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.
>> 
>> 
>> 
>> The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law. 
>> 
>> The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary. 
>> 
>> Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email <https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email> 
>> 
>> PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD. 
>> 
>> British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.
> 
> 
> 
> The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law. 
> 
> The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary. 
> 
> Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email 
> 
> PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD. 
> 
> British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.
> 
> 
> The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law. 
> 
> The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary. 
> 
> Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email <https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email> 
> 
> PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD. 
> 
> British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.


Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Posted by Kostas Kloudas <kk...@apache.org>.
Hi Mark,

This feature of customizing the rolling policy even for bulk formats will
be in the upcoming 1.10 release as described in [1]
although the documentation for the feature is pending [2]. But I hope that
it will be merged on time for the release.

Cheers,
Kostas

[1] https://issues.apache.org/jira/browse/FLINK-13027
[2] https://issues.apache.org/jira/browse/FLINK-15476

On Mon, Feb 3, 2020 at 8:14 PM Kostas Kloudas <kk...@apache.org> wrote:

> Hi Mark,
>
> Currently no, but if rolling on every checkpoint is ok with you, in future
> versions it is easy to allow to roll on every checkpoint, but also on
> inactivity intervals.
>
> Cheers,
> Kostas
>
> On Mon, Feb 3, 2020 at 5:24 PM Mark Harris <ma...@hivehome.com>
> wrote:
>
>> Hi Kostas,
>>
>> Thanks for your help here - I think we're OK with the increased heap
>> size, but happy to explore other alternatives.
>>
>> I see the problem - we're currently using a BulkFormat, which doesn't
>> seem to let us override the rolling policy. Is there an equivalent for the
>> BulkFormat?
>>
>> Best regards,
>>
>> Mark
>> ------------------------------
>> *From:* Kostas Kloudas <kk...@apache.org>
>> *Sent:* 03 February 2020 15:39
>> *To:* Mark Harris <ma...@hivehome.com>
>> *Cc:* Piotr Nowojski <pi...@ververica.com>; Cliff Resnick <
>> cresny@gmail.com>; David Magalhães <sp...@gmail.com>; Till
>> Rohrmann <tr...@apache.org>; flink-user@apache.org <
>> flink-user@apache.org>
>> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
>> hooks for S3a files
>>
>> Hi Mark,
>>
>> You can use something like the following and change the intervals
>> accordingly:
>>
>> final StreamingFileSink<String> sink = StreamingFileSink
>>                           .forRowFormat(new Path(outputPath), new
>> SimpleStringEncoder<>("UTF-8"))
>>                            .withRollingPolicy(
>>                                    DefaultRollingPolicy.builder()
>>                                                                       .
>> withRolloverInterval(TimeUnit.MINUTES.toMillis(15))
>>                                                                       .
>> withInactivityInterval(TimeUnit.MINUTES.toMillis(5))
>>                                                                       .
>> withMaxPartSize(1024 * 1024 * 1024)
>>                                                                       .
>> build()
>>                           ) .build();
>>
>> Let me know if this solves the problem.
>>
>> Cheers,
>> Kostas
>>
>> On Mon, Feb 3, 2020 at 4:11 PM Mark Harris <ma...@hivehome.com>
>> wrote:
>>
>> Hi Kostas,
>>
>> Sorry, stupid question: How do I set that for a StreamingFileSink?
>>
>> Best regards,
>>
>> Mark
>> ------------------------------
>> *From:* Kostas Kloudas <kk...@apache.org>
>> *Sent:* 03 February 2020 14:58
>> *To:* Mark Harris <ma...@hivehome.com>
>> *Cc:* Piotr Nowojski <pi...@ververica.com>; Cliff Resnick <
>> cresny@gmail.com>; David Magalhães <sp...@gmail.com>; Till
>> Rohrmann <tr...@apache.org>; flink-user@apache.org <
>> flink-user@apache.org>
>> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
>> hooks for S3a files
>>
>> Hi Mark,
>>
>> Have you tried to set your rolling policy to close inactive part files
>> after some time [1]?
>> If the part files in the buckets are inactive and there are no new part
>> files, then the state handle for those buckets will also be removed.
>>
>> Cheers,
>> Kostas
>>
>>
>> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/DefaultRollingPolicy.html
>>
>>
>>
>> On Mon, Feb 3, 2020 at 3:54 PM Mark Harris <ma...@hivehome.com>
>> wrote:
>>
>> Hi all,
>>
>> The out-of-memory heap dump had the answer - the job was failing with an
>> OutOfMemoryError because the activeBuckets members of 3 instances of
>> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets were
>> filling a significant enough part of the memory of the taskmanager that no
>> progress was being made. Increasing the memory available to the TM seems to
>> have fixed the problem.
>>
>> I think the DeleteOnExit problem will mean it needs to be restarted every
>> few weeks, but that's acceptable for now.
>>
>> Thanks again,
>>
>> Mark
>> ------------------------------
>> *From:* Mark Harris <ma...@hivehome.com>
>> *Sent:* 30 January 2020 14:36
>> *To:* Piotr Nowojski <pi...@ververica.com>
>> *Cc:* Cliff Resnick <cr...@gmail.com>; David Magalhães <
>> speeddragon@gmail.com>; Till Rohrmann <tr...@apache.org>;
>> flink-user@apache.org <fl...@apache.org>; kkloudas <
>> kkloudas@apache.org>
>> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
>> hooks for S3a files
>>
>> Hi,
>>
>> Thanks for your help with this. 🙂
>>
>> The EMR cluster has 3 15GB VMs, and the flink cluster is started with:
>>
>> /usr/lib/flink/bin/yarn-session.sh -d -n 3 -tm 5760 -jm 5760 -s 3
>>
>> Usually the task runs for about 15 minutes before it restarts, usually
>> due to with an "java.lang.OutOfMemoryError: Java heap space" exception.
>>
>> The figures came from a MemoryAnalyzer session on a manual memory dump
>> from one of the taskmanagers. The total size of that heap was only 1.8gb.
>> In that heap, 1.7gb is taken up by the static field "files" in
>> DeleteOnExitHook, which is a linked hash set containing the 9 million
>> strings.
>>
>> A full example of one the path is
>> /tmp/hadoop-yarn/s3a/s3ablock-0001-6061210725685.tmp, at for 120 bytes per
>> char[] for a solid 1.2gb of chars. Then 200mb for their String wrappers and
>> another 361MB for LinkedHashMap$Entry objects. Despite valiantly holding
>> on to an array of 16777216 HashMap$Node elements, the LinkedHashMap can
>> only contribute another 20MB or so.
>> I goofed in not taking that 85% figure from MemoryAnalyzer - it tells
>> me DeleteOnExitHook is responsible for 96.98% of the heap dump.
>>
>> Looking at the files it managed to write before this started to happen
>> regularly, it looks like they're being written approximately every 3
>> minutes. I'll triple check our config, but I'm reasonably sure the job is
>> configured to checkpoint every 15 minutes - could something else be causing
>> it to write?
>>
>> This may all be a red herring - something else may be taking up the
>> taskmanagers memory which didn't make it into that heap dump. I plan to
>> repeat the analysis on a heapdump created
>> by  -XX:+HeapDumpOnOutOfMemoryError shortly.
>>
>> Best regards,
>>
>> Mark
>>
>> ------------------------------
>> *From:* Piotr Nowojski <pi...@ververica.com>
>> *Sent:* 30 January 2020 13:44
>> *To:* Mark Harris <ma...@hivehome.com>
>> *Cc:* Cliff Resnick <cr...@gmail.com>; David Magalhães <
>> speeddragon@gmail.com>; Till Rohrmann <tr...@apache.org>;
>> flink-user@apache.org <fl...@apache.org>; kkloudas <
>> kkloudas@apache.org>
>> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
>> hooks for S3a files
>>
>> Hi,
>>
>> What is your job setup? Size of the nodes, memory settings of the
>> Flink/JVM?
>>
>> 9 041 060 strings is awfully small number to bring down a whole cluster.
>> With each tmp string having ~30 bytes, that’s only 271MB. Is this really
>> 85% of the heap? And also, with parallelism of 6 and checkpoints every 15
>> minutes, 9 000 000 of leaked strings should happen only after one month
>>  assuming 500-600 total number of buckets. (Also assuming that there is a
>> separate file per each bucket).
>>
>> Piotrek
>>
>> On 30 Jan 2020, at 14:21, Mark Harris <ma...@hivehome.com> wrote:
>>
>> Trying a few different approaches to the fs.s3a.fast.upload settings has
>> bought me no joy - the taskmanagers end up simply crashing or complaining
>> of high GC load. Heap dumps suggest that this time they're clogged with
>> buffers instead, which makes sense.
>>
>> Our job has parallelism of 6 and checkpoints every 15 minutes - if
>> anything, we'd like to increase the frequency of that checkpoint duration.
>> I suspect this could be affected by the partition structure we were
>> bucketing to as well, and at any given moment we could be receiving data
>> for up to 280 buckets at once.
>> Could this be a factor?
>>
>> Best regards,
>>
>> Mark
>> ------------------------------
>> *From:* Piotr Nowojski <pi...@ververica.com>
>> *Sent:* 27 January 2020 16:16
>> *To:* Cliff Resnick <cr...@gmail.com>
>> *Cc:* David Magalhães <sp...@gmail.com>; Mark Harris <
>> mark.harris@hivehome.com>; Till Rohrmann <tr...@apache.org>;
>> flink-user@apache.org <fl...@apache.org>; kkloudas <
>> kkloudas@apache.org>
>> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
>> hooks for S3a files
>>
>> Hi,
>>
>> I think reducing the frequency of the checkpoints and decreasing
>> parallelism of the things using the S3AOutputStream class, would help to
>> mitigate the issue.
>>
>> I don’t know about other solutions. I would suggest to ask this question
>> directly to Steve L. in the bug ticket [1], as he is the one that fixed the
>> issue. If there is no workaround, maybe it would be possible to put a
>> pressure on the Hadoop guys to back port the fix to older versions?
>>
>> Piotrek
>>
>> [1] https://issues.apache.org/jira/browse/HADOOP-15658
>>
>> On 27 Jan 2020, at 15:41, Cliff Resnick <cr...@gmail.com> wrote:
>>
>> I know from experience that Flink's shaded S3A FileSystem does not
>> reference core-site.xml, though I don't remember offhand what file (s) it
>> does reference. However since it's shaded, maybe this could be fixed by
>> building a Flink FS referencing 3.3.0? Last I checked I think it referenced
>> 3.1.0.
>>
>> On Mon, Jan 27, 2020, 8:48 AM David Magalhães <sp...@gmail.com>
>> wrote:
>>
>> Does StreamingFileSink use core-site.xml ? When I was using it, it didn't
>> load any configurations from core-site.xml.
>>
>> On Mon, Jan 27, 2020 at 12:08 PM Mark Harris <ma...@hivehome.com>
>> wrote:
>>
>> Hi Piotr,
>>
>> Thanks for the link to the issue.
>>
>> Do you know if there's a workaround? I've tried setting the following in
>> my core-site.xml:
>>
>> ​fs.s3a.fast.upload.buffer=true
>>
>> To try and avoid writing the buffer files, but the taskmanager breaks
>> with the same problem.
>>
>> Best regards,
>>
>> Mark
>> ------------------------------
>> *From:* Piotr Nowojski <pi...@data-artisans.com> on behalf of Piotr
>> Nowojski <pi...@ververica.com>
>> *Sent:* 22 January 2020 13:29
>> *To:* Till Rohrmann <tr...@apache.org>
>> *Cc:* Mark Harris <ma...@hivehome.com>; flink-user@apache.org <
>> flink-user@apache.org>; kkloudas <kk...@apache.org>
>> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
>> hooks for S3a files
>>
>> Hi,
>>
>> This is probably a known issue of Hadoop [1]. Unfortunately it was only
>> fixed in 3.3.0.
>>
>> Piotrek
>>
>> [1] https://issues.apache.org/jira/browse/HADOOP-15658
>>
>> On 22 Jan 2020, at 13:56, Till Rohrmann <tr...@apache.org> wrote:
>>
>> Thanks for reporting this issue Mark. I'm pulling Klou into this
>> conversation who knows more about the StreamingFileSink. @Klou does the
>> StreamingFileSink relies on DeleteOnExitHooks to clean up files?
>>
>> Cheers,
>> Till
>>
>> On Tue, Jan 21, 2020 at 3:38 PM Mark Harris <ma...@hivehome.com>
>> wrote:
>>
>> Hi,
>>
>> We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop
>> v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail
>> (causing all the jobs running on them to fail) with an
>> "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager
>> (and jobs that should be running on it) remain down until manually
>> restarted.
>>
>> I managed to take and analyze a memory dump from one of the afflicted
>> taskmanagers.
>>
>> It showed that 85% of the heap was made up of
>> the java.io.DeleteOnExitHook.files hashset. The majority of the strings in
>> that hashset (9041060 out of ~9041100) pointed to files that began
>> /tmp/hadoop-yarn/s3a/s3ablock
>>
>> The problem seems to affect jobs that make use of the StreamingFileSink
>> - all of the taskmanager crashes have been on the taskmaster running at
>> least one job using this sink, and a cluster running only a single
>> taskmanager / job that uses the StreamingFileSink crashed with the GC
>> overhead limit exceeded error.
>>
>> I've had a look for advice on handling this error more broadly without
>> luck.
>>
>> Any suggestions or advice gratefully received.
>>
>> Best regards,
>>
>> Mark Harris
>>
>>
>>
>> The information contained in or attached to this email is intended only
>> for the use of the individual or entity to which it is addressed. If you
>> are not the intended recipient, or a person responsible for delivering it
>> to the intended recipient, you are not authorised to and must not disclose,
>> copy, distribute, or retain this message or any part of it. It may contain
>> information which is confidential and/or covered by legal professional or
>> other privilege under applicable law.
>>
>> The views expressed in this email are not necessarily the views of
>> Centrica plc or its subsidiaries, and the company, its directors, officers
>> or employees make no representation or accept any liability for its
>> accuracy or completeness unless expressly stated to the contrary.
>>
>> Additional regulatory disclosures may be found here:
>> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>>
>> PH Jones is a trading name of British Gas Social Housing Limited. British
>> Gas Social Housing Limited (company no: 01026007), British Gas Trading
>> Limited (company no: 03078711), British Gas Services Limited (company no:
>> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
>> New Heating Limited (company no: 06723244), British Gas Services
>> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
>> Limited (company no: 02877397) are all wholly owned subsidiaries of
>> Centrica plc (company no: 3033654). Each company is registered in England
>> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
>> Berkshire SL4 5GD.
>>
>> British Gas Insurance Limited is authorised by the Prudential Regulation
>> Authority and regulated by the Financial Conduct Authority and the
>> Prudential Regulation Authority. British Gas Services Limited and Centrica
>> Energy (Trading) Limited are authorised and regulated by the Financial
>> Conduct Authority. British Gas Trading Limited is an appointed
>> representative of British Gas Services Limited which is authorised and
>> regulated by the Financial Conduct Authority.
>>
>>
>>
>>
>> The information contained in or attached to this email is intended only
>> for the use of the individual or entity to which it is addressed. If you
>> are not the intended recipient, or a person responsible for delivering it
>> to the intended recipient, you are not authorised to and must not disclose,
>> copy, distribute, or retain this message or any part of it. It may contain
>> information which is confidential and/or covered by legal professional or
>> other privilege under applicable law.
>>
>> The views expressed in this email are not necessarily the views of
>> Centrica plc or its subsidiaries, and the company, its directors, officers
>> or employees make no representation or accept any liability for its
>> accuracy or completeness unless expressly stated to the contrary.
>>
>> Additional regulatory disclosures may be found here:
>> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>>
>> PH Jones is a trading name of British Gas Social Housing Limited. British
>> Gas Social Housing Limited (company no: 01026007), British Gas Trading
>> Limited (company no: 03078711), British Gas Services Limited (company no:
>> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
>> New Heating Limited (company no: 06723244), British Gas Services
>> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
>> Limited (company no: 02877397) are all wholly owned subsidiaries of
>> Centrica plc (company no: 3033654). Each company is registered in England
>> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
>> Berkshire SL4 5GD.
>>
>> British Gas Insurance Limited is authorised by the Prudential Regulation
>> Authority and regulated by the Financial Conduct Authority and the
>> Prudential Regulation Authority. British Gas Services Limited and Centrica
>> Energy (Trading) Limited are authorised and regulated by the Financial
>> Conduct Authority. British Gas Trading Limited is an appointed
>> representative of British Gas Services Limited which is authorised and
>> regulated by the Financial Conduct Authority.
>>
>>
>>
>>
>> The information contained in or attached to this email is intended only
>> for the use of the individual or entity to which it is addressed. If you
>> are not the intended recipient, or a person responsible for delivering it
>> to the intended recipient, you are not authorised to and must not disclose,
>> copy, distribute, or retain this message or any part of it. It may contain
>> information which is confidential and/or covered by legal professional or
>> other privilege under applicable law.
>>
>> The views expressed in this email are not necessarily the views of
>> Centrica plc or its subsidiaries, and the company, its directors, officers
>> or employees make no representation or accept any liability for its
>> accuracy or completeness unless expressly stated to the contrary.
>>
>> Additional regulatory disclosures may be found here:
>> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>>
>> PH Jones is a trading name of British Gas Social Housing Limited. British
>> Gas Social Housing Limited (company no: 01026007), British Gas Trading
>> Limited (company no: 03078711), British Gas Services Limited (company no:
>> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
>> New Heating Limited (company no: 06723244), British Gas Services
>> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
>> Limited (company no: 02877397) are all wholly owned subsidiaries of
>> Centrica plc (company no: 3033654). Each company is registered in England
>> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
>> Berkshire SL4 5GD.
>>
>> British Gas Insurance Limited is authorised by the Prudential Regulation
>> Authority and regulated by the Financial Conduct Authority and the
>> Prudential Regulation Authority. British Gas Services Limited and Centrica
>> Energy (Trading) Limited are authorised and regulated by the Financial
>> Conduct Authority. British Gas Trading Limited is an appointed
>> representative of British Gas Services Limited which is authorised and
>> regulated by the Financial Conduct Authority.
>>
>>
>>
>>
>> The information contained in or attached to this email is intended only
>> for the use of the individual or entity to which it is addressed. If you
>> are not the intended recipient, or a person responsible for delivering it
>> to the intended recipient, you are not authorised to and must not disclose,
>> copy, distribute, or retain this message or any part of it. It may contain
>> information which is confidential and/or covered by legal professional or
>> other privilege under applicable law.
>>
>> The views expressed in this email are not necessarily the views of
>> Centrica plc or its subsidiaries, and the company, its directors, officers
>> or employees make no representation or accept any liability for its
>> accuracy or completeness unless expressly stated to the contrary.
>>
>> Additional regulatory disclosures may be found here:
>> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>>
>> PH Jones is a trading name of British Gas Social Housing Limited. British
>> Gas Social Housing Limited (company no: 01026007), British Gas Trading
>> Limited (company no: 03078711), British Gas Services Limited (company no:
>> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
>> New Heating Limited (company no: 06723244), British Gas Services
>> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
>> Limited (company no: 02877397) are all wholly owned subsidiaries of
>> Centrica plc (company no: 3033654). Each company is registered in England
>> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
>> Berkshire SL4 5GD.
>>
>> British Gas Insurance Limited is authorised by the Prudential Regulation
>> Authority and regulated by the Financial Conduct Authority and the
>> Prudential Regulation Authority. British Gas Services Limited and Centrica
>> Energy (Trading) Limited are authorised and regulated by the Financial
>> Conduct Authority. British Gas Trading Limited is an appointed
>> representative of British Gas Services Limited which is authorised and
>> regulated by the Financial Conduct Authority.
>>
>>
>> The information contained in or attached to this email is intended only
>> for the use of the individual or entity to which it is addressed. If you
>> are not the intended recipient, or a person responsible for delivering it
>> to the intended recipient, you are not authorised to and must not disclose,
>> copy, distribute, or retain this message or any part of it. It may contain
>> information which is confidential and/or covered by legal professional or
>> other privilege under applicable law.
>>
>> The views expressed in this email are not necessarily the views of
>> Centrica plc or its subsidiaries, and the company, its directors, officers
>> or employees make no representation or accept any liability for its
>> accuracy or completeness unless expressly stated to the contrary.
>>
>> Additional regulatory disclosures may be found here:
>> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>>
>> PH Jones is a trading name of British Gas Social Housing Limited. British
>> Gas Social Housing Limited (company no: 01026007), British Gas Trading
>> Limited (company no: 03078711), British Gas Services Limited (company no:
>> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
>> New Heating Limited (company no: 06723244), British Gas Services
>> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
>> Limited (company no: 02877397) are all wholly owned subsidiaries of
>> Centrica plc (company no: 3033654). Each company is registered in England
>> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
>> Berkshire SL4 5GD.
>>
>> British Gas Insurance Limited is authorised by the Prudential Regulation
>> Authority and regulated by the Financial Conduct Authority and the
>> Prudential Regulation Authority. British Gas Services Limited and Centrica
>> Energy (Trading) Limited are authorised and regulated by the Financial
>> Conduct Authority. British Gas Trading Limited is an appointed
>> representative of British Gas Services Limited which is authorised and
>> regulated by the Financial Conduct Authority.
>>
>>
>>
>> The information contained in or attached to this email is intended only
>> for the use of the individual or entity to which it is addressed. If you
>> are not the intended recipient, or a person responsible for delivering it
>> to the intended recipient, you are not authorised to and must not disclose,
>> copy, distribute, or retain this message or any part of it. It may contain
>> information which is confidential and/or covered by legal professional or
>> other privilege under applicable law.
>>
>> The views expressed in this email are not necessarily the views of
>> Centrica plc or its subsidiaries, and the company, its directors, officers
>> or employees make no representation or accept any liability for its
>> accuracy or completeness unless expressly stated to the contrary.
>>
>> Additional regulatory disclosures may be found here:
>> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>>
>> PH Jones is a trading name of British Gas Social Housing Limited. British
>> Gas Social Housing Limited (company no: 01026007), British Gas Trading
>> Limited (company no: 03078711), British Gas Services Limited (company no:
>> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
>> New Heating Limited (company no: 06723244), British Gas Services
>> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
>> Limited (company no: 02877397) are all wholly owned subsidiaries of
>> Centrica plc (company no: 3033654). Each company is registered in England
>> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
>> Berkshire SL4 5GD.
>>
>> British Gas Insurance Limited is authorised by the Prudential Regulation
>> Authority and regulated by the Financial Conduct Authority and the
>> Prudential Regulation Authority. British Gas Services Limited and Centrica
>> Energy (Trading) Limited are authorised and regulated by the Financial
>> Conduct Authority. British Gas Trading Limited is an appointed
>> representative of British Gas Services Limited which is authorised and
>> regulated by the Financial Conduct Authority.
>>
>>
>>
>> The information contained in or attached to this email is intended only
>> for the use of the individual or entity to which it is addressed. If you
>> are not the intended recipient, or a person responsible for delivering it
>> to the intended recipient, you are not authorised to and must not disclose,
>> copy, distribute, or retain this message or any part of it. It may contain
>> information which is confidential and/or covered by legal professional or
>> other privilege under applicable law.
>>
>> The views expressed in this email are not necessarily the views of
>> Centrica plc or its subsidiaries, and the company, its directors, officers
>> or employees make no representation or accept any liability for its
>> accuracy or completeness unless expressly stated to the contrary.
>>
>> Additional regulatory disclosures may be found here:
>> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>>
>> PH Jones is a trading name of British Gas Social Housing Limited. British
>> Gas Social Housing Limited (company no: 01026007), British Gas Trading
>> Limited (company no: 03078711), British Gas Services Limited (company no:
>> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
>> New Heating Limited (company no: 06723244), British Gas Services
>> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
>> Limited (company no: 02877397) are all wholly owned subsidiaries of
>> Centrica plc (company no: 3033654). Each company is registered in England
>> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
>> Berkshire SL4 5GD.
>>
>> British Gas Insurance Limited is authorised by the Prudential Regulation
>> Authority and regulated by the Financial Conduct Authority and the
>> Prudential Regulation Authority. British Gas Services Limited and Centrica
>> Energy (Trading) Limited are authorised and regulated by the Financial
>> Conduct Authority. British Gas Trading Limited is an appointed
>> representative of British Gas Services Limited which is authorised and
>> regulated by the Financial Conduct Authority.
>>
>

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Posted by Kostas Kloudas <kk...@apache.org>.
Hi Mark,

Currently no, but if rolling on every checkpoint is ok with you, in future
versions it is easy to allow to roll on every checkpoint, but also on
inactivity intervals.

Cheers,
Kostas

On Mon, Feb 3, 2020 at 5:24 PM Mark Harris <ma...@hivehome.com> wrote:

> Hi Kostas,
>
> Thanks for your help here - I think we're OK with the increased heap size,
> but happy to explore other alternatives.
>
> I see the problem - we're currently using a BulkFormat, which doesn't seem
> to let us override the rolling policy. Is there an equivalent for the
> BulkFormat?
>
> Best regards,
>
> Mark
> ------------------------------
> *From:* Kostas Kloudas <kk...@apache.org>
> *Sent:* 03 February 2020 15:39
> *To:* Mark Harris <ma...@hivehome.com>
> *Cc:* Piotr Nowojski <pi...@ververica.com>; Cliff Resnick <
> cresny@gmail.com>; David Magalhães <sp...@gmail.com>; Till Rohrmann
> <tr...@apache.org>; flink-user@apache.org <fl...@apache.org>
> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
> hooks for S3a files
>
> Hi Mark,
>
> You can use something like the following and change the intervals
> accordingly:
>
> final StreamingFileSink<String> sink = StreamingFileSink
>                           .forRowFormat(new Path(outputPath), new
> SimpleStringEncoder<>("UTF-8"))
>                            .withRollingPolicy(
>                                    DefaultRollingPolicy.builder()
>                                                                       .
> withRolloverInterval(TimeUnit.MINUTES.toMillis(15))
>                                                                       .
> withInactivityInterval(TimeUnit.MINUTES.toMillis(5))
>                                                                       .
> withMaxPartSize(1024 * 1024 * 1024)
>                                                                       .
> build()
>                           ) .build();
>
> Let me know if this solves the problem.
>
> Cheers,
> Kostas
>
> On Mon, Feb 3, 2020 at 4:11 PM Mark Harris <ma...@hivehome.com>
> wrote:
>
> Hi Kostas,
>
> Sorry, stupid question: How do I set that for a StreamingFileSink?
>
> Best regards,
>
> Mark
> ------------------------------
> *From:* Kostas Kloudas <kk...@apache.org>
> *Sent:* 03 February 2020 14:58
> *To:* Mark Harris <ma...@hivehome.com>
> *Cc:* Piotr Nowojski <pi...@ververica.com>; Cliff Resnick <
> cresny@gmail.com>; David Magalhães <sp...@gmail.com>; Till Rohrmann
> <tr...@apache.org>; flink-user@apache.org <fl...@apache.org>
> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
> hooks for S3a files
>
> Hi Mark,
>
> Have you tried to set your rolling policy to close inactive part files
> after some time [1]?
> If the part files in the buckets are inactive and there are no new part
> files, then the state handle for those buckets will also be removed.
>
> Cheers,
> Kostas
>
>
> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/DefaultRollingPolicy.html
>
>
>
> On Mon, Feb 3, 2020 at 3:54 PM Mark Harris <ma...@hivehome.com>
> wrote:
>
> Hi all,
>
> The out-of-memory heap dump had the answer - the job was failing with an
> OutOfMemoryError because the activeBuckets members of 3 instances of
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets were
> filling a significant enough part of the memory of the taskmanager that no
> progress was being made. Increasing the memory available to the TM seems to
> have fixed the problem.
>
> I think the DeleteOnExit problem will mean it needs to be restarted every
> few weeks, but that's acceptable for now.
>
> Thanks again,
>
> Mark
> ------------------------------
> *From:* Mark Harris <ma...@hivehome.com>
> *Sent:* 30 January 2020 14:36
> *To:* Piotr Nowojski <pi...@ververica.com>
> *Cc:* Cliff Resnick <cr...@gmail.com>; David Magalhães <
> speeddragon@gmail.com>; Till Rohrmann <tr...@apache.org>;
> flink-user@apache.org <fl...@apache.org>; kkloudas <
> kkloudas@apache.org>
> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
> hooks for S3a files
>
> Hi,
>
> Thanks for your help with this. 🙂
>
> The EMR cluster has 3 15GB VMs, and the flink cluster is started with:
>
> /usr/lib/flink/bin/yarn-session.sh -d -n 3 -tm 5760 -jm 5760 -s 3
>
> Usually the task runs for about 15 minutes before it restarts, usually due
> to with an "java.lang.OutOfMemoryError: Java heap space" exception.
>
> The figures came from a MemoryAnalyzer session on a manual memory dump
> from one of the taskmanagers. The total size of that heap was only 1.8gb.
> In that heap, 1.7gb is taken up by the static field "files" in
> DeleteOnExitHook, which is a linked hash set containing the 9 million
> strings.
>
> A full example of one the path is
> /tmp/hadoop-yarn/s3a/s3ablock-0001-6061210725685.tmp, at for 120 bytes per
> char[] for a solid 1.2gb of chars. Then 200mb for their String wrappers and
> another 361MB for LinkedHashMap$Entry objects. Despite valiantly holding
> on to an array of 16777216 HashMap$Node elements, the LinkedHashMap can
> only contribute another 20MB or so.
> I goofed in not taking that 85% figure from MemoryAnalyzer - it tells
> me DeleteOnExitHook is responsible for 96.98% of the heap dump.
>
> Looking at the files it managed to write before this started to happen
> regularly, it looks like they're being written approximately every 3
> minutes. I'll triple check our config, but I'm reasonably sure the job is
> configured to checkpoint every 15 minutes - could something else be causing
> it to write?
>
> This may all be a red herring - something else may be taking up the
> taskmanagers memory which didn't make it into that heap dump. I plan to
> repeat the analysis on a heapdump created
> by  -XX:+HeapDumpOnOutOfMemoryError shortly.
>
> Best regards,
>
> Mark
>
> ------------------------------
> *From:* Piotr Nowojski <pi...@ververica.com>
> *Sent:* 30 January 2020 13:44
> *To:* Mark Harris <ma...@hivehome.com>
> *Cc:* Cliff Resnick <cr...@gmail.com>; David Magalhães <
> speeddragon@gmail.com>; Till Rohrmann <tr...@apache.org>;
> flink-user@apache.org <fl...@apache.org>; kkloudas <
> kkloudas@apache.org>
> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
> hooks for S3a files
>
> Hi,
>
> What is your job setup? Size of the nodes, memory settings of the
> Flink/JVM?
>
> 9 041 060 strings is awfully small number to bring down a whole cluster.
> With each tmp string having ~30 bytes, that’s only 271MB. Is this really
> 85% of the heap? And also, with parallelism of 6 and checkpoints every 15
> minutes, 9 000 000 of leaked strings should happen only after one month
>  assuming 500-600 total number of buckets. (Also assuming that there is a
> separate file per each bucket).
>
> Piotrek
>
> On 30 Jan 2020, at 14:21, Mark Harris <ma...@hivehome.com> wrote:
>
> Trying a few different approaches to the fs.s3a.fast.upload settings has
> bought me no joy - the taskmanagers end up simply crashing or complaining
> of high GC load. Heap dumps suggest that this time they're clogged with
> buffers instead, which makes sense.
>
> Our job has parallelism of 6 and checkpoints every 15 minutes - if
> anything, we'd like to increase the frequency of that checkpoint duration.
> I suspect this could be affected by the partition structure we were
> bucketing to as well, and at any given moment we could be receiving data
> for up to 280 buckets at once.
> Could this be a factor?
>
> Best regards,
>
> Mark
> ------------------------------
> *From:* Piotr Nowojski <pi...@ververica.com>
> *Sent:* 27 January 2020 16:16
> *To:* Cliff Resnick <cr...@gmail.com>
> *Cc:* David Magalhães <sp...@gmail.com>; Mark Harris <
> mark.harris@hivehome.com>; Till Rohrmann <tr...@apache.org>;
> flink-user@apache.org <fl...@apache.org>; kkloudas <
> kkloudas@apache.org>
> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
> hooks for S3a files
>
> Hi,
>
> I think reducing the frequency of the checkpoints and decreasing
> parallelism of the things using the S3AOutputStream class, would help to
> mitigate the issue.
>
> I don’t know about other solutions. I would suggest to ask this question
> directly to Steve L. in the bug ticket [1], as he is the one that fixed the
> issue. If there is no workaround, maybe it would be possible to put a
> pressure on the Hadoop guys to back port the fix to older versions?
>
> Piotrek
>
> [1] https://issues.apache.org/jira/browse/HADOOP-15658
>
> On 27 Jan 2020, at 15:41, Cliff Resnick <cr...@gmail.com> wrote:
>
> I know from experience that Flink's shaded S3A FileSystem does not
> reference core-site.xml, though I don't remember offhand what file (s) it
> does reference. However since it's shaded, maybe this could be fixed by
> building a Flink FS referencing 3.3.0? Last I checked I think it referenced
> 3.1.0.
>
> On Mon, Jan 27, 2020, 8:48 AM David Magalhães <sp...@gmail.com>
> wrote:
>
> Does StreamingFileSink use core-site.xml ? When I was using it, it didn't
> load any configurations from core-site.xml.
>
> On Mon, Jan 27, 2020 at 12:08 PM Mark Harris <ma...@hivehome.com>
> wrote:
>
> Hi Piotr,
>
> Thanks for the link to the issue.
>
> Do you know if there's a workaround? I've tried setting the following in
> my core-site.xml:
>
> ​fs.s3a.fast.upload.buffer=true
>
> To try and avoid writing the buffer files, but the taskmanager breaks with
> the same problem.
>
> Best regards,
>
> Mark
> ------------------------------
> *From:* Piotr Nowojski <pi...@data-artisans.com> on behalf of Piotr
> Nowojski <pi...@ververica.com>
> *Sent:* 22 January 2020 13:29
> *To:* Till Rohrmann <tr...@apache.org>
> *Cc:* Mark Harris <ma...@hivehome.com>; flink-user@apache.org <
> flink-user@apache.org>; kkloudas <kk...@apache.org>
> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
> hooks for S3a files
>
> Hi,
>
> This is probably a known issue of Hadoop [1]. Unfortunately it was only
> fixed in 3.3.0.
>
> Piotrek
>
> [1] https://issues.apache.org/jira/browse/HADOOP-15658
>
> On 22 Jan 2020, at 13:56, Till Rohrmann <tr...@apache.org> wrote:
>
> Thanks for reporting this issue Mark. I'm pulling Klou into this
> conversation who knows more about the StreamingFileSink. @Klou does the
> StreamingFileSink relies on DeleteOnExitHooks to clean up files?
>
> Cheers,
> Till
>
> On Tue, Jan 21, 2020 at 3:38 PM Mark Harris <ma...@hivehome.com>
> wrote:
>
> Hi,
>
> We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop
> v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail
> (causing all the jobs running on them to fail) with an
> "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager
> (and jobs that should be running on it) remain down until manually
> restarted.
>
> I managed to take and analyze a memory dump from one of the afflicted
> taskmanagers.
>
> It showed that 85% of the heap was made up of
> the java.io.DeleteOnExitHook.files hashset. The majority of the strings in
> that hashset (9041060 out of ~9041100) pointed to files that began
> /tmp/hadoop-yarn/s3a/s3ablock
>
> The problem seems to affect jobs that make use of the StreamingFileSink -
> all of the taskmanager crashes have been on the taskmaster running at least
> one job using this sink, and a cluster running only a single taskmanager /
> job that uses the StreamingFileSink crashed with the GC overhead limit
> exceeded error.
>
> I've had a look for advice on handling this error more broadly without
> luck.
>
> Any suggestions or advice gratefully received.
>
> Best regards,
>
> Mark Harris
>
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>
>
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>
>
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>
>
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Posted by Mark Harris <ma...@hivehome.com>.
Hi Kostas,

Thanks for your help here - I think we're OK with the increased heap size, but happy to explore other alternatives.

I see the problem - we're currently using a BulkFormat, which doesn't seem to let us override the rolling policy. Is there an equivalent for the BulkFormat?

Best regards,

Mark
________________________________
From: Kostas Kloudas <kk...@apache.org>
Sent: 03 February 2020 15:39
To: Mark Harris <ma...@hivehome.com>
Cc: Piotr Nowojski <pi...@ververica.com>; Cliff Resnick <cr...@gmail.com>; David Magalhães <sp...@gmail.com>; Till Rohrmann <tr...@apache.org>; flink-user@apache.org <fl...@apache.org>
Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Hi Mark,

You can use something like the following and change the intervals accordingly:

final StreamingFileSink<String> sink = StreamingFileSink
                          .forRowFormat(new Path(outputPath), new SimpleStringEncoder<>("UTF-8"))
                           .withRollingPolicy(
                                   DefaultRollingPolicy.builder()
                                                                      .withRolloverInterval(TimeUnit.MINUTES.toMillis(15))
                                                                      .withInactivityInterval(TimeUnit.MINUTES.toMillis(5))
                                                                      .withMaxPartSize(1024 * 1024 * 1024)
                                                                      .build()
                          ) .build();

Let me know if this solves the problem.

Cheers,
Kostas

On Mon, Feb 3, 2020 at 4:11 PM Mark Harris <ma...@hivehome.com>> wrote:
Hi Kostas,

Sorry, stupid question: How do I set that for a StreamingFileSink?

Best regards,

Mark
________________________________
From: Kostas Kloudas <kk...@apache.org>>
Sent: 03 February 2020 14:58
To: Mark Harris <ma...@hivehome.com>>
Cc: Piotr Nowojski <pi...@ververica.com>>; Cliff Resnick <cr...@gmail.com>>; David Magalhães <sp...@gmail.com>>; Till Rohrmann <tr...@apache.org>>; flink-user@apache.org<ma...@apache.org> <fl...@apache.org>>
Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Hi Mark,

Have you tried to set your rolling policy to close inactive part files after some time [1]?
If the part files in the buckets are inactive and there are no new part files, then the state handle for those buckets will also be removed.

Cheers,
Kostas

https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/DefaultRollingPolicy.html



On Mon, Feb 3, 2020 at 3:54 PM Mark Harris <ma...@hivehome.com>> wrote:
Hi all,

The out-of-memory heap dump had the answer - the job was failing with an OutOfMemoryError because the activeBuckets members of 3 instances of org.apache.flink.streaming.api.functions.sink.filesystem.Buckets were filling a significant enough part of the memory of the taskmanager that no progress was being made. Increasing the memory available to the TM seems to have fixed the problem.

I think the DeleteOnExit problem will mean it needs to be restarted every few weeks, but that's acceptable for now.

Thanks again,

Mark
________________________________
From: Mark Harris <ma...@hivehome.com>>
Sent: 30 January 2020 14:36
To: Piotr Nowojski <pi...@ververica.com>>
Cc: Cliff Resnick <cr...@gmail.com>>; David Magalhães <sp...@gmail.com>>; Till Rohrmann <tr...@apache.org>>; flink-user@apache.org<ma...@apache.org> <fl...@apache.org>>; kkloudas <kk...@apache.org>>
Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Hi,

Thanks for your help with this. 🙂

The EMR cluster has 3 15GB VMs, and the flink cluster is started with:

/usr/lib/flink/bin/yarn-session.sh -d -n 3 -tm 5760 -jm 5760 -s 3

Usually the task runs for about 15 minutes before it restarts, usually due to with an "java.lang.OutOfMemoryError: Java heap space" exception.

The figures came from a MemoryAnalyzer session on a manual memory dump from one of the taskmanagers. The total size of that heap was only 1.8gb.  In that heap, 1.7gb is taken up by the static field "files" in DeleteOnExitHook, which is a linked hash set containing the 9 million strings.

A full example of one the path is /tmp/hadoop-yarn/s3a/s3ablock-0001-6061210725685.tmp, at for 120 bytes per char[] for a solid 1.2gb of chars. Then 200mb for their String wrappers and another 361MB for LinkedHashMap$Entry objects. Despite valiantly holding on to an array of 16777216 HashMap$Node elements, the LinkedHashMap can only contribute another 20MB or so.
I goofed in not taking that 85% figure from MemoryAnalyzer - it tells me DeleteOnExitHook is responsible for 96.98% of the heap dump.

Looking at the files it managed to write before this started to happen regularly, it looks like they're being written approximately every 3 minutes. I'll triple check our config, but I'm reasonably sure the job is configured to checkpoint every 15 minutes - could something else be causing it to write?

This may all be a red herring - something else may be taking up the taskmanagers memory which didn't make it into that heap dump. I plan to repeat the analysis on a heapdump created by  -XX:+HeapDumpOnOutOfMemoryError shortly.

Best regards,

Mark

________________________________
From: Piotr Nowojski <pi...@ververica.com>>
Sent: 30 January 2020 13:44
To: Mark Harris <ma...@hivehome.com>>
Cc: Cliff Resnick <cr...@gmail.com>>; David Magalhães <sp...@gmail.com>>; Till Rohrmann <tr...@apache.org>>; flink-user@apache.org<ma...@apache.org> <fl...@apache.org>>; kkloudas <kk...@apache.org>>
Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Hi,

What is your job setup? Size of the nodes, memory settings of the Flink/JVM?

9 041 060 strings is awfully small number to bring down a whole cluster. With each tmp string having ~30 bytes, that’s only 271MB. Is this really 85% of the heap? And also, with parallelism of 6 and checkpoints every 15 minutes, 9 000 000 of leaked strings should happen only after one month  assuming 500-600 total number of buckets. (Also assuming that there is a separate file per each bucket).

Piotrek

On 30 Jan 2020, at 14:21, Mark Harris <ma...@hivehome.com>> wrote:

Trying a few different approaches to the fs.s3a.fast.upload settings has bought me no joy - the taskmanagers end up simply crashing or complaining of high GC load. Heap dumps suggest that this time they're clogged with buffers instead, which makes sense.

Our job has parallelism of 6 and checkpoints every 15 minutes - if anything, we'd like to increase the frequency of that checkpoint duration. I suspect this could be affected by the partition structure we were bucketing to as well, and at any given moment we could be receiving data for up to 280 buckets at once.
Could this be a factor?

Best regards,

Mark
________________________________
From: Piotr Nowojski <pi...@ververica.com>>
Sent: 27 January 2020 16:16
To: Cliff Resnick <cr...@gmail.com>>
Cc: David Magalhães <sp...@gmail.com>>; Mark Harris <ma...@hivehome.com>>; Till Rohrmann <tr...@apache.org>>; flink-user@apache.org<ma...@apache.org> <fl...@apache.org>>; kkloudas <kk...@apache.org>>
Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Hi,

I think reducing the frequency of the checkpoints and decreasing parallelism of the things using the S3AOutputStream class, would help to mitigate the issue.

I don’t know about other solutions. I would suggest to ask this question directly to Steve L. in the bug ticket [1], as he is the one that fixed the issue. If there is no workaround, maybe it would be possible to put a pressure on the Hadoop guys to back port the fix to older versions?

Piotrek

[1] https://issues.apache.org/jira/browse/HADOOP-15658

On 27 Jan 2020, at 15:41, Cliff Resnick <cr...@gmail.com>> wrote:

I know from experience that Flink's shaded S3A FileSystem does not reference core-site.xml, though I don't remember offhand what file (s) it does reference. However since it's shaded, maybe this could be fixed by building a Flink FS referencing 3.3.0? Last I checked I think it referenced 3.1.0.

On Mon, Jan 27, 2020, 8:48 AM David Magalhães <sp...@gmail.com>> wrote:
Does StreamingFileSink use core-site.xml ? When I was using it, it didn't load any configurations from core-site.xml.

On Mon, Jan 27, 2020 at 12:08 PM Mark Harris <ma...@hivehome.com>> wrote:
Hi Piotr,

Thanks for the link to the issue.

Do you know if there's a workaround? I've tried setting the following in my core-site.xml:

​fs.s3a.fast.upload.buffer=true

To try and avoid writing the buffer files, but the taskmanager breaks with the same problem.

Best regards,

Mark
________________________________
From: Piotr Nowojski <pi...@data-artisans.com>> on behalf of Piotr Nowojski <pi...@ververica.com>>
Sent: 22 January 2020 13:29
To: Till Rohrmann <tr...@apache.org>>
Cc: Mark Harris <ma...@hivehome.com>>; flink-user@apache.org<ma...@apache.org> <fl...@apache.org>>; kkloudas <kk...@apache.org>>
Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Hi,

This is probably a known issue of Hadoop [1]. Unfortunately it was only fixed in 3.3.0.

Piotrek

[1] https://issues.apache.org/jira/browse/HADOOP-15658

On 22 Jan 2020, at 13:56, Till Rohrmann <tr...@apache.org>> wrote:

Thanks for reporting this issue Mark. I'm pulling Klou into this conversation who knows more about the StreamingFileSink. @Klou does the StreamingFileSink relies on DeleteOnExitHooks to clean up files?

Cheers,
Till

On Tue, Jan 21, 2020 at 3:38 PM Mark Harris <ma...@hivehome.com>> wrote:
Hi,

We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail (causing all the jobs running on them to fail) with an "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager (and jobs that should be running on it) remain down until manually restarted.

I managed to take and analyze a memory dump from one of the afflicted taskmanagers.

It showed that 85% of the heap was made up of the java.io.DeleteOnExitHook.files hashset. The majority of the strings in that hashset (9041060 out of ~9041100) pointed to files that began /tmp/hadoop-yarn/s3a/s3ablock

The problem seems to affect jobs that make use of the StreamingFileSink - all of the taskmanager crashes have been on the taskmaster running at least one job using this sink, and a cluster running only a single taskmanager / job that uses the StreamingFileSink crashed with the GC overhead limit exceeded error.

I've had a look for advice on handling this error more broadly without luck.

Any suggestions or advice gratefully received.

Best regards,

Mark Harris



The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.



The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.



The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.



The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.


The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.


The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.


The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Posted by Kostas Kloudas <kk...@apache.org>.
Hi Mark,

You can use something like the following and change the intervals
accordingly:

final StreamingFileSink<String> sink = StreamingFileSink
                          .forRowFormat(new Path(outputPath), new
SimpleStringEncoder<>("UTF-8"))
                           .withRollingPolicy(
                                   DefaultRollingPolicy.builder()
                                                                      .
withRolloverInterval(TimeUnit.MINUTES.toMillis(15))
                                                                      .
withInactivityInterval(TimeUnit.MINUTES.toMillis(5))
                                                                      .
withMaxPartSize(1024 * 1024 * 1024)
                                                                      .build
()
                          ) .build();

Let me know if this solves the problem.

Cheers,
Kostas

On Mon, Feb 3, 2020 at 4:11 PM Mark Harris <ma...@hivehome.com> wrote:

> Hi Kostas,
>
> Sorry, stupid question: How do I set that for a StreamingFileSink?
>
> Best regards,
>
> Mark
> ------------------------------
> *From:* Kostas Kloudas <kk...@apache.org>
> *Sent:* 03 February 2020 14:58
> *To:* Mark Harris <ma...@hivehome.com>
> *Cc:* Piotr Nowojski <pi...@ververica.com>; Cliff Resnick <
> cresny@gmail.com>; David Magalhães <sp...@gmail.com>; Till Rohrmann
> <tr...@apache.org>; flink-user@apache.org <fl...@apache.org>
> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
> hooks for S3a files
>
> Hi Mark,
>
> Have you tried to set your rolling policy to close inactive part files
> after some time [1]?
> If the part files in the buckets are inactive and there are no new part
> files, then the state handle for those buckets will also be removed.
>
> Cheers,
> Kostas
>
>
> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/DefaultRollingPolicy.html
>
>
>
> On Mon, Feb 3, 2020 at 3:54 PM Mark Harris <ma...@hivehome.com>
> wrote:
>
> Hi all,
>
> The out-of-memory heap dump had the answer - the job was failing with an
> OutOfMemoryError because the activeBuckets members of 3 instances of
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets were
> filling a significant enough part of the memory of the taskmanager that no
> progress was being made. Increasing the memory available to the TM seems to
> have fixed the problem.
>
> I think the DeleteOnExit problem will mean it needs to be restarted every
> few weeks, but that's acceptable for now.
>
> Thanks again,
>
> Mark
> ------------------------------
> *From:* Mark Harris <ma...@hivehome.com>
> *Sent:* 30 January 2020 14:36
> *To:* Piotr Nowojski <pi...@ververica.com>
> *Cc:* Cliff Resnick <cr...@gmail.com>; David Magalhães <
> speeddragon@gmail.com>; Till Rohrmann <tr...@apache.org>;
> flink-user@apache.org <fl...@apache.org>; kkloudas <
> kkloudas@apache.org>
> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
> hooks for S3a files
>
> Hi,
>
> Thanks for your help with this. 🙂
>
> The EMR cluster has 3 15GB VMs, and the flink cluster is started with:
>
> /usr/lib/flink/bin/yarn-session.sh -d -n 3 -tm 5760 -jm 5760 -s 3
>
> Usually the task runs for about 15 minutes before it restarts, usually due
> to with an "java.lang.OutOfMemoryError: Java heap space" exception.
>
> The figures came from a MemoryAnalyzer session on a manual memory dump
> from one of the taskmanagers. The total size of that heap was only 1.8gb.
> In that heap, 1.7gb is taken up by the static field "files" in
> DeleteOnExitHook, which is a linked hash set containing the 9 million
> strings.
>
> A full example of one the path is
> /tmp/hadoop-yarn/s3a/s3ablock-0001-6061210725685.tmp, at for 120 bytes per
> char[] for a solid 1.2gb of chars. Then 200mb for their String wrappers and
> another 361MB for LinkedHashMap$Entry objects. Despite valiantly holding
> on to an array of 16777216 HashMap$Node elements, the LinkedHashMap can
> only contribute another 20MB or so.
> I goofed in not taking that 85% figure from MemoryAnalyzer - it tells
> me DeleteOnExitHook is responsible for 96.98% of the heap dump.
>
> Looking at the files it managed to write before this started to happen
> regularly, it looks like they're being written approximately every 3
> minutes. I'll triple check our config, but I'm reasonably sure the job is
> configured to checkpoint every 15 minutes - could something else be causing
> it to write?
>
> This may all be a red herring - something else may be taking up the
> taskmanagers memory which didn't make it into that heap dump. I plan to
> repeat the analysis on a heapdump created
> by  -XX:+HeapDumpOnOutOfMemoryError shortly.
>
> Best regards,
>
> Mark
>
> ------------------------------
> *From:* Piotr Nowojski <pi...@ververica.com>
> *Sent:* 30 January 2020 13:44
> *To:* Mark Harris <ma...@hivehome.com>
> *Cc:* Cliff Resnick <cr...@gmail.com>; David Magalhães <
> speeddragon@gmail.com>; Till Rohrmann <tr...@apache.org>;
> flink-user@apache.org <fl...@apache.org>; kkloudas <
> kkloudas@apache.org>
> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
> hooks for S3a files
>
> Hi,
>
> What is your job setup? Size of the nodes, memory settings of the
> Flink/JVM?
>
> 9 041 060 strings is awfully small number to bring down a whole cluster.
> With each tmp string having ~30 bytes, that’s only 271MB. Is this really
> 85% of the heap? And also, with parallelism of 6 and checkpoints every 15
> minutes, 9 000 000 of leaked strings should happen only after one month
>  assuming 500-600 total number of buckets. (Also assuming that there is a
> separate file per each bucket).
>
> Piotrek
>
> On 30 Jan 2020, at 14:21, Mark Harris <ma...@hivehome.com> wrote:
>
> Trying a few different approaches to the fs.s3a.fast.upload settings has
> bought me no joy - the taskmanagers end up simply crashing or complaining
> of high GC load. Heap dumps suggest that this time they're clogged with
> buffers instead, which makes sense.
>
> Our job has parallelism of 6 and checkpoints every 15 minutes - if
> anything, we'd like to increase the frequency of that checkpoint duration.
> I suspect this could be affected by the partition structure we were
> bucketing to as well, and at any given moment we could be receiving data
> for up to 280 buckets at once.
> Could this be a factor?
>
> Best regards,
>
> Mark
> ------------------------------
> *From:* Piotr Nowojski <pi...@ververica.com>
> *Sent:* 27 January 2020 16:16
> *To:* Cliff Resnick <cr...@gmail.com>
> *Cc:* David Magalhães <sp...@gmail.com>; Mark Harris <
> mark.harris@hivehome.com>; Till Rohrmann <tr...@apache.org>;
> flink-user@apache.org <fl...@apache.org>; kkloudas <
> kkloudas@apache.org>
> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
> hooks for S3a files
>
> Hi,
>
> I think reducing the frequency of the checkpoints and decreasing
> parallelism of the things using the S3AOutputStream class, would help to
> mitigate the issue.
>
> I don’t know about other solutions. I would suggest to ask this question
> directly to Steve L. in the bug ticket [1], as he is the one that fixed the
> issue. If there is no workaround, maybe it would be possible to put a
> pressure on the Hadoop guys to back port the fix to older versions?
>
> Piotrek
>
> [1] https://issues.apache.org/jira/browse/HADOOP-15658
>
> On 27 Jan 2020, at 15:41, Cliff Resnick <cr...@gmail.com> wrote:
>
> I know from experience that Flink's shaded S3A FileSystem does not
> reference core-site.xml, though I don't remember offhand what file (s) it
> does reference. However since it's shaded, maybe this could be fixed by
> building a Flink FS referencing 3.3.0? Last I checked I think it referenced
> 3.1.0.
>
> On Mon, Jan 27, 2020, 8:48 AM David Magalhães <sp...@gmail.com>
> wrote:
>
> Does StreamingFileSink use core-site.xml ? When I was using it, it didn't
> load any configurations from core-site.xml.
>
> On Mon, Jan 27, 2020 at 12:08 PM Mark Harris <ma...@hivehome.com>
> wrote:
>
> Hi Piotr,
>
> Thanks for the link to the issue.
>
> Do you know if there's a workaround? I've tried setting the following in
> my core-site.xml:
>
> ​fs.s3a.fast.upload.buffer=true
>
> To try and avoid writing the buffer files, but the taskmanager breaks with
> the same problem.
>
> Best regards,
>
> Mark
> ------------------------------
> *From:* Piotr Nowojski <pi...@data-artisans.com> on behalf of Piotr
> Nowojski <pi...@ververica.com>
> *Sent:* 22 January 2020 13:29
> *To:* Till Rohrmann <tr...@apache.org>
> *Cc:* Mark Harris <ma...@hivehome.com>; flink-user@apache.org <
> flink-user@apache.org>; kkloudas <kk...@apache.org>
> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
> hooks for S3a files
>
> Hi,
>
> This is probably a known issue of Hadoop [1]. Unfortunately it was only
> fixed in 3.3.0.
>
> Piotrek
>
> [1] https://issues.apache.org/jira/browse/HADOOP-15658
>
> On 22 Jan 2020, at 13:56, Till Rohrmann <tr...@apache.org> wrote:
>
> Thanks for reporting this issue Mark. I'm pulling Klou into this
> conversation who knows more about the StreamingFileSink. @Klou does the
> StreamingFileSink relies on DeleteOnExitHooks to clean up files?
>
> Cheers,
> Till
>
> On Tue, Jan 21, 2020 at 3:38 PM Mark Harris <ma...@hivehome.com>
> wrote:
>
> Hi,
>
> We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop
> v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail
> (causing all the jobs running on them to fail) with an
> "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager
> (and jobs that should be running on it) remain down until manually
> restarted.
>
> I managed to take and analyze a memory dump from one of the afflicted
> taskmanagers.
>
> It showed that 85% of the heap was made up of
> the java.io.DeleteOnExitHook.files hashset. The majority of the strings in
> that hashset (9041060 out of ~9041100) pointed to files that began
> /tmp/hadoop-yarn/s3a/s3ablock
>
> The problem seems to affect jobs that make use of the StreamingFileSink -
> all of the taskmanager crashes have been on the taskmaster running at least
> one job using this sink, and a cluster running only a single taskmanager /
> job that uses the StreamingFileSink crashed with the GC overhead limit
> exceeded error.
>
> I've had a look for advice on handling this error more broadly without
> luck.
>
> Any suggestions or advice gratefully received.
>
> Best regards,
>
> Mark Harris
>
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>
>
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>
>
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>
>
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Posted by Mark Harris <ma...@hivehome.com>.
Hi Kostas,

Sorry, stupid question: How do I set that for a StreamingFileSink?

Best regards,

Mark
________________________________
From: Kostas Kloudas <kk...@apache.org>
Sent: 03 February 2020 14:58
To: Mark Harris <ma...@hivehome.com>
Cc: Piotr Nowojski <pi...@ververica.com>; Cliff Resnick <cr...@gmail.com>; David Magalhães <sp...@gmail.com>; Till Rohrmann <tr...@apache.org>; flink-user@apache.org <fl...@apache.org>
Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Hi Mark,

Have you tried to set your rolling policy to close inactive part files after some time [1]?
If the part files in the buckets are inactive and there are no new part files, then the state handle for those buckets will also be removed.

Cheers,
Kostas

https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/DefaultRollingPolicy.html



On Mon, Feb 3, 2020 at 3:54 PM Mark Harris <ma...@hivehome.com>> wrote:
Hi all,

The out-of-memory heap dump had the answer - the job was failing with an OutOfMemoryError because the activeBuckets members of 3 instances of org.apache.flink.streaming.api.functions.sink.filesystem.Buckets were filling a significant enough part of the memory of the taskmanager that no progress was being made. Increasing the memory available to the TM seems to have fixed the problem.

I think the DeleteOnExit problem will mean it needs to be restarted every few weeks, but that's acceptable for now.

Thanks again,

Mark
________________________________
From: Mark Harris <ma...@hivehome.com>>
Sent: 30 January 2020 14:36
To: Piotr Nowojski <pi...@ververica.com>>
Cc: Cliff Resnick <cr...@gmail.com>>; David Magalhães <sp...@gmail.com>>; Till Rohrmann <tr...@apache.org>>; flink-user@apache.org<ma...@apache.org> <fl...@apache.org>>; kkloudas <kk...@apache.org>>
Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Hi,

Thanks for your help with this. 🙂

The EMR cluster has 3 15GB VMs, and the flink cluster is started with:

/usr/lib/flink/bin/yarn-session.sh -d -n 3 -tm 5760 -jm 5760 -s 3

Usually the task runs for about 15 minutes before it restarts, usually due to with an "java.lang.OutOfMemoryError: Java heap space" exception.

The figures came from a MemoryAnalyzer session on a manual memory dump from one of the taskmanagers. The total size of that heap was only 1.8gb.  In that heap, 1.7gb is taken up by the static field "files" in DeleteOnExitHook, which is a linked hash set containing the 9 million strings.

A full example of one the path is /tmp/hadoop-yarn/s3a/s3ablock-0001-6061210725685.tmp, at for 120 bytes per char[] for a solid 1.2gb of chars. Then 200mb for their String wrappers and another 361MB for LinkedHashMap$Entry objects. Despite valiantly holding on to an array of 16777216 HashMap$Node elements, the LinkedHashMap can only contribute another 20MB or so.
I goofed in not taking that 85% figure from MemoryAnalyzer - it tells me DeleteOnExitHook is responsible for 96.98% of the heap dump.

Looking at the files it managed to write before this started to happen regularly, it looks like they're being written approximately every 3 minutes. I'll triple check our config, but I'm reasonably sure the job is configured to checkpoint every 15 minutes - could something else be causing it to write?

This may all be a red herring - something else may be taking up the taskmanagers memory which didn't make it into that heap dump. I plan to repeat the analysis on a heapdump created by  -XX:+HeapDumpOnOutOfMemoryError shortly.

Best regards,

Mark

________________________________
From: Piotr Nowojski <pi...@ververica.com>>
Sent: 30 January 2020 13:44
To: Mark Harris <ma...@hivehome.com>>
Cc: Cliff Resnick <cr...@gmail.com>>; David Magalhães <sp...@gmail.com>>; Till Rohrmann <tr...@apache.org>>; flink-user@apache.org<ma...@apache.org> <fl...@apache.org>>; kkloudas <kk...@apache.org>>
Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Hi,

What is your job setup? Size of the nodes, memory settings of the Flink/JVM?

9 041 060 strings is awfully small number to bring down a whole cluster. With each tmp string having ~30 bytes, that’s only 271MB. Is this really 85% of the heap? And also, with parallelism of 6 and checkpoints every 15 minutes, 9 000 000 of leaked strings should happen only after one month  assuming 500-600 total number of buckets. (Also assuming that there is a separate file per each bucket).

Piotrek

On 30 Jan 2020, at 14:21, Mark Harris <ma...@hivehome.com>> wrote:

Trying a few different approaches to the fs.s3a.fast.upload settings has bought me no joy - the taskmanagers end up simply crashing or complaining of high GC load. Heap dumps suggest that this time they're clogged with buffers instead, which makes sense.

Our job has parallelism of 6 and checkpoints every 15 minutes - if anything, we'd like to increase the frequency of that checkpoint duration. I suspect this could be affected by the partition structure we were bucketing to as well, and at any given moment we could be receiving data for up to 280 buckets at once.
Could this be a factor?

Best regards,

Mark
________________________________
From: Piotr Nowojski <pi...@ververica.com>>
Sent: 27 January 2020 16:16
To: Cliff Resnick <cr...@gmail.com>>
Cc: David Magalhães <sp...@gmail.com>>; Mark Harris <ma...@hivehome.com>>; Till Rohrmann <tr...@apache.org>>; flink-user@apache.org<ma...@apache.org> <fl...@apache.org>>; kkloudas <kk...@apache.org>>
Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Hi,

I think reducing the frequency of the checkpoints and decreasing parallelism of the things using the S3AOutputStream class, would help to mitigate the issue.

I don’t know about other solutions. I would suggest to ask this question directly to Steve L. in the bug ticket [1], as he is the one that fixed the issue. If there is no workaround, maybe it would be possible to put a pressure on the Hadoop guys to back port the fix to older versions?

Piotrek

[1] https://issues.apache.org/jira/browse/HADOOP-15658

On 27 Jan 2020, at 15:41, Cliff Resnick <cr...@gmail.com>> wrote:

I know from experience that Flink's shaded S3A FileSystem does not reference core-site.xml, though I don't remember offhand what file (s) it does reference. However since it's shaded, maybe this could be fixed by building a Flink FS referencing 3.3.0? Last I checked I think it referenced 3.1.0.

On Mon, Jan 27, 2020, 8:48 AM David Magalhães <sp...@gmail.com>> wrote:
Does StreamingFileSink use core-site.xml ? When I was using it, it didn't load any configurations from core-site.xml.

On Mon, Jan 27, 2020 at 12:08 PM Mark Harris <ma...@hivehome.com>> wrote:
Hi Piotr,

Thanks for the link to the issue.

Do you know if there's a workaround? I've tried setting the following in my core-site.xml:

​fs.s3a.fast.upload.buffer=true

To try and avoid writing the buffer files, but the taskmanager breaks with the same problem.

Best regards,

Mark
________________________________
From: Piotr Nowojski <pi...@data-artisans.com>> on behalf of Piotr Nowojski <pi...@ververica.com>>
Sent: 22 January 2020 13:29
To: Till Rohrmann <tr...@apache.org>>
Cc: Mark Harris <ma...@hivehome.com>>; flink-user@apache.org<ma...@apache.org> <fl...@apache.org>>; kkloudas <kk...@apache.org>>
Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Hi,

This is probably a known issue of Hadoop [1]. Unfortunately it was only fixed in 3.3.0.

Piotrek

[1] https://issues.apache.org/jira/browse/HADOOP-15658

On 22 Jan 2020, at 13:56, Till Rohrmann <tr...@apache.org>> wrote:

Thanks for reporting this issue Mark. I'm pulling Klou into this conversation who knows more about the StreamingFileSink. @Klou does the StreamingFileSink relies on DeleteOnExitHooks to clean up files?

Cheers,
Till

On Tue, Jan 21, 2020 at 3:38 PM Mark Harris <ma...@hivehome.com>> wrote:
Hi,

We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail (causing all the jobs running on them to fail) with an "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager (and jobs that should be running on it) remain down until manually restarted.

I managed to take and analyze a memory dump from one of the afflicted taskmanagers.

It showed that 85% of the heap was made up of the java.io.DeleteOnExitHook.files hashset. The majority of the strings in that hashset (9041060 out of ~9041100) pointed to files that began /tmp/hadoop-yarn/s3a/s3ablock

The problem seems to affect jobs that make use of the StreamingFileSink - all of the taskmanager crashes have been on the taskmaster running at least one job using this sink, and a cluster running only a single taskmanager / job that uses the StreamingFileSink crashed with the GC overhead limit exceeded error.

I've had a look for advice on handling this error more broadly without luck.

Any suggestions or advice gratefully received.

Best regards,

Mark Harris



The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.



The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.



The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.



The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.


The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.


The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Posted by Kostas Kloudas <kk...@apache.org>.
Hi Mark,

Have you tried to set your rolling policy to close inactive part files
after some time [1]?
If the part files in the buckets are inactive and there are no new part
files, then the state handle for those buckets will also be removed.

Cheers,
Kostas

https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/DefaultRollingPolicy.html



On Mon, Feb 3, 2020 at 3:54 PM Mark Harris <ma...@hivehome.com> wrote:

> Hi all,
>
> The out-of-memory heap dump had the answer - the job was failing with an
> OutOfMemoryError because the activeBuckets members of 3 instances of
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets were
> filling a significant enough part of the memory of the taskmanager that no
> progress was being made. Increasing the memory available to the TM seems to
> have fixed the problem.
>
> I think the DeleteOnExit problem will mean it needs to be restarted every
> few weeks, but that's acceptable for now.
>
> Thanks again,
>
> Mark
> ------------------------------
> *From:* Mark Harris <ma...@hivehome.com>
> *Sent:* 30 January 2020 14:36
> *To:* Piotr Nowojski <pi...@ververica.com>
> *Cc:* Cliff Resnick <cr...@gmail.com>; David Magalhães <
> speeddragon@gmail.com>; Till Rohrmann <tr...@apache.org>;
> flink-user@apache.org <fl...@apache.org>; kkloudas <
> kkloudas@apache.org>
> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
> hooks for S3a files
>
> Hi,
>
> Thanks for your help with this. 🙂
>
> The EMR cluster has 3 15GB VMs, and the flink cluster is started with:
>
> /usr/lib/flink/bin/yarn-session.sh -d -n 3 -tm 5760 -jm 5760 -s 3
>
> Usually the task runs for about 15 minutes before it restarts, usually due
> to with an "java.lang.OutOfMemoryError: Java heap space" exception.
>
> The figures came from a MemoryAnalyzer session on a manual memory dump
> from one of the taskmanagers. The total size of that heap was only 1.8gb.
> In that heap, 1.7gb is taken up by the static field "files" in
> DeleteOnExitHook, which is a linked hash set containing the 9 million
> strings.
>
> A full example of one the path is
> /tmp/hadoop-yarn/s3a/s3ablock-0001-6061210725685.tmp, at for 120 bytes per
> char[] for a solid 1.2gb of chars. Then 200mb for their String wrappers and
> another 361MB for LinkedHashMap$Entry objects. Despite valiantly holding
> on to an array of 16777216 HashMap$Node elements, the LinkedHashMap can
> only contribute another 20MB or so.
> I goofed in not taking that 85% figure from MemoryAnalyzer - it tells
> me DeleteOnExitHook is responsible for 96.98% of the heap dump.
>
> Looking at the files it managed to write before this started to happen
> regularly, it looks like they're being written approximately every 3
> minutes. I'll triple check our config, but I'm reasonably sure the job is
> configured to checkpoint every 15 minutes - could something else be causing
> it to write?
>
> This may all be a red herring - something else may be taking up the
> taskmanagers memory which didn't make it into that heap dump. I plan to
> repeat the analysis on a heapdump created
> by  -XX:+HeapDumpOnOutOfMemoryError shortly.
>
> Best regards,
>
> Mark
>
> ------------------------------
> *From:* Piotr Nowojski <pi...@ververica.com>
> *Sent:* 30 January 2020 13:44
> *To:* Mark Harris <ma...@hivehome.com>
> *Cc:* Cliff Resnick <cr...@gmail.com>; David Magalhães <
> speeddragon@gmail.com>; Till Rohrmann <tr...@apache.org>;
> flink-user@apache.org <fl...@apache.org>; kkloudas <
> kkloudas@apache.org>
> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
> hooks for S3a files
>
> Hi,
>
> What is your job setup? Size of the nodes, memory settings of the
> Flink/JVM?
>
> 9 041 060 strings is awfully small number to bring down a whole cluster.
> With each tmp string having ~30 bytes, that’s only 271MB. Is this really
> 85% of the heap? And also, with parallelism of 6 and checkpoints every 15
> minutes, 9 000 000 of leaked strings should happen only after one month
>  assuming 500-600 total number of buckets. (Also assuming that there is a
> separate file per each bucket).
>
> Piotrek
>
> On 30 Jan 2020, at 14:21, Mark Harris <ma...@hivehome.com> wrote:
>
> Trying a few different approaches to the fs.s3a.fast.upload settings has
> bought me no joy - the taskmanagers end up simply crashing or complaining
> of high GC load. Heap dumps suggest that this time they're clogged with
> buffers instead, which makes sense.
>
> Our job has parallelism of 6 and checkpoints every 15 minutes - if
> anything, we'd like to increase the frequency of that checkpoint duration.
> I suspect this could be affected by the partition structure we were
> bucketing to as well, and at any given moment we could be receiving data
> for up to 280 buckets at once.
> Could this be a factor?
>
> Best regards,
>
> Mark
> ------------------------------
> *From:* Piotr Nowojski <pi...@ververica.com>
> *Sent:* 27 January 2020 16:16
> *To:* Cliff Resnick <cr...@gmail.com>
> *Cc:* David Magalhães <sp...@gmail.com>; Mark Harris <
> mark.harris@hivehome.com>; Till Rohrmann <tr...@apache.org>;
> flink-user@apache.org <fl...@apache.org>; kkloudas <
> kkloudas@apache.org>
> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
> hooks for S3a files
>
> Hi,
>
> I think reducing the frequency of the checkpoints and decreasing
> parallelism of the things using the S3AOutputStream class, would help to
> mitigate the issue.
>
> I don’t know about other solutions. I would suggest to ask this question
> directly to Steve L. in the bug ticket [1], as he is the one that fixed the
> issue. If there is no workaround, maybe it would be possible to put a
> pressure on the Hadoop guys to back port the fix to older versions?
>
> Piotrek
>
> [1] https://issues.apache.org/jira/browse/HADOOP-15658
>
> On 27 Jan 2020, at 15:41, Cliff Resnick <cr...@gmail.com> wrote:
>
> I know from experience that Flink's shaded S3A FileSystem does not
> reference core-site.xml, though I don't remember offhand what file (s) it
> does reference. However since it's shaded, maybe this could be fixed by
> building a Flink FS referencing 3.3.0? Last I checked I think it referenced
> 3.1.0.
>
> On Mon, Jan 27, 2020, 8:48 AM David Magalhães <sp...@gmail.com>
> wrote:
>
> Does StreamingFileSink use core-site.xml ? When I was using it, it didn't
> load any configurations from core-site.xml.
>
> On Mon, Jan 27, 2020 at 12:08 PM Mark Harris <ma...@hivehome.com>
> wrote:
>
> Hi Piotr,
>
> Thanks for the link to the issue.
>
> Do you know if there's a workaround? I've tried setting the following in
> my core-site.xml:
>
> ​fs.s3a.fast.upload.buffer=true
>
> To try and avoid writing the buffer files, but the taskmanager breaks with
> the same problem.
>
> Best regards,
>
> Mark
> ------------------------------
> *From:* Piotr Nowojski <pi...@data-artisans.com> on behalf of Piotr
> Nowojski <pi...@ververica.com>
> *Sent:* 22 January 2020 13:29
> *To:* Till Rohrmann <tr...@apache.org>
> *Cc:* Mark Harris <ma...@hivehome.com>; flink-user@apache.org <
> flink-user@apache.org>; kkloudas <kk...@apache.org>
> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
> hooks for S3a files
>
> Hi,
>
> This is probably a known issue of Hadoop [1]. Unfortunately it was only
> fixed in 3.3.0.
>
> Piotrek
>
> [1] https://issues.apache.org/jira/browse/HADOOP-15658
>
> On 22 Jan 2020, at 13:56, Till Rohrmann <tr...@apache.org> wrote:
>
> Thanks for reporting this issue Mark. I'm pulling Klou into this
> conversation who knows more about the StreamingFileSink. @Klou does the
> StreamingFileSink relies on DeleteOnExitHooks to clean up files?
>
> Cheers,
> Till
>
> On Tue, Jan 21, 2020 at 3:38 PM Mark Harris <ma...@hivehome.com>
> wrote:
>
> Hi,
>
> We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop
> v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail
> (causing all the jobs running on them to fail) with an
> "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager
> (and jobs that should be running on it) remain down until manually
> restarted.
>
> I managed to take and analyze a memory dump from one of the afflicted
> taskmanagers.
>
> It showed that 85% of the heap was made up of
> the java.io.DeleteOnExitHook.files hashset. The majority of the strings in
> that hashset (9041060 out of ~9041100) pointed to files that began
> /tmp/hadoop-yarn/s3a/s3ablock
>
> The problem seems to affect jobs that make use of the StreamingFileSink -
> all of the taskmanager crashes have been on the taskmaster running at least
> one job using this sink, and a cluster running only a single taskmanager /
> job that uses the StreamingFileSink crashed with the GC overhead limit
> exceeded error.
>
> I've had a look for advice on handling this error more broadly without
> luck.
>
> Any suggestions or advice gratefully received.
>
> Best regards,
>
> Mark Harris
>
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>
>
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>
>
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>
>
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Posted by Mark Harris <ma...@hivehome.com>.
Hi all,

The out-of-memory heap dump had the answer - the job was failing with an OutOfMemoryError because the activeBuckets members of 3 instances of org.apache.flink.streaming.api.functions.sink.filesystem.Buckets were filling a significant enough part of the memory of the taskmanager that no progress was being made. Increasing the memory available to the TM seems to have fixed the problem.

I think the DeleteOnExit problem will mean it needs to be restarted every few weeks, but that's acceptable for now.

Thanks again,

Mark
________________________________
From: Mark Harris <ma...@hivehome.com>
Sent: 30 January 2020 14:36
To: Piotr Nowojski <pi...@ververica.com>
Cc: Cliff Resnick <cr...@gmail.com>; David Magalhães <sp...@gmail.com>; Till Rohrmann <tr...@apache.org>; flink-user@apache.org <fl...@apache.org>; kkloudas <kk...@apache.org>
Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Hi,

Thanks for your help with this. 🙂

The EMR cluster has 3 15GB VMs, and the flink cluster is started with:

/usr/lib/flink/bin/yarn-session.sh -d -n 3 -tm 5760 -jm 5760 -s 3

Usually the task runs for about 15 minutes before it restarts, usually due to with an "java.lang.OutOfMemoryError: Java heap space" exception.

The figures came from a MemoryAnalyzer session on a manual memory dump from one of the taskmanagers. The total size of that heap was only 1.8gb.  In that heap, 1.7gb is taken up by the static field "files" in DeleteOnExitHook, which is a linked hash set containing the 9 million strings.

A full example of one the path is /tmp/hadoop-yarn/s3a/s3ablock-0001-6061210725685.tmp, at for 120 bytes per char[] for a solid 1.2gb of chars. Then 200mb for their String wrappers and another 361MB for LinkedHashMap$Entry objects. Despite valiantly holding on to an array of 16777216 HashMap$Node elements, the LinkedHashMap can only contribute another 20MB or so.
I goofed in not taking that 85% figure from MemoryAnalyzer - it tells me DeleteOnExitHook is responsible for 96.98% of the heap dump.

Looking at the files it managed to write before this started to happen regularly, it looks like they're being written approximately every 3 minutes. I'll triple check our config, but I'm reasonably sure the job is configured to checkpoint every 15 minutes - could something else be causing it to write?

This may all be a red herring - something else may be taking up the taskmanagers memory which didn't make it into that heap dump. I plan to repeat the analysis on a heapdump created by  -XX:+HeapDumpOnOutOfMemoryError shortly.

Best regards,

Mark

________________________________
From: Piotr Nowojski <pi...@ververica.com>
Sent: 30 January 2020 13:44
To: Mark Harris <ma...@hivehome.com>
Cc: Cliff Resnick <cr...@gmail.com>; David Magalhães <sp...@gmail.com>; Till Rohrmann <tr...@apache.org>; flink-user@apache.org <fl...@apache.org>; kkloudas <kk...@apache.org>
Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Hi,

What is your job setup? Size of the nodes, memory settings of the Flink/JVM?

9 041 060 strings is awfully small number to bring down a whole cluster. With each tmp string having ~30 bytes, that’s only 271MB. Is this really 85% of the heap? And also, with parallelism of 6 and checkpoints every 15 minutes, 9 000 000 of leaked strings should happen only after one month  assuming 500-600 total number of buckets. (Also assuming that there is a separate file per each bucket).

Piotrek

On 30 Jan 2020, at 14:21, Mark Harris <ma...@hivehome.com>> wrote:

Trying a few different approaches to the fs.s3a.fast.upload settings has bought me no joy - the taskmanagers end up simply crashing or complaining of high GC load. Heap dumps suggest that this time they're clogged with buffers instead, which makes sense.

Our job has parallelism of 6 and checkpoints every 15 minutes - if anything, we'd like to increase the frequency of that checkpoint duration. I suspect this could be affected by the partition structure we were bucketing to as well, and at any given moment we could be receiving data for up to 280 buckets at once.
Could this be a factor?

Best regards,

Mark
________________________________
From: Piotr Nowojski <pi...@ververica.com>>
Sent: 27 January 2020 16:16
To: Cliff Resnick <cr...@gmail.com>>
Cc: David Magalhães <sp...@gmail.com>>; Mark Harris <ma...@hivehome.com>>; Till Rohrmann <tr...@apache.org>>; flink-user@apache.org<ma...@apache.org> <fl...@apache.org>>; kkloudas <kk...@apache.org>>
Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Hi,

I think reducing the frequency of the checkpoints and decreasing parallelism of the things using the S3AOutputStream class, would help to mitigate the issue.

I don’t know about other solutions. I would suggest to ask this question directly to Steve L. in the bug ticket [1], as he is the one that fixed the issue. If there is no workaround, maybe it would be possible to put a pressure on the Hadoop guys to back port the fix to older versions?

Piotrek

[1] https://issues.apache.org/jira/browse/HADOOP-15658

On 27 Jan 2020, at 15:41, Cliff Resnick <cr...@gmail.com>> wrote:

I know from experience that Flink's shaded S3A FileSystem does not reference core-site.xml, though I don't remember offhand what file (s) it does reference. However since it's shaded, maybe this could be fixed by building a Flink FS referencing 3.3.0? Last I checked I think it referenced 3.1.0.

On Mon, Jan 27, 2020, 8:48 AM David Magalhães <sp...@gmail.com>> wrote:
Does StreamingFileSink use core-site.xml ? When I was using it, it didn't load any configurations from core-site.xml.

On Mon, Jan 27, 2020 at 12:08 PM Mark Harris <ma...@hivehome.com>> wrote:
Hi Piotr,

Thanks for the link to the issue.

Do you know if there's a workaround? I've tried setting the following in my core-site.xml:

​fs.s3a.fast.upload.buffer=true

To try and avoid writing the buffer files, but the taskmanager breaks with the same problem.

Best regards,

Mark
________________________________
From: Piotr Nowojski <pi...@data-artisans.com>> on behalf of Piotr Nowojski <pi...@ververica.com>>
Sent: 22 January 2020 13:29
To: Till Rohrmann <tr...@apache.org>>
Cc: Mark Harris <ma...@hivehome.com>>; flink-user@apache.org<ma...@apache.org> <fl...@apache.org>>; kkloudas <kk...@apache.org>>
Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Hi,

This is probably a known issue of Hadoop [1]. Unfortunately it was only fixed in 3.3.0.

Piotrek

[1] https://issues.apache.org/jira/browse/HADOOP-15658

On 22 Jan 2020, at 13:56, Till Rohrmann <tr...@apache.org>> wrote:

Thanks for reporting this issue Mark. I'm pulling Klou into this conversation who knows more about the StreamingFileSink. @Klou does the StreamingFileSink relies on DeleteOnExitHooks to clean up files?

Cheers,
Till

On Tue, Jan 21, 2020 at 3:38 PM Mark Harris <ma...@hivehome.com>> wrote:
Hi,

We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail (causing all the jobs running on them to fail) with an "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager (and jobs that should be running on it) remain down until manually restarted.

I managed to take and analyze a memory dump from one of the afflicted taskmanagers.

It showed that 85% of the heap was made up of the java.io.DeleteOnExitHook.files hashset. The majority of the strings in that hashset (9041060 out of ~9041100) pointed to files that began /tmp/hadoop-yarn/s3a/s3ablock

The problem seems to affect jobs that make use of the StreamingFileSink - all of the taskmanager crashes have been on the taskmaster running at least one job using this sink, and a cluster running only a single taskmanager / job that uses the StreamingFileSink crashed with the GC overhead limit exceeded error.

I've had a look for advice on handling this error more broadly without luck.

Any suggestions or advice gratefully received.

Best regards,

Mark Harris



The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.



The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.



The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.



The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.


The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Posted by Mark Harris <ma...@hivehome.com>.
Hi,

Thanks for your help with this. 🙂

The EMR cluster has 3 15GB VMs, and the flink cluster is started with:

/usr/lib/flink/bin/yarn-session.sh -d -n 3 -tm 5760 -jm 5760 -s 3

Usually the task runs for about 15 minutes before it restarts, usually due to with an "java.lang.OutOfMemoryError: Java heap space" exception.

The figures came from a MemoryAnalyzer session on a manual memory dump from one of the taskmanagers. The total size of that heap was only 1.8gb.  In that heap, 1.7gb is taken up by the static field "files" in DeleteOnExitHook, which is a linked hash set containing the 9 million strings.

A full example of one the path is /tmp/hadoop-yarn/s3a/s3ablock-0001-6061210725685.tmp, at for 120 bytes per char[] for a solid 1.2gb of chars. Then 200mb for their String wrappers and another 361MB for LinkedHashMap$Entry objects. Despite valiantly holding on to an array of 16777216 HashMap$Node elements, the LinkedHashMap can only contribute another 20MB or so.
I goofed in not taking that 85% figure from MemoryAnalyzer - it tells me DeleteOnExitHook is responsible for 96.98% of the heap dump.

Looking at the files it managed to write before this started to happen regularly, it looks like they're being written approximately every 3 minutes. I'll triple check our config, but I'm reasonably sure the job is configured to checkpoint every 15 minutes - could something else be causing it to write?

This may all be a red herring - something else may be taking up the taskmanagers memory which didn't make it into that heap dump. I plan to repeat the analysis on a heapdump created by  -XX:+HeapDumpOnOutOfMemoryError shortly.

Best regards,

Mark

________________________________
From: Piotr Nowojski <pi...@ververica.com>
Sent: 30 January 2020 13:44
To: Mark Harris <ma...@hivehome.com>
Cc: Cliff Resnick <cr...@gmail.com>; David Magalhães <sp...@gmail.com>; Till Rohrmann <tr...@apache.org>; flink-user@apache.org <fl...@apache.org>; kkloudas <kk...@apache.org>
Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Hi,

What is your job setup? Size of the nodes, memory settings of the Flink/JVM?

9 041 060 strings is awfully small number to bring down a whole cluster. With each tmp string having ~30 bytes, that’s only 271MB. Is this really 85% of the heap? And also, with parallelism of 6 and checkpoints every 15 minutes, 9 000 000 of leaked strings should happen only after one month  assuming 500-600 total number of buckets. (Also assuming that there is a separate file per each bucket).

Piotrek

On 30 Jan 2020, at 14:21, Mark Harris <ma...@hivehome.com>> wrote:

Trying a few different approaches to the fs.s3a.fast.upload settings has bought me no joy - the taskmanagers end up simply crashing or complaining of high GC load. Heap dumps suggest that this time they're clogged with buffers instead, which makes sense.

Our job has parallelism of 6 and checkpoints every 15 minutes - if anything, we'd like to increase the frequency of that checkpoint duration. I suspect this could be affected by the partition structure we were bucketing to as well, and at any given moment we could be receiving data for up to 280 buckets at once.
Could this be a factor?

Best regards,

Mark
________________________________
From: Piotr Nowojski <pi...@ververica.com>>
Sent: 27 January 2020 16:16
To: Cliff Resnick <cr...@gmail.com>>
Cc: David Magalhães <sp...@gmail.com>>; Mark Harris <ma...@hivehome.com>>; Till Rohrmann <tr...@apache.org>>; flink-user@apache.org<ma...@apache.org> <fl...@apache.org>>; kkloudas <kk...@apache.org>>
Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Hi,

I think reducing the frequency of the checkpoints and decreasing parallelism of the things using the S3AOutputStream class, would help to mitigate the issue.

I don’t know about other solutions. I would suggest to ask this question directly to Steve L. in the bug ticket [1], as he is the one that fixed the issue. If there is no workaround, maybe it would be possible to put a pressure on the Hadoop guys to back port the fix to older versions?

Piotrek

[1] https://issues.apache.org/jira/browse/HADOOP-15658

On 27 Jan 2020, at 15:41, Cliff Resnick <cr...@gmail.com>> wrote:

I know from experience that Flink's shaded S3A FileSystem does not reference core-site.xml, though I don't remember offhand what file (s) it does reference. However since it's shaded, maybe this could be fixed by building a Flink FS referencing 3.3.0? Last I checked I think it referenced 3.1.0.

On Mon, Jan 27, 2020, 8:48 AM David Magalhães <sp...@gmail.com>> wrote:
Does StreamingFileSink use core-site.xml ? When I was using it, it didn't load any configurations from core-site.xml.

On Mon, Jan 27, 2020 at 12:08 PM Mark Harris <ma...@hivehome.com>> wrote:
Hi Piotr,

Thanks for the link to the issue.

Do you know if there's a workaround? I've tried setting the following in my core-site.xml:

​fs.s3a.fast.upload.buffer=true

To try and avoid writing the buffer files, but the taskmanager breaks with the same problem.

Best regards,

Mark
________________________________
From: Piotr Nowojski <pi...@data-artisans.com>> on behalf of Piotr Nowojski <pi...@ververica.com>>
Sent: 22 January 2020 13:29
To: Till Rohrmann <tr...@apache.org>>
Cc: Mark Harris <ma...@hivehome.com>>; flink-user@apache.org<ma...@apache.org> <fl...@apache.org>>; kkloudas <kk...@apache.org>>
Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Hi,

This is probably a known issue of Hadoop [1]. Unfortunately it was only fixed in 3.3.0.

Piotrek

[1] https://issues.apache.org/jira/browse/HADOOP-15658

On 22 Jan 2020, at 13:56, Till Rohrmann <tr...@apache.org>> wrote:

Thanks for reporting this issue Mark. I'm pulling Klou into this conversation who knows more about the StreamingFileSink. @Klou does the StreamingFileSink relies on DeleteOnExitHooks to clean up files?

Cheers,
Till

On Tue, Jan 21, 2020 at 3:38 PM Mark Harris <ma...@hivehome.com>> wrote:
Hi,

We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail (causing all the jobs running on them to fail) with an "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager (and jobs that should be running on it) remain down until manually restarted.

I managed to take and analyze a memory dump from one of the afflicted taskmanagers.

It showed that 85% of the heap was made up of the java.io.DeleteOnExitHook.files hashset. The majority of the strings in that hashset (9041060 out of ~9041100) pointed to files that began /tmp/hadoop-yarn/s3a/s3ablock

The problem seems to affect jobs that make use of the StreamingFileSink - all of the taskmanager crashes have been on the taskmaster running at least one job using this sink, and a cluster running only a single taskmanager / job that uses the StreamingFileSink crashed with the GC overhead limit exceeded error.

I've had a look for advice on handling this error more broadly without luck.

Any suggestions or advice gratefully received.

Best regards,

Mark Harris



The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.



The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.



The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.



The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Posted by Piotr Nowojski <pi...@ververica.com>.
Hi,

What is your job setup? Size of the nodes, memory settings of the Flink/JVM?

9 041 060 strings is awfully small number to bring down a whole cluster. With each tmp string having ~30 bytes, that’s only 271MB. Is this really 85% of the heap? And also, with parallelism of 6 and checkpoints every 15 minutes, 9 000 000 of leaked strings should happen only after one month  assuming 500-600 total number of buckets. (Also assuming that there is a separate file per each bucket).

Piotrek 

> On 30 Jan 2020, at 14:21, Mark Harris <ma...@hivehome.com> wrote:
> 
> Trying a few different approaches to the fs.s3a.fast.upload settings has bought me no joy - the taskmanagers end up simply crashing or complaining of high GC load. Heap dumps suggest that this time they're clogged with buffers instead, which makes sense.
> 
> Our job has parallelism of 6 and checkpoints every 15 minutes - if anything, we'd like to increase the frequency of that checkpoint duration. I suspect this could be affected by the partition structure we were bucketing to as well, and at any given moment we could be receiving data for up to 280 buckets at once.
> Could this be a factor?
> 
> Best regards,
> 
> Mark
> From: Piotr Nowojski <pi...@ververica.com>
> Sent: 27 January 2020 16:16
> To: Cliff Resnick <cr...@gmail.com>
> Cc: David Magalhães <sp...@gmail.com>; Mark Harris <ma...@hivehome.com>; Till Rohrmann <tr...@apache.org>; flink-user@apache.org <fl...@apache.org>; kkloudas <kk...@apache.org>
> Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files
>  
> Hi,
> 
> I think reducing the frequency of the checkpoints and decreasing parallelism of the things using the S3AOutputStream class, would help to mitigate the issue. 
> 
> I don’t know about other solutions. I would suggest to ask this question directly to Steve L. in the bug ticket [1], as he is the one that fixed the issue. If there is no workaround, maybe it would be possible to put a pressure on the Hadoop guys to back port the fix to older versions?
> 
> Piotrek
> 
> [1] https://issues.apache.org/jira/browse/HADOOP-15658 <https://issues.apache.org/jira/browse/HADOOP-15658>
> 
>> On 27 Jan 2020, at 15:41, Cliff Resnick <cresny@gmail.com <ma...@gmail.com>> wrote:
>> 
>> I know from experience that Flink's shaded S3A FileSystem does not reference core-site.xml, though I don't remember offhand what file (s) it does reference. However since it's shaded, maybe this could be fixed by building a Flink FS referencing 3.3.0? Last I checked I think it referenced 3.1.0.
>> 
>> On Mon, Jan 27, 2020, 8:48 AM David Magalhães <speeddragon@gmail.com <ma...@gmail.com>> wrote:
>> Does StreamingFileSink use core-site.xml ? When I was using it, it didn't load any configurations from core-site.xml.
>> 
>> On Mon, Jan 27, 2020 at 12:08 PM Mark Harris <mark.harris@hivehome.com <ma...@hivehome.com>> wrote:
>> Hi Piotr,
>> 
>> Thanks for the link to the issue.
>> 
>> Do you know if there's a workaround? I've tried setting the following in my core-site.xml:
>> 
>> ​fs.s3a.fast.upload.buffer=true
>> 
>> To try and avoid writing the buffer files, but the taskmanager breaks with the same problem.
>> 
>> Best regards,
>> 
>> Mark
>> From: Piotr Nowojski <piotr@data-artisans.com <ma...@data-artisans.com>> on behalf of Piotr Nowojski <piotr@ververica.com <ma...@ververica.com>>
>> Sent: 22 January 2020 13:29
>> To: Till Rohrmann <trohrmann@apache.org <ma...@apache.org>>
>> Cc: Mark Harris <mark.harris@hivehome.com <ma...@hivehome.com>>; flink-user@apache.org <ma...@apache.org> <flink-user@apache.org <ma...@apache.org>>; kkloudas <kkloudas@apache.org <ma...@apache.org>>
>> Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files
>>  
>> Hi,
>> 
>> This is probably a known issue of Hadoop [1]. Unfortunately it was only fixed in 3.3.0.
>> 
>> Piotrek
>> 
>> [1] https://issues.apache.org/jira/browse/HADOOP-15658 <https://issues.apache.org/jira/browse/HADOOP-15658>
>> 
>>> On 22 Jan 2020, at 13:56, Till Rohrmann <trohrmann@apache.org <ma...@apache.org>> wrote:
>>> 
>>> Thanks for reporting this issue Mark. I'm pulling Klou into this conversation who knows more about the StreamingFileSink. @Klou does the StreamingFileSink relies on DeleteOnExitHooks to clean up files?
>>> 
>>> Cheers,
>>> Till
>>> 
>>> On Tue, Jan 21, 2020 at 3:38 PM Mark Harris <mark.harris@hivehome.com <ma...@hivehome.com>> wrote:
>>> Hi,
>>> 
>>> We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail (causing all the jobs running on them to fail) with an "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager (and jobs that should be running on it) remain down until manually restarted.
>>> 
>>> I managed to take and analyze a memory dump from one of the afflicted taskmanagers. 
>>> 
>>> It showed that 85% of the heap was made up of the java.io.DeleteOnExitHook.files hashset. The majority of the strings in that hashset (9041060 out of ~9041100) pointed to files that began /tmp/hadoop-yarn/s3a/s3ablock
>>> 
>>> The problem seems to affect jobs that make use of the StreamingFileSink - all of the taskmanager crashes have been on the taskmaster running at least one job using this sink, and a cluster running only a single taskmanager / job that uses the StreamingFileSink crashed with the GC overhead limit exceeded error.
>>> 
>>> I've had a look for advice on handling this error more broadly without luck.
>>> 
>>> Any suggestions or advice gratefully received.
>>> 
>>> Best regards,
>>> 
>>> Mark Harris
>>> 
>>> 
>>> 
>>> The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law. 
>>> 
>>> The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary. 
>>> 
>>> Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email <https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email> 
>>> 
>>> PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD. 
>>> 
>>> British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.
>> 
>> 
>> 
>> The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law. 
>> 
>> The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary. 
>> 
>> Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email <https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email> 
>> 
>> PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD. 
>> 
>> British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.
> 
> 
> 
> The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law. 
> 
> The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary. 
> 
> Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email <https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email> 
> 
> PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD. 
> 
> British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.


Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Posted by Mark Harris <ma...@hivehome.com>.
Trying a few different approaches to the fs.s3a.fast.upload settings has bought me no joy - the taskmanagers end up simply crashing or complaining of high GC load. Heap dumps suggest that this time they're clogged with buffers instead, which makes sense.

Our job has parallelism of 6 and checkpoints every 15 minutes - if anything, we'd like to increase the frequency of that checkpoint duration. I suspect this could be affected by the partition structure we were bucketing to as well, and at any given moment we could be receiving data for up to 280 buckets at once.
Could this be a factor?

Best regards,

Mark
________________________________
From: Piotr Nowojski <pi...@ververica.com>
Sent: 27 January 2020 16:16
To: Cliff Resnick <cr...@gmail.com>
Cc: David Magalhães <sp...@gmail.com>; Mark Harris <ma...@hivehome.com>; Till Rohrmann <tr...@apache.org>; flink-user@apache.org <fl...@apache.org>; kkloudas <kk...@apache.org>
Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Hi,

I think reducing the frequency of the checkpoints and decreasing parallelism of the things using the S3AOutputStream class, would help to mitigate the issue.

I don’t know about other solutions. I would suggest to ask this question directly to Steve L. in the bug ticket [1], as he is the one that fixed the issue. If there is no workaround, maybe it would be possible to put a pressure on the Hadoop guys to back port the fix to older versions?

Piotrek

[1] https://issues.apache.org/jira/browse/HADOOP-15658

On 27 Jan 2020, at 15:41, Cliff Resnick <cr...@gmail.com>> wrote:

I know from experience that Flink's shaded S3A FileSystem does not reference core-site.xml, though I don't remember offhand what file (s) it does reference. However since it's shaded, maybe this could be fixed by building a Flink FS referencing 3.3.0? Last I checked I think it referenced 3.1.0.

On Mon, Jan 27, 2020, 8:48 AM David Magalhães <sp...@gmail.com>> wrote:
Does StreamingFileSink use core-site.xml ? When I was using it, it didn't load any configurations from core-site.xml.

On Mon, Jan 27, 2020 at 12:08 PM Mark Harris <ma...@hivehome.com>> wrote:
Hi Piotr,

Thanks for the link to the issue.

Do you know if there's a workaround? I've tried setting the following in my core-site.xml:

​fs.s3a.fast.upload.buffer=true

To try and avoid writing the buffer files, but the taskmanager breaks with the same problem.

Best regards,

Mark
________________________________
From: Piotr Nowojski <pi...@data-artisans.com>> on behalf of Piotr Nowojski <pi...@ververica.com>>
Sent: 22 January 2020 13:29
To: Till Rohrmann <tr...@apache.org>>
Cc: Mark Harris <ma...@hivehome.com>>; flink-user@apache.org<ma...@apache.org> <fl...@apache.org>>; kkloudas <kk...@apache.org>>
Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Hi,

This is probably a known issue of Hadoop [1]. Unfortunately it was only fixed in 3.3.0.

Piotrek

[1] https://issues.apache.org/jira/browse/HADOOP-15658

On 22 Jan 2020, at 13:56, Till Rohrmann <tr...@apache.org>> wrote:

Thanks for reporting this issue Mark. I'm pulling Klou into this conversation who knows more about the StreamingFileSink. @Klou does the StreamingFileSink relies on DeleteOnExitHooks to clean up files?

Cheers,
Till

On Tue, Jan 21, 2020 at 3:38 PM Mark Harris <ma...@hivehome.com>> wrote:
Hi,

We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail (causing all the jobs running on them to fail) with an "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager (and jobs that should be running on it) remain down until manually restarted.

I managed to take and analyze a memory dump from one of the afflicted taskmanagers.

It showed that 85% of the heap was made up of the java.io.DeleteOnExitHook.files hashset. The majority of the strings in that hashset (9041060 out of ~9041100) pointed to files that began /tmp/hadoop-yarn/s3a/s3ablock

The problem seems to affect jobs that make use of the StreamingFileSink - all of the taskmanager crashes have been on the taskmaster running at least one job using this sink, and a cluster running only a single taskmanager / job that uses the StreamingFileSink crashed with the GC overhead limit exceeded error.

I've had a look for advice on handling this error more broadly without luck.

Any suggestions or advice gratefully received.

Best regards,

Mark Harris



The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.



The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.



The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Posted by Arvid Heise <ar...@ververica.com>.
Hi Mark,

if you add `fs.s3a.fast.upload.buffer: true` to your Flink configuration,
it should add that to the respective Hadoop configuration when creating the
file system.
Note, I haven't tried it but all keys with the prefixes "s3.", "s3a.",
"fs.s3a." should be forwarded.

-- Arvid

On Mon, Jan 27, 2020 at 5:16 PM Piotr Nowojski <pi...@ververica.com> wrote:

> Hi,
>
> I think reducing the frequency of the checkpoints and decreasing
> parallelism of the things using the S3AOutputStream class, would help to
> mitigate the issue.
>
> I don’t know about other solutions. I would suggest to ask this question
> directly to Steve L. in the bug ticket [1], as he is the one that fixed the
> issue. If there is no workaround, maybe it would be possible to put a
> pressure on the Hadoop guys to back port the fix to older versions?
>
> Piotrek
>
> [1] https://issues.apache.org/jira/browse/HADOOP-15658
>
> On 27 Jan 2020, at 15:41, Cliff Resnick <cr...@gmail.com> wrote:
>
> I know from experience that Flink's shaded S3A FileSystem does not
> reference core-site.xml, though I don't remember offhand what file (s) it
> does reference. However since it's shaded, maybe this could be fixed by
> building a Flink FS referencing 3.3.0? Last I checked I think it referenced
> 3.1.0.
>
> On Mon, Jan 27, 2020, 8:48 AM David Magalhães <sp...@gmail.com>
> wrote:
>
>> Does StreamingFileSink use core-site.xml ? When I was using it, it didn't
>> load any configurations from core-site.xml.
>>
>> On Mon, Jan 27, 2020 at 12:08 PM Mark Harris <ma...@hivehome.com>
>> wrote:
>>
>>> Hi Piotr,
>>>
>>> Thanks for the link to the issue.
>>>
>>> Do you know if there's a workaround? I've tried setting the following in
>>> my core-site.xml:
>>>
>>> ​fs.s3a.fast.upload.buffer=true
>>>
>>> To try and avoid writing the buffer files, but the taskmanager breaks
>>> with the same problem.
>>>
>>> Best regards,
>>>
>>> Mark
>>> ------------------------------
>>> *From:* Piotr Nowojski <pi...@data-artisans.com> on behalf of Piotr
>>> Nowojski <pi...@ververica.com>
>>> *Sent:* 22 January 2020 13:29
>>> *To:* Till Rohrmann <tr...@apache.org>
>>> *Cc:* Mark Harris <ma...@hivehome.com>; flink-user@apache.org <
>>> flink-user@apache.org>; kkloudas <kk...@apache.org>
>>> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
>>> hooks for S3a files
>>>
>>> Hi,
>>>
>>> This is probably a known issue of Hadoop [1]. Unfortunately it was only
>>> fixed in 3.3.0.
>>>
>>> Piotrek
>>>
>>> [1] https://issues.apache.org/jira/browse/HADOOP-15658
>>>
>>> On 22 Jan 2020, at 13:56, Till Rohrmann <tr...@apache.org> wrote:
>>>
>>> Thanks for reporting this issue Mark. I'm pulling Klou into this
>>> conversation who knows more about the StreamingFileSink. @Klou does the
>>> StreamingFileSink relies on DeleteOnExitHooks to clean up files?
>>>
>>> Cheers,
>>> Till
>>>
>>> On Tue, Jan 21, 2020 at 3:38 PM Mark Harris <ma...@hivehome.com>
>>> wrote:
>>>
>>> Hi,
>>>
>>> We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs
>>> hadoop v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail
>>> (causing all the jobs running on them to fail) with an
>>> "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager
>>> (and jobs that should be running on it) remain down until manually
>>> restarted.
>>>
>>> I managed to take and analyze a memory dump from one of the afflicted
>>> taskmanagers.
>>>
>>> It showed that 85% of the heap was made up of
>>> the java.io.DeleteOnExitHook.files hashset. The majority of the strings in
>>> that hashset (9041060 out of ~9041100) pointed to files that began
>>> /tmp/hadoop-yarn/s3a/s3ablock
>>>
>>> The problem seems to affect jobs that make use of the StreamingFileSink
>>> - all of the taskmanager crashes have been on the taskmaster running at
>>> least one job using this sink, and a cluster running only a single
>>> taskmanager / job that uses the StreamingFileSink crashed with the GC
>>> overhead limit exceeded error.
>>>
>>> I've had a look for advice on handling this error more broadly without
>>> luck.
>>>
>>> Any suggestions or advice gratefully received.
>>>
>>> Best regards,
>>>
>>> Mark Harris
>>>
>>>
>>>
>>> The information contained in or attached to this email is intended only
>>> for the use of the individual or entity to which it is addressed. If you
>>> are not the intended recipient, or a person responsible for delivering it
>>> to the intended recipient, you are not authorised to and must not disclose,
>>> copy, distribute, or retain this message or any part of it. It may contain
>>> information which is confidential and/or covered by legal professional or
>>> other privilege under applicable law.
>>>
>>> The views expressed in this email are not necessarily the views of
>>> Centrica plc or its subsidiaries, and the company, its directors, officers
>>> or employees make no representation or accept any liability for its
>>> accuracy or completeness unless expressly stated to the contrary.
>>>
>>> Additional regulatory disclosures may be found here:
>>> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>>>
>>> PH Jones is a trading name of British Gas Social Housing Limited.
>>> British Gas Social Housing Limited (company no: 01026007), British Gas
>>> Trading Limited (company no: 03078711), British Gas Services Limited
>>> (company no: 3141243), British Gas Insurance Limited (company no:
>>> 06608316), British Gas New Heating Limited (company no: 06723244), British
>>> Gas Services (Commercial) Limited (company no: 07385984) and Centrica
>>> Energy (Trading) Limited (company no: 02877397) are all wholly owned
>>> subsidiaries of Centrica plc (company no: 3033654). Each company is
>>> registered in England and Wales with a registered office at Millstream,
>>> Maidenhead Road, Windsor, Berkshire SL4 5GD.
>>>
>>> British Gas Insurance Limited is authorised by the Prudential Regulation
>>> Authority and regulated by the Financial Conduct Authority and the
>>> Prudential Regulation Authority. British Gas Services Limited and Centrica
>>> Energy (Trading) Limited are authorised and regulated by the Financial
>>> Conduct Authority. British Gas Trading Limited is an appointed
>>> representative of British Gas Services Limited which is authorised and
>>> regulated by the Financial Conduct Authority.
>>>
>>>
>>>
>>>
>>> The information contained in or attached to this email is intended only
>>> for the use of the individual or entity to which it is addressed. If you
>>> are not the intended recipient, or a person responsible for delivering it
>>> to the intended recipient, you are not authorised to and must not disclose,
>>> copy, distribute, or retain this message or any part of it. It may contain
>>> information which is confidential and/or covered by legal professional or
>>> other privilege under applicable law.
>>>
>>> The views expressed in this email are not necessarily the views of
>>> Centrica plc or its subsidiaries, and the company, its directors, officers
>>> or employees make no representation or accept any liability for its
>>> accuracy or completeness unless expressly stated to the contrary.
>>>
>>> Additional regulatory disclosures may be found here:
>>> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>>>
>>> PH Jones is a trading name of British Gas Social Housing Limited.
>>> British Gas Social Housing Limited (company no: 01026007), British Gas
>>> Trading Limited (company no: 03078711), British Gas Services Limited
>>> (company no: 3141243), British Gas Insurance Limited (company no:
>>> 06608316), British Gas New Heating Limited (company no: 06723244), British
>>> Gas Services (Commercial) Limited (company no: 07385984) and Centrica
>>> Energy (Trading) Limited (company no: 02877397) are all wholly owned
>>> subsidiaries of Centrica plc (company no: 3033654). Each company is
>>> registered in England and Wales with a registered office at Millstream,
>>> Maidenhead Road, Windsor, Berkshire SL4 5GD.
>>>
>>> British Gas Insurance Limited is authorised by the Prudential Regulation
>>> Authority and regulated by the Financial Conduct Authority and the
>>> Prudential Regulation Authority. British Gas Services Limited and Centrica
>>> Energy (Trading) Limited are authorised and regulated by the Financial
>>> Conduct Authority. British Gas Trading Limited is an appointed
>>> representative of British Gas Services Limited which is authorised and
>>> regulated by the Financial Conduct Authority.
>>>
>>
>

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Posted by Piotr Nowojski <pi...@ververica.com>.
Hi,

I think reducing the frequency of the checkpoints and decreasing parallelism of the things using the S3AOutputStream class, would help to mitigate the issue. 

I don’t know about other solutions. I would suggest to ask this question directly to Steve L. in the bug ticket [1], as he is the one that fixed the issue. If there is no workaround, maybe it would be possible to put a pressure on the Hadoop guys to back port the fix to older versions?

Piotrek

[1] https://issues.apache.org/jira/browse/HADOOP-15658 <https://issues.apache.org/jira/browse/HADOOP-15658>

> On 27 Jan 2020, at 15:41, Cliff Resnick <cr...@gmail.com> wrote:
> 
> I know from experience that Flink's shaded S3A FileSystem does not reference core-site.xml, though I don't remember offhand what file (s) it does reference. However since it's shaded, maybe this could be fixed by building a Flink FS referencing 3.3.0? Last I checked I think it referenced 3.1.0.
> 
> On Mon, Jan 27, 2020, 8:48 AM David Magalhães <speeddragon@gmail.com <ma...@gmail.com>> wrote:
> Does StreamingFileSink use core-site.xml ? When I was using it, it didn't load any configurations from core-site.xml.
> 
> On Mon, Jan 27, 2020 at 12:08 PM Mark Harris <mark.harris@hivehome.com <ma...@hivehome.com>> wrote:
> Hi Piotr,
> 
> Thanks for the link to the issue.
> 
> Do you know if there's a workaround? I've tried setting the following in my core-site.xml:
> 
> ​fs.s3a.fast.upload.buffer=true
> 
> To try and avoid writing the buffer files, but the taskmanager breaks with the same problem.
> 
> Best regards,
> 
> Mark
> From: Piotr Nowojski <piotr@data-artisans.com <ma...@data-artisans.com>> on behalf of Piotr Nowojski <piotr@ververica.com <ma...@ververica.com>>
> Sent: 22 January 2020 13:29
> To: Till Rohrmann <trohrmann@apache.org <ma...@apache.org>>
> Cc: Mark Harris <mark.harris@hivehome.com <ma...@hivehome.com>>; flink-user@apache.org <ma...@apache.org> <flink-user@apache.org <ma...@apache.org>>; kkloudas <kkloudas@apache.org <ma...@apache.org>>
> Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files
>  
> Hi,
> 
> This is probably a known issue of Hadoop [1]. Unfortunately it was only fixed in 3.3.0.
> 
> Piotrek
> 
> [1] https://issues.apache.org/jira/browse/HADOOP-15658 <https://issues.apache.org/jira/browse/HADOOP-15658>
> 
>> On 22 Jan 2020, at 13:56, Till Rohrmann <trohrmann@apache.org <ma...@apache.org>> wrote:
>> 
>> Thanks for reporting this issue Mark. I'm pulling Klou into this conversation who knows more about the StreamingFileSink. @Klou does the StreamingFileSink relies on DeleteOnExitHooks to clean up files?
>> 
>> Cheers,
>> Till
>> 
>> On Tue, Jan 21, 2020 at 3:38 PM Mark Harris <mark.harris@hivehome.com <ma...@hivehome.com>> wrote:
>> Hi,
>> 
>> We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail (causing all the jobs running on them to fail) with an "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager (and jobs that should be running on it) remain down until manually restarted.
>> 
>> I managed to take and analyze a memory dump from one of the afflicted taskmanagers. 
>> 
>> It showed that 85% of the heap was made up of the java.io.DeleteOnExitHook.files hashset. The majority of the strings in that hashset (9041060 out of ~9041100) pointed to files that began /tmp/hadoop-yarn/s3a/s3ablock
>> 
>> The problem seems to affect jobs that make use of the StreamingFileSink - all of the taskmanager crashes have been on the taskmaster running at least one job using this sink, and a cluster running only a single taskmanager / job that uses the StreamingFileSink crashed with the GC overhead limit exceeded error.
>> 
>> I've had a look for advice on handling this error more broadly without luck.
>> 
>> Any suggestions or advice gratefully received.
>> 
>> Best regards,
>> 
>> Mark Harris
>> 
>> 
>> 
>> The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law. 
>> 
>> The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary. 
>> 
>> Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email <https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email> 
>> 
>> PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD. 
>> 
>> British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.
> 
> 
> 
> The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law. 
> 
> The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary. 
> 
> Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email <https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email> 
> 
> PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD. 
> 
> British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.


Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Posted by Cliff Resnick <cr...@gmail.com>.
I know from experience that Flink's shaded S3A FileSystem does not
reference core-site.xml, though I don't remember offhand what file (s) it
does reference. However since it's shaded, maybe this could be fixed by
building a Flink FS referencing 3.3.0? Last I checked I think it referenced
3.1.0.

On Mon, Jan 27, 2020, 8:48 AM David Magalhães <sp...@gmail.com> wrote:

> Does StreamingFileSink use core-site.xml ? When I was using it, it didn't
> load any configurations from core-site.xml.
>
> On Mon, Jan 27, 2020 at 12:08 PM Mark Harris <ma...@hivehome.com>
> wrote:
>
>> Hi Piotr,
>>
>> Thanks for the link to the issue.
>>
>> Do you know if there's a workaround? I've tried setting the following in
>> my core-site.xml:
>>
>> ​fs.s3a.fast.upload.buffer=true
>>
>> To try and avoid writing the buffer files, but the taskmanager breaks
>> with the same problem.
>>
>> Best regards,
>>
>> Mark
>> ------------------------------
>> *From:* Piotr Nowojski <pi...@data-artisans.com> on behalf of Piotr
>> Nowojski <pi...@ververica.com>
>> *Sent:* 22 January 2020 13:29
>> *To:* Till Rohrmann <tr...@apache.org>
>> *Cc:* Mark Harris <ma...@hivehome.com>; flink-user@apache.org <
>> flink-user@apache.org>; kkloudas <kk...@apache.org>
>> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
>> hooks for S3a files
>>
>> Hi,
>>
>> This is probably a known issue of Hadoop [1]. Unfortunately it was only
>> fixed in 3.3.0.
>>
>> Piotrek
>>
>> [1] https://issues.apache.org/jira/browse/HADOOP-15658
>>
>> On 22 Jan 2020, at 13:56, Till Rohrmann <tr...@apache.org> wrote:
>>
>> Thanks for reporting this issue Mark. I'm pulling Klou into this
>> conversation who knows more about the StreamingFileSink. @Klou does the
>> StreamingFileSink relies on DeleteOnExitHooks to clean up files?
>>
>> Cheers,
>> Till
>>
>> On Tue, Jan 21, 2020 at 3:38 PM Mark Harris <ma...@hivehome.com>
>> wrote:
>>
>> Hi,
>>
>> We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop
>> v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail
>> (causing all the jobs running on them to fail) with an
>> "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager
>> (and jobs that should be running on it) remain down until manually
>> restarted.
>>
>> I managed to take and analyze a memory dump from one of the afflicted
>> taskmanagers.
>>
>> It showed that 85% of the heap was made up of
>> the java.io.DeleteOnExitHook.files hashset. The majority of the strings in
>> that hashset (9041060 out of ~9041100) pointed to files that began
>> /tmp/hadoop-yarn/s3a/s3ablock
>>
>> The problem seems to affect jobs that make use of the StreamingFileSink
>> - all of the taskmanager crashes have been on the taskmaster running at
>> least one job using this sink, and a cluster running only a single
>> taskmanager / job that uses the StreamingFileSink crashed with the GC
>> overhead limit exceeded error.
>>
>> I've had a look for advice on handling this error more broadly without
>> luck.
>>
>> Any suggestions or advice gratefully received.
>>
>> Best regards,
>>
>> Mark Harris
>>
>>
>>
>> The information contained in or attached to this email is intended only
>> for the use of the individual or entity to which it is addressed. If you
>> are not the intended recipient, or a person responsible for delivering it
>> to the intended recipient, you are not authorised to and must not disclose,
>> copy, distribute, or retain this message or any part of it. It may contain
>> information which is confidential and/or covered by legal professional or
>> other privilege under applicable law.
>>
>> The views expressed in this email are not necessarily the views of
>> Centrica plc or its subsidiaries, and the company, its directors, officers
>> or employees make no representation or accept any liability for its
>> accuracy or completeness unless expressly stated to the contrary.
>>
>> Additional regulatory disclosures may be found here:
>> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>>
>> PH Jones is a trading name of British Gas Social Housing Limited. British
>> Gas Social Housing Limited (company no: 01026007), British Gas Trading
>> Limited (company no: 03078711), British Gas Services Limited (company no:
>> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
>> New Heating Limited (company no: 06723244), British Gas Services
>> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
>> Limited (company no: 02877397) are all wholly owned subsidiaries of
>> Centrica plc (company no: 3033654). Each company is registered in England
>> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
>> Berkshire SL4 5GD.
>>
>> British Gas Insurance Limited is authorised by the Prudential Regulation
>> Authority and regulated by the Financial Conduct Authority and the
>> Prudential Regulation Authority. British Gas Services Limited and Centrica
>> Energy (Trading) Limited are authorised and regulated by the Financial
>> Conduct Authority. British Gas Trading Limited is an appointed
>> representative of British Gas Services Limited which is authorised and
>> regulated by the Financial Conduct Authority.
>>
>>
>>
>>
>> The information contained in or attached to this email is intended only
>> for the use of the individual or entity to which it is addressed. If you
>> are not the intended recipient, or a person responsible for delivering it
>> to the intended recipient, you are not authorised to and must not disclose,
>> copy, distribute, or retain this message or any part of it. It may contain
>> information which is confidential and/or covered by legal professional or
>> other privilege under applicable law.
>>
>> The views expressed in this email are not necessarily the views of
>> Centrica plc or its subsidiaries, and the company, its directors, officers
>> or employees make no representation or accept any liability for its
>> accuracy or completeness unless expressly stated to the contrary.
>>
>> Additional regulatory disclosures may be found here:
>> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>>
>> PH Jones is a trading name of British Gas Social Housing Limited. British
>> Gas Social Housing Limited (company no: 01026007), British Gas Trading
>> Limited (company no: 03078711), British Gas Services Limited (company no:
>> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
>> New Heating Limited (company no: 06723244), British Gas Services
>> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
>> Limited (company no: 02877397) are all wholly owned subsidiaries of
>> Centrica plc (company no: 3033654). Each company is registered in England
>> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
>> Berkshire SL4 5GD.
>>
>> British Gas Insurance Limited is authorised by the Prudential Regulation
>> Authority and regulated by the Financial Conduct Authority and the
>> Prudential Regulation Authority. British Gas Services Limited and Centrica
>> Energy (Trading) Limited are authorised and regulated by the Financial
>> Conduct Authority. British Gas Trading Limited is an appointed
>> representative of British Gas Services Limited which is authorised and
>> regulated by the Financial Conduct Authority.
>>
>

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Posted by David Magalhães <sp...@gmail.com>.
Does StreamingFileSink use core-site.xml ? When I was using it, it didn't
load any configurations from core-site.xml.

On Mon, Jan 27, 2020 at 12:08 PM Mark Harris <ma...@hivehome.com>
wrote:

> Hi Piotr,
>
> Thanks for the link to the issue.
>
> Do you know if there's a workaround? I've tried setting the following in
> my core-site.xml:
>
> ​fs.s3a.fast.upload.buffer=true
>
> To try and avoid writing the buffer files, but the taskmanager breaks with
> the same problem.
>
> Best regards,
>
> Mark
> ------------------------------
> *From:* Piotr Nowojski <pi...@data-artisans.com> on behalf of Piotr
> Nowojski <pi...@ververica.com>
> *Sent:* 22 January 2020 13:29
> *To:* Till Rohrmann <tr...@apache.org>
> *Cc:* Mark Harris <ma...@hivehome.com>; flink-user@apache.org <
> flink-user@apache.org>; kkloudas <kk...@apache.org>
> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
> hooks for S3a files
>
> Hi,
>
> This is probably a known issue of Hadoop [1]. Unfortunately it was only
> fixed in 3.3.0.
>
> Piotrek
>
> [1] https://issues.apache.org/jira/browse/HADOOP-15658
>
> On 22 Jan 2020, at 13:56, Till Rohrmann <tr...@apache.org> wrote:
>
> Thanks for reporting this issue Mark. I'm pulling Klou into this
> conversation who knows more about the StreamingFileSink. @Klou does the
> StreamingFileSink relies on DeleteOnExitHooks to clean up files?
>
> Cheers,
> Till
>
> On Tue, Jan 21, 2020 at 3:38 PM Mark Harris <ma...@hivehome.com>
> wrote:
>
> Hi,
>
> We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop
> v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail
> (causing all the jobs running on them to fail) with an
> "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager
> (and jobs that should be running on it) remain down until manually
> restarted.
>
> I managed to take and analyze a memory dump from one of the afflicted
> taskmanagers.
>
> It showed that 85% of the heap was made up of
> the java.io.DeleteOnExitHook.files hashset. The majority of the strings in
> that hashset (9041060 out of ~9041100) pointed to files that began
> /tmp/hadoop-yarn/s3a/s3ablock
>
> The problem seems to affect jobs that make use of the StreamingFileSink -
> all of the taskmanager crashes have been on the taskmaster running at least
> one job using this sink, and a cluster running only a single taskmanager /
> job that uses the StreamingFileSink crashed with the GC overhead limit
> exceeded error.
>
> I've had a look for advice on handling this error more broadly without
> luck.
>
> Any suggestions or advice gratefully received.
>
> Best regards,
>
> Mark Harris
>
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>
>
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Posted by Mark Harris <ma...@hivehome.com>.
Hi Piotr,

Thanks for the link to the issue.

Do you know if there's a workaround? I've tried setting the following in my core-site.xml:

​fs.s3a.fast.upload.buffer=true

To try and avoid writing the buffer files, but the taskmanager breaks with the same problem.

Best regards,

Mark
________________________________
From: Piotr Nowojski <pi...@data-artisans.com> on behalf of Piotr Nowojski <pi...@ververica.com>
Sent: 22 January 2020 13:29
To: Till Rohrmann <tr...@apache.org>
Cc: Mark Harris <ma...@hivehome.com>; flink-user@apache.org <fl...@apache.org>; kkloudas <kk...@apache.org>
Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Hi,

This is probably a known issue of Hadoop [1]. Unfortunately it was only fixed in 3.3.0.

Piotrek

[1] https://issues.apache.org/jira/browse/HADOOP-15658

On 22 Jan 2020, at 13:56, Till Rohrmann <tr...@apache.org>> wrote:

Thanks for reporting this issue Mark. I'm pulling Klou into this conversation who knows more about the StreamingFileSink. @Klou does the StreamingFileSink relies on DeleteOnExitHooks to clean up files?

Cheers,
Till

On Tue, Jan 21, 2020 at 3:38 PM Mark Harris <ma...@hivehome.com>> wrote:
Hi,

We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail (causing all the jobs running on them to fail) with an "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager (and jobs that should be running on it) remain down until manually restarted.

I managed to take and analyze a memory dump from one of the afflicted taskmanagers.

It showed that 85% of the heap was made up of the java.io.DeleteOnExitHook.files hashset. The majority of the strings in that hashset (9041060 out of ~9041100) pointed to files that began /tmp/hadoop-yarn/s3a/s3ablock

The problem seems to affect jobs that make use of the StreamingFileSink - all of the taskmanager crashes have been on the taskmaster running at least one job using this sink, and a cluster running only a single taskmanager / job that uses the StreamingFileSink crashed with the GC overhead limit exceeded error.

I've had a look for advice on handling this error more broadly without luck.

Any suggestions or advice gratefully received.

Best regards,

Mark Harris



The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.



The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law.

The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Posted by Piotr Nowojski <pi...@ververica.com>.
Hi,

This is probably a known issue of Hadoop [1]. Unfortunately it was only fixed in 3.3.0.

Piotrek

[1] https://issues.apache.org/jira/browse/HADOOP-15658 <https://issues.apache.org/jira/browse/HADOOP-15658>

> On 22 Jan 2020, at 13:56, Till Rohrmann <tr...@apache.org> wrote:
> 
> Thanks for reporting this issue Mark. I'm pulling Klou into this conversation who knows more about the StreamingFileSink. @Klou does the StreamingFileSink relies on DeleteOnExitHooks to clean up files?
> 
> Cheers,
> Till
> 
> On Tue, Jan 21, 2020 at 3:38 PM Mark Harris <mark.harris@hivehome.com <ma...@hivehome.com>> wrote:
> Hi,
> 
> We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail (causing all the jobs running on them to fail) with an "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager (and jobs that should be running on it) remain down until manually restarted.
> 
> I managed to take and analyze a memory dump from one of the afflicted taskmanagers. 
> 
> It showed that 85% of the heap was made up of the java.io.DeleteOnExitHook.files hashset. The majority of the strings in that hashset (9041060 out of ~9041100) pointed to files that began /tmp/hadoop-yarn/s3a/s3ablock
> 
> The problem seems to affect jobs that make use of the StreamingFileSink - all of the taskmanager crashes have been on the taskmaster running at least one job using this sink, and a cluster running only a single taskmanager / job that uses the StreamingFileSink crashed with the GC overhead limit exceeded error.
> 
> I've had a look for advice on handling this error more broadly without luck.
> 
> Any suggestions or advice gratefully received.
> 
> Best regards,
> 
> Mark Harris
> 
> 
> 
> The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law. 
> 
> The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary. 
> 
> Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email <https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email> 
> 
> PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD. 
> 
> British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.


Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

Posted by Till Rohrmann <tr...@apache.org>.
Thanks for reporting this issue Mark. I'm pulling Klou into this
conversation who knows more about the StreamingFileSink. @Klou does the
StreamingFileSink relies on DeleteOnExitHooks to clean up files?

Cheers,
Till

On Tue, Jan 21, 2020 at 3:38 PM Mark Harris <ma...@hivehome.com>
wrote:

> Hi,
>
> We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop
> v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail
> (causing all the jobs running on them to fail) with an
> "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager
> (and jobs that should be running on it) remain down until manually
> restarted.
>
> I managed to take and analyze a memory dump from one of the afflicted
> taskmanagers.
>
> It showed that 85% of the heap was made up of
> the java.io.DeleteOnExitHook.files hashset. The majority of the strings in
> that hashset (9041060 out of ~9041100) pointed to files that began
> /tmp/hadoop-yarn/s3a/s3ablock
>
> The problem seems to affect jobs that make use of the StreamingFileSink -
> all of the taskmanager crashes have been on the taskmaster running at least
> one job using this sink, and a cluster running only a single taskmanager /
> job that uses the StreamingFileSink crashed with the GC overhead limit
> exceeded error.
>
> I've had a look for advice on handling this error more broadly without
> luck.
>
> Any suggestions or advice gratefully received.
>
> Best regards,
>
> Mark Harris
>
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>