You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Jon Yeargers <jo...@cedexis.com> on 2016/12/22 15:26:27 UTC

Memory / resource leak in 0.10.1.1 release

Im still hitting this leak with the released version of 0.10.1.1.

Process mem % grows over the course of 10-20 minutes and eventually the OS
kills it.

Messages like this appear in /var/log/messages:

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.793692] java invoked
oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.798383] java cpuset=/
mems_allowed=0

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.801079] CPU: 0 PID: 9550
Comm: java Tainted: G            E   4.4.19-29.55.amzn1.x86_64 #1

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Hardware name:
Xen HVM domU, BIOS 4.2.amazon 11/11/2016

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
0000000000000000 ffff88071c517a70 ffffffff812c958f ffff88071c517c58

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
0000000000000000 ffff88071c517b00 ffffffff811ce76d ffffffff8109db14

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
ffffffff810b2d91 0000000000000000 0000000000000010 ffffffff817d0fe9

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Call Trace:

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
[<ffffffff812c958f>] dump_stack+0x63/0x84

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
[<ffffffff811ce76d>] dump_header+0x5e/0x1d8

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
[<ffffffff8109db14>] ? set_next_entity+0xa4/0x710

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
[<ffffffff810b2d91>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
[<ffffffff81163ba5>] oom_kill_process+0x205/0x3d0

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
[<ffffffff81164201>] out_of_memory+0x431/0x480

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
[<ffffffff811692ce>] __alloc_pages_nodemask+0x91e/0xa60

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
[<ffffffff811ad0b8>] alloc_pages_current+0x88/0x120

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
[<ffffffff811604a4>] __page_cache_alloc+0xb4/0xc0

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
[<ffffffff811627e8>] filemap_fault+0x188/0x3e0

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
[<ffffffffa0122cb6>] ext4_filemap_fault+0x36/0x50 [ext4]

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
[<ffffffff8118a24d>] __do_fault+0x3d/0x70

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
[<ffffffff8118e687>] handle_mm_fault+0xf27/0x1870

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
[<ffffffff810b2d91>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
[<ffffffff8105ea33>] __do_page_fault+0x183/0x3f0

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
[<ffffffff8105ecc2>] do_page_fault+0x22/0x30

Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
[<ffffffff814e03d8>] page_fault+0x28/0x30

Re: Memory / resource leak in 0.10.1.1 release

Posted by Jon Yeargers <jo...@cedexis.com>.
FWIW: I went through and removed all the 'custom' serdes from my code and
replaced them with 'string serdes'. The memory leak problem went away.

The code is a bit more cumbersome now as it's constantly flipping back and
forth between Objects and JSON.. but that seems to be what it takes to keep
it running.

On Thu, Dec 29, 2016 at 9:42 PM, Guozhang Wang <wa...@gmail.com> wrote:

> Hello Jon,
>
> It is hard to tell, since I cannot see how is your Aggregate() function is
> implemented as well.
>
> Note that the deserializer of transactionSerde is used in both `aggregate`
> and `KstreamBuilder.stream`, while the serializer of transactionSerde is
> only used in `aggregate`, so if you suspect the transactionSerde is the
> root cause, to narrow it down you can leave the topology as
>
>
> KStream<String,SumRecord> transactionKStream =  kStreamBuilder.stream(
> stringSerde,transactionSerde,TOPIC);
>
> transactionKStream.to(TOPIC-2);
>
> where TOPIC-2 should be pre-created.
>
> The above topology will also trigger both the serializer and deserializer
> of the transactionSerde, and if this topology also leads to memory leak,
> then it means it is not relevant to your aggregate function.
>
>
> Guozhang
>
>
> On Sun, Dec 25, 2016 at 4:15 AM, Jon Yeargers <jo...@cedexis.com>
> wrote:
>
> > I narrowed this problem down to this part of the topology (and yes, it's
> > 100% repro - for me):
> >
> > KStream<String,SumRecord> transactionKStream =
> >  kStreamBuilder.stream(stringSerde,transactionSerde,TOPIC);
> >
> > KTable<Windowed<String>, SumRecordCollector> ktAgg =
> > transactionKStream.groupByKey().aggregate(
> >         SumRecordCollector::new,
> >         new Aggregate(),
> >         TimeWindows.of(20 * 60 * 1000L),
> >         collectorSerde, "table_stream");
> >
> > Given that this is a pretty trivial, well-traveled piece of Kafka I can't
> > imagine it has a memory leak.
> >
> > So Im guessing that the serde I'm using is causing a problem somehow. The
> > 'transactionSerde' is just to get/set JSON into the 'SumRecord' object.
> > That Object is just a bunch of String and int fields so nothing
> interesting
> > there either.
> >
> > I'm attaching the two parts of the transactionSerde to see if anyone has
> > suggestions on how to find / fix this.
> >
> >
> >
> > On Thu, Dec 22, 2016 at 9:26 AM, Jon Yeargers <jo...@cedexis.com>
> > wrote:
> >
> >> Yes - that's the one. It's 100% reproducible (for me).
> >>
> >>
> >> On Thu, Dec 22, 2016 at 8:03 AM, Damian Guy <da...@gmail.com>
> wrote:
> >>
> >>> Hi Jon,
> >>>
> >>> Is this for the topology where you are doing something like:
> >>>
> >>> topology: kStream -> groupByKey.aggregate(minute) -> foreach
> >>>                              \-> groupByKey.aggregate(hour) -> foreach
> >>>
> >>> I'm trying to understand how i could reproduce your problem. I've not
> >>> seen
> >>> any such issues with 0.10.1.1, but then i'm not sure what you are
> doing.
> >>>
> >>> Thanks,
> >>> Damian
> >>>
> >>> On Thu, 22 Dec 2016 at 15:26 Jon Yeargers <jo...@cedexis.com>
> >>> wrote:
> >>>
> >>> > Im still hitting this leak with the released version of 0.10.1.1.
> >>> >
> >>> > Process mem % grows over the course of 10-20 minutes and eventually
> >>> the OS
> >>> > kills it.
> >>> >
> >>> > Messages like this appear in /var/log/messages:
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.793692] java
> invoked
> >>> > oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.798383] java
> >>> cpuset=/
> >>> > mems_allowed=0
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.801079] CPU: 0
> PID:
> >>> 9550
> >>> > Comm: java Tainted: G            E   4.4.19-29.55.amzn1.x86_64 #1
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Hardware
> >>> name:
> >>> > Xen HVM domU, BIOS 4.2.amazon 11/11/2016
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> >>> > 0000000000000000 ffff88071c517a70 ffffffff812c958f ffff88071c517c58
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> >>> > 0000000000000000 ffff88071c517b00 ffffffff811ce76d ffffffff8109db14
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> >>> > ffffffff810b2d91 0000000000000000 0000000000000010 ffffffff817d0fe9
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Call
> Trace:
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> >>> > [<ffffffff812c958f>] dump_stack+0x63/0x84
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> >>> > [<ffffffff811ce76d>] dump_header+0x5e/0x1d8
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> >>> > [<ffffffff8109db14>] ? set_next_entity+0xa4/0x710
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> >>> > [<ffffffff810b2d91>] ? __raw_callee_save___pv_queued_
> >>> spin_unlock+0x11/0x20
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> >>> > [<ffffffff81163ba5>] oom_kill_process+0x205/0x3d0
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> >>> > [<ffffffff81164201>] out_of_memory+0x431/0x480
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> >>> > [<ffffffff811692ce>] __alloc_pages_nodemask+0x91e/0xa60
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> >>> > [<ffffffff811ad0b8>] alloc_pages_current+0x88/0x120
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> >>> > [<ffffffff811604a4>] __page_cache_alloc+0xb4/0xc0
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> >>> > [<ffffffff811627e8>] filemap_fault+0x188/0x3e0
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> >>> > [<ffffffffa0122cb6>] ext4_filemap_fault+0x36/0x50 [ext4]
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> >>> > [<ffffffff8118a24d>] __do_fault+0x3d/0x70
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> >>> > [<ffffffff8118e687>] handle_mm_fault+0xf27/0x1870
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> >>> > [<ffffffff810b2d91>] ? __raw_callee_save___pv_queued_
> >>> spin_unlock+0x11/0x20
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> >>> > [<ffffffff8105ea33>] __do_page_fault+0x183/0x3f0
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> >>> > [<ffffffff8105ecc2>] do_page_fault+0x22/0x30
> >>> >
> >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> >>> > [<ffffffff814e03d8>] page_fault+0x28/0x30
> >>> >
> >>>
> >>
> >>
> >
>
>
> --
> -- Guozhang
>

Re: Memory / resource leak in 0.10.1.1 release

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Jon,

It is hard to tell, since I cannot see how is your Aggregate() function is
implemented as well.

Note that the deserializer of transactionSerde is used in both `aggregate`
and `KstreamBuilder.stream`, while the serializer of transactionSerde is
only used in `aggregate`, so if you suspect the transactionSerde is the
root cause, to narrow it down you can leave the topology as


KStream<String,SumRecord> transactionKStream =  kStreamBuilder.stream(
stringSerde,transactionSerde,TOPIC);

transactionKStream.to(TOPIC-2);

where TOPIC-2 should be pre-created.

The above topology will also trigger both the serializer and deserializer
of the transactionSerde, and if this topology also leads to memory leak,
then it means it is not relevant to your aggregate function.


Guozhang


On Sun, Dec 25, 2016 at 4:15 AM, Jon Yeargers <jo...@cedexis.com>
wrote:

> I narrowed this problem down to this part of the topology (and yes, it's
> 100% repro - for me):
>
> KStream<String,SumRecord> transactionKStream =
>  kStreamBuilder.stream(stringSerde,transactionSerde,TOPIC);
>
> KTable<Windowed<String>, SumRecordCollector> ktAgg =
> transactionKStream.groupByKey().aggregate(
>         SumRecordCollector::new,
>         new Aggregate(),
>         TimeWindows.of(20 * 60 * 1000L),
>         collectorSerde, "table_stream");
>
> Given that this is a pretty trivial, well-traveled piece of Kafka I can't
> imagine it has a memory leak.
>
> So Im guessing that the serde I'm using is causing a problem somehow. The
> 'transactionSerde' is just to get/set JSON into the 'SumRecord' object.
> That Object is just a bunch of String and int fields so nothing interesting
> there either.
>
> I'm attaching the two parts of the transactionSerde to see if anyone has
> suggestions on how to find / fix this.
>
>
>
> On Thu, Dec 22, 2016 at 9:26 AM, Jon Yeargers <jo...@cedexis.com>
> wrote:
>
>> Yes - that's the one. It's 100% reproducible (for me).
>>
>>
>> On Thu, Dec 22, 2016 at 8:03 AM, Damian Guy <da...@gmail.com> wrote:
>>
>>> Hi Jon,
>>>
>>> Is this for the topology where you are doing something like:
>>>
>>> topology: kStream -> groupByKey.aggregate(minute) -> foreach
>>>                              \-> groupByKey.aggregate(hour) -> foreach
>>>
>>> I'm trying to understand how i could reproduce your problem. I've not
>>> seen
>>> any such issues with 0.10.1.1, but then i'm not sure what you are doing.
>>>
>>> Thanks,
>>> Damian
>>>
>>> On Thu, 22 Dec 2016 at 15:26 Jon Yeargers <jo...@cedexis.com>
>>> wrote:
>>>
>>> > Im still hitting this leak with the released version of 0.10.1.1.
>>> >
>>> > Process mem % grows over the course of 10-20 minutes and eventually
>>> the OS
>>> > kills it.
>>> >
>>> > Messages like this appear in /var/log/messages:
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.793692] java invoked
>>> > oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.798383] java
>>> cpuset=/
>>> > mems_allowed=0
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.801079] CPU: 0 PID:
>>> 9550
>>> > Comm: java Tainted: G            E   4.4.19-29.55.amzn1.x86_64 #1
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Hardware
>>> name:
>>> > Xen HVM domU, BIOS 4.2.amazon 11/11/2016
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > 0000000000000000 ffff88071c517a70 ffffffff812c958f ffff88071c517c58
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > 0000000000000000 ffff88071c517b00 ffffffff811ce76d ffffffff8109db14
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > ffffffff810b2d91 0000000000000000 0000000000000010 ffffffff817d0fe9
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Call Trace:
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff812c958f>] dump_stack+0x63/0x84
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff811ce76d>] dump_header+0x5e/0x1d8
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff8109db14>] ? set_next_entity+0xa4/0x710
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff810b2d91>] ? __raw_callee_save___pv_queued_
>>> spin_unlock+0x11/0x20
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff81163ba5>] oom_kill_process+0x205/0x3d0
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff81164201>] out_of_memory+0x431/0x480
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff811692ce>] __alloc_pages_nodemask+0x91e/0xa60
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff811ad0b8>] alloc_pages_current+0x88/0x120
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff811604a4>] __page_cache_alloc+0xb4/0xc0
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff811627e8>] filemap_fault+0x188/0x3e0
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffffa0122cb6>] ext4_filemap_fault+0x36/0x50 [ext4]
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff8118a24d>] __do_fault+0x3d/0x70
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff8118e687>] handle_mm_fault+0xf27/0x1870
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff810b2d91>] ? __raw_callee_save___pv_queued_
>>> spin_unlock+0x11/0x20
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff8105ea33>] __do_page_fault+0x183/0x3f0
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff8105ecc2>] do_page_fault+0x22/0x30
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff814e03d8>] page_fault+0x28/0x30
>>> >
>>>
>>
>>
>


-- 
-- Guozhang

Re: Memory / resource leak in 0.10.1.1 release

Posted by Jon Yeargers <jo...@cedexis.com>.
I narrowed this problem down to this part of the topology (and yes, it's
100% repro - for me):

KStream<String,SumRecord> transactionKStream =
 kStreamBuilder.stream(stringSerde,transactionSerde,TOPIC);

KTable<Windowed<String>, SumRecordCollector> ktAgg =
transactionKStream.groupByKey().aggregate(
        SumRecordCollector::new,
        new Aggregate(),
        TimeWindows.of(20 * 60 * 1000L),
        collectorSerde, "table_stream");

Given that this is a pretty trivial, well-traveled piece of Kafka I can't
imagine it has a memory leak.

So Im guessing that the serde I'm using is causing a problem somehow. The
'transactionSerde' is just to get/set JSON into the 'SumRecord' object.
That Object is just a bunch of String and int fields so nothing interesting
there either.

I'm attaching the two parts of the transactionSerde to see if anyone has
suggestions on how to find / fix this.



On Thu, Dec 22, 2016 at 9:26 AM, Jon Yeargers <jo...@cedexis.com>
wrote:

> Yes - that's the one. It's 100% reproducible (for me).
>
>
> On Thu, Dec 22, 2016 at 8:03 AM, Damian Guy <da...@gmail.com> wrote:
>
>> Hi Jon,
>>
>> Is this for the topology where you are doing something like:
>>
>> topology: kStream -> groupByKey.aggregate(minute) -> foreach
>>                              \-> groupByKey.aggregate(hour) -> foreach
>>
>> I'm trying to understand how i could reproduce your problem. I've not seen
>> any such issues with 0.10.1.1, but then i'm not sure what you are doing.
>>
>> Thanks,
>> Damian
>>
>> On Thu, 22 Dec 2016 at 15:26 Jon Yeargers <jo...@cedexis.com>
>> wrote:
>>
>> > Im still hitting this leak with the released version of 0.10.1.1.
>> >
>> > Process mem % grows over the course of 10-20 minutes and eventually the
>> OS
>> > kills it.
>> >
>> > Messages like this appear in /var/log/messages:
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.793692] java invoked
>> > oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.798383] java cpuset=/
>> > mems_allowed=0
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.801079] CPU: 0 PID:
>> 9550
>> > Comm: java Tainted: G            E   4.4.19-29.55.amzn1.x86_64 #1
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Hardware
>> name:
>> > Xen HVM domU, BIOS 4.2.amazon 11/11/2016
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > 0000000000000000 ffff88071c517a70 ffffffff812c958f ffff88071c517c58
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > 0000000000000000 ffff88071c517b00 ffffffff811ce76d ffffffff8109db14
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > ffffffff810b2d91 0000000000000000 0000000000000010 ffffffff817d0fe9
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Call Trace:
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff812c958f>] dump_stack+0x63/0x84
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff811ce76d>] dump_header+0x5e/0x1d8
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff8109db14>] ? set_next_entity+0xa4/0x710
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff810b2d91>] ? __raw_callee_save___pv_queued_
>> spin_unlock+0x11/0x20
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff81163ba5>] oom_kill_process+0x205/0x3d0
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff81164201>] out_of_memory+0x431/0x480
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff811692ce>] __alloc_pages_nodemask+0x91e/0xa60
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff811ad0b8>] alloc_pages_current+0x88/0x120
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff811604a4>] __page_cache_alloc+0xb4/0xc0
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff811627e8>] filemap_fault+0x188/0x3e0
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffffa0122cb6>] ext4_filemap_fault+0x36/0x50 [ext4]
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff8118a24d>] __do_fault+0x3d/0x70
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff8118e687>] handle_mm_fault+0xf27/0x1870
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff810b2d91>] ? __raw_callee_save___pv_queued_
>> spin_unlock+0x11/0x20
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff8105ea33>] __do_page_fault+0x183/0x3f0
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff8105ecc2>] do_page_fault+0x22/0x30
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff814e03d8>] page_fault+0x28/0x30
>> >
>>
>
>

Re: Memory / resource leak in 0.10.1.1 release

Posted by Jon Yeargers <jo...@cedexis.com>.
Yes - that's the one. It's 100% reproducible (for me).


On Thu, Dec 22, 2016 at 8:03 AM, Damian Guy <da...@gmail.com> wrote:

> Hi Jon,
>
> Is this for the topology where you are doing something like:
>
> topology: kStream -> groupByKey.aggregate(minute) -> foreach
>                              \-> groupByKey.aggregate(hour) -> foreach
>
> I'm trying to understand how i could reproduce your problem. I've not seen
> any such issues with 0.10.1.1, but then i'm not sure what you are doing.
>
> Thanks,
> Damian
>
> On Thu, 22 Dec 2016 at 15:26 Jon Yeargers <jo...@cedexis.com>
> wrote:
>
> > Im still hitting this leak with the released version of 0.10.1.1.
> >
> > Process mem % grows over the course of 10-20 minutes and eventually the
> OS
> > kills it.
> >
> > Messages like this appear in /var/log/messages:
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.793692] java invoked
> > oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.798383] java cpuset=/
> > mems_allowed=0
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.801079] CPU: 0 PID:
> 9550
> > Comm: java Tainted: G            E   4.4.19-29.55.amzn1.x86_64 #1
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Hardware name:
> > Xen HVM domU, BIOS 4.2.amazon 11/11/2016
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > 0000000000000000 ffff88071c517a70 ffffffff812c958f ffff88071c517c58
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > 0000000000000000 ffff88071c517b00 ffffffff811ce76d ffffffff8109db14
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > ffffffff810b2d91 0000000000000000 0000000000000010 ffffffff817d0fe9
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Call Trace:
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [<ffffffff812c958f>] dump_stack+0x63/0x84
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [<ffffffff811ce76d>] dump_header+0x5e/0x1d8
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [<ffffffff8109db14>] ? set_next_entity+0xa4/0x710
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [<ffffffff810b2d91>] ? __raw_callee_save___pv_queued_
> spin_unlock+0x11/0x20
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [<ffffffff81163ba5>] oom_kill_process+0x205/0x3d0
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [<ffffffff81164201>] out_of_memory+0x431/0x480
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [<ffffffff811692ce>] __alloc_pages_nodemask+0x91e/0xa60
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [<ffffffff811ad0b8>] alloc_pages_current+0x88/0x120
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [<ffffffff811604a4>] __page_cache_alloc+0xb4/0xc0
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [<ffffffff811627e8>] filemap_fault+0x188/0x3e0
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [<ffffffffa0122cb6>] ext4_filemap_fault+0x36/0x50 [ext4]
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [<ffffffff8118a24d>] __do_fault+0x3d/0x70
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [<ffffffff8118e687>] handle_mm_fault+0xf27/0x1870
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [<ffffffff810b2d91>] ? __raw_callee_save___pv_queued_
> spin_unlock+0x11/0x20
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [<ffffffff8105ea33>] __do_page_fault+0x183/0x3f0
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [<ffffffff8105ecc2>] do_page_fault+0x22/0x30
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [<ffffffff814e03d8>] page_fault+0x28/0x30
> >
>

Re: Memory / resource leak in 0.10.1.1 release

Posted by Damian Guy <da...@gmail.com>.
Hi Jon,

Is this for the topology where you are doing something like:

topology: kStream -> groupByKey.aggregate(minute) -> foreach
                             \-> groupByKey.aggregate(hour) -> foreach

I'm trying to understand how i could reproduce your problem. I've not seen
any such issues with 0.10.1.1, but then i'm not sure what you are doing.

Thanks,
Damian

On Thu, 22 Dec 2016 at 15:26 Jon Yeargers <jo...@cedexis.com> wrote:

> Im still hitting this leak with the released version of 0.10.1.1.
>
> Process mem % grows over the course of 10-20 minutes and eventually the OS
> kills it.
>
> Messages like this appear in /var/log/messages:
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.793692] java invoked
> oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.798383] java cpuset=/
> mems_allowed=0
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.801079] CPU: 0 PID: 9550
> Comm: java Tainted: G            E   4.4.19-29.55.amzn1.x86_64 #1
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Hardware name:
> Xen HVM domU, BIOS 4.2.amazon 11/11/2016
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> 0000000000000000 ffff88071c517a70 ffffffff812c958f ffff88071c517c58
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> 0000000000000000 ffff88071c517b00 ffffffff811ce76d ffffffff8109db14
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> ffffffff810b2d91 0000000000000000 0000000000000010 ffffffff817d0fe9
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Call Trace:
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff812c958f>] dump_stack+0x63/0x84
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff811ce76d>] dump_header+0x5e/0x1d8
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff8109db14>] ? set_next_entity+0xa4/0x710
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff810b2d91>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff81163ba5>] oom_kill_process+0x205/0x3d0
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff81164201>] out_of_memory+0x431/0x480
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff811692ce>] __alloc_pages_nodemask+0x91e/0xa60
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff811ad0b8>] alloc_pages_current+0x88/0x120
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff811604a4>] __page_cache_alloc+0xb4/0xc0
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff811627e8>] filemap_fault+0x188/0x3e0
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffffa0122cb6>] ext4_filemap_fault+0x36/0x50 [ext4]
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff8118a24d>] __do_fault+0x3d/0x70
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff8118e687>] handle_mm_fault+0xf27/0x1870
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff810b2d91>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff8105ea33>] __do_page_fault+0x183/0x3f0
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff8105ecc2>] do_page_fault+0x22/0x30
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff814e03d8>] page_fault+0x28/0x30
>