You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bigtop.apache.org by Olaf Flebbe <of...@oflebbe.de> on 2016/06/05 19:44:14 UTC

CI stalled again: Flink ?

Hi,

I am looking into why our CI stalled again.

I am seeing a lot of out of memory conditions on slave 06 .

It seems to me that flink is causing it : For instance

https://ci.bigtop.apache.org/job/Bigtop-trunk-packages/BUILD_ENVIRONMENTS=ubuntu-14.04,COMPONENTS=flink,label=docker-slave/229/
https://ci.bigtop.apache.org/job/Bigtop-trunk-packages/227/BUILD_ENVIRONMENTS=fedora-20,COMPONENTS=flink,label=docker-slave/

Question to the flink guys: How much memory should we reserve for a flink compile process ?

Olaf

Re: CI stalled again: Flink ?

Posted by Márton Balassi <ba...@gmail.com>.
No problem, Olaf. It is great that you could clarify the situation.


On Mon, Jun 6, 2016 at 8:31 PM, Olaf Flebbe <of...@oflebbe.de> wrote:

> Thanks for double checking. 2GB is ok. I found some stalled containers on
> our build machine.
>

Re: CI stalled again: Flink ?

Posted by Olaf Flebbe <of...@oflebbe.de>.
Hi,

Thanks for double checking. 2GB is ok. I found some stalled containers on our build machine.
Stopping them freed up memory.
The CI is unstable because of corrupt images now, will doublecheck our side when it is stable again.

Thanks!
    Olaf

> Am 06.06.2016 um 13:40 schrieb Márton Balassi <ba...@gmail.com>:
> 
> Did a quick sanity check, building Flink 1.0.3 with skipping the tests
> (same as during the Bigtop package build) consumed under 2.8 GBs also
> considering the kernel, peaking on flink-tests, scala projects and
> flink-yarn. I had swap enabled as Robert suggests though and I might need
> to do a more fine-grained test then just looking on the usage graph on
> ganglia.
> 
> On Sun, Jun 5, 2016 at 10:45 PM, Olaf Flebbe <of...@oflebbe.de> wrote:
> 
>> Hi,
>> 
>> right they are on the kernel.
>> 
>> Our both build machines have 32GB and 16GB, respectivly and are running 8
>> and 4 jobs in parallel.
>> 
>> So we have 4 GB in worst case scenario.
>> 
>> Olaf
>> 
>>> Am 05.06.2016 um 22:18 schrieb Robert Metzger <rm...@apache.org>:
>>> 
>>> Hi,
>>> 
>>> I guess OOM errors by the kernel, not by the JVM, right?
>>> I first thought JVM OutOfMemoryExceptions .. we could have fixed those
>> with
>>> jvm memory parameters.
>>> But it for kernel OOM errors, we need more available memory on the build
>>> slaves. How much memory is typically available on the machines? (maybe
>>> enabling swap space is a good temporary workaround?)
>>> 
>>> Regards,
>>> Robert
>>> 
>>> 
>>> On Sun, Jun 5, 2016 at 10:09 PM, Márton Balassi <
>> balassi.marton@gmail.com>
>>> wrote:
>>> 
>>>> Hi Olaf,
>>>> 
>>>> I could not find the OOM's in the logs you linked, how can I reproduce
>>>> them?
>>>> 
>>>> The Flink team uses Travis for CI internally [1] and the machine there
>> is
>>>> based on the pre-2015 standard, which has 7,5 GB RAM in it [2], but I
>>>> regularly build Flink on 4GB RAM virtual images myself. How much memory
>> do
>>>> we give it currently?
>>>> 
>>>> [1] https://travis-ci.org/apache/flink/
>>>> [2]
>>>> 
>> https://docs.travis-ci.com/user/ci-environment/#Virtualization-environments
>>>> 
>>>> On Sun, Jun 5, 2016 at 9:54 PM, Olaf Flebbe <of...@oflebbe.de> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> While restarting the compile process, older logfiles have been removed.
>>>>> The Error messages where typically OOM errors.
>>>>> 
>>>>> Olaf
>>>>> 
>>>>>> Am 05.06.2016 um 21:44 schrieb Olaf Flebbe <of...@oflebbe.de>:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I am looking into why our CI stalled again.
>>>>>> 
>>>>>> I am seeing a lot of out of memory conditions on slave 06 .
>>>>>> 
>>>>>> It seems to me that flink is causing it : For instance
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> https://ci.bigtop.apache.org/job/Bigtop-trunk-packages/BUILD_ENVIRONMENTS=ubuntu-14.04,COMPONENTS=flink,label=docker-slave/229/
>>>>>> 
>>>>> 
>>>> 
>> https://ci.bigtop.apache.org/job/Bigtop-trunk-packages/227/BUILD_ENVIRONMENTS=fedora-20,COMPONENTS=flink,label=docker-slave/
>>>>>> 
>>>>>> Question to the flink guys: How much memory should we reserve for a
>>>>> flink compile process ?
>>>>>> 
>>>>>> Olaf
>>>>> 
>>>>> 
>>>> 
>> 
>> 


Re: CI stalled again: Flink ?

Posted by Márton Balassi <ba...@gmail.com>.
Did a quick sanity check, building Flink 1.0.3 with skipping the tests
(same as during the Bigtop package build) consumed under 2.8 GBs also
considering the kernel, peaking on flink-tests, scala projects and
flink-yarn. I had swap enabled as Robert suggests though and I might need
to do a more fine-grained test then just looking on the usage graph on
ganglia.

On Sun, Jun 5, 2016 at 10:45 PM, Olaf Flebbe <of...@oflebbe.de> wrote:

> Hi,
>
> right they are on the kernel.
>
> Our both build machines have 32GB and 16GB, respectivly and are running 8
> and 4 jobs in parallel.
>
> So we have 4 GB in worst case scenario.
>
> Olaf
>
> > Am 05.06.2016 um 22:18 schrieb Robert Metzger <rm...@apache.org>:
> >
> > Hi,
> >
> > I guess OOM errors by the kernel, not by the JVM, right?
> > I first thought JVM OutOfMemoryExceptions .. we could have fixed those
> with
> > jvm memory parameters.
> > But it for kernel OOM errors, we need more available memory on the build
> > slaves. How much memory is typically available on the machines? (maybe
> > enabling swap space is a good temporary workaround?)
> >
> > Regards,
> > Robert
> >
> >
> > On Sun, Jun 5, 2016 at 10:09 PM, Márton Balassi <
> balassi.marton@gmail.com>
> > wrote:
> >
> >> Hi Olaf,
> >>
> >> I could not find the OOM's in the logs you linked, how can I reproduce
> >> them?
> >>
> >> The Flink team uses Travis for CI internally [1] and the machine there
> is
> >> based on the pre-2015 standard, which has 7,5 GB RAM in it [2], but I
> >> regularly build Flink on 4GB RAM virtual images myself. How much memory
> do
> >> we give it currently?
> >>
> >> [1] https://travis-ci.org/apache/flink/
> >> [2]
> >>
> https://docs.travis-ci.com/user/ci-environment/#Virtualization-environments
> >>
> >> On Sun, Jun 5, 2016 at 9:54 PM, Olaf Flebbe <of...@oflebbe.de> wrote:
> >>
> >>> Hi,
> >>>
> >>> While restarting the compile process, older logfiles have been removed.
> >>> The Error messages where typically OOM errors.
> >>>
> >>> Olaf
> >>>
> >>>> Am 05.06.2016 um 21:44 schrieb Olaf Flebbe <of...@oflebbe.de>:
> >>>>
> >>>> Hi,
> >>>>
> >>>> I am looking into why our CI stalled again.
> >>>>
> >>>> I am seeing a lot of out of memory conditions on slave 06 .
> >>>>
> >>>> It seems to me that flink is causing it : For instance
> >>>>
> >>>>
> >>>
> >>
> https://ci.bigtop.apache.org/job/Bigtop-trunk-packages/BUILD_ENVIRONMENTS=ubuntu-14.04,COMPONENTS=flink,label=docker-slave/229/
> >>>>
> >>>
> >>
> https://ci.bigtop.apache.org/job/Bigtop-trunk-packages/227/BUILD_ENVIRONMENTS=fedora-20,COMPONENTS=flink,label=docker-slave/
> >>>>
> >>>> Question to the flink guys: How much memory should we reserve for a
> >>> flink compile process ?
> >>>>
> >>>> Olaf
> >>>
> >>>
> >>
>
>

Re: CI stalled again: Flink ?

Posted by Olaf Flebbe <of...@oflebbe.de>.
Hi,

right they are on the kernel.

Our both build machines have 32GB and 16GB, respectivly and are running 8 and 4 jobs in parallel.

So we have 4 GB in worst case scenario.

Olaf

> Am 05.06.2016 um 22:18 schrieb Robert Metzger <rm...@apache.org>:
> 
> Hi,
> 
> I guess OOM errors by the kernel, not by the JVM, right?
> I first thought JVM OutOfMemoryExceptions .. we could have fixed those with
> jvm memory parameters.
> But it for kernel OOM errors, we need more available memory on the build
> slaves. How much memory is typically available on the machines? (maybe
> enabling swap space is a good temporary workaround?)
> 
> Regards,
> Robert
> 
> 
> On Sun, Jun 5, 2016 at 10:09 PM, Márton Balassi <ba...@gmail.com>
> wrote:
> 
>> Hi Olaf,
>> 
>> I could not find the OOM's in the logs you linked, how can I reproduce
>> them?
>> 
>> The Flink team uses Travis for CI internally [1] and the machine there is
>> based on the pre-2015 standard, which has 7,5 GB RAM in it [2], but I
>> regularly build Flink on 4GB RAM virtual images myself. How much memory do
>> we give it currently?
>> 
>> [1] https://travis-ci.org/apache/flink/
>> [2]
>> https://docs.travis-ci.com/user/ci-environment/#Virtualization-environments
>> 
>> On Sun, Jun 5, 2016 at 9:54 PM, Olaf Flebbe <of...@oflebbe.de> wrote:
>> 
>>> Hi,
>>> 
>>> While restarting the compile process, older logfiles have been removed.
>>> The Error messages where typically OOM errors.
>>> 
>>> Olaf
>>> 
>>>> Am 05.06.2016 um 21:44 schrieb Olaf Flebbe <of...@oflebbe.de>:
>>>> 
>>>> Hi,
>>>> 
>>>> I am looking into why our CI stalled again.
>>>> 
>>>> I am seeing a lot of out of memory conditions on slave 06 .
>>>> 
>>>> It seems to me that flink is causing it : For instance
>>>> 
>>>> 
>>> 
>> https://ci.bigtop.apache.org/job/Bigtop-trunk-packages/BUILD_ENVIRONMENTS=ubuntu-14.04,COMPONENTS=flink,label=docker-slave/229/
>>>> 
>>> 
>> https://ci.bigtop.apache.org/job/Bigtop-trunk-packages/227/BUILD_ENVIRONMENTS=fedora-20,COMPONENTS=flink,label=docker-slave/
>>>> 
>>>> Question to the flink guys: How much memory should we reserve for a
>>> flink compile process ?
>>>> 
>>>> Olaf
>>> 
>>> 
>> 


Re: CI stalled again: Flink ?

Posted by Robert Metzger <rm...@apache.org>.
Hi,

I guess OOM errors by the kernel, not by the JVM, right?
I first thought JVM OutOfMemoryExceptions .. we could have fixed those with
jvm memory parameters.
But it for kernel OOM errors, we need more available memory on the build
slaves. How much memory is typically available on the machines? (maybe
enabling swap space is a good temporary workaround?)

Regards,
Robert


On Sun, Jun 5, 2016 at 10:09 PM, Márton Balassi <ba...@gmail.com>
wrote:

> Hi Olaf,
>
> I could not find the OOM's in the logs you linked, how can I reproduce
> them?
>
> The Flink team uses Travis for CI internally [1] and the machine there is
> based on the pre-2015 standard, which has 7,5 GB RAM in it [2], but I
> regularly build Flink on 4GB RAM virtual images myself. How much memory do
> we give it currently?
>
> [1] https://travis-ci.org/apache/flink/
> [2]
> https://docs.travis-ci.com/user/ci-environment/#Virtualization-environments
>
> On Sun, Jun 5, 2016 at 9:54 PM, Olaf Flebbe <of...@oflebbe.de> wrote:
>
> > Hi,
> >
> > While restarting the compile process, older logfiles have been removed.
> > The Error messages where typically OOM errors.
> >
> > Olaf
> >
> > > Am 05.06.2016 um 21:44 schrieb Olaf Flebbe <of...@oflebbe.de>:
> > >
> > > Hi,
> > >
> > > I am looking into why our CI stalled again.
> > >
> > > I am seeing a lot of out of memory conditions on slave 06 .
> > >
> > > It seems to me that flink is causing it : For instance
> > >
> > >
> >
> https://ci.bigtop.apache.org/job/Bigtop-trunk-packages/BUILD_ENVIRONMENTS=ubuntu-14.04,COMPONENTS=flink,label=docker-slave/229/
> > >
> >
> https://ci.bigtop.apache.org/job/Bigtop-trunk-packages/227/BUILD_ENVIRONMENTS=fedora-20,COMPONENTS=flink,label=docker-slave/
> > >
> > > Question to the flink guys: How much memory should we reserve for a
> > flink compile process ?
> > >
> > > Olaf
> >
> >
>

Re: CI stalled again: Flink ?

Posted by Márton Balassi <ba...@gmail.com>.
Hi Olaf,

I could not find the OOM's in the logs you linked, how can I reproduce
them?

The Flink team uses Travis for CI internally [1] and the machine there is
based on the pre-2015 standard, which has 7,5 GB RAM in it [2], but I
regularly build Flink on 4GB RAM virtual images myself. How much memory do
we give it currently?

[1] https://travis-ci.org/apache/flink/
[2]
https://docs.travis-ci.com/user/ci-environment/#Virtualization-environments

On Sun, Jun 5, 2016 at 9:54 PM, Olaf Flebbe <of...@oflebbe.de> wrote:

> Hi,
>
> While restarting the compile process, older logfiles have been removed.
> The Error messages where typically OOM errors.
>
> Olaf
>
> > Am 05.06.2016 um 21:44 schrieb Olaf Flebbe <of...@oflebbe.de>:
> >
> > Hi,
> >
> > I am looking into why our CI stalled again.
> >
> > I am seeing a lot of out of memory conditions on slave 06 .
> >
> > It seems to me that flink is causing it : For instance
> >
> >
> https://ci.bigtop.apache.org/job/Bigtop-trunk-packages/BUILD_ENVIRONMENTS=ubuntu-14.04,COMPONENTS=flink,label=docker-slave/229/
> >
> https://ci.bigtop.apache.org/job/Bigtop-trunk-packages/227/BUILD_ENVIRONMENTS=fedora-20,COMPONENTS=flink,label=docker-slave/
> >
> > Question to the flink guys: How much memory should we reserve for a
> flink compile process ?
> >
> > Olaf
>
>

Re: CI stalled again: Flink ?

Posted by Olaf Flebbe <of...@oflebbe.de>.
Hi,

While restarting the compile process, older logfiles have been removed. The Error messages where typically OOM errors.

Olaf

> Am 05.06.2016 um 21:44 schrieb Olaf Flebbe <of...@oflebbe.de>:
> 
> Hi,
> 
> I am looking into why our CI stalled again.
> 
> I am seeing a lot of out of memory conditions on slave 06 .
> 
> It seems to me that flink is causing it : For instance
> 
> https://ci.bigtop.apache.org/job/Bigtop-trunk-packages/BUILD_ENVIRONMENTS=ubuntu-14.04,COMPONENTS=flink,label=docker-slave/229/
> https://ci.bigtop.apache.org/job/Bigtop-trunk-packages/227/BUILD_ENVIRONMENTS=fedora-20,COMPONENTS=flink,label=docker-slave/
> 
> Question to the flink guys: How much memory should we reserve for a flink compile process ?
> 
> Olaf