You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Alfredo Carneiro <al...@simbioseventures.com> on 2016/03/17 01:12:48 UTC

Unstability on Mesos 0.27

Hello guys,

I am using Mesos 0.27 with different kinds of applications, such as,
crawlers, databases and websites. However, I have faced many crashes and I
couldn't find what it is the matter.

We have 14 machines with 8Gb of ram and 4 cpu each. Usually, we run about
40 instance of our crawler, which they start stopping of nowhere (but the
containers keep running). The day before yesterday we decided try to test
our entire infrastrcuture and we scaled our crawler up to 110 instances.
Unfortunately, today we've faced a big crash that affected mainly our
crawler and our databases.

So, I am wondering if anyone else have the same problem, such as apps which
crashes of nowhere or something else which could be related to some
unstability on Mesos.

-- 
Alfredo Miranda

Re: Unstability on Mesos 0.27

Posted by Artem Harutyunyan <ar...@mesosphere.io>.
Hi Guillermo,

We would really like to help you, and to understand what the issues are.
Could you please send us all the logs you have so we can inspect them and
figure out what happened?

Artem.

On Thursday, March 17, 2016, Guillermo Rodriguez <gu...@spritekin.com>
wrote:

> Update to 0.27.2 or wait for 0.28.0.
>
> I experienced many crashes as well with 0.27.1 due to crashes in the
> frameworks bringing down the whole cluster (swarm specially). Also problems
> in the resource precision that also crashed the servers and crashes when
> nodes disconnected.
>
> I really found 0.27 very unstable.
>
> Many of this problems were solved for 0.27.2 and my latest environment has
> proven way more stable. It is still not fully stable as the cluster crashed
> yesterday due to a crash in marathon, but way better overall and quick to
> recover.
>
> Luck!
> Guimo
>
>
> ------------------------------
> *From*: "Klaus Ma" <klaus1982.cn@gmail.com
> <javascript:_e(%7B%7D,'cvml','klaus1982.cn@gmail.com');>>
> *Sent*: Thursday, March 17, 2016 1:36 PM
> *To*: user@mesos.apache.org
> <javascript:_e(%7B%7D,'cvml','user@mesos.apache.org');>
> *Cc*: "Gabriel Menegatti" <gabriel@simbioseventures.com
> <javascript:_e(%7B%7D,'cvml','gabriel@simbioseventures.com');>>
> *Subject*: Re: Unstability on Mesos 0.27
>
> If Mesos daemon crashed, I'd suggest to log a JIRA and append more detail,
> e.g. steps, master/agent log.
>
> ----
> Da (Klaus), Ma (??) | PMP® | Advisory Software Engineer
> Platform OpenSource Technology, STG, IBM GCG
> +86-10-8245 4084 | klaus1982.cn@gmail.com
> <javascript:_e(%7B%7D,'cvml','klaus1982.cn@gmail.com');> | http://k82.me
>
> On Thu, Mar 17, 2016 at 8:26 AM, Vinod Kone <vinodkone@apache.org
> <javascript:_e(%7B%7D,'cvml','vinodkone@apache.org');>> wrote:
>>
>> Hey Gabriel,
>>
>> Could you share more details on what the crashes are and what your setup
>> is (docker containerizer?). Any logs (master, agent, application) that can
>> shed light would be useful to diagnose.
>>
>> On Wed, Mar 16, 2016 at 5:12 PM, Alfredo Carneiro <
>> alfredo@simbioseventures.com
>> <javascript:_e(%7B%7D,'cvml','alfredo@simbioseventures.com');>> wrote:
>>>
>>> Hello guys,
>>>
>>> I am using Mesos 0.27 with different kinds of applications, such as,
>>> crawlers, databases and websites. However, I have faced many crashes and I
>>> couldn't find what it is the matter.
>>>
>>> We have 14 machines with 8Gb of ram and 4 cpu each. Usually, we run
>>> about 40 instance of our crawler, which they start stopping of nowhere (but
>>> the containers keep running). The day before yesterday we decided try to
>>> test our entire infrastrcuture and we scaled our crawler up to 110
>>> instances. Unfortunately, today we've faced a big crash that affected
>>> mainly our crawler and our databases.
>>>
>>> So, I am wondering if anyone else have the same problem, such as apps
>>> which crashes of nowhere or something else which could be related to some
>>> unstability on Mesos.
>>>
>>> --
>>> Alfredo Miranda
>>>
>>>
>>

Re: Unstability on Mesos 0.27

Posted by Alfredo Carneiro <al...@simbioseventures.com>.
Sorry for my delay guys.

Actually the crashes aren't on Mesos(master and slave), or at least I
didn't find any thing weird in the logs. The problem was in the tasks which
couldn't start properly, and as I said sometimes the crawlers just stop
working. I did what some people advise me to update to the version 0.27.2,
since them I didn't see any other related problem. I will try for a few
days and I will get back to you guys.

Really thanks for your help.

On Thu, Mar 17, 2016 at 6:08 PM, Guillermo Rodriguez <gu...@spritekin.com>
wrote:

> Hi,
>
> I have reported all my problems already. That's why I installed 0.27.2 and
> I'm waiting for 0.28 because I see many solutions there.
>
> The lastest crash I was unable to identify. There were no logs at all of
> any error. I just know marathon and mesos decided to shutdown at the same
> time. I guess marathon won't crash mesos but crashing mesos will crash
> marathon.
>
> Will report if I see this again.
>
> Luck!
>
>
> ------------------------------
> *From*: "Jie Yu" <yu...@gmail.com>
> *Sent*: Friday, March 18, 2016 2:48 AM
> *To*: "user" <us...@mesos.apache.org>, guimo@spritekin.com
>
> *Cc*: "Gabriel Menegatti" <ga...@simbioseventures.com>
> *Subject*: Re: Unstability on Mesos 0.27
>
> Thanks for reporting! Can you be more specific about which component
> crashes a lot? Is it the framework, the master, the agent, or the executor.
> As Artem and Vinod mentioned, it'll be really helpful if you can provide
> the relevant log (master/agent/executor's log) so that we can pinpoint the
> issue.
>
> - Jie
>
> On Thu, Mar 17, 2016 at 1:45 AM, Guillermo Rodriguez <gu...@spritekin.com>
> wrote:
>>
>> Update to 0.27.2 or wait for 0.28.0.
>>
>> I experienced many crashes as well with 0.27.1 due to crashes in the
>> frameworks bringing down the whole cluster (swarm specially). Also problems
>> in the resource precision that also crashed the servers and crashes when
>> nodes disconnected.
>>
>> I really found 0.27 very unstable.
>>
>> Many of this problems were solved for 0.27.2 and my latest environment
>> has proven way more stable. It is still not fully stable as the cluster
>> crashed yesterday due to a crash in marathon, but way better overall and
>> quick to recover.
>>
>> Luck!
>> Guimo
>>
>>
>> ------------------------------
>> *From*: "Klaus Ma" <kl...@gmail.com>
>> *Sent*: Thursday, March 17, 2016 1:36 PM
>> *To*: user@mesos.apache.org
>> *Cc*: "Gabriel Menegatti" <ga...@simbioseventures.com>
>> *Subject*: Re: Unstability on Mesos 0.27
>>
>> If Mesos daemon crashed, I'd suggest to log a JIRA and append more
>> detail, e.g. steps, master/agent log.
>>
>> ----
>> Da (Klaus), Ma (??) | PMP® | Advisory Software Engineer
>> Platform OpenSource Technology, STG, IBM GCG
>> +86-10-8245 4084 | klaus1982.cn@gmail.com | http://k82.me
>>
>> On Thu, Mar 17, 2016 at 8:26 AM, Vinod Kone <vi...@apache.org>
>> wrote:
>>>
>>> Hey Gabriel,
>>>
>>> Could you share more details on what the crashes are and what your setup
>>> is (docker containerizer?). Any logs (master, agent, application) that can
>>> shed light would be useful to diagnose.
>>>
>>> On Wed, Mar 16, 2016 at 5:12 PM, Alfredo Carneiro <
>>> alfredo@simbioseventures.com> wrote:
>>>>
>>>> Hello guys,
>>>>
>>>> I am using Mesos 0.27 with different kinds of applications, such as,
>>>> crawlers, databases and websites. However, I have faced many crashes and I
>>>> couldn't find what it is the matter.
>>>>
>>>> We have 14 machines with 8Gb of ram and 4 cpu each. Usually, we run
>>>> about 40 instance of our crawler, which they start stopping of nowhere (but
>>>> the containers keep running). The day before yesterday we decided try to
>>>> test our entire infrastrcuture and we scaled our crawler up to 110
>>>> instances. Unfortunately, today we've faced a big crash that affected
>>>> mainly our crawler and our databases.
>>>>
>>>> So, I am wondering if anyone else have the same problem, such as apps
>>>> which crashes of nowhere or something else which could be related to some
>>>> unstability on Mesos.
>>>>
>>>> --
>>>> Alfredo Miranda
>>>>
>>>>
>>>


-- 
Alfredo Miranda

Re: Unstability on Mesos 0.27

Posted by Guillermo Rodriguez <gu...@spritekin.com>.
Hi,
  
 I have reported all my problems already. That's why I installed 0.27.2 and I'm waiting for 0.28 because I see many solutions there.
  
 The lastest crash I was unable to identify. There were no logs at all of any error. I just know marathon and mesos decided to shutdown at the same time. I guess marathon won't crash mesos but crashing mesos will crash marathon.
  
 Will report if I see this again.
  
 Luck!
  
  

----------------------------------------
 From: "Jie Yu" <yu...@gmail.com>
Sent: Friday, March 18, 2016 2:48 AM
To: "user" <us...@mesos.apache.org>, guimo@spritekin.com
Cc: "Gabriel Menegatti" <ga...@simbioseventures.com>
Subject: Re: Unstability on Mesos 0.27   
 Thanks for reporting! Can you be more specific about which component crashes a lot? Is it the framework, the master, the agent, or the executor. As Artem and Vinod mentioned, it'll be really helpful if you can provide the relevant log (master/agent/executor's log) so that we can pinpoint the issue.  
 - Jie   On Thu, Mar 17, 2016 at 1:45 AM, Guillermo Rodriguez <gu...@spritekin.com> wrote:   Update to 0.27.2 or wait for 0.28.0.
  
 I experienced many crashes as well with 0.27.1 due to crashes in the frameworks bringing down the whole cluster (swarm specially). Also problems in the resource precision that also crashed the servers and crashes when nodes disconnected.
  
 I really found 0.27 very unstable.
  
 Many of this problems were solved for 0.27.2 and my latest environment has proven way more stable. It is still not fully stable as the cluster crashed yesterday due to a crash in marathon, but way better overall and quick to recover.
  
 Luck!
 Guimo
  
  

----------------------------------------
 From: "Klaus Ma" <kl...@gmail.com>
Sent: Thursday, March 17, 2016 1:36 PM
To: user@mesos.apache.org
Cc: "Gabriel Menegatti" <ga...@simbioseventures.com>
Subject: Re: Unstability on Mesos 0.27    
    If Mesos daemon crashed, I'd suggest to log a JIRA and append more detail, e.g. steps, master/agent log.

                   ---- 
 Da (Klaus), Ma (??) | PMP® | Advisory Software Engineer  Platform OpenSource Technology, STG, IBM GCG  +86-10-8245 4084 | klaus1982.cn@gmail.com | http://k82.me 

     On Thu, Mar 17, 2016 at 8:26 AM, Vinod Kone <vi...@apache.org> wrote:    Hey Gabriel,    
   Could you share more details on what the crashes are and what your setup is (docker containerizer?). Any logs (master, agent, application) that can shed light would be useful to diagnose.
         On Wed, Mar 16, 2016 at 5:12 PM, Alfredo Carneiro <al...@simbioseventures.com> wrote:     Hello guys,
  
 I am using Mesos 0.27 with different kinds of applications, such as, crawlers, databases and websites. However, I have faced many crashes and I couldn't find what it is the matter.
  
 We have 14 machines with 8Gb of ram and 4 cpu each. Usually, we run about 40 instance of our crawler, which they start stopping of nowhere (but the containers keep running). The day before yesterday we decided try to test our entire infrastrcuture and we scaled our crawler up to 110 instances. Unfortunately, today we've faced a big crash that affected mainly our crawler and our databases.
  
 So, I am wondering if anyone else have the same problem, such as apps which crashes of nowhere or something else which could be related to some unstability on Mesos.
  
 --  
 Alfredo Miranda 
  



Re: Unstability on Mesos 0.27

Posted by Jie Yu <yu...@gmail.com>.
Thanks for reporting! Can you be more specific about which component
crashes a lot? Is it the framework, the master, the agent, or the executor.
As Artem and Vinod mentioned, it'll be really helpful if you can provide
the relevant log (master/agent/executor's log) so that we can pinpoint the
issue.

- Jie

On Thu, Mar 17, 2016 at 1:45 AM, Guillermo Rodriguez <gu...@spritekin.com>
wrote:

> Update to 0.27.2 or wait for 0.28.0.
>
> I experienced many crashes as well with 0.27.1 due to crashes in the
> frameworks bringing down the whole cluster (swarm specially). Also problems
> in the resource precision that also crashed the servers and crashes when
> nodes disconnected.
>
> I really found 0.27 very unstable.
>
> Many of this problems were solved for 0.27.2 and my latest environment has
> proven way more stable. It is still not fully stable as the cluster crashed
> yesterday due to a crash in marathon, but way better overall and quick to
> recover.
>
> Luck!
> Guimo
>
>
> ------------------------------
> *From*: "Klaus Ma" <kl...@gmail.com>
> *Sent*: Thursday, March 17, 2016 1:36 PM
> *To*: user@mesos.apache.org
> *Cc*: "Gabriel Menegatti" <ga...@simbioseventures.com>
> *Subject*: Re: Unstability on Mesos 0.27
>
> If Mesos daemon crashed, I'd suggest to log a JIRA and append more detail,
> e.g. steps, master/agent log.
>
> ----
> Da (Klaus), Ma (??) | PMP® | Advisory Software Engineer
> Platform OpenSource Technology, STG, IBM GCG
> +86-10-8245 4084 | klaus1982.cn@gmail.com | http://k82.me
>
> On Thu, Mar 17, 2016 at 8:26 AM, Vinod Kone <vi...@apache.org> wrote:
>>
>> Hey Gabriel,
>>
>> Could you share more details on what the crashes are and what your setup
>> is (docker containerizer?). Any logs (master, agent, application) that can
>> shed light would be useful to diagnose.
>>
>> On Wed, Mar 16, 2016 at 5:12 PM, Alfredo Carneiro <
>> alfredo@simbioseventures.com> wrote:
>>>
>>> Hello guys,
>>>
>>> I am using Mesos 0.27 with different kinds of applications, such as,
>>> crawlers, databases and websites. However, I have faced many crashes and I
>>> couldn't find what it is the matter.
>>>
>>> We have 14 machines with 8Gb of ram and 4 cpu each. Usually, we run
>>> about 40 instance of our crawler, which they start stopping of nowhere (but
>>> the containers keep running). The day before yesterday we decided try to
>>> test our entire infrastrcuture and we scaled our crawler up to 110
>>> instances. Unfortunately, today we've faced a big crash that affected
>>> mainly our crawler and our databases.
>>>
>>> So, I am wondering if anyone else have the same problem, such as apps
>>> which crashes of nowhere or something else which could be related to some
>>> unstability on Mesos.
>>>
>>> --
>>> Alfredo Miranda
>>>
>>>
>>

Re: Unstability on Mesos 0.27

Posted by Guillermo Rodriguez <gu...@spritekin.com>.
Update to 0.27.2 or wait for 0.28.0.
  
 I experienced many crashes as well with 0.27.1 due to crashes in the frameworks bringing down the whole cluster (swarm specially). Also problems in the resource precision that also crashed the servers and crashes when nodes disconnected.
  
 I really found 0.27 very unstable.
  
 Many of this problems were solved for 0.27.2 and my latest environment has proven way more stable. It is still not fully stable as the cluster crashed yesterday due to a crash in marathon, but way better overall and quick to recover.
  
 Luck!
 Guimo
  
  

----------------------------------------
 From: "Klaus Ma" <kl...@gmail.com>
Sent: Thursday, March 17, 2016 1:36 PM
To: user@mesos.apache.org
Cc: "Gabriel Menegatti" <ga...@simbioseventures.com>
Subject: Re: Unstability on Mesos 0.27   
  If Mesos daemon crashed, I'd suggest to log a JIRA and append more detail, e.g. steps, master/agent log.

               ---- 
 Da (Klaus), Ma (??) | PMP® | Advisory Software Engineer  Platform OpenSource Technology, STG, IBM GCG  +86-10-8245 4084 | klaus1982.cn@gmail.com | http://k82.me 

   On Thu, Mar 17, 2016 at 8:26 AM, Vinod Kone <vi...@apache.org> wrote:   Hey Gabriel,  
 Could you share more details on what the crashes are and what your setup is (docker containerizer?). Any logs (master, agent, application) that can shed light would be useful to diagnose.
     On Wed, Mar 16, 2016 at 5:12 PM, Alfredo Carneiro <al...@simbioseventures.com> wrote:    Hello guys,
  
 I am using Mesos 0.27 with different kinds of applications, such as, crawlers, databases and websites. However, I have faced many crashes and I couldn't find what it is the matter.
  
 We have 14 machines with 8Gb of ram and 4 cpu each. Usually, we run about 40 instance of our crawler, which they start stopping of nowhere (but the containers keep running). The day before yesterday we decided try to test our entire infrastrcuture and we scaled our crawler up to 110 instances. Unfortunately, today we've faced a big crash that affected mainly our crawler and our databases.
  
 So, I am wondering if anyone else have the same problem, such as apps which crashes of nowhere or something else which could be related to some unstability on Mesos.
  
 --  
 Alfredo Miranda 
  



Re: Unstability on Mesos 0.27

Posted by Klaus Ma <kl...@gmail.com>.
If Mesos daemon crashed, I'd suggest to log a JIRA and append more detail,
e.g. steps, master/agent log.

----
Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer
Platform OpenSource Technology, STG, IBM GCG
+86-10-8245 4084 | klaus1982.cn@gmail.com | http://k82.me

On Thu, Mar 17, 2016 at 8:26 AM, Vinod Kone <vi...@apache.org> wrote:

> Hey Gabriel,
>
> Could you share more details on what the crashes are and what your setup
> is (docker containerizer?). Any logs (master, agent, application) that can
> shed light would be useful to diagnose.
>
> On Wed, Mar 16, 2016 at 5:12 PM, Alfredo Carneiro <
> alfredo@simbioseventures.com> wrote:
>
>> Hello guys,
>>
>> I am using Mesos 0.27 with different kinds of applications, such as,
>> crawlers, databases and websites. However, I have faced many crashes and I
>> couldn't find what it is the matter.
>>
>> We have 14 machines with 8Gb of ram and 4 cpu each. Usually, we run about
>> 40 instance of our crawler, which they start stopping of nowhere (but the
>> containers keep running). The day before yesterday we decided try to test
>> our entire infrastrcuture and we scaled our crawler up to 110 instances.
>> Unfortunately, today we've faced a big crash that affected mainly our
>> crawler and our databases.
>>
>> So, I am wondering if anyone else have the same problem, such as apps
>> which crashes of nowhere or something else which could be related to some
>> unstability on Mesos.
>>
>> --
>> Alfredo Miranda
>>
>
>

Re: Unstability on Mesos 0.27

Posted by Vinod Kone <vi...@apache.org>.
Hey Gabriel,

Could you share more details on what the crashes are and what your setup is
(docker containerizer?). Any logs (master, agent, application) that can
shed light would be useful to diagnose.

On Wed, Mar 16, 2016 at 5:12 PM, Alfredo Carneiro <
alfredo@simbioseventures.com> wrote:

> Hello guys,
>
> I am using Mesos 0.27 with different kinds of applications, such as,
> crawlers, databases and websites. However, I have faced many crashes and I
> couldn't find what it is the matter.
>
> We have 14 machines with 8Gb of ram and 4 cpu each. Usually, we run about
> 40 instance of our crawler, which they start stopping of nowhere (but the
> containers keep running). The day before yesterday we decided try to test
> our entire infrastrcuture and we scaled our crawler up to 110 instances.
> Unfortunately, today we've faced a big crash that affected mainly our
> crawler and our databases.
>
> So, I am wondering if anyone else have the same problem, such as apps
> which crashes of nowhere or something else which could be related to some
> unstability on Mesos.
>
> --
> Alfredo Miranda
>