You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@cloudstack.apache.org by Rohit Yadav <ro...@shapeblue.com> on 2016/05/03 17:18:15 UTC

Re: [ANNOUNCE] Open source distributed virtual machine scheduling platform

Nice feature :)

I did not look at the code but I'm curious on how you're powering off hosts, I think with my out-of-band management PR you can use the oobm subsystem to perform power management operations for IPMI 2.0 enabled hosts.

Also curious how you implemented the heuristics and wrote tests (esp. integration ones), some of us had a related discussion about such a feature and we looked at this paper from VMware DRS team: http://www.waldspurger.org/carl/papers/drs-vmtj-mar12.pdf

Regards,
Rohit Yadav


Regards,

Rohit Yadav

rohit.yadav@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
On Apr 27 2016, at 2:29 am, Gabriel Beims Bräscher <ga...@gmail.com> wrote:

Hello CloudStack community members (@dev and @users),

This email is meant to announce the publication of a project on Github that
provides a distributed virtual machine scheduling platform that can be
easily integrated with Apache CloudStack (ACS). The project is available at
[1], you can find a detailed explanation of the idea of the project, its
aspirations, basic concepts, installation and uninstallation processes and
other information at [2]. Also, if you want to know more about the
Autonomiccs and its creators, you can access the link [3].

The code that was opened at Github is part of a bigger system that has the
goal of managing a cloud computing environment autonomously. All of that is
being developed and used in my Ph. D. thesis and the masters’ thesis of
some colleagues. The formalization of that component will be published at
the 12th IEEE World Congress on Services (SERVICES 2016) at San Francisco
USA.

You can see the stats of our code at [4] and [5]. Right now we only have
~40% of code test coverage. However, we intend to increase that value to
~60% until next week and ~90% until the end of June.

To give you a picture of what we are preparing for the future, we can
highlight the following goals for this year (You can find others short term
goals at [6]):

   -

   Integrate our platform [1] with a multi-agent system (MAS) platform, in
   order to facilitate the development of agents. Currently, we are using
   Spring-integration to “emulate” and an agent life cycle; that can become a
   problem when needing to add more agents and they start to communicate with
   each other. Therefore, we will integrate the platform in [1] with JADE [7];
   -

   Today the metrics about the use of resource are not properly gathered by
   ACS; in order to develop more accurate predictions we need to store
   resource usage metrics. Also, those metrics have to be gathered in a
   distributed way without causing service degradation. For that and a few
   other reasons (you can send us an email so we can provide you more
   details), we are developing an autonomic monitoring platform that will
   integrate with the system available in [1];
   -

   We also foresee the need to develop a better way to visualize the cloud
   environment, a way to detect hot spots (pods and hosts) with higher
   resource usage trends (VMs trends). We see the need to change the rustic
   view of the environment with tables for a better suitable one for humans
   (this is a surprise that we intend to present at the CCCBR).

We hope you like the software and that it meets your expectations. If it
does not suffice all of your needs, let’s work together to improve it. If
you have any doubts or suggestions please send us an email; we will reply
it as fast as we can. Also, critics that can help us improve that platform
are very welcome.

[1] https://github.com/Autonomiccs/autonomiccs-platform

[2] https://github.com/Autonomiccs/autonomiccs-platform/wiki

[3] http://autonomiccs.com.br/

[4] http://jenkins.autonomiccs.com.br/

[5] http://sonar.autonomiccs.com.br/

[6] https://github.com/Autonomiccs/autonomiccs-platform#project-evolution

[7] http://jade.tilab.com/

Cheers, Gabriel.

Re: [ANNOUNCE] Open source distributed virtual machine scheduling platform

Posted by Rafael Weingärtner <ra...@gmail.com>.

You are welcome Ilya,
We are very glad that there are people interested in testing/using it.

We have indeed seen the Rohit PRs that introduces some changes that will
enable ACS to activate and deactivate servers. I also think that using IPMI
is a very interesting approach. Sadly, we needed something like this last
year; that is why we created it, we even thought about opening a PR to
donate the code to ACS, but we did not have much time to do that.

Also, as I said in the previous email, we just developed what we did
because we needed; and we tried to use the less amount of energy possible
to do that. We have bigger problems to focus on, and we are not a huge team
right now.

Of course that as soon as ACS has those changes incorporated we will change
our code to use the ACS’s functions.




On Tue, May 3, 2016 at 5:30 PM, ilya <il...@gmail.com> wrote:

> Rafael and Gabriel,
>
> Firstly, thanks for working on this initiative.
>
> We also realized current cloudstack allocation algorithms are rather
> limited and AutonomicCS is very timely.
>
> The project looks very promising and its something i'd like to try out
> in my environments - as it gains production level stability and i have
> internal CI lab with few hundred nested KVM hypervisors - to test on.
>
>
> Many of us in the community put alot of effort into getting IPMI specs
> and support with CloudStack. We will be merging IPMI support in our
> environment shortly.
>
> In addition, like you mentioned earlier, WOL and OS level shutdowns will
> work most of the time, but aren't ideal when you have enterprise grade
> hardware with IPMI support (which is being defacto even with whitebox
> servers).
>
> CloudStack IPMI feature Rohit worked on - is very extensive, to the
> point that you can switch the IPMI driver to use WOL or Shutdown
> commands and abstract the operations with shell scripts entirely (Rohit
> please keep me honest).
>
> With that said, please kindly consider integrating with IPMI interface
> Rohit mentioned - or make the WOL/POWER OFF pluggable.
>
> Thanks,
> ilya
>
> On 5/3/16 1:00 PM, Rafael Weingärtner wrote:
> > Hi Rohit, thanks ;)
> >
> > I will answer your questions in line.
> >
> >
> > I did not look at the code but I'm curious on how you're powering off
> > hosts, I think with my out-of-band management PR you can use the oobm
> > subsystem to perform power management operations for IPMI 2.0 enabled
> hosts.
> >
> > A: when we developed the first version (around October 2015), Apache
> > CloudStack (ACS) did not have support to activate and deactivate hosts,
> it
> > still does not have; you are working on that for Shapeblue, right? If
> there
> > was something at that time, it would have been great. Therefore, we had
> to
> > develop something to allow us to power on/off hosts (that was not our
> > focus, but we needed it). So, we created the simplest solution possible
> > (just to suffice our needs). Our cloud computing environment is created
> > using pretty outdated servers, half of them do not have support for IPMI.
> > Therefore, to shut down hosts, we use the hypervisors API. We noticed
> that
> > most of the hypervisors have a shutdown command in their APIs; that is
> why
> > we used it. We could not use many resources (time and energy) on
> developing
> > that for every hypervisor ACS supports, so we did it only for XenServer
> to
> > be used as a proof of concept (POC); to add the support to other
> > hypervisors it would be a matter of implementing an interface.
> >
> > Even though we did the “shutdown“ using the hypervisor API, it would be
> > nice to have it also through the IPMI interface; it is rare, but we have
> > seen servers hung during the shutdown process.
> >
> > Then, to activate (start) servers, we used the wake on LAN (WOL)
> protocol.
> > We found that to be the easiest way to activate servers on a LAN (there
> are
> > some requirements to do that, giving that it uses the layer 2 of the OSI
> > model to send the commands). However, once again, our environment did not
> > help much. One of our servers did not support WOL, but gladly it had IPMI
> > support. Therefore, to start servers depending on a flag that we add to
> the
> > “cloud.host” table we use IPMI or WOL.
> >
> >
> > Did the explanation help? You are welcome to look at the code, we think
> it
> > is more or less clear and documented.
> >
> > Also curious how you implemented the heuristics and wrote tests (esp.
> > integration ones), some of us had a related discussion about such a
> feature
> > and we looked at this paper from VMware DRS team:
> > http://www.waldspurger.org/carl/papers/drs-vmtj-mar12.pdf
> >
> > A: well, the heuristics are written in Java; we have an interface with a
> > set of methods that have to be implemented and that can be used by our
> > agents; also, we have a set of basic classes to support the development
> of
> > new heuristics. We have created only two simple heuristics to be used as
> a
> > proof of concept of the whole architecture we have created. Our first
> goal
> > was to formalize and finish the whole architecture; after that, we could
> > work on some more interesting things. Right now we are working on
> > techniques to mix (add) neural or Bayesian networks into our heuristics;
> we
> > intend to use those techniques to improve our VM mapping algorithms or
> the
> > ranking of hosts.
> >
> > We have not read the VMware’s paper (we have based our whole proposals
> > solely on academic work until now); I have just glanced at it, and it
> seems
> > interesting; though I would need much more time and a deeper reading to
> be
> > able to comment on it.
> >
> > The testing is done in a test environment we have, we isolate and control
> > the variables of the environment and everything that can affect the
> agents
> > behaviors; then, we start to test every functionalities and the agent
> > behavior. The process of testing for the first release was very manual.
> > However, now that we know the whole framework works. We are covering it
> > with test cases (unit and integration) and then to test a heuristic it
> > would be a matter of writing test cases for it.
> >
> > Even with test cases, every experiment we do or release that is closed,
> we
> > execute a thorough batch of tests to check if everything is working;
> sadly,
> > those tests today manually executed.
> >
> > I can say that the fun is going to start now. I find it much more
> > interesting to create methods/heuristics to manage the environment than
> to
> > create the structure that uses the heuristics.
> >
> > Do you have some other doubts?
> >
> > On Tue, May 3, 2016 at 12:18 PM, Rohit Yadav <ro...@shapeblue.com>
> > wrote:
> >
> >> Nice feature :)
> >>
> >> I did not look at the code but I'm curious on how you're powering off
> >> hosts, I think with my out-of-band management PR you can use the oobm
> >> subsystem to perform power management operations for IPMI 2.0 enabled
> hosts.
> >>
> >> Also curious how you implemented the heuristics and wrote tests (esp.
> >> integration ones), some of us had a related discussion about such a
> feature
> >> and we looked at this paper from VMware DRS team:
> >> http://www.waldspurger.org/carl/papers/drs-vmtj-mar12.pdf
> >>
> >> Regards,
> >> Rohit Yadav
> >>
> >>
> >> Regards,
> >>
> >> Rohit Yadav
> >>
> >> rohit.yadav@shapeblue.com
> >> www.shapeblue.com
> >> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> >> @shapeblue
> >> On Apr 27 2016, at 2:29 am, Gabriel Beims Bräscher <
> gabrascher@gmail.com>
> >> wrote:
> >>
> >> Hello CloudStack community members (@dev and @users),
> >>
> >> This email is meant to announce the publication of a project on Github
> that
> >> provides a distributed virtual machine scheduling platform that can be
> >> easily integrated with Apache CloudStack (ACS). The project is
> available at
> >> [1], you can find a detailed explanation of the idea of the project, its
> >> aspirations, basic concepts, installation and uninstallation processes
> and
> >> other information at [2]. Also, if you want to know more about the
> >> Autonomiccs and its creators, you can access the link [3].
> >>
> >> The code that was opened at Github is part of a bigger system that has
> the
> >> goal of managing a cloud computing environment autonomously. All of
> that is
> >> being developed and used in my Ph. D. thesis and the masters’ thesis of
> >> some colleagues. The formalization of that component will be published
> at
> >> the 12th IEEE World Congress on Services (SERVICES 2016) at San
> Francisco
> >> USA.
> >>
> >> You can see the stats of our code at [4] and [5]. Right now we only have
> >> ~40% of code test coverage. However, we intend to increase that value to
> >> ~60% until next week and ~90% until the end of June.
> >>
> >> To give you a picture of what we are preparing for the future, we can
> >> highlight the following goals for this year (You can find others short
> term
> >> goals at [6]):
> >>
> >>    -
> >>
> >>    Integrate our platform [1] with a multi-agent system (MAS) platform,
> in
> >>    order to facilitate the development of agents. Currently, we are
> using
> >>    Spring-integration to “emulate” and an agent life cycle; that can
> >> become a
> >>    problem when needing to add more agents and they start to communicate
> >> with
> >>    each other. Therefore, we will integrate the platform in [1] with
> JADE
> >> [7];
> >>    -
> >>
> >>    Today the metrics about the use of resource are not properly
> gathered by
> >>    ACS; in order to develop more accurate predictions we need to store
> >>    resource usage metrics. Also, those metrics have to be gathered in a
> >>    distributed way without causing service degradation. For that and a
> few
> >>    other reasons (you can send us an email so we can provide you more
> >>    details), we are developing an autonomic monitoring platform that
> will
> >>    integrate with the system available in [1];
> >>    -
> >>
> >>    We also foresee the need to develop a better way to visualize the
> cloud
> >>    environment, a way to detect hot spots (pods and hosts) with higher
> >>    resource usage trends (VMs trends). We see the need to change the
> rustic
> >>    view of the environment with tables for a better suitable one for
> humans
> >>    (this is a surprise that we intend to present at the CCCBR).
> >>
> >> We hope you like the software and that it meets your expectations. If it
> >> does not suffice all of your needs, let’s work together to improve it.
> If
> >> you have any doubts or suggestions please send us an email; we will
> reply
> >> it as fast as we can. Also, critics that can help us improve that
> platform
> >> are very welcome.
> >>
> >> [1] https://github.com/Autonomiccs/autonomiccs-platform
> >>
> >> [2] https://github.com/Autonomiccs/autonomiccs-platform/wiki
> >>
> >> [3] http://autonomiccs.com.br/
> >>
> >> [4] http://jenkins.autonomiccs.com.br/
> >>
> >> [5] http://sonar.autonomiccs.com.br/
> >>
> >> [6]
> https://github.com/Autonomiccs/autonomiccs-platform#project-evolution
> >>
> >> [7] http://jade.tilab.com/
> >>
> >> Cheers, Gabriel.
> >>
> >
> >
> >
>



-- 
Rafael Weingärtner

Re: [ANNOUNCE] Open source distributed virtual machine scheduling platform

Posted by ilya <il...@gmail.com>.

Rafael and Gabriel,

Firstly, thanks for working on this initiative.

We also realized current cloudstack allocation algorithms are rather
limited and AutonomicCS is very timely.

The project looks very promising and its something i'd like to try out
in my environments - as it gains production level stability and i have
internal CI lab with few hundred nested KVM hypervisors - to test on.


Many of us in the community put alot of effort into getting IPMI specs
and support with CloudStack. We will be merging IPMI support in our
environment shortly.

In addition, like you mentioned earlier, WOL and OS level shutdowns will
work most of the time, but aren't ideal when you have enterprise grade
hardware with IPMI support (which is being defacto even with whitebox
servers).

CloudStack IPMI feature Rohit worked on - is very extensive, to the
point that you can switch the IPMI driver to use WOL or Shutdown
commands and abstract the operations with shell scripts entirely (Rohit
please keep me honest).

With that said, please kindly consider integrating with IPMI interface
Rohit mentioned - or make the WOL/POWER OFF pluggable.

Thanks,
ilya

On 5/3/16 1:00 PM, Rafael Weing�rtner wrote:
> Hi Rohit, thanks ;)
> 
> I will answer your questions in line.
> 
> 
> I did not look at the code but I'm curious on how you're powering off
> hosts, I think with my out-of-band management PR you can use the oobm
> subsystem to perform power management operations for IPMI 2.0 enabled hosts.
> 
> A: when we developed the first version (around October 2015), Apache
> CloudStack (ACS) did not have support to activate and deactivate hosts, it
> still does not have; you are working on that for Shapeblue, right? If there
> was something at that time, it would have been great. Therefore, we had to
> develop something to allow us to power on/off hosts (that was not our
> focus, but we needed it). So, we created the simplest solution possible
> (just to suffice our needs). Our cloud computing environment is created
> using pretty outdated servers, half of them do not have support for IPMI.
> Therefore, to shut down hosts, we use the hypervisors API. We noticed that
> most of the hypervisors have a shutdown command in their APIs; that is why
> we used it. We could not use many resources (time and energy) on developing
> that for every hypervisor ACS supports, so we did it only for XenServer to
> be used as a proof of concept (POC); to add the support to other
> hypervisors it would be a matter of implementing an interface.
> 
> Even though we did the \u201cshutdown\u201c using the hypervisor API, it would be
> nice to have it also through the IPMI interface; it is rare, but we have
> seen servers hung during the shutdown process.
> 
> Then, to activate (start) servers, we used the wake on LAN (WOL) protocol.
> We found that to be the easiest way to activate servers on a LAN (there are
> some requirements to do that, giving that it uses the layer 2 of the OSI
> model to send the commands). However, once again, our environment did not
> help much. One of our servers did not support WOL, but gladly it had IPMI
> support. Therefore, to start servers depending on a flag that we add to the
> \u201ccloud.host\u201d table we use IPMI or WOL.
> 
> 
> Did the explanation help? You are welcome to look at the code, we think it
> is more or less clear and documented.
> 
> Also curious how you implemented the heuristics and wrote tests (esp.
> integration ones), some of us had a related discussion about such a feature
> and we looked at this paper from VMware DRS team:
> http://www.waldspurger.org/carl/papers/drs-vmtj-mar12.pdf
> 
> A: well, the heuristics are written in Java; we have an interface with a
> set of methods that have to be implemented and that can be used by our
> agents; also, we have a set of basic classes to support the development of
> new heuristics. We have created only two simple heuristics to be used as a
> proof of concept of the whole architecture we have created. Our first goal
> was to formalize and finish the whole architecture; after that, we could
> work on some more interesting things. Right now we are working on
> techniques to mix (add) neural or Bayesian networks into our heuristics; we
> intend to use those techniques to improve our VM mapping algorithms or the
> ranking of hosts.
> 
> We have not read the VMware\u2019s paper (we have based our whole proposals
> solely on academic work until now); I have just glanced at it, and it seems
> interesting; though I would need much more time and a deeper reading to be
> able to comment on it.
> 
> The testing is done in a test environment we have, we isolate and control
> the variables of the environment and everything that can affect the agents
> behaviors; then, we start to test every functionalities and the agent
> behavior. The process of testing for the first release was very manual.
> However, now that we know the whole framework works. We are covering it
> with test cases (unit and integration) and then to test a heuristic it
> would be a matter of writing test cases for it.
> 
> Even with test cases, every experiment we do or release that is closed, we
> execute a thorough batch of tests to check if everything is working; sadly,
> those tests today manually executed.
> 
> I can say that the fun is going to start now. I find it much more
> interesting to create methods/heuristics to manage the environment than to
> create the structure that uses the heuristics.
> 
> Do you have some other doubts?
> 
> On Tue, May 3, 2016 at 12:18 PM, Rohit Yadav <ro...@shapeblue.com>
> wrote:
> 
>> Nice feature :)
>>
>> I did not look at the code but I'm curious on how you're powering off
>> hosts, I think with my out-of-band management PR you can use the oobm
>> subsystem to perform power management operations for IPMI 2.0 enabled hosts.
>>
>> Also curious how you implemented the heuristics and wrote tests (esp.
>> integration ones), some of us had a related discussion about such a feature
>> and we looked at this paper from VMware DRS team:
>> http://www.waldspurger.org/carl/papers/drs-vmtj-mar12.pdf
>>
>> Regards,
>> Rohit Yadav
>>
>>
>> Regards,
>>
>> Rohit Yadav
>>
>> rohit.yadav@shapeblue.com
>> www.shapeblue.com
>> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
>> @shapeblue
>> On Apr 27 2016, at 2:29 am, Gabriel Beims Br�scher <ga...@gmail.com>
>> wrote:
>>
>> Hello CloudStack community members (@dev and @users),
>>
>> This email is meant to announce the publication of a project on Github that
>> provides a distributed virtual machine scheduling platform that can be
>> easily integrated with Apache CloudStack (ACS). The project is available at
>> [1], you can find a detailed explanation of the idea of the project, its
>> aspirations, basic concepts, installation and uninstallation processes and
>> other information at [2]. Also, if you want to know more about the
>> Autonomiccs and its creators, you can access the link [3].
>>
>> The code that was opened at Github is part of a bigger system that has the
>> goal of managing a cloud computing environment autonomously. All of that is
>> being developed and used in my Ph. D. thesis and the masters\u2019 thesis of
>> some colleagues. The formalization of that component will be published at
>> the 12th IEEE World Congress on Services (SERVICES 2016) at San Francisco
>> USA.
>>
>> You can see the stats of our code at [4] and [5]. Right now we only have
>> ~40% of code test coverage. However, we intend to increase that value to
>> ~60% until next week and ~90% until the end of June.
>>
>> To give you a picture of what we are preparing for the future, we can
>> highlight the following goals for this year (You can find others short term
>> goals at [6]):
>>
>>    -
>>
>>    Integrate our platform [1] with a multi-agent system (MAS) platform, in
>>    order to facilitate the development of agents. Currently, we are using
>>    Spring-integration to \u201cemulate\u201d and an agent life cycle; that can
>> become a
>>    problem when needing to add more agents and they start to communicate
>> with
>>    each other. Therefore, we will integrate the platform in [1] with JADE
>> [7];
>>    -
>>
>>    Today the metrics about the use of resource are not properly gathered by
>>    ACS; in order to develop more accurate predictions we need to store
>>    resource usage metrics. Also, those metrics have to be gathered in a
>>    distributed way without causing service degradation. For that and a few
>>    other reasons (you can send us an email so we can provide you more
>>    details), we are developing an autonomic monitoring platform that will
>>    integrate with the system available in [1];
>>    -
>>
>>    We also foresee the need to develop a better way to visualize the cloud
>>    environment, a way to detect hot spots (pods and hosts) with higher
>>    resource usage trends (VMs trends). We see the need to change the rustic
>>    view of the environment with tables for a better suitable one for humans
>>    (this is a surprise that we intend to present at the CCCBR).
>>
>> We hope you like the software and that it meets your expectations. If it
>> does not suffice all of your needs, let\u2019s work together to improve it. If
>> you have any doubts or suggestions please send us an email; we will reply
>> it as fast as we can. Also, critics that can help us improve that platform
>> are very welcome.
>>
>> [1] https://github.com/Autonomiccs/autonomiccs-platform
>>
>> [2] https://github.com/Autonomiccs/autonomiccs-platform/wiki
>>
>> [3] http://autonomiccs.com.br/
>>
>> [4] http://jenkins.autonomiccs.com.br/
>>
>> [5] http://sonar.autonomiccs.com.br/
>>
>> [6] https://github.com/Autonomiccs/autonomiccs-platform#project-evolution
>>
>> [7] http://jade.tilab.com/
>>
>> Cheers, Gabriel.
>>
> 
> 
>

Re: [ANNOUNCE] Open source distributed virtual machine scheduling platform

Posted by Rafael Weingärtner <ra...@gmail.com>.

Hi Rohit, thanks ;)

I will answer your questions in line.

I did not look at the code but I'm curious on how you're powering off
hosts, I think with my out-of-band management PR you can use the oobm
subsystem to perform power management operations for IPMI 2.0 enabled hosts.

A: when we developed the first version (around October 2015), Apache
CloudStack (ACS) did not have support to activate and deactivate hosts, it
still does not have; you are working on that for Shapeblue, right? If there
was something at that time, it would have been great. Therefore, we had to
develop something to allow us to power on/off hosts (that was not our
focus, but we needed it). So, we created the simplest solution possible
(just to suffice our needs). Our cloud computing environment is created
using pretty outdated servers, half of them do not have support for IPMI.
Therefore, to shut down hosts, we use the hypervisors API. We noticed that
most of the hypervisors have a shutdown command in their APIs; that is why
we used it. We could not use many resources (time and energy) on developing
that for every hypervisor ACS supports, so we did it only for XenServer to
be used as a proof of concept (POC); to add the support to other
hypervisors it would be a matter of implementing an interface.

Even though we did the “shutdown“ using the hypervisor API, it would be
nice to have it also through the IPMI interface; it is rare, but we have
seen servers hung during the shutdown process.

Then, to activate (start) servers, we used the wake on LAN (WOL) protocol.
We found that to be the easiest way to activate servers on a LAN (there are
some requirements to do that, giving that it uses the layer 2 of the OSI
model to send the commands). However, once again, our environment did not
help much. One of our servers did not support WOL, but gladly it had IPMI
support. Therefore, to start servers depending on a flag that we add to the
“cloud.host” table we use IPMI or WOL.

Did the explanation help? You are welcome to look at the code, we think it
is more or less clear and documented.

Also curious how you implemented the heuristics and wrote tests (esp.
integration ones), some of us had a related discussion about such a feature
and we looked at this paper from VMware DRS team:
http://www.waldspurger.org/carl/papers/drs-vmtj-mar12.pdf

A: well, the heuristics are written in Java; we have an interface with a
set of methods that have to be implemented and that can be used by our
agents; also, we have a set of basic classes to support the development of
new heuristics. We have created only two simple heuristics to be used as a
proof of concept of the whole architecture we have created. Our first goal
was to formalize and finish the whole architecture; after that, we could
work on some more interesting things. Right now we are working on
techniques to mix (add) neural or Bayesian networks into our heuristics; we
intend to use those techniques to improve our VM mapping algorithms or the
ranking of hosts.

We have not read the VMware’s paper (we have based our whole proposals
solely on academic work until now); I have just glanced at it, and it seems
interesting; though I would need much more time and a deeper reading to be
able to comment on it.

The testing is done in a test environment we have, we isolate and control
the variables of the environment and everything that can affect the agents
behaviors; then, we start to test every functionalities and the agent
behavior. The process of testing for the first release was very manual.
However, now that we know the whole framework works. We are covering it
with test cases (unit and integration) and then to test a heuristic it
would be a matter of writing test cases for it.

Even with test cases, every experiment we do or release that is closed, we
execute a thorough batch of tests to check if everything is working; sadly,
those tests today manually executed.

I can say that the fun is going to start now. I find it much more
interesting to create methods/heuristics to manage the environment than to
create the structure that uses the heuristics.

Do you have some other doubts?

On Tue, May 3, 2016 at 12:18 PM, Rohit Yadav <ro...@shapeblue.com>
wrote:

> Nice feature :)
>
> I did not look at the code but I'm curious on how you're powering off
> hosts, I think with my out-of-band management PR you can use the oobm
> subsystem to perform power management operations for IPMI 2.0 enabled hosts.
>
> Also curious how you implemented the heuristics and wrote tests (esp.
> integration ones), some of us had a related discussion about such a feature
> and we looked at this paper from VMware DRS team:
> http://www.waldspurger.org/carl/papers/drs-vmtj-mar12.pdf
>
> Regards,
> Rohit Yadav
>
>
> Regards,
>
> Rohit Yadav
>
> rohit.yadav@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
> On Apr 27 2016, at 2:29 am, Gabriel Beims Bräscher <ga...@gmail.com>
> wrote:
>
> Hello CloudStack community members (@dev and @users),
>
> This email is meant to announce the publication of a project on Github that
> provides a distributed virtual machine scheduling platform that can be
> easily integrated with Apache CloudStack (ACS). The project is available at
> [1], you can find a detailed explanation of the idea of the project, its
> aspirations, basic concepts, installation and uninstallation processes and
> other information at [2]. Also, if you want to know more about the
> Autonomiccs and its creators, you can access the link [3].
>
> The code that was opened at Github is part of a bigger system that has the
> goal of managing a cloud computing environment autonomously. All of that is
> being developed and used in my Ph. D. thesis and the masters’ thesis of
> some colleagues. The formalization of that component will be published at
> the 12th IEEE World Congress on Services (SERVICES 2016) at San Francisco
> USA.
>
> You can see the stats of our code at [4] and [5]. Right now we only have
> ~40% of code test coverage. However, we intend to increase that value to
> ~60% until next week and ~90% until the end of June.
>
> To give you a picture of what we are preparing for the future, we can
> highlight the following goals for this year (You can find others short term
> goals at [6]):
>
>    -
>
>    Integrate our platform [1] with a multi-agent system (MAS) platform, in
>    order to facilitate the development of agents. Currently, we are using
>    Spring-integration to “emulate” and an agent life cycle; that can
> become a
>    problem when needing to add more agents and they start to communicate
> with
>    each other. Therefore, we will integrate the platform in [1] with JADE
> [7];
>    -
>
>    Today the metrics about the use of resource are not properly gathered by
>    ACS; in order to develop more accurate predictions we need to store
>    resource usage metrics. Also, those metrics have to be gathered in a
>    distributed way without causing service degradation. For that and a few
>    other reasons (you can send us an email so we can provide you more
>    details), we are developing an autonomic monitoring platform that will
>    integrate with the system available in [1];
>    -
>
>    We also foresee the need to develop a better way to visualize the cloud
>    environment, a way to detect hot spots (pods and hosts) with higher
>    resource usage trends (VMs trends). We see the need to change the rustic
>    view of the environment with tables for a better suitable one for humans
>    (this is a surprise that we intend to present at the CCCBR).
>
> We hope you like the software and that it meets your expectations. If it
> does not suffice all of your needs, let’s work together to improve it. If
> you have any doubts or suggestions please send us an email; we will reply
> it as fast as we can. Also, critics that can help us improve that platform
> are very welcome.
>
> [1] https://github.com/Autonomiccs/autonomiccs-platform
>
> [2] https://github.com/Autonomiccs/autonomiccs-platform/wiki
>
> [3] http://autonomiccs.com.br/
>
> [4] http://jenkins.autonomiccs.com.br/
>
> [5] http://sonar.autonomiccs.com.br/
>
> [6] https://github.com/Autonomiccs/autonomiccs-platform#project-evolution
>
> [7] http://jade.tilab.com/
>
> Cheers, Gabriel.
>

-- 
Rafael Weingärtner

Re: [ANNOUNCE] Open source distributed virtual machine scheduling platform

Posted by Rafael Weingärtner <ra...@gmail.com>.

Hi Rohit, thanks ;)

I will answer your questions in line.

I did not look at the code but I'm curious on how you're powering off
hosts, I think with my out-of-band management PR you can use the oobm
subsystem to perform power management operations for IPMI 2.0 enabled hosts.

A: when we developed the first version (around October 2015), Apache
CloudStack (ACS) did not have support to activate and deactivate hosts, it
still does not have; you are working on that for Shapeblue, right? If there
was something at that time, it would have been great. Therefore, we had to
develop something to allow us to power on/off hosts (that was not our
focus, but we needed it). So, we created the simplest solution possible
(just to suffice our needs). Our cloud computing environment is created
using pretty outdated servers, half of them do not have support for IPMI.
Therefore, to shut down hosts, we use the hypervisors API. We noticed that
most of the hypervisors have a shutdown command in their APIs; that is why
we used it. We could not use many resources (time and energy) on developing
that for every hypervisor ACS supports, so we did it only for XenServer to
be used as a proof of concept (POC); to add the support to other
hypervisors it would be a matter of implementing an interface.

Even though we did the “shutdown“ using the hypervisor API, it would be
nice to have it also through the IPMI interface; it is rare, but we have
seen servers hung during the shutdown process.

Then, to activate (start) servers, we used the wake on LAN (WOL) protocol.
We found that to be the easiest way to activate servers on a LAN (there are
some requirements to do that, giving that it uses the layer 2 of the OSI
model to send the commands). However, once again, our environment did not
help much. One of our servers did not support WOL, but gladly it had IPMI
support. Therefore, to start servers depending on a flag that we add to the
“cloud.host” table we use IPMI or WOL.

Did the explanation help? You are welcome to look at the code, we think it
is more or less clear and documented.

Also curious how you implemented the heuristics and wrote tests (esp.
integration ones), some of us had a related discussion about such a feature
and we looked at this paper from VMware DRS team:
http://www.waldspurger.org/carl/papers/drs-vmtj-mar12.pdf

A: well, the heuristics are written in Java; we have an interface with a
set of methods that have to be implemented and that can be used by our
agents; also, we have a set of basic classes to support the development of
new heuristics. We have created only two simple heuristics to be used as a
proof of concept of the whole architecture we have created. Our first goal
was to formalize and finish the whole architecture; after that, we could
work on some more interesting things. Right now we are working on
techniques to mix (add) neural or Bayesian networks into our heuristics; we
intend to use those techniques to improve our VM mapping algorithms or the
ranking of hosts.

We have not read the VMware’s paper (we have based our whole proposals
solely on academic work until now); I have just glanced at it, and it seems
interesting; though I would need much more time and a deeper reading to be
able to comment on it.

The testing is done in a test environment we have, we isolate and control
the variables of the environment and everything that can affect the agents
behaviors; then, we start to test every functionalities and the agent
behavior. The process of testing for the first release was very manual.
However, now that we know the whole framework works. We are covering it
with test cases (unit and integration) and then to test a heuristic it
would be a matter of writing test cases for it.

Even with test cases, every experiment we do or release that is closed, we
execute a thorough batch of tests to check if everything is working; sadly,
those tests today manually executed.

I can say that the fun is going to start now. I find it much more
interesting to create methods/heuristics to manage the environment than to
create the structure that uses the heuristics.

Do you have some other doubts?

On Tue, May 3, 2016 at 12:18 PM, Rohit Yadav <ro...@shapeblue.com>
wrote:

> Nice feature :)
>
> I did not look at the code but I'm curious on how you're powering off
> hosts, I think with my out-of-band management PR you can use the oobm
> subsystem to perform power management operations for IPMI 2.0 enabled hosts.
>
> Also curious how you implemented the heuristics and wrote tests (esp.
> integration ones), some of us had a related discussion about such a feature
> and we looked at this paper from VMware DRS team:
> http://www.waldspurger.org/carl/papers/drs-vmtj-mar12.pdf
>
> Regards,
> Rohit Yadav
>
>
> Regards,
>
> Rohit Yadav
>
> rohit.yadav@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
> On Apr 27 2016, at 2:29 am, Gabriel Beims Bräscher <ga...@gmail.com>
> wrote:
>
> Hello CloudStack community members (@dev and @users),
>
> This email is meant to announce the publication of a project on Github that
> provides a distributed virtual machine scheduling platform that can be
> easily integrated with Apache CloudStack (ACS). The project is available at
> [1], you can find a detailed explanation of the idea of the project, its
> aspirations, basic concepts, installation and uninstallation processes and
> other information at [2]. Also, if you want to know more about the
> Autonomiccs and its creators, you can access the link [3].
>
> The code that was opened at Github is part of a bigger system that has the
> goal of managing a cloud computing environment autonomously. All of that is
> being developed and used in my Ph. D. thesis and the masters’ thesis of
> some colleagues. The formalization of that component will be published at
> the 12th IEEE World Congress on Services (SERVICES 2016) at San Francisco
> USA.
>
> You can see the stats of our code at [4] and [5]. Right now we only have
> ~40% of code test coverage. However, we intend to increase that value to
> ~60% until next week and ~90% until the end of June.
>
> To give you a picture of what we are preparing for the future, we can
> highlight the following goals for this year (You can find others short term
> goals at [6]):
>
>    -
>
>    Integrate our platform [1] with a multi-agent system (MAS) platform, in
>    order to facilitate the development of agents. Currently, we are using
>    Spring-integration to “emulate” and an agent life cycle; that can
> become a
>    problem when needing to add more agents and they start to communicate
> with
>    each other. Therefore, we will integrate the platform in [1] with JADE
> [7];
>    -
>
>    Today the metrics about the use of resource are not properly gathered by
>    ACS; in order to develop more accurate predictions we need to store
>    resource usage metrics. Also, those metrics have to be gathered in a
>    distributed way without causing service degradation. For that and a few
>    other reasons (you can send us an email so we can provide you more
>    details), we are developing an autonomic monitoring platform that will
>    integrate with the system available in [1];
>    -
>
>    We also foresee the need to develop a better way to visualize the cloud
>    environment, a way to detect hot spots (pods and hosts) with higher
>    resource usage trends (VMs trends). We see the need to change the rustic
>    view of the environment with tables for a better suitable one for humans
>    (this is a surprise that we intend to present at the CCCBR).
>
> We hope you like the software and that it meets your expectations. If it
> does not suffice all of your needs, let’s work together to improve it. If
> you have any doubts or suggestions please send us an email; we will reply
> it as fast as we can. Also, critics that can help us improve that platform
> are very welcome.
>
> [1] https://github.com/Autonomiccs/autonomiccs-platform
>
> [2] https://github.com/Autonomiccs/autonomiccs-platform/wiki
>
> [3] http://autonomiccs.com.br/
>
> [4] http://jenkins.autonomiccs.com.br/
>
> [5] http://sonar.autonomiccs.com.br/
>
> [6] https://github.com/Autonomiccs/autonomiccs-platform#project-evolution
>
> [7] http://jade.tilab.com/
>
> Cheers, Gabriel.
>

-- 
Rafael Weingärtner