You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@airavata.apache.org by Apoorv Palkar <ap...@aol.com> on 2017/07/24 17:14:44 UTC

Monitoring System Diagram

K Dev,

I have attached a architecture diagram for the monitoring system. Currently the challenges we are facing is that GFAC is heavily tied to the monitoring system via task execution. The ultimate goal to separate this from the current GFAC. I understand Marlon doesn't want me looking at the code too much to avoid bias. I have glanced at some specifics/couple lines to get an idea of how monitoring is currently implemented in Airavata.

Previously, I had been working on the parsing of the particular emails: PBS, slurs, UGA, etc. Over the weekend, I ran some performance matrix tests on the current parsing code as Shameera suggested. The current code written is quite balanced in terms of large scale processing. It is able to quickly parse the emails and still maintain a high degree of simplicity. I improved on a couple lines without using regex, however the code proved to be highly unmaintainable. As shameera/marlon pointed out, these emails change relatively frequently as servers/machines are upgraded/replaced. It is important for this code to be highly maintainable.

In addition to this, I have been working with Supun to develop a new architecture for the mailing list. At first, there was a debate on whether to use Zookeeper and/or Redis in a global state. I conducted some research to identify the pros and cons of each technology. As Suresh/Gourav suggested, airaveata currently uses zookeeper. Also, zookeeper would provide less overhead than a database such as Regis. A big problem with this development strategy is the complexity of the code we will have to write. In the scenario of multiple GFaCs, a global zookeeper makes some sense. However the problem comes if a job is cancelled. This can potentially cause edge case scenario problems where say GFaC A accidentally processes GFAC B's emails. Therefore, we have to imagine on a low level, a clever implementation of locks for who needs to access data and who doesn't. This can prove to be a hassle.

Another potential solution we can have is to implement a work queue similar to our job submission in Airavata. The work queue delegates the work of parsing/reading emails to multiple gfacs. This potentially could avoid lock/thread/dangerous situations. If a GFAC fails somehow, there needs to be a mechanism in place to handle the particular emails that GFAC is handed. We still have to decide on the correct implementation before the code can be implemented. I've been also working on the Thrift/RabbitMQ scenario, where data is parsed, serialized, and then sent over the network. I will upload the code by today/tomorrow.

SHOUT OUT @Marcus !

Re: Monitoring System Diagram

Posted by Apoorv Palkar <ap...@aol.com>.

Yes Gourav,


Currently, I'm working on making the monitoring aspect of Airavata as independent as possible. I'm approaching the problem as if it doesn't matter whether the current architecture is being used or Helix is being used. After coming up with a good design and implementation code, we can proceed to see how to connect the pieces. From there we can probably split the DAG into two parts instead of having 1 DAG with the monitoring system.


- A Palkar.




shout out Marcus --



-----Original Message-----
From: Shenoy, Gourav Ganesh <go...@indiana.edu>
To: dev <de...@airavata.apache.org>
Sent: Tue, Jul 25, 2017 11:04 am
Subject: Re: Monitoring System Diagram



Apoorv,
 
Good work with the architecture. But I think the "request to monitor via helix mechanism" and "output to helix orchestration" are very crucial pieces here – which need to be detailed. Rest of it is kind of trivial, but what is important is how this system blends in to the Helix design. Do you think you can add more details to the above statements?
 
Once again, you are heading in the right direction – good job!
 
Thanks and Regards,
Gourav Shenoy
 

From: Apoorv Palkar <ap...@aol.com>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Monday, July 24, 2017 at 1:14 PM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Monitoring System Diagram

 

K Dev,

 

I have attached a architecture diagram for the monitoring system. Currently the challenges we are facing is that GFAC is heavily tied to the monitoring system via task execution. The ultimate goal to separate this from the current GFAC. I understand Marlon doesn't want me looking at the code too much to avoid bias. I have glanced at some specifics/couple lines to get an idea of how monitoring is currently implemented in Airavata.

 

Previously, I had been working on the parsing of the particular emails: PBS, slurs, UGA, etc. Over the weekend, I ran some performance matrix tests on the current parsing code as Shameera suggested. The current code written is quite balanced in terms of large scale processing. It is able to quickly parse the emails and still maintain a high degree of simplicity. I improved on a couple lines without using regex, however the code proved to be highly unmaintainable. As shameera/marlon pointed out, these emails change relatively frequently as servers/machines are upgraded/replaced. It is important for this code to be highly maintainable.

 

In addition to this, I have been working with Supun to develop a new architecture for the mailing list. At first, there was a debate on whether to use Zookeeper and/or Redis in a global state. I conducted some research to identify the pros and cons of each technology. As Suresh/Gourav suggested, airaveata currently uses zookeeper. Also, zookeeper would provide less overhead than a database such as Regis. A big problem with this development strategy is the complexity of the code we will have to write. In the scenario of multiple GFaCs, a global zookeeper makes some sense. However the problem comes if a job is cancelled. This can potentially cause edge case scenario problems where say GFaC A accidentally processes GFAC B's emails. Therefore, we have to imagine on a low level, a clever implementation of locks for who needs to access data and who doesn't. This can prove to be a hassle.

 

 

Another potential solution we can have is to implement a work queue similar to our job submission in Airavata. The work queue delegates the work of parsing/reading emails to multiple gfacs. This potentially could avoid lock/thread/dangerous situations. If a GFAC fails somehow, there needs to be a mechanism in place to handle the particular emails that GFAC is handed. We still have to decide on the correct implementation before the code can be implemented. I've been also working on the Thrift/RabbitMQ scenario, where data is parsed, serialized, and then sent over the network. I will upload the code by today/tomorrow.

 

 

SHOUT OUT @Marcus !

Re: Monitoring System Diagram

Posted by "Shenoy, Gourav Ganesh" <go...@indiana.edu>.

Apoorv,

Good work with the architecture. But I think the "request to monitor via helix mechanism" and "output to helix orchestration" are very crucial pieces here – which need to be detailed. Rest of it is kind of trivial, but what is important is how this system blends in to the Helix design. Do you think you can add more details to the above statements?

Once again, you are heading in the right direction – good job!

Thanks and Regards,
Gourav Shenoy

From: Apoorv Palkar <ap...@aol.com>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Monday, July 24, 2017 at 1:14 PM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Monitoring System Diagram

K Dev,

I have attached a architecture diagram for the monitoring system. Currently the challenges we are facing is that GFAC is heavily tied to the monitoring system via task execution. The ultimate goal to separate this from the current GFAC. I understand Marlon doesn't want me looking at the code too much to avoid bias. I have glanced at some specifics/couple lines to get an idea of how monitoring is currently implemented in Airavata.

Previously, I had been working on the parsing of the particular emails: PBS, slurs, UGA, etc. Over the weekend, I ran some performance matrix tests on the current parsing code as Shameera suggested. The current code written is quite balanced in terms of large scale processing. It is able to quickly parse the emails and still maintain a high degree of simplicity. I improved on a couple lines without using regex, however the code proved to be highly unmaintainable. As shameera/marlon pointed out, these emails change relatively frequently as servers/machines are upgraded/replaced. It is important for this code to be highly maintainable.

In addition to this, I have been working with Supun to develop a new architecture for the mailing list. At first, there was a debate on whether to use Zookeeper and/or Redis in a global state. I conducted some research to identify the pros and cons of each technology. As Suresh/Gourav suggested, airaveata currently uses zookeeper. Also, zookeeper would provide less overhead than a database such as Regis. A big problem with this development strategy is the complexity of the code we will have to write. In the scenario of multiple GFaCs, a global zookeeper makes some sense. However the problem comes if a job is cancelled. This can potentially cause edge case scenario problems where say GFaC A accidentally processes GFAC B's emails. Therefore, we have to imagine on a low level, a clever implementation of locks for who needs to access data and who doesn't. This can prove to be a hassle.


Another potential solution we can have is to implement a work queue similar to our job submission in Airavata. The work queue delegates the work of parsing/reading emails to multiple gfacs. This potentially could avoid lock/thread/dangerous situations. If a GFAC fails somehow, there needs to be a mechanism in place to handle the particular emails that GFAC is handed. We still have to decide on the correct implementation before the code can be implemented. I've been also working on the Thrift/RabbitMQ scenario, where data is parsed, serialized, and then sent over the network. I will upload the code by today/tomorrow.


SHOUT OUT @Marcus !