You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@airavata.apache.org by DImuthu Upeksha <di...@gmail.com> on 2017/10/05 06:40:10 UTC

Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

Hi All,

Within last few days, I have been going through the requirements and design
of current setup of Airavata and I identified following ares as the key
focusing areas in the technology evaluation phase

Micorservices deployment platform (container management system)

Possible candidates: Google Kubernetes, Apache Mesos, Apache Helix
As the most of the operational units of Airavata is supposed to be moving
into microservices based deployment pattern, having a unified deployment
platform to manage those microservices will make the DevOps operations
easier and faster. From the other hand, although writing and maintaining a
single micro service is a somewhat straightforward way, making multiple
microservies running, monitoring and maintaining the lifecycles manually in
a production environment is an tiresome and complex operation to perform.
Using such a deployment platform, we can easily automate lots of pain
points that I have mentioned earlier.

Scalability

We need a solution that can easily scalable depending on the load condition
of several parts of the system. For example, the workers in the post
processing pipeline should be able scaled up and down depending on the
events come into the message queue.

Availability

We need to support solution to be deployed in multiple geographically
distant data centers. When evaluating container management systems, we
should consider this is as a primary requirement. However one thing that I
am not sure is the availability mode that Airavata normally expect. Is it a
active-active mode or active-passive mode?

Service discovery

Once we move in to microservice based deployment pattern, there could be
scenarios where we want service discovery for several use cases. For
example, if we are going to scale up API Server to handle an increased
load, we might have to put a load balancer in between the client and API
Server instances. In that case, service discovery is essential to instruct
the load balancer with healthy API Server endpoints which are currently
running in the system.

Cluster coordination

Although micorservices are supposed to be stateless in most of the cases,
we might have scenarios to feed some state to particular micorservices. For
example if we are going to implement a microservice that perform
Orchestrator's role, there could be issues if we keep multiple instances of
it in several data centers to increase the availability. According to my
understanding, there should be only one Orchestrator being running at a
time as it is the one who takes decisions of the job execution process. So,
if we are going to keep multiple instances of it running in the system,
there should be an some sort of a leader election in between Orchestrator
quorum.

Common messaging medium in between mocroservices

This might be out of the scope but I thought of sharing with the team to
have an general idea. Idea was raised at the hip chat discussion with
Marlon and Gaourav. Using a common messaging medium might enable
microservices to communicate with in a decoupled manner which will increase
the scalability of the system. For example there is a reference
architecture that we can utilize with kafka based messaging medium [1],
[2]. However I noticed in one paper that Kafka was previously rejected as
writing clients was onerous. Please share your views on this as I'm not
familiar with the existing fan out model based on AMQP and  pain points of
it.

Those are the main areas that I have understood while going through
Airavata current implementation and requirements stated in some of the
research papers. Please let me know whether my understanding on above items
are correct and suggestions are always welcome :)

[1] https://medium.com/@ulymarins/an-introduction-to-apache-kafka-and-
microservices-communication-bf0a0966d63
[2] https://www.slideshare.net/ConfluentInc/microservices-in-the-apache-
kafka-ecosystem

References

Marru, S., Gunathilake, L., Herath, C., Tangchaisin, P., Pierce, M.,
Mattmann, C., Singh, R., Gunarathne, T., Chinthaka, E., Gardler, R. and
Slominski, A., 2011, November. Apache airavata: a framework for distributed
applications and computational workflows. In Proceedings of the 2011 ACM
workshop on Gateway computing environments (pp. 21-28). ACM.

Nakandala, S., Pamidighantam, S., Yodage, S., Doshi, N., Abeysinghe, E.,
Kankanamalage, C.P., Marru, S. and Pierce, M., 2016, July. Anatomy of the
SEAGrid Science Gateway. In Proceedings of the XSEDE16 Conference on
Diversity, Big Data, and Science at Scale (p. 40). ACM.

Pierce, Marlon E., Suresh Marru, Lahiru Gunathilake, Don Kushan Wijeratne,
Raminder Singh, Chathuri Wimalasena, Shameera Ratnayaka, and Sudhakar
Pamidighantam. "Apache Airavata: design and directions of a science gateway
framework." Concurrency and Computation: Practice and Experience 27, no. 16
(2015): 4282-4291.

Pierce, Marlon, Suresh Marru, Borries Demeler, Raminderjeet Singh, and Gary
Gorbet. "The apache airavata application programming interface: overview
and evaluation with the UltraScan science gateway." In Proceedings of the
9th Gateway Computing Environments Workshop, pp. 25-29. IEEE Press, 2014.

Marru, Suresh, Marlon Pierce, Sudhakar Pamidighantam, and Chathuri
Wimalasena. "Apache Airavata as a laboratory: architecture and case study
for component- based gateway middleware." In Proceedings of the 1st
Workshop on The Science of Cyberinfrastructure: Research, Experience,
Applications and Models, pp. 19-26. ACM, 2015.

Thanks
Dimuthu

Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

Posted by DImuthu Upeksha <di...@gmail.com>.

Hi Supun,

I came up with a design and a PoC to verify above approach. Search for
subject "Async Agents to handle long running jobs"

Thanks
Dimuthu

On Mon, Oct 9, 2017 at 9:43 AM, Supun Nakandala <su...@gmail.com>
wrote:

> +1 for the idea.
>
> On Sun, Oct 8, 2017 at 2:52 AM, DImuthu Upeksha <
> dimuthu.upeksha2@gmail.com> wrote:
>
>> Hi Supun,
>>
>> My belief also letting orchestrator to determine the worker to run
>> particular job is complex to implement and will make the maintainability of
>> orchestrator code quite hard in long run. I'm also in partially agreement
>> with embedding a worker inside the firewall protected resource but I guess
>> we can improve it further to make homogenous and stateless. Have a look
>> at following figure
>>
>>
>> In above design we keep all the workers outside and keep a daemon inside
>> the protected resource to securely communicate with workers. Then the
>> problem is how do we make the worker homogenous as this is still just
>> adding another layer to the solution stated above. Trick is, we decouple
>> the communication between worker and resource. Communication to any
>> resource is being done through a well defined API. Speaking in java
>>
>> public interface CommunicationInterface {
>>       public String sshToResource(String resourceIp, String command);
>>       public void transferDataTo(String resourceIp, String target,
>> InputStream in);
>>       public void transferDataFrom(String resourceIp, String target,
>> OutputStream out);
>> }
>> 
>> Implementation of this API might change according to the resource. We
>> keep a separate Catalog that will cater the libraries that have the
>> implementation specific to each resource. For example, if Worker 1 needs to
>> talk to Resource 1 which acts behind a firewall and the Airavata
>> communication agent is placed inside, it will query the Catalog for the
>> Resource 1 and fetch the library that implemented CommunicationInterface
>> to talk securely with Airavata Agent. If it wants to talk to Resource 2,
>> another library will be fetched from Catalog that has default
>> implementations. Once those SDKs are fetched, they are loaded into the JVM
>> at runtime using a class loader and communication will be done afterwards.
>>
>> We can improve this by caching libraries inside workers and reusing them
>> as much as possible to limit number of queries to Catalog from workers.
>>
>> Advantage of this is, we can add resources with different security levels
>> without changing the Worker implementations. Only thing we have to do is to
>> come up with an agent and a library to talk with agent. Then add them to
>> Catalog and rest will be taken cared by the framework. This model is
>> analogous to the sql drivers that we use in java to connect to databases.
>>
>> Please note that I came up with this design based on the limited
>> knowledge I have in Airavata Workers and Resources. There will be lot of
>> corner cases that I have not identified. Your views and ideas are highly
>> appreciated.
>>
>> Thanks
>> Dimuthu
>>
>> On Sun, Oct 8, 2017 at 10:51 AM, Supun Nakandala <
>> supun.nakandala@gmail.com> wrote:
>>
>>> Hi Dimuthu,
>>>
>>> Thank you for the very good summary. I think you have covered almost all
>>> the things.
>>>
>>> I would also like to mention one other futuristic requirements that I
>>> think will be important in this discussion.
>>>
>>> In my opinion going forward, Airavata will get the requirement of
>>> working with firewall protected resources. In such cases, workers which are
>>> residing outside will not be able to communicate with the protected
>>> resources. What we initially thought was to deploy a special type of worker
>>> which will be placed inside the firewall-protected network and will
>>> coordinate with Airavata orchestrator to execute actions. One such tool
>>> which is used by ServiceNow in enterprise settings is the MidServer (
>>> http://wiki.servicenow.com/index.php?title=MID_Server#gsc.tab=0). The
>>> downside of this approach is that it breaks our assumption of all workers
>>> being homogenous and therefore require orchestrator to be worker aware.
>>> Perhaps, instead of workers picking work we can design such that
>>> orchestrator will grant work to the corresponding work. But this
>>> incorporates a lot of complexity on the orchestrator's side.
>>>
>>>
>>>
>>> On Oct 5, 2017 10:47 AM, "DImuthu Upeksha" <di...@gmail.com>
>>> wrote:
>>>
>>>> Hi Gaurav,
>>>>
>>>> Thanks a lot for the detailed description about DC/OS and how it can be
>>>> utilized in Airavata. Seems like it is an interesting project and I'll add
>>>> it to the technology list that are to be evaluated.
>>>>
>>>> When selecting a technology, in addition to the features it provides,
>>>> we might have to take some non-functional features like the community
>>>> participation (committers, commits and forks), number of customers  who
>>>> are  running it  in production environments, maturity of the project and
>>>> the complexity it brings in to the total system into the consideration. So
>>>> I'll first try to go through the resources (documentation and source) and
>>>> try to grab concepts of DC/OS and hopefully I can work with you to dig
>>>> deeper to understand more about DC/OS
>>>>
>>>> Thanks
>>>> Dimuthu
>>>>
>>>> On Thu, Oct 5, 2017 at 8:50 PM, Shenoy, Gourav Ganesh <
>>>> goshenoy@indiana.edu> wrote:
>>>>
>>>>> Sorry, missed the attachment in my previous email.
>>>>>
>>>>>
>>>>>
>>>>> PS: DC/OS is just a recommendation for performing containerized
>>>>> deployment and application management for Airavata. I would be happy to
>>>>> consider alternative frameworks such as Kubernetes.
>>>>>
>>>>>
>>>>>
>>>>> Thanks and Regards,
>>>>>
>>>>> Gourav Shenoy
>>>>>
>>>>>
>>>>>
>>>>> *From: *"Shenoy, Gourav Ganesh" <go...@indiana.edu>
>>>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>>> *Date: *Thursday, October 5, 2017 at 11:16 AM
>>>>>
>>>>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>>> *Subject: *Re: Linked Container Services for Apache Airavata
>>>>> Components - Phase 1 - Requirement identification
>>>>>
>>>>>
>>>>>
>>>>> Hi Dimuthu,
>>>>>
>>>>>
>>>>>
>>>>> Very good summary! I am not sure if you have, but DC/OS (DataCenter
>>>>> Operating System) is a container orchestration platform based on Apache
>>>>> Mesos. The beauty of DC/OS is the ease and simplicity of
>>>>> development/deployment; yet being extremely powerful in most of the
>>>>> parameters – multi-datacenter, multi-cloud, scalability, high availability,
>>>>> fault tolerance, load balancing, and more importantly the community support
>>>>> is fantastic.
>>>>>
>>>>>
>>>>>
>>>>> DC/OS has an exhaustive service catalog, it’s more like a PAAS for
>>>>> containers (not just restricted to containers though) – you can run
>>>>> services like Spark, Kafka, RabbitMQ, etc out of the box with a single
>>>>> click install. And Apache Mesos as the underlying resource manager makes it
>>>>> seamless to deploy applications across different datacenters. There is a
>>>>> concept of SERVICE vs JOB – service is considered long running and DC/OS
>>>>> will make sure it keeps it running (if a service fails, it spins up a new
>>>>> one), whereas jobs are one time executors. This comes handy for using DC/OS
>>>>> as a target runtime for Airavata.
>>>>>
>>>>>
>>>>>
>>>>> We used DC/OS for our class project to run the distributed task
>>>>> execution prototype we built (which uses RabbitMQ messaging). Here’s a link
>>>>> to the blog I have explaining the process:
>>>>> https://gouravshenoy.github.io/apache-airavata/spring17/2017
>>>>> /04/20/final-report.html . I have also attached a PDF paper we wrote
>>>>> as part of the class explaining the task execution process and *one
>>>>> solution* using rabbitmq messaging.
>>>>>
>>>>>
>>>>>
>>>>> I had also started with the work of containerizing Airavata and a
>>>>> unified build + deployment mechanism with CI CD on DC/OS. Unfortunately, I
>>>>> couldn’t complete it due to time constraints, but I would be more than
>>>>> happy to work with you on this. Let me know and we can coordinate.
>>>>>
>>>>>
>>>>>
>>>>> Thanks and Regards,
>>>>>
>>>>> Gourav Shenoy
>>>>>
>>>>>
>>>>>
>>>>> *From: *DImuthu Upeksha <di...@gmail.com>
>>>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>>> *Date: *Thursday, October 5, 2017 at 9:52 AM
>>>>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>>> *Subject: *Re: Linked Container Services for Apache Airavata
>>>>> Components - Phase 1 - Requirement identification
>>>>>
>>>>>
>>>>>
>>>>> Hi Marlon,
>>>>>
>>>>>
>>>>>
>>>>> Thanks for the input. I got your idea of availability mode and will
>>>>> keep in mind while designing the PoC. CI/CD is the one I have missed and
>>>>> thanks for pointing it out.
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Dimuthu
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Oct 5, 2017 at 7:04 PM, Pierce, Marlon <ma...@iu.edu>
>>>>> wrote:
>>>>>
>>>>> Thanks, Dimuthu, this is a good summary. Others may comment about
>>>>> Kafka, stateful versus stateless parts of Airavata, etc.  You may also find
>>>>> some of this discussion on the mailing list archives.
>>>>>
>>>>>
>>>>>
>>>>> Active-active vs. active-passive is a good question, and we have
>>>>> typically thought of this in terms of individual Airavata components rather
>>>>> than the whole system.  Some components can be active-active (like a
>>>>> stateless application manager), while others (like the orchestrator example
>>>>> you give below) are stafefull and may be better as active-passive.
>>>>>
>>>>>
>>>>>
>>>>> There is also the issue of system updates and continuous deployments,
>>>>> which could be added to your list.
>>>>>
>>>>>
>>>>>
>>>>> Marlon
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *From: *"dimuthu.upeksha2@gmail.com" <di...@gmail.com>
>>>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>>> *Date: *Thursday, October 5, 2017 at 2:40 AM
>>>>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>>> *Subject: *Linked Container Services for Apache Airavata Components -
>>>>> Phase 1 - Requirement identification
>>>>>
>>>>>
>>>>>
>>>>> Hi All,
>>>>>
>>>>>
>>>>>
>>>>> Within last few days, I have been going through the requirements and
>>>>> design of current setup of Airavata and I identified following ares as the
>>>>> key focusing areas in the technology evaluation phase
>>>>>
>>>>>
>>>>>
>>>>> Micorservices deployment platform (container management system)
>>>>>
>>>>>
>>>>>
>>>>> Possible candidates: Google Kubernetes, Apache Mesos, Apache Helix
>>>>>
>>>>> As the most of the operational units of Airavata is supposed to be
>>>>> moving into microservices based deployment pattern, having a unified
>>>>> deployment platform to manage those microservices will make the DevOps
>>>>> operations easier and faster. From the other hand, although writing and
>>>>> maintaining a single micro service is a somewhat straightforward way,
>>>>> making multiple microservies running, monitoring and maintaining the
>>>>> lifecycles manually in a production environment is an tiresome and complex
>>>>> operation to perform. Using such a deployment platform, we can easily
>>>>> automate lots of pain points that I have mentioned earlier.
>>>>>
>>>>>
>>>>>
>>>>> Scalability
>>>>>
>>>>>
>>>>>
>>>>> We need a solution that can easily scalable depending on the load
>>>>> condition of several parts of the system. For example, the workers in the
>>>>> post processing pipeline should be able scaled up and down depending on the
>>>>> events come into the message queue.
>>>>>
>>>>>
>>>>>
>>>>> Availability
>>>>>
>>>>>
>>>>>
>>>>> We need to support solution to be deployed in multiple geographically
>>>>> distant data centers. When evaluating container management systems, we
>>>>> should consider this is as a primary requirement. However one thing that I
>>>>> am not sure is the availability mode that Airavata normally expect. Is it a
>>>>> active-active mode or active-passive mode?
>>>>>
>>>>>
>>>>>
>>>>> Service discovery
>>>>>
>>>>>
>>>>>
>>>>> Once we move in to microservice based deployment pattern, there could
>>>>> be scenarios where we want service discovery for several use cases. For
>>>>> example, if we are going to scale up API Server to handle an increased
>>>>> load, we might have to put a load balancer in between the client and API
>>>>> Server instances. In that case, service discovery is essential to instruct
>>>>> the load balancer with healthy API Server endpoints which are currently
>>>>> running in the system.
>>>>>
>>>>>
>>>>>
>>>>> Cluster coordination
>>>>>
>>>>>
>>>>>
>>>>> Although micorservices are supposed to be stateless in most of the
>>>>> cases, we might have scenarios to feed some state to particular
>>>>> micorservices. For example if we are going to implement a microservice that
>>>>> perform Orchestrator's role, there could be issues if we keep multiple
>>>>> instances of it in several data centers to increase the availability.
>>>>> According to my understanding, there should be only one Orchestrator being
>>>>> running at a time as it is the one who takes decisions of the job execution
>>>>> process. So, if we are going to keep multiple instances of it running in
>>>>> the system, there should be an some sort of a leader election in between
>>>>> Orchestrator quorum.
>>>>>
>>>>>
>>>>>
>>>>> Common messaging medium in between mocroservices
>>>>>
>>>>>
>>>>>
>>>>> This might be out of the scope but I thought of sharing with the team
>>>>> to have an general idea. Idea was raised at the hip chat discussion with
>>>>> Marlon and Gaourav. Using a common messaging medium might enable
>>>>> microservices to communicate with in a decoupled manner which will increase
>>>>> the scalability of the system. For example there is a reference
>>>>> architecture that we can utilize with kafka based messaging medium [1],
>>>>> [2]. However I noticed in one paper that Kafka was previously rejected as
>>>>> writing clients was onerous. Please share your views on this as I'm not
>>>>> familiar with the existing fan out model based on AMQP and  pain points of
>>>>> it.
>>>>>
>>>>>
>>>>>
>>>>> Those are the main areas that I have understood while going through
>>>>> Airavata current implementation and requirements stated in some of the
>>>>> research papers. Please let me know whether my understanding on above items
>>>>> are correct and suggestions are always welcome :)
>>>>>
>>>>>
>>>>>
>>>>> [1] https://medium.com/@ulymarins/an-introduction-to-apache-
>>>>> kafka-and-microservices-communication-bf0a0966d63
>>>>>
>>>>> [2] https://www.slideshare.net/ConfluentInc/microservices-in
>>>>> -the-apache-kafka-ecosystem
>>>>>
>>>>>
>>>>>
>>>>> References
>>>>>
>>>>>
>>>>>
>>>>> Marru, S., Gunathilake, L., Herath, C., Tangchaisin, P., Pierce, M.,
>>>>> Mattmann, C., Singh, R., Gunarathne, T., Chinthaka, E., Gardler, R. and
>>>>> Slominski, A., 2011, November. Apache airavata: a framework for distributed
>>>>> applications and computational workflows. In Proceedings of the 2011 ACM
>>>>> workshop on Gateway computing environments (pp. 21-28). ACM.
>>>>>
>>>>>
>>>>>
>>>>> Nakandala, S., Pamidighantam, S., Yodage, S., Doshi, N., Abeysinghe,
>>>>> E., Kankanamalage, C.P., Marru, S. and Pierce, M., 2016, July. Anatomy of
>>>>> the SEAGrid Science Gateway. In Proceedings of the XSEDE16 Conference on
>>>>> Diversity, Big Data, and Science at Scale (p. 40). ACM.
>>>>>
>>>>>
>>>>>
>>>>> Pierce, Marlon E., Suresh Marru, Lahiru Gunathilake, Don Kushan
>>>>> Wijeratne, Raminder Singh, Chathuri Wimalasena, Shameera Ratnayaka, and
>>>>> Sudhakar Pamidighantam. "Apache Airavata: design and directions of a
>>>>> science gateway framework." Concurrency and Computation: Practice and
>>>>> Experience 27, no. 16 (2015): 4282-4291.
>>>>>
>>>>>
>>>>>
>>>>> Pierce, Marlon, Suresh Marru, Borries Demeler, Raminderjeet Singh, and
>>>>> Gary Gorbet. "The apache airavata application programming interface:
>>>>> overview and evaluation with the UltraScan science gateway." In Proceedings
>>>>> of the 9th Gateway Computing Environments Workshop, pp. 25-29. IEEE Press,
>>>>> 2014.
>>>>>
>>>>>
>>>>>
>>>>> Marru, Suresh, Marlon Pierce, Sudhakar Pamidighantam, and Chathuri
>>>>> Wimalasena. "Apache Airavata as a laboratory: architecture and case study
>>>>> for component- based gateway middleware." In Proceedings of the 1st
>>>>> Workshop on The Science of Cyberinfrastructure: Research, Experience,
>>>>> Applications and Models, pp. 19-26. ACM, 2015.
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Dimuthu
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>
>

Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

Posted by "Pierce, Marlon" <ma...@iu.edu>.

Thanks, Dimuthu, this is a very nice analysis.  I’d like to get others’ feedback on this.

 

Marlon

 

 

From: "dimuthu.upeksha2@gmail.com" <di...@gmail.com>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Monday, October 16, 2017 at 1:47 AM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

 

Hi All, 

 

Thanks for all the valuable feedback that you have provided so far and please find attached document that contains the evaluation of Kubernetes, DC/OS and Helix as candidate platforms to deploy Airavata mircoservices. Use the google doc [1] to provide your suggestions and comments. 

 

Summary of the document:

 

Considering all the facts, I believe that Kuberentes is more suitable for our use cases.

 

Advantages of Kubernetes over DC/OS

 

1. DC/OS uses Marathon framework to perform container orchestration. Marathon framework should be deployed on Mesos framework. So, from the architectural point of view, there are two framework levels which one installed on top of another. Reason for this is because Mesos is a generic framework to deploy any application and Marathon is the one which adds the value by providing container orchestration. Although this is a good design from design point of view, it requires more resources and involves a lot of complexities when it comes to an production deployment. From the other hand, Kubernetes has built from the scratch to support container orchestration so it can do the same work which DC/OS performs, with less resources and less complexity.

 

2. Kubernetes has fewer components compared to DC/OS and comparatively lighter than DC/OS framework deployment. Lesser components makes it easy to maintain and monitor the framework.

 

3. When it comes to high availability deployments, DC/OS has more components (Mesos masters, Marathon masters) to make available than Kubernetes. This makes the production deployment and management process complex and tiresome. 

 

4. Having the capability of deploying non container based applications in the platform is not one of our requirements. So that feature of DC/OS will rarely be beneficial for us. 

 

5. Kubernetes has a huge community and a lot of exposure to opensource arena throughout the inception. DC/OS is mainly managed by Mesosphere even though it has been made opensource in 2016. Most of the feature designs and issue discussions are well documented in Kubernetes github repository which makes it really easy to track and solve when come to an issue of the framework, where as in DC/OS that ecosystem is still at the very early age.

 

6. There are lot of vendors working on Kubernetes and currently there are significant amount of tools developed around Kubernetes to deploy, monitor and manage Kubernetes clusters whereas in DC/OS we can not see that amount of traction.

 

Advantages of Kubernetes over Helix

 

1. Containers (docker) are the proper way to deploy microservice due to its platform agnostic packaging model, resource limitation capability and ease of distributability. 

 

2. Helix doesn’t support container deployment out of the box where as Kubernetes itself is a container orchestration framework

 

3. Helix has lesser components compared to Kubernetes however considering the features Kubernetes provides over Helix is justifiable for having more components

 

4. In Helix, microservice application logic is tightly coupled with the Helix participant code (more precisely to the State Model). This provides several issues in a production deployments

Update process become very complex as we have to restart participant nodes for each update. And it will affect other services as well which is not acceptable under any circumstance.
We can not limit resources for each microservice as all are run inside the same JVM of Participant node.
 

5. Kubernetes in contrast have clearly defined boundaries between application logic and the runtime framework. Application logic is bundled as a docker image and they are run as separate processes which makes the update process and resource limitation very easy

 

6. Kubernetes comes with the service discovery and load balancing out of the box whereas Helix doesn’t provide such features by default. 

 

7. Kubernetes has a well defined and scalable node affinity API but in Helix we have to write custom Re-Balancers to achieve it and it is not scalable either.

 

8. It is very complex to come up with a proper CI/CD pipeline for Helix as application code is tightly coupled to the framework. Kubernetes has a straightforward way to integrate CI/CD pipelines to test and deploy microservices

 

9. Kubernetes has a comprehensive role based access model (RBAC) to authorize the resources while Helix doesn’t have as such

 

[1] https://docs.google.com/document/d/17Hfu-qFFRZHWfLCtf3esXZTcQy3jPKF7OfetTWJntpY/edit?usp=sharing

 

Thanks

Dimuthu

 

On Mon, Oct 9, 2017 at 9:43 AM, Supun Nakandala <su...@gmail.com> wrote:

+1 for the idea.

 

On Sun, Oct 8, 2017 at 2:52 AM, DImuthu Upeksha <di...@gmail.com> wrote:

Hi Supun, 

 

My belief also letting orchestrator to determine the worker to run particular job is complex to implement and will make the maintainability of orchestrator code quite hard in long run. I'm also in partially agreement with embedding a worker inside the firewall protected resource but I guess we can improve it further to make homogenous and stateless. Have a look at following figure

 


In above design we keep all the workers outside and keep a daemon inside the protected resource to securely communicate with workers. Then the problem is how do we make the worker homogenous as this is still just adding another layer to the solution stated above. Trick is, we decouple the communication between worker and resource. Communication to any resource is being done through a well defined API. Speaking in java

 

public interface CommunicationInterface {

      public String sshToResource(String resourceIp, String command);

      public void transferDataTo(String resourceIp, String target, InputStream in);

      public void transferDataFrom(String resourceIp, String target, OutputStream out);

}



Implementation of this API might change according to the resource. We keep a separate Catalog that will cater the libraries that have the implementation specific to each resource. For example, if Worker 1 needs to talk to Resource 1 which acts behind a firewall and the Airavata communication agent is placed inside, it will query the Catalog for the Resource 1 and fetch the library that implemented CommunicationInterface to talk securely with Airavata Agent. If it wants to talk to Resource 2, another library will be fetched from Catalog that has default implementations. Once those SDKs are fetched, they are loaded into the JVM at runtime using a class loader and communication will be done afterwards.

 

We can improve this by caching libraries inside workers and reusing them as much as possible to limit number of queries to Catalog from workers.

 

Advantage of this is, we can add resources with different security levels without changing the Worker implementations. Only thing we have to do is to come up with an agent and a library to talk with agent. Then add them to Catalog and rest will be taken cared by the framework. This model is analogous to the sql drivers that we use in java to connect to databases.

 

Please note that I came up with this design based on the limited knowledge I have in Airavata Workers and Resources. There will be lot of corner cases that I have not identified. Your views and ideas are highly appreciated.

 

Thanks

Dimuthu

 

On Sun, Oct 8, 2017 at 10:51 AM, Supun Nakandala <su...@gmail.com> wrote:

Hi Dimuthu, 

 

Thank you for the very good summary. I think you have covered almost all the things.

 

I would also like to mention one other futuristic requirements that I think will be important in this discussion.

 

In my opinion going forward, Airavata will get the requirement of working with firewall protected resources. In such cases, workers which are residing outside will not be able to communicate with the protected resources. What we initially thought was to deploy a special type of worker which will be placed inside the firewall-protected network and will coordinate with Airavata orchestrator to execute actions. One such tool which is used by ServiceNow in enterprise settings is the MidServer (http://wiki.servicenow.com/index.php?title=MID_Server#gsc.tab=0). The downside of this approach is that it breaks our assumption of all workers being homogenous and therefore require orchestrator to be worker aware. Perhaps, instead of workers picking work we can design such that orchestrator will grant work to the corresponding work. But this incorporates a lot of complexity on the orchestrator's side. 

 

 

 

On Oct 5, 2017 10:47 AM, "DImuthu Upeksha" <di...@gmail.com> wrote:

Hi Gaurav, 

 

Thanks a lot for the detailed description about DC/OS and how it can be utilized in Airavata. Seems like it is an interesting project and I'll add it to the technology list that are to be evaluated. 

 

When selecting a technology, in addition to the features it provides, we might have to take some non-functional features like the community participation (committers, commits and forks), number of customers  who are  running it  in production environments, maturity of the project and the complexity it brings in to the total system into the consideration. So I'll first try to go through the resources (documentation and source) and try to grab concepts of DC/OS and hopefully I can work with you to dig deeper to understand more about DC/OS

 

Thanks

Dimuthu

 

On Thu, Oct 5, 2017 at 8:50 PM, Shenoy, Gourav Ganesh <go...@indiana.edu> wrote:

Sorry, missed the attachment in my previous email.

 

PS: DC/OS is just a recommendation for performing containerized deployment and application management for Airavata. I would be happy to consider alternative frameworks such as Kubernetes.

 

Thanks and Regards,

Gourav Shenoy

 

From: "Shenoy, Gourav Ganesh" <go...@indiana.edu>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Thursday, October 5, 2017 at 11:16 AM


To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

 

Hi Dimuthu,

 

Very good summary! I am not sure if you have, but DC/OS (DataCenter Operating System) is a container orchestration platform based on Apache Mesos. The beauty of DC/OS is the ease and simplicity of development/deployment; yet being extremely powerful in most of the parameters – multi-datacenter, multi-cloud, scalability, high availability, fault tolerance, load balancing, and more importantly the community support is fantastic.

 

DC/OS has an exhaustive service catalog, it’s more like a PAAS for containers (not just restricted to containers though) – you can run services like Spark, Kafka, RabbitMQ, etc out of the box with a single click install. And Apache Mesos as the underlying resource manager makes it seamless to deploy applications across different datacenters. There is a concept of SERVICE vs JOB – service is considered long running and DC/OS will make sure it keeps it running (if a service fails, it spins up a new one), whereas jobs are one time executors. This comes handy for using DC/OS as a target runtime for Airavata.

 

We used DC/OS for our class project to run the distributed task execution prototype we built (which uses RabbitMQ messaging). Here’s a link to the blog I have explaining the process: https://gouravshenoy.github.io/apache-airavata/spring17/2017/04/20/final-report.html . I have also attached a PDF paper we wrote as part of the class explaining the task execution process and one solution using rabbitmq messaging.

 

I had also started with the work of containerizing Airavata and a unified build + deployment mechanism with CI CD on DC/OS. Unfortunately, I couldn’t complete it due to time constraints, but I would be more than happy to work with you on this. Let me know and we can coordinate. 

 

Thanks and Regards,

Gourav Shenoy

 

From: DImuthu Upeksha <di...@gmail.com>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Thursday, October 5, 2017 at 9:52 AM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

 

Hi Marlon, 

 

Thanks for the input. I got your idea of availability mode and will keep in mind while designing the PoC. CI/CD is the one I have missed and thanks for pointing it out.

 

Thanks

Dimuthu

 

On Thu, Oct 5, 2017 at 7:04 PM, Pierce, Marlon <ma...@iu.edu> wrote:

Thanks, Dimuthu, this is a good summary. Others may comment about Kafka, stateful versus stateless parts of Airavata, etc.  You may also find some of this discussion on the mailing list archives.

 

Active-active vs. active-passive is a good question, and we have typically thought of this in terms of individual Airavata components rather than the whole system.  Some components can be active-active (like a stateless application manager), while others (like the orchestrator example you give below) are stafefull and may be better as active-passive.  

 

There is also the issue of system updates and continuous deployments, which could be added to your list.

 

Marlon

 

 

From: "dimuthu.upeksha2@gmail.com" <di...@gmail.com>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Thursday, October 5, 2017 at 2:40 AM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

 

Hi All,

 

Within last few days, I have been going through the requirements and design of current setup of Airavata and I identified following ares as the key focusing areas in the technology evaluation phase

 

Micorservices deployment platform (container management system) 

 

Possible candidates: Google Kubernetes, Apache Mesos, Apache Helix 

As the most of the operational units of Airavata is supposed to be moving into microservices based deployment pattern, having a unified deployment platform to manage those microservices will make the DevOps operations easier and faster. From the other hand, although writing and maintaining a single micro service is a somewhat straightforward way, making multiple microservies running, monitoring and maintaining the lifecycles manually in a production environment is an tiresome and complex operation to perform. Using such a deployment platform, we can easily automate lots of pain points that I have mentioned earlier. 

 

Scalability

 

We need a solution that can easily scalable depending on the load condition of several parts of the system. For example, the workers in the post processing pipeline should be able scaled up and down depending on the events come into the message queue. 

 

Availability

 

We need to support solution to be deployed in multiple geographically distant data centers. When evaluating container management systems, we should consider this is as a primary requirement. However one thing that I am not sure is the availability mode that Airavata normally expect. Is it a active-active mode or active-passive mode? 

 

Service discovery

 

Once we move in to microservice based deployment pattern, there could be scenarios where we want service discovery for several use cases. For example, if we are going to scale up API Server to handle an increased load, we might have to put a load balancer in between the client and API Server instances. In that case, service discovery is essential to instruct the load balancer with healthy API Server endpoints which are currently running in the system.

 

Cluster coordination

 

Although micorservices are supposed to be stateless in most of the cases, we might have scenarios to feed some state to particular micorservices. For example if we are going to implement a microservice that perform Orchestrator's role, there could be issues if we keep multiple instances of it in several data centers to increase the availability. According to my understanding, there should be only one Orchestrator being running at a time as it is the one who takes decisions of the job execution process. So, if we are going to keep multiple instances of it running in the system, there should be an some sort of a leader election in between Orchestrator quorum.

 

Common messaging medium in between mocroservices

 

This might be out of the scope but I thought of sharing with the team to have an general idea. Idea was raised at the hip chat discussion with Marlon and Gaourav. Using a common messaging medium might enable microservices to communicate with in a decoupled manner which will increase the scalability of the system. For example there is a reference architecture that we can utilize with kafka based messaging medium [1], [2]. However I noticed in one paper that Kafka was previously rejected as writing clients was onerous. Please share your views on this as I'm not familiar with the existing fan out model based on AMQP and  pain points of it. 

 

Those are the main areas that I have understood while going through Airavata current implementation and requirements stated in some of the research papers. Please let me know whether my understanding on above items are correct and suggestions are always welcome :)

 

[1] https://medium.com/@ulymarins/an-introduction-to-apache-kafka-and-microservices-communication-bf0a0966d63

[2] https://www.slideshare.net/ConfluentInc/microservices-in-the-apache-kafka-ecosystem

 

References

 

Marru, S., Gunathilake, L., Herath, C., Tangchaisin, P., Pierce, M., Mattmann, C., Singh, R., Gunarathne, T., Chinthaka, E., Gardler, R. and Slominski, A., 2011, November. Apache airavata: a framework for distributed applications and computational workflows. In Proceedings of the 2011 ACM workshop on Gateway computing environments (pp. 21-28). ACM.

 

Nakandala, S., Pamidighantam, S., Yodage, S., Doshi, N., Abeysinghe, E., Kankanamalage, C.P., Marru, S. and Pierce, M., 2016, July. Anatomy of the SEAGrid Science Gateway. In Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale (p. 40). ACM.

 

Pierce, Marlon E., Suresh Marru, Lahiru Gunathilake, Don Kushan Wijeratne, Raminder Singh, Chathuri Wimalasena, Shameera Ratnayaka, and Sudhakar Pamidighantam. "Apache Airavata: design and directions of a science gateway framework." Concurrency and Computation: Practice and Experience 27, no. 16 (2015): 4282-4291.

 

Pierce, Marlon, Suresh Marru, Borries Demeler, Raminderjeet Singh, and Gary Gorbet. "The apache airavata application programming interface: overview and evaluation with the UltraScan science gateway." In Proceedings of the 9th Gateway Computing Environments Workshop, pp. 25-29. IEEE Press, 2014.

 

Marru, Suresh, Marlon Pierce, Sudhakar Pamidighantam, and Chathuri Wimalasena. "Apache Airavata as a laboratory: architecture and case study for component- based gateway middleware." In Proceedings of the 1st Workshop on The Science of Cyberinfrastructure: Research, Experience, Applications and Models, pp. 19-26. ACM, 2015.

 

Thanks

Dimuthu

Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

Posted by DImuthu Upeksha <di...@gmail.com>.

Hi Gaurav,

Really thankful for your feedback and I believe that will help us to get an
clear idea about the overall context and select the best matching product
for our use case. Please find my response inline with brownish red color :)
for your comments.

Really sorry if the response became lengthy as there are some points that I
have missed to include in the report.

On Thu, Nov 2, 2017 at 9:10 AM, Shenoy, Gourav Ganesh <go...@indiana.edu>
wrote:

> Hi Dimuthu,
>
>
>
> Sorry for catching up late on your emails and thanks again for summarizing
> your findings. I believe this is really helpful for anyone who wishes to
> understand the intricacies involved in re-designing the architecture and at
> the same time understanding the technologies. I have a few comments though;
> please find my feedback inline in blue.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *DImuthu Upeksha <di...@gmail.com>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Monday, October 16, 2017 at 1:48 AM
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Re: Linked Container Services for Apache Airavata Components -
> Phase 1 - Requirement identification
>
>
>
> Hi All,
>
>
>
> Thanks for all the valuable feedback that you have provided so far and
> please find attached document that contains the evaluation of Kubernetes,
> DC/OS and Helix as candidate platforms to deploy Airavata mircoservices.
> Use the google doc [1] to provide your suggestions and comments.
>
>
>
> Summary of the document:
>
>
>
> *Considering all the facts, I believe that Kuberentes is more suitable for
> our use cases.*
>
>
>
> *Advantages of Kubernetes over DC/OS*
>
>
>
> 1. DC/OS uses Marathon framework to perform container orchestration.
> Marathon framework should be deployed on Mesos framework. So, from the
> architectural point of view, there are two framework levels which one
> installed on top of another. Reason for this is because Mesos is a generic
> framework to deploy any application and Marathon is the one which adds the
> value by providing container orchestration. Although this is a good design
> from design point of view, it requires more resources and involves a lot of
> complexities when it comes to an production deployment. From the other
> hand, Kubernetes has built from the scratch to support container
> orchestration so it can do the same work which DC/OS performs, with less
> resources and less complexity.
>
>
>
> *DC/OS is an operating system which uses Mesos as the underlying resource
> manager (to add abstraction if your infrastructure spans multiple clouds OR
> if you have an on-premise infra), whereas Marathon is “generally” the
> scheduler used by DC/OS to orchestrate applications/services as containers
> over the infrastructure managed by Mesos. I say “generally”, because DC/OS
> also allows you to run one-off jobs (services are long running, jobs are
> one-time run), and uses Aurora as the scheduler for it. The reason I am
> reiterating this is because of the following:*
>
>    1. *You mentioned DC/OS using more resources, which I believe might
>    not be accurate (I might be wrong). DC/OS has evolved out of an ecosystem
>    which comprises of resource management and application management.*
>
>
What I meant as the resource utilization is that the resources that
framework components use, not the docker containers (actually the pods)
that are being deployed inside each framework. Really sorry being unable to
share statistics that I have observed which might give a clear idea abut
this. I guess that we both are on agreement that DC/OS provides some
additional features compared to Kubernetes including non container based
service deployment using the features of Mesos. To provide those features,
DC/OS needs extra utilities other than Marathon. Precisely, Marathon is
also a service deployed on Mesos. If you go to a dashboard of DC/OS, you
should be able to see around 27 components are working together to compose
3 node DC/OS cluster. There might be overlapping components so please
correct me if I am wrong. I'm still not using Marathon LB
<https://dcos.io/docs/1.7/usage/service-discovery/marathon-lb/usage/> to
Load balance containers and it will also add some additional components to
the framework.


Look at the component diagram provided by DC/OS
https://docs.mesosphere.com/1.10/overview/architecture/components/




Where as in Kubernetes, you might want to setup following components,

In master nodes:

Kube Api Server*
Kube Controller Manager*
Kube Secheduler*
Kube DNS* (This is a pod deployed in Kubernetes)

In Minion Nodes

Kubelet*
Kube Proxy*
Supervisord (If you want to keep k8s agent monitoring for availability)
Fluentd (If you want cluster logging)




*are mandatory options

Both require cluster coordinator (zookeeper) or distributed key value store
(etcd) to keep the status of the cluster but I believe both have the same
complexity from that point.

To make my point more clear, master of DC/OS installation with marathon
installed but no container deployed in it consumed 1.3GB of total memory of
the machine.



Kuberentes master with 8 pods + DNS pods deployed, consumed around 500 MB
of the memory. I used Ubuntu 16.04 minimal distribution for both
installations.




I understand that above pods are deployed on a minion machine and the
resources that they consumed by them do not add up to above values. But it
takes some memory to keep track of them from master.

By providing above information, I'm not trying to emphasize that Kubernetes
is better than DC/OS rather I want to show that DC/OS solves a bigger
problem than Kubernetes and it needs more resources and components to
achieve that.



>
>    1.
>    2. *The complexity involved in getting a working “production grade”
>    Kubernetes cluster setup is far more complicated than getting DC/OS
>    bootstrapped.*
>
> I think what you are referring as DC/OS bootstrap is this
https://dcos.io/docs/1.10/installing/custom/advanced/. However there are
some tools that we can easily setup a Kubernetes setup on a cluster
(kubeadm, kops, kubespray) and they have been dramatically improved
compared to last year. But the point is, getting production ready cluster
is not enough. If you are the one who is maintaining the cluster, you
should be aware of the each and every part of the platform as the issues in
a distributed system are inevitable. That's why I tried to analyze the
components of both platforms.


>    1.
>    2. *It really doesn’t matter “what scheduler” we use to orchestrate
>    the containers, because in the end the battle is between Kubernetes and
>    Marathon, and not Kubernetes vs DC/OS. In fact, DC/OS (as an operating
>    system) supports the use of multiple orchestration schedulers (like running
>    multiple browsers over a Mac/Windows machine) – so as a devops engineer,
>    DC/OS allows you to decide whether to use Marathon or Kubernetes. You will
>    be pleasantly surprised to know that Kubernetes is a first-class package on
>    DC/OS.*
>
>
Partially agreed. DC/OS is just the arena to pay and the marathon is the
player here. And I'm aware that we can install Kubernetes on DC/OS. To be
frank, my belief is that, deploying Kubernetes on DC/OS and making DC/OS
opensource is a marketing trick DC/OS has followed in order to get more
traction. I sightly see a possibility of a Marathon user migrating a
production deployment to Kubernetes (or other way around) easily. If we
really want a Kubernetes setup, I suggest to deploy a baremetal setup using
above mentioned tools. It will be straightforward, robust and less resource
consuming as well.


>    1.
>
>
>
> 2. Kubernetes has fewer components compared to DC/OS and comparatively
> lighter than DC/OS framework deployment. Lesser components makes it easy to
> maintain and monitor the framework.
>
>
>
> *As I mentioned in my comments above, I do not think this might be
> accurate. Yes, maintainability is a very important aspect but both these
> frameworks are proven to be easily maintainable.*
>

Answered above

>
>
> 3. When it comes to high availability deployments, DC/OS has more
> components (Mesos masters, Marathon masters) to make available than
> Kubernetes. This makes the production deployment and management process
> complex and tiresome.
>
>
>
> *Same as points (1, 2) above, the real battle is between Kubernetes and
> Marathon. Mesos is just the resource layer which allows you to easily
> manage applications across heterogeneous/hybrid infrastructure.*
>

Answered above. Adding more. Although this is between Kubernetes and
Marathon, we have to look at the bigger picture. We have to find an answer
to following issues if we are using Marathon as it is not a standalone
application.

1. Where do we deploy marathon?
2. If it is on top of DC/OS, then above comments should be considered.
3. If it is on to of a vanila Mesos cluster, problem may become more
complex in production deployments (correct me if I'm wrong) and it will
loose all the advantages that DC/OS brings.


>
> 4. Having the capability of deploying non container based applications in
> the platform is not one of our requirements. So that feature of DC/OS will
> rarely be beneficial for us.
>
>
>
> *Partially agreed. Yes, the discussion is towards containerizing Airavata
> components. But knowing that DC/OS also supports running distributed
> services is beneficial. *
>
>
>

This is something we have to discuss more in detail. If we can come up with
a set of distributed services that we want to deploy in our case, it will
be easy to talk about this point.

5. Kubernetes has a huge community and a lot of exposure to opensource
> arena throughout the inception. DC/OS is mainly managed by Mesosphere even
> though it has been made opensource in 2016. Most of the feature designs and
> issue discussions are well documented in Kubernetes github repository which
> makes it really easy to track and solve when come to an issue of the
> framework, where as in DC/OS that ecosystem is still at the very early age.
>
>
>
> *I agree. Kubernetes has a larger adoption and good community support.*
>
>
>
> 6. There are lot of vendors working on Kubernetes and currently there are
> significant amount of tools developed around Kubernetes to deploy, monitor
> and manage Kubernetes clusters whereas in DC/OS we can not see that amount
> of traction.
>
>
>
> *Yes, this is a very good point. Just deploying Airavata over DC/OS or
> Kubernetes is not the goal. Eventually having a streamlined CI/CD process
> is extremely essential. Yes, as Kubernetes has a wider adoption there are a
> lot of tools available for CI/CD over Kubernetes. Although I haven’t
> followed up or tried it myself, but looks like DC/OS has fairly good
> support for elastic CI/CD pipelines. Mesosphere ecosystem boasts about it
> here. <https://mesosphere.com/solutions/developer/>*
>

True. I have mentioned it in the report (page 20)


>
> *NOTE: I am not advocating DC/OS over Kubernetes. I just wanted to clarify
> some of the subtle differences between the two. Most people confuse between
> them as being competing technologies (me included), but this blog
> <https://mesosphere.com/blog/kubernetes-and-the-dcos/> throws some light
> over this topic.*
>

:) I would suggest we try both the technologies by ourselves and come up
with a better evaluation. This article is published by mesosphere and they
definitely point out their edge over Kubernetes. However all the points
that they have emphasized are correct but my worry is that, whether it
covers the 100% of the evaluation. This
<https://platform9.com/blog/kubernetes-vs-mesos-marathon/> covers the
evaluation to some extent but it will also be biased to Kubernetes because
the publishers of this article are Kubernetes commercial vendors.


>
>
> *Advantages of Kubernetes over Helix*
>
>
>
> *While the overall differences between Kubernetes and Helix makes total
> sense, however since we are considering only the “deployment” aspect here
> (containers being the protagonist in our story), Helix is completely out of
> the equation. We are only considering Helix as the distributed task
> execution framework for managing workloads “within” Airavata. The idea was
> to leverage Helix’s task execution APIs to define custom task executors,
> and orchestrate the DAGs defined using Helix nomenclature.*
>
>
>
> *Although Helix provides cluster management capabilities (and honestly
> that was the sole purpose behind the team at LinkedIn building Helix), we
> are not interested in using Helix for managing Airavata microservices.
> Rather, we need to identify the best way to build our micro-services around
> Helix’s task execution framework. I am not saying this is the ideal way to
> solve our problem, but certainly is one of the powerful candidates out
> there.*
>

True. Helix's task execution framework has rich capabilities but I couldn't
properly map its functionalities to the use cases that I was given. However
if we can come up with a design how we can incorporate this with Airavata
microservices, it would be a really interesting topic to talk about.

>
>
> 1. Containers (docker) are the proper way to deploy microservice due to
> its platform agnostic packaging model, resource limitation capability and
> ease of distributability.
>
>
>
> 2. Helix doesn’t support container deployment out of the box where as
> Kubernetes itself is a container orchestration framework
>
>
>
> 3. Helix has lesser components compared to Kubernetes however considering
> the features Kubernetes provides over Helix is justifiable for having more
> components
>
>
>
> 4. In Helix, microservice application logic is tightly coupled with the
> Helix participant code (more precisely to the State Model). This provides
> several issues in a production deployments
>
>    - Update process become very complex as we have to restart participant
>    nodes for each update. And it will affect other services as well which is
>    not acceptable under any circumstance.
>    - We can not limit resources for each microservice as all are run
>    inside the same JVM of Participant node.
>
>
>
> 5. Kubernetes in contrast have clearly defined boundaries between
> application logic and the runtime framework. Application logic is bundled
> as a docker image and they are run as separate processes which makes the
> update process and resource limitation very easy
>
>
>
> 6. Kubernetes comes with the service discovery and load balancing out of
> the box whereas Helix doesn’t provide such features by default.
>
>
>
> 7. Kubernetes has a well defined and scalable node affinity API but in
> Helix we have to write custom Re-Balancers to achieve it and it is not
> scalable either.
>
>
>
> 8. It is very complex to come up with a proper CI/CD pipeline for Helix as
> application code is tightly coupled to the framework. Kubernetes has a
> straightforward way to integrate CI/CD pipelines to test and deploy
> microservices
>
>
>
> 9. Kubernetes has a comprehensive role based access model (RBAC) to
> authorize the resources while Helix doesn’t have as such
>
>
>
> [1] https://docs.google.com/document/d/17Hfu-qFFRZHWfLCtf3esXZTc
> Qy3jPKF7OfetTWJntpY/edit?usp=sharing
>
>
>
> Thanks
>
> Dimuthu
>
>
>
> On Mon, Oct 9, 2017 at 9:43 AM, Supun Nakandala <su...@gmail.com>
> wrote:
>
> +1 for the idea.
>
>
>
> On Sun, Oct 8, 2017 at 2:52 AM, DImuthu Upeksha <
> dimuthu.upeksha2@gmail.com> wrote:
>
> Hi Supun,
>
>
>
> My belief also letting orchestrator to determine the worker to run
> particular job is complex to implement and will make the maintainability of
> orchestrator code quite hard in long run. I'm also in partially agreement
> with embedding a worker inside the firewall protected resource but I guess
> we can improve it further to make homogenous and stateless. Have a look
> at following figure
>
>
>
>
> In above design we keep all the workers outside and keep a daemon inside
> the protected resource to securely communicate with workers. Then the
> problem is how do we make the worker homogenous as this is still just
> adding another layer to the solution stated above. Trick is, we decouple
> the communication between worker and resource. Communication to any
> resource is being done through a well defined API. Speaking in java
>
>
>
> public interface CommunicationInterface {
>
>       public String sshToResource(String resourceIp, String command);
>
>       public void transferDataTo(String resourceIp, String target,
> InputStream in);
>
>       public void transferDataFrom(String resourceIp, String target,
> OutputStream out);
>
> }
>
> 
>
> Implementation of this API might change according to the resource. We keep
> a separate Catalog that will cater the libraries that have the
> implementation specific to each resource. For example, if Worker 1 needs to
> talk to Resource 1 which acts behind a firewall and the Airavata
> communication agent is placed inside, it will query the Catalog for the
> Resource 1 and fetch the library that implemented CommunicationInterface
> to talk securely with Airavata Agent. If it wants to talk to Resource 2,
> another library will be fetched from Catalog that has default
> implementations. Once those SDKs are fetched, they are loaded into the JVM
> at runtime using a class loader and communication will be done afterwards.
>
>
>
> We can improve this by caching libraries inside workers and reusing them
> as much as possible to limit number of queries to Catalog from workers.
>
>
>
> Advantage of this is, we can add resources with different security levels
> without changing the Worker implementations. Only thing we have to do is to
> come up with an agent and a library to talk with agent. Then add them to
> Catalog and rest will be taken cared by the framework. This model is
> analogous to the sql drivers that we use in java to connect to databases.
>
>
>
> Please note that I came up with this design based on the limited knowledge
> I have in Airavata Workers and Resources. There will be lot of corner cases
> that I have not identified. Your views and ideas are highly appreciated.
>
>
>
> Thanks
>
> Dimuthu
>
>
>
> On Sun, Oct 8, 2017 at 10:51 AM, Supun Nakandala <
> supun.nakandala@gmail.com> wrote:
>
> Hi Dimuthu,
>
>
>
> Thank you for the very good summary. I think you have covered almost all
> the things.
>
>
>
> I would also like to mention one other futuristic requirements that I
> think will be important in this discussion.
>
>
>
> In my opinion going forward, Airavata will get the requirement of working
> with firewall protected resources. In such cases, workers which are
> residing outside will not be able to communicate with the protected
> resources. What we initially thought was to deploy a special type of worker
> which will be placed inside the firewall-protected network and will
> coordinate with Airavata orchestrator to execute actions. One such tool
> which is used by ServiceNow in enterprise settings is the MidServer (
> http://wiki.servicenow.com/index.php?title=MID_Server#gsc.tab=0). The
> downside of this approach is that it breaks our assumption of all workers
> being homogenous and therefore require orchestrator to be worker aware.
> Perhaps, instead of workers picking work we can design such that
> orchestrator will grant work to the corresponding work. But this
> incorporates a lot of complexity on the orchestrator's side.
>
>
>
>
>
>
>
> On Oct 5, 2017 10:47 AM, "DImuthu Upeksha" <di...@gmail.com>
> wrote:
>
> Hi Gaurav,
>
>
>
> Thanks a lot for the detailed description about DC/OS and how it can be
> utilized in Airavata. Seems like it is an interesting project and I'll add
> it to the technology list that are to be evaluated.
>
>
>
> When selecting a technology, in addition to the features it provides, we
> might have to take some non-functional features like the community
> participation (committers, commits and forks), number of customers  who
> are  running it  in production environments, maturity of the project and
> the complexity it brings in to the total system into the consideration. So
> I'll first try to go through the resources (documentation and source) and
> try to grab concepts of DC/OS and hopefully I can work with you to dig
> deeper to understand more about DC/OS
>
>
>
> Thanks
>
> Dimuthu
>
>
>
> On Thu, Oct 5, 2017 at 8:50 PM, Shenoy, Gourav Ganesh <
> goshenoy@indiana.edu> wrote:
>
> Sorry, missed the attachment in my previous email.
>
>
>
> PS: DC/OS is just a recommendation for performing containerized deployment
> and application management for Airavata. I would be happy to consider
> alternative frameworks such as Kubernetes.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *"Shenoy, Gourav Ganesh" <go...@indiana.edu>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Thursday, October 5, 2017 at 11:16 AM
>
>
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Re: Linked Container Services for Apache Airavata Components -
> Phase 1 - Requirement identification
>
>
>
> Hi Dimuthu,
>
>
>
> Very good summary! I am not sure if you have, but DC/OS (DataCenter
> Operating System) is a container orchestration platform based on Apache
> Mesos. The beauty of DC/OS is the ease and simplicity of
> development/deployment; yet being extremely powerful in most of the
> parameters – multi-datacenter, multi-cloud, scalability, high availability,
> fault tolerance, load balancing, and more importantly the community support
> is fantastic.
>
>
>
> DC/OS has an exhaustive service catalog, it’s more like a PAAS for
> containers (not just restricted to containers though) – you can run
> services like Spark, Kafka, RabbitMQ, etc out of the box with a single
> click install. And Apache Mesos as the underlying resource manager makes it
> seamless to deploy applications across different datacenters. There is a
> concept of SERVICE vs JOB – service is considered long running and DC/OS
> will make sure it keeps it running (if a service fails, it spins up a new
> one), whereas jobs are one time executors. This comes handy for using DC/OS
> as a target runtime for Airavata.
>
>
>
> We used DC/OS for our class project to run the distributed task execution
> prototype we built (which uses RabbitMQ messaging). Here’s a link to the
> blog I have explaining the process: https://gouravshenoy.github.io
> /apache-airavata/spring17/2017/04/20/final-report.html . I have also
> attached a PDF paper we wrote as part of the class explaining the task
> execution process and *one solution* using rabbitmq messaging.
>
>
>
> I had also started with the work of containerizing Airavata and a unified
> build + deployment mechanism with CI CD on DC/OS. Unfortunately, I couldn’t
> complete it due to time constraints, but I would be more than happy to work
> with you on this. Let me know and we can coordinate.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *DImuthu Upeksha <di...@gmail.com>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Thursday, October 5, 2017 at 9:52 AM
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Re: Linked Container Services for Apache Airavata Components -
> Phase 1 - Requirement identification
>
>
>
> Hi Marlon,
>
>
>
> Thanks for the input. I got your idea of availability mode and will keep
> in mind while designing the PoC. CI/CD is the one I have missed and thanks
> for pointing it out.
>
>
>
> Thanks
>
> Dimuthu
>
>
>
> On Thu, Oct 5, 2017 at 7:04 PM, Pierce, Marlon <ma...@iu.edu> wrote:
>
> Thanks, Dimuthu, this is a good summary. Others may comment about Kafka,
> stateful versus stateless parts of Airavata, etc.  You may also find some
> of this discussion on the mailing list archives.
>
>
>
> Active-active vs. active-passive is a good question, and we have typically
> thought of this in terms of individual Airavata components rather than the
> whole system.  Some components can be active-active (like a stateless
> application manager), while others (like the orchestrator example you give
> below) are stafefull and may be better as active-passive.
>
>
>
> There is also the issue of system updates and continuous deployments,
> which could be added to your list.
>
>
>
> Marlon
>
>
>
>
>
> *From: *"dimuthu.upeksha2@gmail.com" <di...@gmail.com>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Thursday, October 5, 2017 at 2:40 AM
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Linked Container Services for Apache Airavata Components -
> Phase 1 - Requirement identification
>
>
>
> Hi All,
>
>
>
> Within last few days, I have been going through the requirements and
> design of current setup of Airavata and I identified following ares as the
> key focusing areas in the technology evaluation phase
>
>
>
> Micorservices deployment platform (container management system)
>
>
>
> Possible candidates: Google Kubernetes, Apache Mesos, Apache Helix
>
> As the most of the operational units of Airavata is supposed to be moving
> into microservices based deployment pattern, having a unified deployment
> platform to manage those microservices will make the DevOps operations
> easier and faster. From the other hand, although writing and maintaining a
> single micro service is a somewhat straightforward way, making multiple
> microservies running, monitoring and maintaining the lifecycles manually in
> a production environment is an tiresome and complex operation to perform.
> Using such a deployment platform, we can easily automate lots of pain
> points that I have mentioned earlier.
>
>
>
> Scalability
>
>
>
> We need a solution that can easily scalable depending on the load
> condition of several parts of the system. For example, the workers in the
> post processing pipeline should be able scaled up and down depending on the
> events come into the message queue.
>
>
>
> Availability
>
>
>
> We need to support solution to be deployed in multiple geographically
> distant data centers. When evaluating container management systems, we
> should consider this is as a primary requirement. However one thing that I
> am not sure is the availability mode that Airavata normally expect. Is it a
> active-active mode or active-passive mode?
>
>
>
> Service discovery
>
>
>
> Once we move in to microservice based deployment pattern, there could be
> scenarios where we want service discovery for several use cases. For
> example, if we are going to scale up API Server to handle an increased
> load, we might have to put a load balancer in between the client and API
> Server instances. In that case, service discovery is essential to instruct
> the load balancer with healthy API Server endpoints which are currently
> running in the system.
>
>
>
> Cluster coordination
>
>
>
> Although micorservices are supposed to be stateless in most of the cases,
> we might have scenarios to feed some state to particular micorservices. For
> example if we are going to implement a microservice that perform
> Orchestrator's role, there could be issues if we keep multiple instances of
> it in several data centers to increase the availability. According to my
> understanding, there should be only one Orchestrator being running at a
> time as it is the one who takes decisions of the job execution process. So,
> if we are going to keep multiple instances of it running in the system,
> there should be an some sort of a leader election in between Orchestrator
> quorum.
>
>
>
> Common messaging medium in between mocroservices
>
>
>
> This might be out of the scope but I thought of sharing with the team to
> have an general idea. Idea was raised at the hip chat discussion with
> Marlon and Gaourav. Using a common messaging medium might enable
> microservices to communicate with in a decoupled manner which will increase
> the scalability of the system. For example there is a reference
> architecture that we can utilize with kafka based messaging medium [1],
> [2]. However I noticed in one paper that Kafka was previously rejected as
> writing clients was onerous. Please share your views on this as I'm not
> familiar with the existing fan out model based on AMQP and  pain points of
> it.
>
>
>
> Those are the main areas that I have understood while going through
> Airavata current implementation and requirements stated in some of the
> research papers. Please let me know whether my understanding on above items
> are correct and suggestions are always welcome :)
>
>
>
> [1] https://medium.com/@ulymarins/an-introduction-to-apache-
> kafka-and-microservices-communication-bf0a0966d63
>
> [2] https://www.slideshare.net/ConfluentInc/microservices-in
> -the-apache-kafka-ecosystem
>
>
>
> References
>
>
>
> Marru, S., Gunathilake, L., Herath, C., Tangchaisin, P., Pierce, M.,
> Mattmann, C., Singh, R., Gunarathne, T., Chinthaka, E., Gardler, R. and
> Slominski, A., 2011, November. Apache airavata: a framework for distributed
> applications and computational workflows. In Proceedings of the 2011 ACM
> workshop on Gateway computing environments (pp. 21-28). ACM.
>
>
>
> Nakandala, S., Pamidighantam, S., Yodage, S., Doshi, N., Abeysinghe, E.,
> Kankanamalage, C.P., Marru, S. and Pierce, M., 2016, July. Anatomy of the
> SEAGrid Science Gateway. In Proceedings of the XSEDE16 Conference on
> Diversity, Big Data, and Science at Scale (p. 40). ACM.
>
>
>
> Pierce, Marlon E., Suresh Marru, Lahiru Gunathilake, Don Kushan Wijeratne,
> Raminder Singh, Chathuri Wimalasena, Shameera Ratnayaka, and Sudhakar
> Pamidighantam. "Apache Airavata: design and directions of a science gateway
> framework." Concurrency and Computation: Practice and Experience 27, no. 16
> (2015): 4282-4291.
>
>
>
> Pierce, Marlon, Suresh Marru, Borries Demeler, Raminderjeet Singh, and
> Gary Gorbet. "The apache airavata application programming interface:
> overview and evaluation with the UltraScan science gateway." In Proceedings
> of the 9th Gateway Computing Environments Workshop, pp. 25-29. IEEE Press,
> 2014.
>
>
>
> Marru, Suresh, Marlon Pierce, Sudhakar Pamidighantam, and Chathuri
> Wimalasena. "Apache Airavata as a laboratory: architecture and case study
> for component- based gateway middleware." In Proceedings of the 1st
> Workshop on The Science of Cyberinfrastructure: Research, Experience,
> Applications and Models, pp. 19-26. ACM, 2015.
>
>
>
> Thanks
>
> Dimuthu
>
>
>
>
>
>
>
>
>
>
>

Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

Posted by "Shenoy, Gourav Ganesh" <go...@indiana.edu>.

Hi Dimuthu,

Sorry for catching up late on your emails and thanks again for summarizing your findings. I believe this is really helpful for anyone who wishes to understand the intricacies involved in re-designing the architecture and at the same time understanding the technologies. I have a few comments though; please find my feedback inline in blue.

Thanks and Regards,
Gourav Shenoy

From: DImuthu Upeksha <di...@gmail.com>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Monday, October 16, 2017 at 1:48 AM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

Hi All,

Thanks for all the valuable feedback that you have provided so far and please find attached document that contains the evaluation of Kubernetes, DC/OS and Helix as candidate platforms to deploy Airavata mircoservices. Use the google doc [1] to provide your suggestions and comments.

Summary of the document:

Considering all the facts, I believe that Kuberentes is more suitable for our use cases.

Advantages of Kubernetes over DC/OS

1. DC/OS uses Marathon framework to perform container orchestration. Marathon framework should be deployed on Mesos framework. So, from the architectural point of view, there are two framework levels which one installed on top of another. Reason for this is because Mesos is a generic framework to deploy any application and Marathon is the one which adds the value by providing container orchestration. Although this is a good design from design point of view, it requires more resources and involves a lot of complexities when it comes to an production deployment. From the other hand, Kubernetes has built from the scratch to support container orchestration so it can do the same work which DC/OS performs, with less resources and less complexity.

DC/OS is an operating system which uses Mesos as the underlying resource manager (to add abstraction if your infrastructure spans multiple clouds OR if you have an on-premise infra), whereas Marathon is “generally” the scheduler used by DC/OS to orchestrate applications/services as containers over the infrastructure managed by Mesos. I say “generally”, because DC/OS also allows you to run one-off jobs (services are long running, jobs are one-time run), and uses Aurora as the scheduler for it. The reason I am reiterating this is because of the following:

  1.  You mentioned DC/OS using more resources, which I believe might not be accurate (I might be wrong). DC/OS has evolved out of an ecosystem which comprises of resource management and application management.
  2.  The complexity involved in getting a working “production grade” Kubernetes cluster setup is far more complicated than getting DC/OS bootstrapped.
  3.  It really doesn’t matter “what scheduler” we use to orchestrate the containers, because in the end the battle is between Kubernetes and Marathon, and not Kubernetes vs DC/OS. In fact, DC/OS (as an operating system) supports the use of multiple orchestration schedulers (like running multiple browsers over a Mac/Windows machine) – so as a devops engineer, DC/OS allows you to decide whether to use Marathon or Kubernetes. You will be pleasantly surprised to know that Kubernetes is a first-class package on DC/OS.

2. Kubernetes has fewer components compared to DC/OS and comparatively lighter than DC/OS framework deployment. Lesser components makes it easy to maintain and monitor the framework.

As I mentioned in my comments above, I do not think this might be accurate. Yes, maintainability is a very important aspect but both these frameworks are proven to be easily maintainable.

3. When it comes to high availability deployments, DC/OS has more components (Mesos masters, Marathon masters) to make available than Kubernetes. This makes the production deployment and management process complex and tiresome.

Same as points (1, 2) above, the real battle is between Kubernetes and Marathon. Mesos is just the resource layer which allows you to easily manage applications across heterogeneous/hybrid infrastructure.

4. Having the capability of deploying non container based applications in the platform is not one of our requirements. So that feature of DC/OS will rarely be beneficial for us.

Partially agreed. Yes, the discussion is towards containerizing Airavata components. But knowing that DC/OS also supports running distributed services is beneficial.

5. Kubernetes has a huge community and a lot of exposure to opensource arena throughout the inception. DC/OS is mainly managed by Mesosphere even though it has been made opensource in 2016. Most of the feature designs and issue discussions are well documented in Kubernetes github repository which makes it really easy to track and solve when come to an issue of the framework, where as in DC/OS that ecosystem is still at the very early age.

I agree. Kubernetes has a larger adoption and good community support.

6. There are lot of vendors working on Kubernetes and currently there are significant amount of tools developed around Kubernetes to deploy, monitor and manage Kubernetes clusters whereas in DC/OS we can not see that amount of traction.

Yes, this is a very good point. Just deploying Airavata over DC/OS or Kubernetes is not the goal. Eventually having a streamlined CI/CD process is extremely essential. Yes, as Kubernetes has a wider adoption there are a lot of tools available for CI/CD over Kubernetes. Although I haven’t followed up or tried it myself, but looks like DC/OS has fairly good support for elastic CI/CD pipelines. Mesosphere ecosystem boasts about it here.<https://mesosphere.com/solutions/developer/>

NOTE: I am not advocating DC/OS over Kubernetes. I just wanted to clarify some of the subtle differences between the two. Most people confuse between them as being competing technologies (me included), but this blog<https://mesosphere.com/blog/kubernetes-and-the-dcos/> throws some light over this topic.

Advantages of Kubernetes over Helix

While the overall differences between Kubernetes and Helix makes total sense, however since we are considering only the “deployment” aspect here (containers being the protagonist in our story), Helix is completely out of the equation. We are only considering Helix as the distributed task execution framework for managing workloads “within” Airavata. The idea was to leverage Helix’s task execution APIs to define custom task executors, and orchestrate the DAGs defined using Helix nomenclature.

Although Helix provides cluster management capabilities (and honestly that was the sole purpose behind the team at LinkedIn building Helix), we are not interested in using Helix for managing Airavata microservices. Rather, we need to identify the best way to build our micro-services around Helix’s task execution framework. I am not saying this is the ideal way to solve our problem, but certainly is one of the powerful candidates out there.

1. Containers (docker) are the proper way to deploy microservice due to its platform agnostic packaging model, resource limitation capability and ease of distributability.

2. Helix doesn’t support container deployment out of the box where as Kubernetes itself is a container orchestration framework

3. Helix has lesser components compared to Kubernetes however considering the features Kubernetes provides over Helix is justifiable for having more components

4. In Helix, microservice application logic is tightly coupled with the Helix participant code (more precisely to the State Model). This provides several issues in a production deployments

  *   Update process become very complex as we have to restart participant nodes for each update. And it will affect other services as well which is not acceptable under any circumstance.
  *   We can not limit resources for each microservice as all are run inside the same JVM of Participant node.

5. Kubernetes in contrast have clearly defined boundaries between application logic and the runtime framework. Application logic is bundled as a docker image and they are run as separate processes which makes the update process and resource limitation very easy

6. Kubernetes comes with the service discovery and load balancing out of the box whereas Helix doesn’t provide such features by default.

7. Kubernetes has a well defined and scalable node affinity API but in Helix we have to write custom Re-Balancers to achieve it and it is not scalable either.

8. It is very complex to come up with a proper CI/CD pipeline for Helix as application code is tightly coupled to the framework. Kubernetes has a straightforward way to integrate CI/CD pipelines to test and deploy microservices

9. Kubernetes has a comprehensive role based access model (RBAC) to authorize the resources while Helix doesn’t have as such

[1] https://docs.google.com/document/d/17Hfu-qFFRZHWfLCtf3esXZTcQy3jPKF7OfetTWJntpY/edit?usp=sharing

Thanks
Dimuthu

On Mon, Oct 9, 2017 at 9:43 AM, Supun Nakandala <su...@gmail.com>> wrote:
+1 for the idea.

On Sun, Oct 8, 2017 at 2:52 AM, DImuthu Upeksha <di...@gmail.com>> wrote:
Hi Supun,

My belief also letting orchestrator to determine the worker to run particular job is complex to implement and will make the maintainability of orchestrator code quite hard in long run. I'm also in partially agreement with embedding a worker inside the firewall protected resource but I guess we can improve it further to make homogenous and stateless. Have a look at following figure

[cid:image001.png@01D3536A.CB2BCB00]
In above design we keep all the workers outside and keep a daemon inside the protected resource to securely communicate with workers. Then the problem is how do we make the worker homogenous as this is still just adding another layer to the solution stated above. Trick is, we decouple the communication between worker and resource. Communication to any resource is being done through a well defined API. Speaking in java

public interface CommunicationInterface {
      public String sshToResource(String resourceIp, String command);
      public void transferDataTo(String resourceIp, String target, InputStream in);
      public void transferDataFrom(String resourceIp, String target, OutputStream out);
}

Implementation of this API might change according to the resource. We keep a separate Catalog that will cater the libraries that have the implementation specific to each resource. For example, if Worker 1 needs to talk to Resource 1 which acts behind a firewall and the Airavata communication agent is placed inside, it will query the Catalog for the Resource 1 and fetch the library that implemented CommunicationInterface to talk securely with Airavata Agent. If it wants to talk to Resource 2, another library will be fetched from Catalog that has default implementations. Once those SDKs are fetched, they are loaded into the JVM at runtime using a class loader and communication will be done afterwards.

We can improve this by caching libraries inside workers and reusing them as much as possible to limit number of queries to Catalog from workers.

Advantage of this is, we can add resources with different security levels without changing the Worker implementations. Only thing we have to do is to come up with an agent and a library to talk with agent. Then add them to Catalog and rest will be taken cared by the framework. This model is analogous to the sql drivers that we use in java to connect to databases.

Please note that I came up with this design based on the limited knowledge I have in Airavata Workers and Resources. There will be lot of corner cases that I have not identified. Your views and ideas are highly appreciated.

Thanks
Dimuthu

On Sun, Oct 8, 2017 at 10:51 AM, Supun Nakandala <su...@gmail.com>> wrote:
Hi Dimuthu,

Thank you for the very good summary. I think you have covered almost all the things.

I would also like to mention one other futuristic requirements that I think will be important in this discussion.

In my opinion going forward, Airavata will get the requirement of working with firewall protected resources. In such cases, workers which are residing outside will not be able to communicate with the protected resources. What we initially thought was to deploy a special type of worker which will be placed inside the firewall-protected network and will coordinate with Airavata orchestrator to execute actions. One such tool which is used by ServiceNow in enterprise settings is the MidServer (http://wiki.servicenow.com/index.php?title=MID_Server#gsc.tab=0). The downside of this approach is that it breaks our assumption of all workers being homogenous and therefore require orchestrator to be worker aware. Perhaps, instead of workers picking work we can design such that orchestrator will grant work to the corresponding work. But this incorporates a lot of complexity on the orchestrator's side.

On Oct 5, 2017 10:47 AM, "DImuthu Upeksha" <di...@gmail.com>> wrote:
Hi Gaurav,

Thanks a lot for the detailed description about DC/OS and how it can be utilized in Airavata. Seems like it is an interesting project and I'll add it to the technology list that are to be evaluated.

When selecting a technology, in addition to the features it provides, we might have to take some non-functional features like the community participation (committers, commits and forks), number of customers  who are  running it  in production environments, maturity of the project and the complexity it brings in to the total system into the consideration. So I'll first try to go through the resources (documentation and source) and try to grab concepts of DC/OS and hopefully I can work with you to dig deeper to understand more about DC/OS

Thanks
Dimuthu

On Thu, Oct 5, 2017 at 8:50 PM, Shenoy, Gourav Ganesh <go...@indiana.edu>> wrote:
Sorry, missed the attachment in my previous email.

PS: DC/OS is just a recommendation for performing containerized deployment and application management for Airavata. I would be happy to consider alternative frameworks such as Kubernetes.

Thanks and Regards,
Gourav Shenoy

From: "Shenoy, Gourav Ganesh" <go...@indiana.edu>>
Reply-To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Date: Thursday, October 5, 2017 at 11:16 AM

To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Subject: Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

Hi Dimuthu,

Very good summary! I am not sure if you have, but DC/OS (DataCenter Operating System) is a container orchestration platform based on Apache Mesos. The beauty of DC/OS is the ease and simplicity of development/deployment; yet being extremely powerful in most of the parameters – multi-datacenter, multi-cloud, scalability, high availability, fault tolerance, load balancing, and more importantly the community support is fantastic.

DC/OS has an exhaustive service catalog, it’s more like a PAAS for containers (not just restricted to containers though) – you can run services like Spark, Kafka, RabbitMQ, etc out of the box with a single click install. And Apache Mesos as the underlying resource manager makes it seamless to deploy applications across different datacenters. There is a concept of SERVICE vs JOB – service is considered long running and DC/OS will make sure it keeps it running (if a service fails, it spins up a new one), whereas jobs are one time executors. This comes handy for using DC/OS as a target runtime for Airavata.

We used DC/OS for our class project to run the distributed task execution prototype we built (which uses RabbitMQ messaging). Here’s a link to the blog I have explaining the process: https://gouravshenoy.github.io/apache-airavata/spring17/2017/04/20/final-report.html . I have also attached a PDF paper we wrote as part of the class explaining the task execution process and one solution using rabbitmq messaging.

I had also started with the work of containerizing Airavata and a unified build + deployment mechanism with CI CD on DC/OS. Unfortunately, I couldn’t complete it due to time constraints, but I would be more than happy to work with you on this. Let me know and we can coordinate.

Thanks and Regards,
Gourav Shenoy

From: DImuthu Upeksha <di...@gmail.com>>
Reply-To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Date: Thursday, October 5, 2017 at 9:52 AM
To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Subject: Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

Hi Marlon,

Thanks for the input. I got your idea of availability mode and will keep in mind while designing the PoC. CI/CD is the one I have missed and thanks for pointing it out.

Thanks
Dimuthu

On Thu, Oct 5, 2017 at 7:04 PM, Pierce, Marlon <ma...@iu.edu>> wrote:
Thanks, Dimuthu, this is a good summary. Others may comment about Kafka, stateful versus stateless parts of Airavata, etc.  You may also find some of this discussion on the mailing list archives.

Active-active vs. active-passive is a good question, and we have typically thought of this in terms of individual Airavata components rather than the whole system.  Some components can be active-active (like a stateless application manager), while others (like the orchestrator example you give below) are stafefull and may be better as active-passive.

There is also the issue of system updates and continuous deployments, which could be added to your list.

Marlon

From: "dimuthu.upeksha2@gmail.com<ma...@gmail.com>" <di...@gmail.com>>
Reply-To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Date: Thursday, October 5, 2017 at 2:40 AM
To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Subject: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

Hi All,

Within last few days, I have been going through the requirements and design of current setup of Airavata and I identified following ares as the key focusing areas in the technology evaluation phase

Micorservices deployment platform (container management system)

Possible candidates: Google Kubernetes, Apache Mesos, Apache Helix
As the most of the operational units of Airavata is supposed to be moving into microservices based deployment pattern, having a unified deployment platform to manage those microservices will make the DevOps operations easier and faster. From the other hand, although writing and maintaining a single micro service is a somewhat straightforward way, making multiple microservies running, monitoring and maintaining the lifecycles manually in a production environment is an tiresome and complex operation to perform. Using such a deployment platform, we can easily automate lots of pain points that I have mentioned earlier.

Scalability

We need a solution that can easily scalable depending on the load condition of several parts of the system. For example, the workers in the post processing pipeline should be able scaled up and down depending on the events come into the message queue.

Availability

We need to support solution to be deployed in multiple geographically distant data centers. When evaluating container management systems, we should consider this is as a primary requirement. However one thing that I am not sure is the availability mode that Airavata normally expect. Is it a active-active mode or active-passive mode?

Service discovery

Once we move in to microservice based deployment pattern, there could be scenarios where we want service discovery for several use cases. For example, if we are going to scale up API Server to handle an increased load, we might have to put a load balancer in between the client and API Server instances. In that case, service discovery is essential to instruct the load balancer with healthy API Server endpoints which are currently running in the system.

Cluster coordination

Although micorservices are supposed to be stateless in most of the cases, we might have scenarios to feed some state to particular micorservices. For example if we are going to implement a microservice that perform Orchestrator's role, there could be issues if we keep multiple instances of it in several data centers to increase the availability. According to my understanding, there should be only one Orchestrator being running at a time as it is the one who takes decisions of the job execution process. So, if we are going to keep multiple instances of it running in the system, there should be an some sort of a leader election in between Orchestrator quorum.

Common messaging medium in between mocroservices

This might be out of the scope but I thought of sharing with the team to have an general idea. Idea was raised at the hip chat discussion with Marlon and Gaourav. Using a common messaging medium might enable microservices to communicate with in a decoupled manner which will increase the scalability of the system. For example there is a reference architecture that we can utilize with kafka based messaging medium [1], [2]. However I noticed in one paper that Kafka was previously rejected as writing clients was onerous. Please share your views on this as I'm not familiar with the existing fan out model based on AMQP and  pain points of it.

Those are the main areas that I have understood while going through Airavata current implementation and requirements stated in some of the research papers. Please let me know whether my understanding on above items are correct and suggestions are always welcome :)

[1] https://medium.com/@ulymarins/an-introduction-to-apache-kafka-and-microservices-communication-bf0a0966d63
[2] https://www.slideshare.net/ConfluentInc/microservices-in-the-apache-kafka-ecosystem

References

Marru, S., Gunathilake, L., Herath, C., Tangchaisin, P., Pierce, M., Mattmann, C., Singh, R., Gunarathne, T., Chinthaka, E., Gardler, R. and Slominski, A., 2011, November. Apache airavata: a framework for distributed applications and computational workflows. In Proceedings of the 2011 ACM workshop on Gateway computing environments (pp. 21-28). ACM.

Nakandala, S., Pamidighantam, S., Yodage, S., Doshi, N., Abeysinghe, E., Kankanamalage, C.P., Marru, S. and Pierce, M., 2016, July. Anatomy of the SEAGrid Science Gateway. In Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale (p. 40). ACM.

Pierce, Marlon E., Suresh Marru, Lahiru Gunathilake, Don Kushan Wijeratne, Raminder Singh, Chathuri Wimalasena, Shameera Ratnayaka, and Sudhakar Pamidighantam. "Apache Airavata: design and directions of a science gateway framework." Concurrency and Computation: Practice and Experience 27, no. 16 (2015): 4282-4291.

Pierce, Marlon, Suresh Marru, Borries Demeler, Raminderjeet Singh, and Gary Gorbet. "The apache airavata application programming interface: overview and evaluation with the UltraScan science gateway." In Proceedings of the 9th Gateway Computing Environments Workshop, pp. 25-29. IEEE Press, 2014.

Marru, Suresh, Marlon Pierce, Sudhakar Pamidighantam, and Chathuri Wimalasena. "Apache Airavata as a laboratory: architecture and case study for component- based gateway middleware." In Proceedings of the 1st Workshop on The Science of Cyberinfrastructure: Research, Experience, Applications and Models, pp. 19-26. ACM, 2015.

Thanks
Dimuthu

Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

Posted by DImuthu Upeksha <di...@gmail.com>.

Hi All,

Thanks for all the valuable feedback that you have provided so far and
please find attached document that contains the evaluation of Kubernetes,
DC/OS and Helix as candidate platforms to deploy Airavata mircoservices.
Use the google doc [1] to provide your suggestions and comments.

Summary of the document:

*Considering all the facts, I believe that Kuberentes is more suitable for
our use cases.*

*Advantages of Kubernetes over DC/OS*

1. DC/OS uses Marathon framework to perform container orchestration.
Marathon framework should be deployed on Mesos framework. So, from the
architectural point of view, there are two framework levels which one
installed on top of another. Reason for this is because Mesos is a generic
framework to deploy any application and Marathon is the one which adds the
value by providing container orchestration. Although this is a good design
from design point of view, it requires more resources and involves a lot of
complexities when it comes to an production deployment. From the other
hand, Kubernetes has built from the scratch to support container
orchestration so it can do the same work which DC/OS performs, with less
resources and less complexity.

2. Kubernetes has fewer components compared to DC/OS and comparatively
lighter than DC/OS framework deployment. Lesser components makes it easy to
maintain and monitor the framework.

3. When it comes to high availability deployments, DC/OS has more
components (Mesos masters, Marathon masters) to make available than
Kubernetes. This makes the production deployment and management process
complex and tiresome.

4. Having the capability of deploying non container based applications in
the platform is not one of our requirements. So that feature of DC/OS will
rarely be beneficial for us.

5. Kubernetes has a huge community and a lot of exposure to opensource
arena throughout the inception. DC/OS is mainly managed by Mesosphere even
though it has been made opensource in 2016. Most of the feature designs and
issue discussions are well documented in Kubernetes github repository which
makes it really easy to track and solve when come to an issue of the
framework, where as in DC/OS that ecosystem is still at the very early age.

6. There are lot of vendors working on Kubernetes and currently there are
significant amount of tools developed around Kubernetes to deploy, monitor
and manage Kubernetes clusters whereas in DC/OS we can not see that amount
of traction.

*Advantages of Kubernetes over Helix*

1. Containers (docker) are the proper way to deploy microservice due to its
platform agnostic packaging model, resource limitation capability and ease
of distributability.

2. Helix doesn’t support container deployment out of the box where as
Kubernetes itself is a container orchestration framework

3. Helix has lesser components compared to Kubernetes however considering
the features Kubernetes provides over Helix is justifiable for having more
components

4. In Helix, microservice application logic is tightly coupled with the
Helix participant code (more precisely to the State Model). This provides
several issues in a production deployments

   - Update process become very complex as we have to restart participant
   nodes for each update. And it will affect other services as well which is
   not acceptable under any circumstance.
   - We can not limit resources for each microservice as all are run inside
   the same JVM of Participant node.


5. Kubernetes in contrast have clearly defined boundaries between
application logic and the runtime framework. Application logic is bundled
as a docker image and they are run as separate processes which makes the
update process and resource limitation very easy

6. Kubernetes comes with the service discovery and load balancing out of
the box whereas Helix doesn’t provide such features by default.

7. Kubernetes has a well defined and scalable node affinity API but in
Helix we have to write custom Re-Balancers to achieve it and it is not
scalable either.

8. It is very complex to come up with a proper CI/CD pipeline for Helix as
application code is tightly coupled to the framework. Kubernetes has a
straightforward way to integrate CI/CD pipelines to test and deploy
microservices

9. Kubernetes has a comprehensive role based access model (RBAC) to
authorize the resources while Helix doesn’t have as such

[1]
https://docs.google.com/document/d/17Hfu-qFFRZHWfLCtf3esXZTcQy3jPKF7OfetTWJntpY/edit?usp=sharing

Thanks
Dimuthu

On Mon, Oct 9, 2017 at 9:43 AM, Supun Nakandala <su...@gmail.com>
wrote:

> +1 for the idea.
>
> On Sun, Oct 8, 2017 at 2:52 AM, DImuthu Upeksha <
> dimuthu.upeksha2@gmail.com> wrote:
>
>> Hi Supun,
>>
>> My belief also letting orchestrator to determine the worker to run
>> particular job is complex to implement and will make the maintainability of
>> orchestrator code quite hard in long run. I'm also in partially agreement
>> with embedding a worker inside the firewall protected resource but I guess
>> we can improve it further to make homogenous and stateless. Have a look
>> at following figure
>>
>>
>> In above design we keep all the workers outside and keep a daemon inside
>> the protected resource to securely communicate with workers. Then the
>> problem is how do we make the worker homogenous as this is still just
>> adding another layer to the solution stated above. Trick is, we decouple
>> the communication between worker and resource. Communication to any
>> resource is being done through a well defined API. Speaking in java
>>
>> public interface CommunicationInterface {
>>       public String sshToResource(String resourceIp, String command);
>>       public void transferDataTo(String resourceIp, String target,
>> InputStream in);
>>       public void transferDataFrom(String resourceIp, String target,
>> OutputStream out);
>> }
>> 
>> Implementation of this API might change according to the resource. We
>> keep a separate Catalog that will cater the libraries that have the
>> implementation specific to each resource. For example, if Worker 1 needs to
>> talk to Resource 1 which acts behind a firewall and the Airavata
>> communication agent is placed inside, it will query the Catalog for the
>> Resource 1 and fetch the library that implemented CommunicationInterface
>> to talk securely with Airavata Agent. If it wants to talk to Resource 2,
>> another library will be fetched from Catalog that has default
>> implementations. Once those SDKs are fetched, they are loaded into the JVM
>> at runtime using a class loader and communication will be done afterwards.
>>
>> We can improve this by caching libraries inside workers and reusing them
>> as much as possible to limit number of queries to Catalog from workers.
>>
>> Advantage of this is, we can add resources with different security levels
>> without changing the Worker implementations. Only thing we have to do is to
>> come up with an agent and a library to talk with agent. Then add them to
>> Catalog and rest will be taken cared by the framework. This model is
>> analogous to the sql drivers that we use in java to connect to databases.
>>
>> Please note that I came up with this design based on the limited
>> knowledge I have in Airavata Workers and Resources. There will be lot of
>> corner cases that I have not identified. Your views and ideas are highly
>> appreciated.
>>
>> Thanks
>> Dimuthu
>>
>> On Sun, Oct 8, 2017 at 10:51 AM, Supun Nakandala <
>> supun.nakandala@gmail.com> wrote:
>>
>>> Hi Dimuthu,
>>>
>>> Thank you for the very good summary. I think you have covered almost all
>>> the things.
>>>
>>> I would also like to mention one other futuristic requirements that I
>>> think will be important in this discussion.
>>>
>>> In my opinion going forward, Airavata will get the requirement of
>>> working with firewall protected resources. In such cases, workers which are
>>> residing outside will not be able to communicate with the protected
>>> resources. What we initially thought was to deploy a special type of worker
>>> which will be placed inside the firewall-protected network and will
>>> coordinate with Airavata orchestrator to execute actions. One such tool
>>> which is used by ServiceNow in enterprise settings is the MidServer (
>>> http://wiki.servicenow.com/index.php?title=MID_Server#gsc.tab=0). The
>>> downside of this approach is that it breaks our assumption of all workers
>>> being homogenous and therefore require orchestrator to be worker aware.
>>> Perhaps, instead of workers picking work we can design such that
>>> orchestrator will grant work to the corresponding work. But this
>>> incorporates a lot of complexity on the orchestrator's side.
>>>
>>>
>>>
>>> On Oct 5, 2017 10:47 AM, "DImuthu Upeksha" <di...@gmail.com>
>>> wrote:
>>>
>>>> Hi Gaurav,
>>>>
>>>> Thanks a lot for the detailed description about DC/OS and how it can be
>>>> utilized in Airavata. Seems like it is an interesting project and I'll add
>>>> it to the technology list that are to be evaluated.
>>>>
>>>> When selecting a technology, in addition to the features it provides,
>>>> we might have to take some non-functional features like the community
>>>> participation (committers, commits and forks), number of customers  who
>>>> are  running it  in production environments, maturity of the project and
>>>> the complexity it brings in to the total system into the consideration. So
>>>> I'll first try to go through the resources (documentation and source) and
>>>> try to grab concepts of DC/OS and hopefully I can work with you to dig
>>>> deeper to understand more about DC/OS
>>>>
>>>> Thanks
>>>> Dimuthu
>>>>
>>>> On Thu, Oct 5, 2017 at 8:50 PM, Shenoy, Gourav Ganesh <
>>>> goshenoy@indiana.edu> wrote:
>>>>
>>>>> Sorry, missed the attachment in my previous email.
>>>>>
>>>>>
>>>>>
>>>>> PS: DC/OS is just a recommendation for performing containerized
>>>>> deployment and application management for Airavata. I would be happy to
>>>>> consider alternative frameworks such as Kubernetes.
>>>>>
>>>>>
>>>>>
>>>>> Thanks and Regards,
>>>>>
>>>>> Gourav Shenoy
>>>>>
>>>>>
>>>>>
>>>>> *From: *"Shenoy, Gourav Ganesh" <go...@indiana.edu>
>>>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>>> *Date: *Thursday, October 5, 2017 at 11:16 AM
>>>>>
>>>>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>>> *Subject: *Re: Linked Container Services for Apache Airavata
>>>>> Components - Phase 1 - Requirement identification
>>>>>
>>>>>
>>>>>
>>>>> Hi Dimuthu,
>>>>>
>>>>>
>>>>>
>>>>> Very good summary! I am not sure if you have, but DC/OS (DataCenter
>>>>> Operating System) is a container orchestration platform based on Apache
>>>>> Mesos. The beauty of DC/OS is the ease and simplicity of
>>>>> development/deployment; yet being extremely powerful in most of the
>>>>> parameters – multi-datacenter, multi-cloud, scalability, high availability,
>>>>> fault tolerance, load balancing, and more importantly the community support
>>>>> is fantastic.
>>>>>
>>>>>
>>>>>
>>>>> DC/OS has an exhaustive service catalog, it’s more like a PAAS for
>>>>> containers (not just restricted to containers though) – you can run
>>>>> services like Spark, Kafka, RabbitMQ, etc out of the box with a single
>>>>> click install. And Apache Mesos as the underlying resource manager makes it
>>>>> seamless to deploy applications across different datacenters. There is a
>>>>> concept of SERVICE vs JOB – service is considered long running and DC/OS
>>>>> will make sure it keeps it running (if a service fails, it spins up a new
>>>>> one), whereas jobs are one time executors. This comes handy for using DC/OS
>>>>> as a target runtime for Airavata.
>>>>>
>>>>>
>>>>>
>>>>> We used DC/OS for our class project to run the distributed task
>>>>> execution prototype we built (which uses RabbitMQ messaging). Here’s a link
>>>>> to the blog I have explaining the process:
>>>>> https://gouravshenoy.github.io/apache-airavata/spring17/2017
>>>>> /04/20/final-report.html . I have also attached a PDF paper we wrote
>>>>> as part of the class explaining the task execution process and *one
>>>>> solution* using rabbitmq messaging.
>>>>>
>>>>>
>>>>>
>>>>> I had also started with the work of containerizing Airavata and a
>>>>> unified build + deployment mechanism with CI CD on DC/OS. Unfortunately, I
>>>>> couldn’t complete it due to time constraints, but I would be more than
>>>>> happy to work with you on this. Let me know and we can coordinate.
>>>>>
>>>>>
>>>>>
>>>>> Thanks and Regards,
>>>>>
>>>>> Gourav Shenoy
>>>>>
>>>>>
>>>>>
>>>>> *From: *DImuthu Upeksha <di...@gmail.com>
>>>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>>> *Date: *Thursday, October 5, 2017 at 9:52 AM
>>>>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>>> *Subject: *Re: Linked Container Services for Apache Airavata
>>>>> Components - Phase 1 - Requirement identification
>>>>>
>>>>>
>>>>>
>>>>> Hi Marlon,
>>>>>
>>>>>
>>>>>
>>>>> Thanks for the input. I got your idea of availability mode and will
>>>>> keep in mind while designing the PoC. CI/CD is the one I have missed and
>>>>> thanks for pointing it out.
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Dimuthu
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Oct 5, 2017 at 7:04 PM, Pierce, Marlon <ma...@iu.edu>
>>>>> wrote:
>>>>>
>>>>> Thanks, Dimuthu, this is a good summary. Others may comment about
>>>>> Kafka, stateful versus stateless parts of Airavata, etc.  You may also find
>>>>> some of this discussion on the mailing list archives.
>>>>>
>>>>>
>>>>>
>>>>> Active-active vs. active-passive is a good question, and we have
>>>>> typically thought of this in terms of individual Airavata components rather
>>>>> than the whole system.  Some components can be active-active (like a
>>>>> stateless application manager), while others (like the orchestrator example
>>>>> you give below) are stafefull and may be better as active-passive.
>>>>>
>>>>>
>>>>>
>>>>> There is also the issue of system updates and continuous deployments,
>>>>> which could be added to your list.
>>>>>
>>>>>
>>>>>
>>>>> Marlon
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *From: *"dimuthu.upeksha2@gmail.com" <di...@gmail.com>
>>>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>>> *Date: *Thursday, October 5, 2017 at 2:40 AM
>>>>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>>> *Subject: *Linked Container Services for Apache Airavata Components -
>>>>> Phase 1 - Requirement identification
>>>>>
>>>>>
>>>>>
>>>>> Hi All,
>>>>>
>>>>>
>>>>>
>>>>> Within last few days, I have been going through the requirements and
>>>>> design of current setup of Airavata and I identified following ares as the
>>>>> key focusing areas in the technology evaluation phase
>>>>>
>>>>>
>>>>>
>>>>> Micorservices deployment platform (container management system)
>>>>>
>>>>>
>>>>>
>>>>> Possible candidates: Google Kubernetes, Apache Mesos, Apache Helix
>>>>>
>>>>> As the most of the operational units of Airavata is supposed to be
>>>>> moving into microservices based deployment pattern, having a unified
>>>>> deployment platform to manage those microservices will make the DevOps
>>>>> operations easier and faster. From the other hand, although writing and
>>>>> maintaining a single micro service is a somewhat straightforward way,
>>>>> making multiple microservies running, monitoring and maintaining the
>>>>> lifecycles manually in a production environment is an tiresome and complex
>>>>> operation to perform. Using such a deployment platform, we can easily
>>>>> automate lots of pain points that I have mentioned earlier.
>>>>>
>>>>>
>>>>>
>>>>> Scalability
>>>>>
>>>>>
>>>>>
>>>>> We need a solution that can easily scalable depending on the load
>>>>> condition of several parts of the system. For example, the workers in the
>>>>> post processing pipeline should be able scaled up and down depending on the
>>>>> events come into the message queue.
>>>>>
>>>>>
>>>>>
>>>>> Availability
>>>>>
>>>>>
>>>>>
>>>>> We need to support solution to be deployed in multiple geographically
>>>>> distant data centers. When evaluating container management systems, we
>>>>> should consider this is as a primary requirement. However one thing that I
>>>>> am not sure is the availability mode that Airavata normally expect. Is it a
>>>>> active-active mode or active-passive mode?
>>>>>
>>>>>
>>>>>
>>>>> Service discovery
>>>>>
>>>>>
>>>>>
>>>>> Once we move in to microservice based deployment pattern, there could
>>>>> be scenarios where we want service discovery for several use cases. For
>>>>> example, if we are going to scale up API Server to handle an increased
>>>>> load, we might have to put a load balancer in between the client and API
>>>>> Server instances. In that case, service discovery is essential to instruct
>>>>> the load balancer with healthy API Server endpoints which are currently
>>>>> running in the system.
>>>>>
>>>>>
>>>>>
>>>>> Cluster coordination
>>>>>
>>>>>
>>>>>
>>>>> Although micorservices are supposed to be stateless in most of the
>>>>> cases, we might have scenarios to feed some state to particular
>>>>> micorservices. For example if we are going to implement a microservice that
>>>>> perform Orchestrator's role, there could be issues if we keep multiple
>>>>> instances of it in several data centers to increase the availability.
>>>>> According to my understanding, there should be only one Orchestrator being
>>>>> running at a time as it is the one who takes decisions of the job execution
>>>>> process. So, if we are going to keep multiple instances of it running in
>>>>> the system, there should be an some sort of a leader election in between
>>>>> Orchestrator quorum.
>>>>>
>>>>>
>>>>>
>>>>> Common messaging medium in between mocroservices
>>>>>
>>>>>
>>>>>
>>>>> This might be out of the scope but I thought of sharing with the team
>>>>> to have an general idea. Idea was raised at the hip chat discussion with
>>>>> Marlon and Gaourav. Using a common messaging medium might enable
>>>>> microservices to communicate with in a decoupled manner which will increase
>>>>> the scalability of the system. For example there is a reference
>>>>> architecture that we can utilize with kafka based messaging medium [1],
>>>>> [2]. However I noticed in one paper that Kafka was previously rejected as
>>>>> writing clients was onerous. Please share your views on this as I'm not
>>>>> familiar with the existing fan out model based on AMQP and  pain points of
>>>>> it.
>>>>>
>>>>>
>>>>>
>>>>> Those are the main areas that I have understood while going through
>>>>> Airavata current implementation and requirements stated in some of the
>>>>> research papers. Please let me know whether my understanding on above items
>>>>> are correct and suggestions are always welcome :)
>>>>>
>>>>>
>>>>>
>>>>> [1] https://medium.com/@ulymarins/an-introduction-to-apache-
>>>>> kafka-and-microservices-communication-bf0a0966d63
>>>>>
>>>>> [2] https://www.slideshare.net/ConfluentInc/microservices-in
>>>>> -the-apache-kafka-ecosystem
>>>>>
>>>>>
>>>>>
>>>>> References
>>>>>
>>>>>
>>>>>
>>>>> Marru, S., Gunathilake, L., Herath, C., Tangchaisin, P., Pierce, M.,
>>>>> Mattmann, C., Singh, R., Gunarathne, T., Chinthaka, E., Gardler, R. and
>>>>> Slominski, A., 2011, November. Apache airavata: a framework for distributed
>>>>> applications and computational workflows. In Proceedings of the 2011 ACM
>>>>> workshop on Gateway computing environments (pp. 21-28). ACM.
>>>>>
>>>>>
>>>>>
>>>>> Nakandala, S., Pamidighantam, S., Yodage, S., Doshi, N., Abeysinghe,
>>>>> E., Kankanamalage, C.P., Marru, S. and Pierce, M., 2016, July. Anatomy of
>>>>> the SEAGrid Science Gateway. In Proceedings of the XSEDE16 Conference on
>>>>> Diversity, Big Data, and Science at Scale (p. 40). ACM.
>>>>>
>>>>>
>>>>>
>>>>> Pierce, Marlon E., Suresh Marru, Lahiru Gunathilake, Don Kushan
>>>>> Wijeratne, Raminder Singh, Chathuri Wimalasena, Shameera Ratnayaka, and
>>>>> Sudhakar Pamidighantam. "Apache Airavata: design and directions of a
>>>>> science gateway framework." Concurrency and Computation: Practice and
>>>>> Experience 27, no. 16 (2015): 4282-4291.
>>>>>
>>>>>
>>>>>
>>>>> Pierce, Marlon, Suresh Marru, Borries Demeler, Raminderjeet Singh, and
>>>>> Gary Gorbet. "The apache airavata application programming interface:
>>>>> overview and evaluation with the UltraScan science gateway." In Proceedings
>>>>> of the 9th Gateway Computing Environments Workshop, pp. 25-29. IEEE Press,
>>>>> 2014.
>>>>>
>>>>>
>>>>>
>>>>> Marru, Suresh, Marlon Pierce, Sudhakar Pamidighantam, and Chathuri
>>>>> Wimalasena. "Apache Airavata as a laboratory: architecture and case study
>>>>> for component- based gateway middleware." In Proceedings of the 1st
>>>>> Workshop on The Science of Cyberinfrastructure: Research, Experience,
>>>>> Applications and Models, pp. 19-26. ACM, 2015.
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Dimuthu
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>
>

Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

Posted by Supun Nakandala <su...@gmail.com>.

+1 for the idea.

On Sun, Oct 8, 2017 at 2:52 AM, DImuthu Upeksha <di...@gmail.com>
wrote:

> Hi Supun,
>
> My belief also letting orchestrator to determine the worker to run
> particular job is complex to implement and will make the maintainability of
> orchestrator code quite hard in long run. I'm also in partially agreement
> with embedding a worker inside the firewall protected resource but I guess
> we can improve it further to make homogenous and stateless. Have a look
> at following figure
>
>
> In above design we keep all the workers outside and keep a daemon inside
> the protected resource to securely communicate with workers. Then the
> problem is how do we make the worker homogenous as this is still just
> adding another layer to the solution stated above. Trick is, we decouple
> the communication between worker and resource. Communication to any
> resource is being done through a well defined API. Speaking in java
>
> public interface CommunicationInterface {
>       public String sshToResource(String resourceIp, String command);
>       public void transferDataTo(String resourceIp, String target,
> InputStream in);
>       public void transferDataFrom(String resourceIp, String target,
> OutputStream out);
> }
> 
> Implementation of this API might change according to the resource. We keep
> a separate Catalog that will cater the libraries that have the
> implementation specific to each resource. For example, if Worker 1 needs to
> talk to Resource 1 which acts behind a firewall and the Airavata
> communication agent is placed inside, it will query the Catalog for the
> Resource 1 and fetch the library that implemented CommunicationInterface
> to talk securely with Airavata Agent. If it wants to talk to Resource 2,
> another library will be fetched from Catalog that has default
> implementations. Once those SDKs are fetched, they are loaded into the JVM
> at runtime using a class loader and communication will be done afterwards.
>
> We can improve this by caching libraries inside workers and reusing them
> as much as possible to limit number of queries to Catalog from workers.
>
> Advantage of this is, we can add resources with different security levels
> without changing the Worker implementations. Only thing we have to do is to
> come up with an agent and a library to talk with agent. Then add them to
> Catalog and rest will be taken cared by the framework. This model is
> analogous to the sql drivers that we use in java to connect to databases.
>
> Please note that I came up with this design based on the limited knowledge
> I have in Airavata Workers and Resources. There will be lot of corner cases
> that I have not identified. Your views and ideas are highly appreciated.
>
> Thanks
> Dimuthu
>
> On Sun, Oct 8, 2017 at 10:51 AM, Supun Nakandala <
> supun.nakandala@gmail.com> wrote:
>
>> Hi Dimuthu,
>>
>> Thank you for the very good summary. I think you have covered almost all
>> the things.
>>
>> I would also like to mention one other futuristic requirements that I
>> think will be important in this discussion.
>>
>> In my opinion going forward, Airavata will get the requirement of working
>> with firewall protected resources. In such cases, workers which are
>> residing outside will not be able to communicate with the protected
>> resources. What we initially thought was to deploy a special type of worker
>> which will be placed inside the firewall-protected network and will
>> coordinate with Airavata orchestrator to execute actions. One such tool
>> which is used by ServiceNow in enterprise settings is the MidServer (
>> http://wiki.servicenow.com/index.php?title=MID_Server#gsc.tab=0). The
>> downside of this approach is that it breaks our assumption of all workers
>> being homogenous and therefore require orchestrator to be worker aware.
>> Perhaps, instead of workers picking work we can design such that
>> orchestrator will grant work to the corresponding work. But this
>> incorporates a lot of complexity on the orchestrator's side.
>>
>>
>>
>> On Oct 5, 2017 10:47 AM, "DImuthu Upeksha" <di...@gmail.com>
>> wrote:
>>
>>> Hi Gaurav,
>>>
>>> Thanks a lot for the detailed description about DC/OS and how it can be
>>> utilized in Airavata. Seems like it is an interesting project and I'll add
>>> it to the technology list that are to be evaluated.
>>>
>>> When selecting a technology, in addition to the features it provides, we
>>> might have to take some non-functional features like the community
>>> participation (committers, commits and forks), number of customers  who
>>> are  running it  in production environments, maturity of the project and
>>> the complexity it brings in to the total system into the consideration. So
>>> I'll first try to go through the resources (documentation and source) and
>>> try to grab concepts of DC/OS and hopefully I can work with you to dig
>>> deeper to understand more about DC/OS
>>>
>>> Thanks
>>> Dimuthu
>>>
>>> On Thu, Oct 5, 2017 at 8:50 PM, Shenoy, Gourav Ganesh <
>>> goshenoy@indiana.edu> wrote:
>>>
>>>> Sorry, missed the attachment in my previous email.
>>>>
>>>>
>>>>
>>>> PS: DC/OS is just a recommendation for performing containerized
>>>> deployment and application management for Airavata. I would be happy to
>>>> consider alternative frameworks such as Kubernetes.
>>>>
>>>>
>>>>
>>>> Thanks and Regards,
>>>>
>>>> Gourav Shenoy
>>>>
>>>>
>>>>
>>>> *From: *"Shenoy, Gourav Ganesh" <go...@indiana.edu>
>>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>> *Date: *Thursday, October 5, 2017 at 11:16 AM
>>>>
>>>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>> *Subject: *Re: Linked Container Services for Apache Airavata
>>>> Components - Phase 1 - Requirement identification
>>>>
>>>>
>>>>
>>>> Hi Dimuthu,
>>>>
>>>>
>>>>
>>>> Very good summary! I am not sure if you have, but DC/OS (DataCenter
>>>> Operating System) is a container orchestration platform based on Apache
>>>> Mesos. The beauty of DC/OS is the ease and simplicity of
>>>> development/deployment; yet being extremely powerful in most of the
>>>> parameters – multi-datacenter, multi-cloud, scalability, high availability,
>>>> fault tolerance, load balancing, and more importantly the community support
>>>> is fantastic.
>>>>
>>>>
>>>>
>>>> DC/OS has an exhaustive service catalog, it’s more like a PAAS for
>>>> containers (not just restricted to containers though) – you can run
>>>> services like Spark, Kafka, RabbitMQ, etc out of the box with a single
>>>> click install. And Apache Mesos as the underlying resource manager makes it
>>>> seamless to deploy applications across different datacenters. There is a
>>>> concept of SERVICE vs JOB – service is considered long running and DC/OS
>>>> will make sure it keeps it running (if a service fails, it spins up a new
>>>> one), whereas jobs are one time executors. This comes handy for using DC/OS
>>>> as a target runtime for Airavata.
>>>>
>>>>
>>>>
>>>> We used DC/OS for our class project to run the distributed task
>>>> execution prototype we built (which uses RabbitMQ messaging). Here’s a link
>>>> to the blog I have explaining the process:
>>>> https://gouravshenoy.github.io/apache-airavata/spring17/2017
>>>> /04/20/final-report.html . I have also attached a PDF paper we wrote
>>>> as part of the class explaining the task execution process and *one
>>>> solution* using rabbitmq messaging.
>>>>
>>>>
>>>>
>>>> I had also started with the work of containerizing Airavata and a
>>>> unified build + deployment mechanism with CI CD on DC/OS. Unfortunately, I
>>>> couldn’t complete it due to time constraints, but I would be more than
>>>> happy to work with you on this. Let me know and we can coordinate.
>>>>
>>>>
>>>>
>>>> Thanks and Regards,
>>>>
>>>> Gourav Shenoy
>>>>
>>>>
>>>>
>>>> *From: *DImuthu Upeksha <di...@gmail.com>
>>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>> *Date: *Thursday, October 5, 2017 at 9:52 AM
>>>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>> *Subject: *Re: Linked Container Services for Apache Airavata
>>>> Components - Phase 1 - Requirement identification
>>>>
>>>>
>>>>
>>>> Hi Marlon,
>>>>
>>>>
>>>>
>>>> Thanks for the input. I got your idea of availability mode and will
>>>> keep in mind while designing the PoC. CI/CD is the one I have missed and
>>>> thanks for pointing it out.
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Dimuthu
>>>>
>>>>
>>>>
>>>> On Thu, Oct 5, 2017 at 7:04 PM, Pierce, Marlon <ma...@iu.edu> wrote:
>>>>
>>>> Thanks, Dimuthu, this is a good summary. Others may comment about
>>>> Kafka, stateful versus stateless parts of Airavata, etc.  You may also find
>>>> some of this discussion on the mailing list archives.
>>>>
>>>>
>>>>
>>>> Active-active vs. active-passive is a good question, and we have
>>>> typically thought of this in terms of individual Airavata components rather
>>>> than the whole system.  Some components can be active-active (like a
>>>> stateless application manager), while others (like the orchestrator example
>>>> you give below) are stafefull and may be better as active-passive.
>>>>
>>>>
>>>>
>>>> There is also the issue of system updates and continuous deployments,
>>>> which could be added to your list.
>>>>
>>>>
>>>>
>>>> Marlon
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From: *"dimuthu.upeksha2@gmail.com" <di...@gmail.com>
>>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>> *Date: *Thursday, October 5, 2017 at 2:40 AM
>>>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>> *Subject: *Linked Container Services for Apache Airavata Components -
>>>> Phase 1 - Requirement identification
>>>>
>>>>
>>>>
>>>> Hi All,
>>>>
>>>>
>>>>
>>>> Within last few days, I have been going through the requirements and
>>>> design of current setup of Airavata and I identified following ares as the
>>>> key focusing areas in the technology evaluation phase
>>>>
>>>>
>>>>
>>>> Micorservices deployment platform (container management system)
>>>>
>>>>
>>>>
>>>> Possible candidates: Google Kubernetes, Apache Mesos, Apache Helix
>>>>
>>>> As the most of the operational units of Airavata is supposed to be
>>>> moving into microservices based deployment pattern, having a unified
>>>> deployment platform to manage those microservices will make the DevOps
>>>> operations easier and faster. From the other hand, although writing and
>>>> maintaining a single micro service is a somewhat straightforward way,
>>>> making multiple microservies running, monitoring and maintaining the
>>>> lifecycles manually in a production environment is an tiresome and complex
>>>> operation to perform. Using such a deployment platform, we can easily
>>>> automate lots of pain points that I have mentioned earlier.
>>>>
>>>>
>>>>
>>>> Scalability
>>>>
>>>>
>>>>
>>>> We need a solution that can easily scalable depending on the load
>>>> condition of several parts of the system. For example, the workers in the
>>>> post processing pipeline should be able scaled up and down depending on the
>>>> events come into the message queue.
>>>>
>>>>
>>>>
>>>> Availability
>>>>
>>>>
>>>>
>>>> We need to support solution to be deployed in multiple geographically
>>>> distant data centers. When evaluating container management systems, we
>>>> should consider this is as a primary requirement. However one thing that I
>>>> am not sure is the availability mode that Airavata normally expect. Is it a
>>>> active-active mode or active-passive mode?
>>>>
>>>>
>>>>
>>>> Service discovery
>>>>
>>>>
>>>>
>>>> Once we move in to microservice based deployment pattern, there could
>>>> be scenarios where we want service discovery for several use cases. For
>>>> example, if we are going to scale up API Server to handle an increased
>>>> load, we might have to put a load balancer in between the client and API
>>>> Server instances. In that case, service discovery is essential to instruct
>>>> the load balancer with healthy API Server endpoints which are currently
>>>> running in the system.
>>>>
>>>>
>>>>
>>>> Cluster coordination
>>>>
>>>>
>>>>
>>>> Although micorservices are supposed to be stateless in most of the
>>>> cases, we might have scenarios to feed some state to particular
>>>> micorservices. For example if we are going to implement a microservice that
>>>> perform Orchestrator's role, there could be issues if we keep multiple
>>>> instances of it in several data centers to increase the availability.
>>>> According to my understanding, there should be only one Orchestrator being
>>>> running at a time as it is the one who takes decisions of the job execution
>>>> process. So, if we are going to keep multiple instances of it running in
>>>> the system, there should be an some sort of a leader election in between
>>>> Orchestrator quorum.
>>>>
>>>>
>>>>
>>>> Common messaging medium in between mocroservices
>>>>
>>>>
>>>>
>>>> This might be out of the scope but I thought of sharing with the team
>>>> to have an general idea. Idea was raised at the hip chat discussion with
>>>> Marlon and Gaourav. Using a common messaging medium might enable
>>>> microservices to communicate with in a decoupled manner which will increase
>>>> the scalability of the system. For example there is a reference
>>>> architecture that we can utilize with kafka based messaging medium [1],
>>>> [2]. However I noticed in one paper that Kafka was previously rejected as
>>>> writing clients was onerous. Please share your views on this as I'm not
>>>> familiar with the existing fan out model based on AMQP and  pain points of
>>>> it.
>>>>
>>>>
>>>>
>>>> Those are the main areas that I have understood while going through
>>>> Airavata current implementation and requirements stated in some of the
>>>> research papers. Please let me know whether my understanding on above items
>>>> are correct and suggestions are always welcome :)
>>>>
>>>>
>>>>
>>>> [1] https://medium.com/@ulymarins/an-introduction-to-apache-
>>>> kafka-and-microservices-communication-bf0a0966d63
>>>>
>>>> [2] https://www.slideshare.net/ConfluentInc/microservices-in
>>>> -the-apache-kafka-ecosystem
>>>>
>>>>
>>>>
>>>> References
>>>>
>>>>
>>>>
>>>> Marru, S., Gunathilake, L., Herath, C., Tangchaisin, P., Pierce, M.,
>>>> Mattmann, C., Singh, R., Gunarathne, T., Chinthaka, E., Gardler, R. and
>>>> Slominski, A., 2011, November. Apache airavata: a framework for distributed
>>>> applications and computational workflows. In Proceedings of the 2011 ACM
>>>> workshop on Gateway computing environments (pp. 21-28). ACM.
>>>>
>>>>
>>>>
>>>> Nakandala, S., Pamidighantam, S., Yodage, S., Doshi, N., Abeysinghe,
>>>> E., Kankanamalage, C.P., Marru, S. and Pierce, M., 2016, July. Anatomy of
>>>> the SEAGrid Science Gateway. In Proceedings of the XSEDE16 Conference on
>>>> Diversity, Big Data, and Science at Scale (p. 40). ACM.
>>>>
>>>>
>>>>
>>>> Pierce, Marlon E., Suresh Marru, Lahiru Gunathilake, Don Kushan
>>>> Wijeratne, Raminder Singh, Chathuri Wimalasena, Shameera Ratnayaka, and
>>>> Sudhakar Pamidighantam. "Apache Airavata: design and directions of a
>>>> science gateway framework." Concurrency and Computation: Practice and
>>>> Experience 27, no. 16 (2015): 4282-4291.
>>>>
>>>>
>>>>
>>>> Pierce, Marlon, Suresh Marru, Borries Demeler, Raminderjeet Singh, and
>>>> Gary Gorbet. "The apache airavata application programming interface:
>>>> overview and evaluation with the UltraScan science gateway." In Proceedings
>>>> of the 9th Gateway Computing Environments Workshop, pp. 25-29. IEEE Press,
>>>> 2014.
>>>>
>>>>
>>>>
>>>> Marru, Suresh, Marlon Pierce, Sudhakar Pamidighantam, and Chathuri
>>>> Wimalasena. "Apache Airavata as a laboratory: architecture and case study
>>>> for component- based gateway middleware." In Proceedings of the 1st
>>>> Workshop on The Science of Cyberinfrastructure: Research, Experience,
>>>> Applications and Models, pp. 19-26. ACM, 2015.
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Dimuthu
>>>>
>>>>
>>>>
>>>
>>>
>

Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

Posted by DImuthu Upeksha <di...@gmail.com>.

Hi Supun,

My belief also letting orchestrator to determine the worker to run
particular job is complex to implement and will make the maintainability of
orchestrator code quite hard in long run. I'm also in partially agreement
with embedding a worker inside the firewall protected resource but I guess
we can improve it further to make homogenous and stateless. Have a look at
following figure


In above design we keep all the workers outside and keep a daemon inside
the protected resource to securely communicate with workers. Then the
problem is how do we make the worker homogenous as this is still just
adding another layer to the solution stated above. Trick is, we decouple
the communication between worker and resource. Communication to any
resource is being done through a well defined API. Speaking in java

public interface CommunicationInterface {
      public String sshToResource(String resourceIp, String command);
      public void transferDataTo(String resourceIp, String target,
InputStream in);
      public void transferDataFrom(String resourceIp, String target,
OutputStream out);
}

Implementation of this API might change according to the resource. We keep
a separate Catalog that will cater the libraries that have the
implementation specific to each resource. For example, if Worker 1 needs to
talk to Resource 1 which acts behind a firewall and the Airavata
communication agent is placed inside, it will query the Catalog for the
Resource 1 and fetch the library that implemented CommunicationInterface to
talk securely with Airavata Agent. If it wants to talk to Resource 2,
another library will be fetched from Catalog that has default
implementations. Once those SDKs are fetched, they are loaded into the JVM
at runtime using a class loader and communication will be done afterwards.

We can improve this by caching libraries inside workers and reusing them as
much as possible to limit number of queries to Catalog from workers.

Advantage of this is, we can add resources with different security levels
without changing the Worker implementations. Only thing we have to do is to
come up with an agent and a library to talk with agent. Then add them to
Catalog and rest will be taken cared by the framework. This model is
analogous to the sql drivers that we use in java to connect to databases.

Please note that I came up with this design based on the limited knowledge
I have in Airavata Workers and Resources. There will be lot of corner cases
that I have not identified. Your views and ideas are highly appreciated.

Thanks
Dimuthu

On Sun, Oct 8, 2017 at 10:51 AM, Supun Nakandala <su...@gmail.com>
wrote:

> Hi Dimuthu,
>
> Thank you for the very good summary. I think you have covered almost all
> the things.
>
> I would also like to mention one other futuristic requirements that I
> think will be important in this discussion.
>
> In my opinion going forward, Airavata will get the requirement of working
> with firewall protected resources. In such cases, workers which are
> residing outside will not be able to communicate with the protected
> resources. What we initially thought was to deploy a special type of worker
> which will be placed inside the firewall-protected network and will
> coordinate with Airavata orchestrator to execute actions. One such tool
> which is used by ServiceNow in enterprise settings is the MidServer (
> http://wiki.servicenow.com/index.php?title=MID_Server#gsc.tab=0). The
> downside of this approach is that it breaks our assumption of all workers
> being homogenous and therefore require orchestrator to be worker aware.
> Perhaps, instead of workers picking work we can design such that
> orchestrator will grant work to the corresponding work. But this
> incorporates a lot of complexity on the orchestrator's side.
>
>
>
> On Oct 5, 2017 10:47 AM, "DImuthu Upeksha" <di...@gmail.com>
> wrote:
>
>> Hi Gaurav,
>>
>> Thanks a lot for the detailed description about DC/OS and how it can be
>> utilized in Airavata. Seems like it is an interesting project and I'll add
>> it to the technology list that are to be evaluated.
>>
>> When selecting a technology, in addition to the features it provides, we
>> might have to take some non-functional features like the community
>> participation (committers, commits and forks), number of customers  who
>> are  running it  in production environments, maturity of the project and
>> the complexity it brings in to the total system into the consideration. So
>> I'll first try to go through the resources (documentation and source) and
>> try to grab concepts of DC/OS and hopefully I can work with you to dig
>> deeper to understand more about DC/OS
>>
>> Thanks
>> Dimuthu
>>
>> On Thu, Oct 5, 2017 at 8:50 PM, Shenoy, Gourav Ganesh <
>> goshenoy@indiana.edu> wrote:
>>
>>> Sorry, missed the attachment in my previous email.
>>>
>>>
>>>
>>> PS: DC/OS is just a recommendation for performing containerized
>>> deployment and application management for Airavata. I would be happy to
>>> consider alternative frameworks such as Kubernetes.
>>>
>>>
>>>
>>> Thanks and Regards,
>>>
>>> Gourav Shenoy
>>>
>>>
>>>
>>> *From: *"Shenoy, Gourav Ganesh" <go...@indiana.edu>
>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>> *Date: *Thursday, October 5, 2017 at 11:16 AM
>>>
>>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>> *Subject: *Re: Linked Container Services for Apache Airavata Components
>>> - Phase 1 - Requirement identification
>>>
>>>
>>>
>>> Hi Dimuthu,
>>>
>>>
>>>
>>> Very good summary! I am not sure if you have, but DC/OS (DataCenter
>>> Operating System) is a container orchestration platform based on Apache
>>> Mesos. The beauty of DC/OS is the ease and simplicity of
>>> development/deployment; yet being extremely powerful in most of the
>>> parameters – multi-datacenter, multi-cloud, scalability, high availability,
>>> fault tolerance, load balancing, and more importantly the community support
>>> is fantastic.
>>>
>>>
>>>
>>> DC/OS has an exhaustive service catalog, it’s more like a PAAS for
>>> containers (not just restricted to containers though) – you can run
>>> services like Spark, Kafka, RabbitMQ, etc out of the box with a single
>>> click install. And Apache Mesos as the underlying resource manager makes it
>>> seamless to deploy applications across different datacenters. There is a
>>> concept of SERVICE vs JOB – service is considered long running and DC/OS
>>> will make sure it keeps it running (if a service fails, it spins up a new
>>> one), whereas jobs are one time executors. This comes handy for using DC/OS
>>> as a target runtime for Airavata.
>>>
>>>
>>>
>>> We used DC/OS for our class project to run the distributed task
>>> execution prototype we built (which uses RabbitMQ messaging). Here’s a link
>>> to the blog I have explaining the process:
>>> https://gouravshenoy.github.io/apache-airavata/spring17/2017
>>> /04/20/final-report.html . I have also attached a PDF paper we wrote as
>>> part of the class explaining the task execution process and *one
>>> solution* using rabbitmq messaging.
>>>
>>>
>>>
>>> I had also started with the work of containerizing Airavata and a
>>> unified build + deployment mechanism with CI CD on DC/OS. Unfortunately, I
>>> couldn’t complete it due to time constraints, but I would be more than
>>> happy to work with you on this. Let me know and we can coordinate.
>>>
>>>
>>>
>>> Thanks and Regards,
>>>
>>> Gourav Shenoy
>>>
>>>
>>>
>>> *From: *DImuthu Upeksha <di...@gmail.com>
>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>> *Date: *Thursday, October 5, 2017 at 9:52 AM
>>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>> *Subject: *Re: Linked Container Services for Apache Airavata Components
>>> - Phase 1 - Requirement identification
>>>
>>>
>>>
>>> Hi Marlon,
>>>
>>>
>>>
>>> Thanks for the input. I got your idea of availability mode and will keep
>>> in mind while designing the PoC. CI/CD is the one I have missed and thanks
>>> for pointing it out.
>>>
>>>
>>>
>>> Thanks
>>>
>>> Dimuthu
>>>
>>>
>>>
>>> On Thu, Oct 5, 2017 at 7:04 PM, Pierce, Marlon <ma...@iu.edu> wrote:
>>>
>>> Thanks, Dimuthu, this is a good summary. Others may comment about Kafka,
>>> stateful versus stateless parts of Airavata, etc.  You may also find some
>>> of this discussion on the mailing list archives.
>>>
>>>
>>>
>>> Active-active vs. active-passive is a good question, and we have
>>> typically thought of this in terms of individual Airavata components rather
>>> than the whole system.  Some components can be active-active (like a
>>> stateless application manager), while others (like the orchestrator example
>>> you give below) are stafefull and may be better as active-passive.
>>>
>>>
>>>
>>> There is also the issue of system updates and continuous deployments,
>>> which could be added to your list.
>>>
>>>
>>>
>>> Marlon
>>>
>>>
>>>
>>>
>>>
>>> *From: *"dimuthu.upeksha2@gmail.com" <di...@gmail.com>
>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>> *Date: *Thursday, October 5, 2017 at 2:40 AM
>>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>> *Subject: *Linked Container Services for Apache Airavata Components -
>>> Phase 1 - Requirement identification
>>>
>>>
>>>
>>> Hi All,
>>>
>>>
>>>
>>> Within last few days, I have been going through the requirements and
>>> design of current setup of Airavata and I identified following ares as the
>>> key focusing areas in the technology evaluation phase
>>>
>>>
>>>
>>> Micorservices deployment platform (container management system)
>>>
>>>
>>>
>>> Possible candidates: Google Kubernetes, Apache Mesos, Apache Helix
>>>
>>> As the most of the operational units of Airavata is supposed to be
>>> moving into microservices based deployment pattern, having a unified
>>> deployment platform to manage those microservices will make the DevOps
>>> operations easier and faster. From the other hand, although writing and
>>> maintaining a single micro service is a somewhat straightforward way,
>>> making multiple microservies running, monitoring and maintaining the
>>> lifecycles manually in a production environment is an tiresome and complex
>>> operation to perform. Using such a deployment platform, we can easily
>>> automate lots of pain points that I have mentioned earlier.
>>>
>>>
>>>
>>> Scalability
>>>
>>>
>>>
>>> We need a solution that can easily scalable depending on the load
>>> condition of several parts of the system. For example, the workers in the
>>> post processing pipeline should be able scaled up and down depending on the
>>> events come into the message queue.
>>>
>>>
>>>
>>> Availability
>>>
>>>
>>>
>>> We need to support solution to be deployed in multiple geographically
>>> distant data centers. When evaluating container management systems, we
>>> should consider this is as a primary requirement. However one thing that I
>>> am not sure is the availability mode that Airavata normally expect. Is it a
>>> active-active mode or active-passive mode?
>>>
>>>
>>>
>>> Service discovery
>>>
>>>
>>>
>>> Once we move in to microservice based deployment pattern, there could be
>>> scenarios where we want service discovery for several use cases. For
>>> example, if we are going to scale up API Server to handle an increased
>>> load, we might have to put a load balancer in between the client and API
>>> Server instances. In that case, service discovery is essential to instruct
>>> the load balancer with healthy API Server endpoints which are currently
>>> running in the system.
>>>
>>>
>>>
>>> Cluster coordination
>>>
>>>
>>>
>>> Although micorservices are supposed to be stateless in most of the
>>> cases, we might have scenarios to feed some state to particular
>>> micorservices. For example if we are going to implement a microservice that
>>> perform Orchestrator's role, there could be issues if we keep multiple
>>> instances of it in several data centers to increase the availability.
>>> According to my understanding, there should be only one Orchestrator being
>>> running at a time as it is the one who takes decisions of the job execution
>>> process. So, if we are going to keep multiple instances of it running in
>>> the system, there should be an some sort of a leader election in between
>>> Orchestrator quorum.
>>>
>>>
>>>
>>> Common messaging medium in between mocroservices
>>>
>>>
>>>
>>> This might be out of the scope but I thought of sharing with the team to
>>> have an general idea. Idea was raised at the hip chat discussion with
>>> Marlon and Gaourav. Using a common messaging medium might enable
>>> microservices to communicate with in a decoupled manner which will increase
>>> the scalability of the system. For example there is a reference
>>> architecture that we can utilize with kafka based messaging medium [1],
>>> [2]. However I noticed in one paper that Kafka was previously rejected as
>>> writing clients was onerous. Please share your views on this as I'm not
>>> familiar with the existing fan out model based on AMQP and  pain points of
>>> it.
>>>
>>>
>>>
>>> Those are the main areas that I have understood while going through
>>> Airavata current implementation and requirements stated in some of the
>>> research papers. Please let me know whether my understanding on above items
>>> are correct and suggestions are always welcome :)
>>>
>>>
>>>
>>> [1] https://medium.com/@ulymarins/an-introduction-to-apache-
>>> kafka-and-microservices-communication-bf0a0966d63
>>>
>>> [2] https://www.slideshare.net/ConfluentInc/microservices-in
>>> -the-apache-kafka-ecosystem
>>>
>>>
>>>
>>> References
>>>
>>>
>>>
>>> Marru, S., Gunathilake, L., Herath, C., Tangchaisin, P., Pierce, M.,
>>> Mattmann, C., Singh, R., Gunarathne, T., Chinthaka, E., Gardler, R. and
>>> Slominski, A., 2011, November. Apache airavata: a framework for distributed
>>> applications and computational workflows. In Proceedings of the 2011 ACM
>>> workshop on Gateway computing environments (pp. 21-28). ACM.
>>>
>>>
>>>
>>> Nakandala, S., Pamidighantam, S., Yodage, S., Doshi, N., Abeysinghe, E.,
>>> Kankanamalage, C.P., Marru, S. and Pierce, M., 2016, July. Anatomy of the
>>> SEAGrid Science Gateway. In Proceedings of the XSEDE16 Conference on
>>> Diversity, Big Data, and Science at Scale (p. 40). ACM.
>>>
>>>
>>>
>>> Pierce, Marlon E., Suresh Marru, Lahiru Gunathilake, Don Kushan
>>> Wijeratne, Raminder Singh, Chathuri Wimalasena, Shameera Ratnayaka, and
>>> Sudhakar Pamidighantam. "Apache Airavata: design and directions of a
>>> science gateway framework." Concurrency and Computation: Practice and
>>> Experience 27, no. 16 (2015): 4282-4291.
>>>
>>>
>>>
>>> Pierce, Marlon, Suresh Marru, Borries Demeler, Raminderjeet Singh, and
>>> Gary Gorbet. "The apache airavata application programming interface:
>>> overview and evaluation with the UltraScan science gateway." In Proceedings
>>> of the 9th Gateway Computing Environments Workshop, pp. 25-29. IEEE Press,
>>> 2014.
>>>
>>>
>>>
>>> Marru, Suresh, Marlon Pierce, Sudhakar Pamidighantam, and Chathuri
>>> Wimalasena. "Apache Airavata as a laboratory: architecture and case study
>>> for component- based gateway middleware." In Proceedings of the 1st
>>> Workshop on The Science of Cyberinfrastructure: Research, Experience,
>>> Applications and Models, pp. 19-26. ACM, 2015.
>>>
>>>
>>>
>>> Thanks
>>>
>>> Dimuthu
>>>
>>>
>>>
>>
>>

Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

Posted by Supun Nakandala <su...@gmail.com>.

Hi Dimuthu,

Thank you for the very good summary. I think you have covered almost all
the things.

I would also like to mention one other futuristic requirements that I think
will be important in this discussion.

In my opinion going forward, Airavata will get the requirement of working
with firewall protected resources. In such cases, workers which are
residing outside will not be able to communicate with the protected
resources. What we initially thought was to deploy a special type of worker
which will be placed inside the firewall-protected network and will
coordinate with Airavata orchestrator to execute actions. One such tool
which is used by ServiceNow in enterprise settings is the MidServer (
http://wiki.servicenow.com/index.php?title=MID_Server#gsc.tab=0). The
downside of this approach is that it breaks our assumption of all workers
being homogenous and therefore require orchestrator to be worker aware.
Perhaps, instead of workers picking work we can design such that
orchestrator will grant work to the corresponding work. But this
incorporates a lot of complexity on the orchestrator's side.



On Oct 5, 2017 10:47 AM, "DImuthu Upeksha" <di...@gmail.com>
wrote:

> Hi Gaurav,
>
> Thanks a lot for the detailed description about DC/OS and how it can be
> utilized in Airavata. Seems like it is an interesting project and I'll add
> it to the technology list that are to be evaluated.
>
> When selecting a technology, in addition to the features it provides, we
> might have to take some non-functional features like the community
> participation (committers, commits and forks), number of customers  who
> are  running it  in production environments, maturity of the project and
> the complexity it brings in to the total system into the consideration. So
> I'll first try to go through the resources (documentation and source) and
> try to grab concepts of DC/OS and hopefully I can work with you to dig
> deeper to understand more about DC/OS
>
> Thanks
> Dimuthu
>
> On Thu, Oct 5, 2017 at 8:50 PM, Shenoy, Gourav Ganesh <
> goshenoy@indiana.edu> wrote:
>
>> Sorry, missed the attachment in my previous email.
>>
>>
>>
>> PS: DC/OS is just a recommendation for performing containerized
>> deployment and application management for Airavata. I would be happy to
>> consider alternative frameworks such as Kubernetes.
>>
>>
>>
>> Thanks and Regards,
>>
>> Gourav Shenoy
>>
>>
>>
>> *From: *"Shenoy, Gourav Ganesh" <go...@indiana.edu>
>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Date: *Thursday, October 5, 2017 at 11:16 AM
>>
>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Subject: *Re: Linked Container Services for Apache Airavata Components
>> - Phase 1 - Requirement identification
>>
>>
>>
>> Hi Dimuthu,
>>
>>
>>
>> Very good summary! I am not sure if you have, but DC/OS (DataCenter
>> Operating System) is a container orchestration platform based on Apache
>> Mesos. The beauty of DC/OS is the ease and simplicity of
>> development/deployment; yet being extremely powerful in most of the
>> parameters – multi-datacenter, multi-cloud, scalability, high availability,
>> fault tolerance, load balancing, and more importantly the community support
>> is fantastic.
>>
>>
>>
>> DC/OS has an exhaustive service catalog, it’s more like a PAAS for
>> containers (not just restricted to containers though) – you can run
>> services like Spark, Kafka, RabbitMQ, etc out of the box with a single
>> click install. And Apache Mesos as the underlying resource manager makes it
>> seamless to deploy applications across different datacenters. There is a
>> concept of SERVICE vs JOB – service is considered long running and DC/OS
>> will make sure it keeps it running (if a service fails, it spins up a new
>> one), whereas jobs are one time executors. This comes handy for using DC/OS
>> as a target runtime for Airavata.
>>
>>
>>
>> We used DC/OS for our class project to run the distributed task execution
>> prototype we built (which uses RabbitMQ messaging). Here’s a link to the
>> blog I have explaining the process: https://gouravshenoy.github.io
>> /apache-airavata/spring17/2017/04/20/final-report.html . I have also
>> attached a PDF paper we wrote as part of the class explaining the task
>> execution process and *one solution* using rabbitmq messaging.
>>
>>
>>
>> I had also started with the work of containerizing Airavata and a unified
>> build + deployment mechanism with CI CD on DC/OS. Unfortunately, I couldn’t
>> complete it due to time constraints, but I would be more than happy to work
>> with you on this. Let me know and we can coordinate.
>>
>>
>>
>> Thanks and Regards,
>>
>> Gourav Shenoy
>>
>>
>>
>> *From: *DImuthu Upeksha <di...@gmail.com>
>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Date: *Thursday, October 5, 2017 at 9:52 AM
>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Subject: *Re: Linked Container Services for Apache Airavata Components
>> - Phase 1 - Requirement identification
>>
>>
>>
>> Hi Marlon,
>>
>>
>>
>> Thanks for the input. I got your idea of availability mode and will keep
>> in mind while designing the PoC. CI/CD is the one I have missed and thanks
>> for pointing it out.
>>
>>
>>
>> Thanks
>>
>> Dimuthu
>>
>>
>>
>> On Thu, Oct 5, 2017 at 7:04 PM, Pierce, Marlon <ma...@iu.edu> wrote:
>>
>> Thanks, Dimuthu, this is a good summary. Others may comment about Kafka,
>> stateful versus stateless parts of Airavata, etc.  You may also find some
>> of this discussion on the mailing list archives.
>>
>>
>>
>> Active-active vs. active-passive is a good question, and we have
>> typically thought of this in terms of individual Airavata components rather
>> than the whole system.  Some components can be active-active (like a
>> stateless application manager), while others (like the orchestrator example
>> you give below) are stafefull and may be better as active-passive.
>>
>>
>>
>> There is also the issue of system updates and continuous deployments,
>> which could be added to your list.
>>
>>
>>
>> Marlon
>>
>>
>>
>>
>>
>> *From: *"dimuthu.upeksha2@gmail.com" <di...@gmail.com>
>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Date: *Thursday, October 5, 2017 at 2:40 AM
>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Subject: *Linked Container Services for Apache Airavata Components -
>> Phase 1 - Requirement identification
>>
>>
>>
>> Hi All,
>>
>>
>>
>> Within last few days, I have been going through the requirements and
>> design of current setup of Airavata and I identified following ares as the
>> key focusing areas in the technology evaluation phase
>>
>>
>>
>> Micorservices deployment platform (container management system)
>>
>>
>>
>> Possible candidates: Google Kubernetes, Apache Mesos, Apache Helix
>>
>> As the most of the operational units of Airavata is supposed to be moving
>> into microservices based deployment pattern, having a unified deployment
>> platform to manage those microservices will make the DevOps operations
>> easier and faster. From the other hand, although writing and maintaining a
>> single micro service is a somewhat straightforward way, making multiple
>> microservies running, monitoring and maintaining the lifecycles manually in
>> a production environment is an tiresome and complex operation to perform.
>> Using such a deployment platform, we can easily automate lots of pain
>> points that I have mentioned earlier.
>>
>>
>>
>> Scalability
>>
>>
>>
>> We need a solution that can easily scalable depending on the load
>> condition of several parts of the system. For example, the workers in the
>> post processing pipeline should be able scaled up and down depending on the
>> events come into the message queue.
>>
>>
>>
>> Availability
>>
>>
>>
>> We need to support solution to be deployed in multiple geographically
>> distant data centers. When evaluating container management systems, we
>> should consider this is as a primary requirement. However one thing that I
>> am not sure is the availability mode that Airavata normally expect. Is it a
>> active-active mode or active-passive mode?
>>
>>
>>
>> Service discovery
>>
>>
>>
>> Once we move in to microservice based deployment pattern, there could be
>> scenarios where we want service discovery for several use cases. For
>> example, if we are going to scale up API Server to handle an increased
>> load, we might have to put a load balancer in between the client and API
>> Server instances. In that case, service discovery is essential to instruct
>> the load balancer with healthy API Server endpoints which are currently
>> running in the system.
>>
>>
>>
>> Cluster coordination
>>
>>
>>
>> Although micorservices are supposed to be stateless in most of the cases,
>> we might have scenarios to feed some state to particular micorservices. For
>> example if we are going to implement a microservice that perform
>> Orchestrator's role, there could be issues if we keep multiple instances of
>> it in several data centers to increase the availability. According to my
>> understanding, there should be only one Orchestrator being running at a
>> time as it is the one who takes decisions of the job execution process. So,
>> if we are going to keep multiple instances of it running in the system,
>> there should be an some sort of a leader election in between Orchestrator
>> quorum.
>>
>>
>>
>> Common messaging medium in between mocroservices
>>
>>
>>
>> This might be out of the scope but I thought of sharing with the team to
>> have an general idea. Idea was raised at the hip chat discussion with
>> Marlon and Gaourav. Using a common messaging medium might enable
>> microservices to communicate with in a decoupled manner which will increase
>> the scalability of the system. For example there is a reference
>> architecture that we can utilize with kafka based messaging medium [1],
>> [2]. However I noticed in one paper that Kafka was previously rejected as
>> writing clients was onerous. Please share your views on this as I'm not
>> familiar with the existing fan out model based on AMQP and  pain points of
>> it.
>>
>>
>>
>> Those are the main areas that I have understood while going through
>> Airavata current implementation and requirements stated in some of the
>> research papers. Please let me know whether my understanding on above items
>> are correct and suggestions are always welcome :)
>>
>>
>>
>> [1] https://medium.com/@ulymarins/an-introduction-to-apache-
>> kafka-and-microservices-communication-bf0a0966d63
>>
>> [2] https://www.slideshare.net/ConfluentInc/microservices-in
>> -the-apache-kafka-ecosystem
>>
>>
>>
>> References
>>
>>
>>
>> Marru, S., Gunathilake, L., Herath, C., Tangchaisin, P., Pierce, M.,
>> Mattmann, C., Singh, R., Gunarathne, T., Chinthaka, E., Gardler, R. and
>> Slominski, A., 2011, November. Apache airavata: a framework for distributed
>> applications and computational workflows. In Proceedings of the 2011 ACM
>> workshop on Gateway computing environments (pp. 21-28). ACM.
>>
>>
>>
>> Nakandala, S., Pamidighantam, S., Yodage, S., Doshi, N., Abeysinghe, E.,
>> Kankanamalage, C.P., Marru, S. and Pierce, M., 2016, July. Anatomy of the
>> SEAGrid Science Gateway. In Proceedings of the XSEDE16 Conference on
>> Diversity, Big Data, and Science at Scale (p. 40). ACM.
>>
>>
>>
>> Pierce, Marlon E., Suresh Marru, Lahiru Gunathilake, Don Kushan
>> Wijeratne, Raminder Singh, Chathuri Wimalasena, Shameera Ratnayaka, and
>> Sudhakar Pamidighantam. "Apache Airavata: design and directions of a
>> science gateway framework." Concurrency and Computation: Practice and
>> Experience 27, no. 16 (2015): 4282-4291.
>>
>>
>>
>> Pierce, Marlon, Suresh Marru, Borries Demeler, Raminderjeet Singh, and
>> Gary Gorbet. "The apache airavata application programming interface:
>> overview and evaluation with the UltraScan science gateway." In Proceedings
>> of the 9th Gateway Computing Environments Workshop, pp. 25-29. IEEE Press,
>> 2014.
>>
>>
>>
>> Marru, Suresh, Marlon Pierce, Sudhakar Pamidighantam, and Chathuri
>> Wimalasena. "Apache Airavata as a laboratory: architecture and case study
>> for component- based gateway middleware." In Proceedings of the 1st
>> Workshop on The Science of Cyberinfrastructure: Research, Experience,
>> Applications and Models, pp. 19-26. ACM, 2015.
>>
>>
>>
>> Thanks
>>
>> Dimuthu
>>
>>
>>
>
>

Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

Posted by DImuthu Upeksha <di...@gmail.com>.

Hi Gaurav,

Thanks a lot for the detailed description about DC/OS and how it can be
utilized in Airavata. Seems like it is an interesting project and I'll add
it to the technology list that are to be evaluated.

When selecting a technology, in addition to the features it provides, we
might have to take some non-functional features like the community
participation (committers, commits and forks), number of customers  who
are  running it  in production environments, maturity of the project and
the complexity it brings in to the total system into the consideration. So
I'll first try to go through the resources (documentation and source) and
try to grab concepts of DC/OS and hopefully I can work with you to dig
deeper to understand more about DC/OS

Thanks
Dimuthu

On Thu, Oct 5, 2017 at 8:50 PM, Shenoy, Gourav Ganesh <go...@indiana.edu>
wrote:

> Sorry, missed the attachment in my previous email.
>
>
>
> PS: DC/OS is just a recommendation for performing containerized deployment
> and application management for Airavata. I would be happy to consider
> alternative frameworks such as Kubernetes.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *"Shenoy, Gourav Ganesh" <go...@indiana.edu>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Thursday, October 5, 2017 at 11:16 AM
>
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Re: Linked Container Services for Apache Airavata Components -
> Phase 1 - Requirement identification
>
>
>
> Hi Dimuthu,
>
>
>
> Very good summary! I am not sure if you have, but DC/OS (DataCenter
> Operating System) is a container orchestration platform based on Apache
> Mesos. The beauty of DC/OS is the ease and simplicity of
> development/deployment; yet being extremely powerful in most of the
> parameters – multi-datacenter, multi-cloud, scalability, high availability,
> fault tolerance, load balancing, and more importantly the community support
> is fantastic.
>
>
>
> DC/OS has an exhaustive service catalog, it’s more like a PAAS for
> containers (not just restricted to containers though) – you can run
> services like Spark, Kafka, RabbitMQ, etc out of the box with a single
> click install. And Apache Mesos as the underlying resource manager makes it
> seamless to deploy applications across different datacenters. There is a
> concept of SERVICE vs JOB – service is considered long running and DC/OS
> will make sure it keeps it running (if a service fails, it spins up a new
> one), whereas jobs are one time executors. This comes handy for using DC/OS
> as a target runtime for Airavata.
>
>
>
> We used DC/OS for our class project to run the distributed task execution
> prototype we built (which uses RabbitMQ messaging). Here’s a link to the
> blog I have explaining the process: https://gouravshenoy.github.
> io/apache-airavata/spring17/2017/04/20/final-report.html . I have also
> attached a PDF paper we wrote as part of the class explaining the task
> execution process and *one solution* using rabbitmq messaging.
>
>
>
> I had also started with the work of containerizing Airavata and a unified
> build + deployment mechanism with CI CD on DC/OS. Unfortunately, I couldn’t
> complete it due to time constraints, but I would be more than happy to work
> with you on this. Let me know and we can coordinate.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *DImuthu Upeksha <di...@gmail.com>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Thursday, October 5, 2017 at 9:52 AM
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Re: Linked Container Services for Apache Airavata Components -
> Phase 1 - Requirement identification
>
>
>
> Hi Marlon,
>
>
>
> Thanks for the input. I got your idea of availability mode and will keep
> in mind while designing the PoC. CI/CD is the one I have missed and thanks
> for pointing it out.
>
>
>
> Thanks
>
> Dimuthu
>
>
>
> On Thu, Oct 5, 2017 at 7:04 PM, Pierce, Marlon <ma...@iu.edu> wrote:
>
> Thanks, Dimuthu, this is a good summary. Others may comment about Kafka,
> stateful versus stateless parts of Airavata, etc.  You may also find some
> of this discussion on the mailing list archives.
>
>
>
> Active-active vs. active-passive is a good question, and we have typically
> thought of this in terms of individual Airavata components rather than the
> whole system.  Some components can be active-active (like a stateless
> application manager), while others (like the orchestrator example you give
> below) are stafefull and may be better as active-passive.
>
>
>
> There is also the issue of system updates and continuous deployments,
> which could be added to your list.
>
>
>
> Marlon
>
>
>
>
>
> *From: *"dimuthu.upeksha2@gmail.com" <di...@gmail.com>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Thursday, October 5, 2017 at 2:40 AM
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Linked Container Services for Apache Airavata Components -
> Phase 1 - Requirement identification
>
>
>
> Hi All,
>
>
>
> Within last few days, I have been going through the requirements and
> design of current setup of Airavata and I identified following ares as the
> key focusing areas in the technology evaluation phase
>
>
>
> Micorservices deployment platform (container management system)
>
>
>
> Possible candidates: Google Kubernetes, Apache Mesos, Apache Helix
>
> As the most of the operational units of Airavata is supposed to be moving
> into microservices based deployment pattern, having a unified deployment
> platform to manage those microservices will make the DevOps operations
> easier and faster. From the other hand, although writing and maintaining a
> single micro service is a somewhat straightforward way, making multiple
> microservies running, monitoring and maintaining the lifecycles manually in
> a production environment is an tiresome and complex operation to perform.
> Using such a deployment platform, we can easily automate lots of pain
> points that I have mentioned earlier.
>
>
>
> Scalability
>
>
>
> We need a solution that can easily scalable depending on the load
> condition of several parts of the system. For example, the workers in the
> post processing pipeline should be able scaled up and down depending on the
> events come into the message queue.
>
>
>
> Availability
>
>
>
> We need to support solution to be deployed in multiple geographically
> distant data centers. When evaluating container management systems, we
> should consider this is as a primary requirement. However one thing that I
> am not sure is the availability mode that Airavata normally expect. Is it a
> active-active mode or active-passive mode?
>
>
>
> Service discovery
>
>
>
> Once we move in to microservice based deployment pattern, there could be
> scenarios where we want service discovery for several use cases. For
> example, if we are going to scale up API Server to handle an increased
> load, we might have to put a load balancer in between the client and API
> Server instances. In that case, service discovery is essential to instruct
> the load balancer with healthy API Server endpoints which are currently
> running in the system.
>
>
>
> Cluster coordination
>
>
>
> Although micorservices are supposed to be stateless in most of the cases,
> we might have scenarios to feed some state to particular micorservices. For
> example if we are going to implement a microservice that perform
> Orchestrator's role, there could be issues if we keep multiple instances of
> it in several data centers to increase the availability. According to my
> understanding, there should be only one Orchestrator being running at a
> time as it is the one who takes decisions of the job execution process. So,
> if we are going to keep multiple instances of it running in the system,
> there should be an some sort of a leader election in between Orchestrator
> quorum.
>
>
>
> Common messaging medium in between mocroservices
>
>
>
> This might be out of the scope but I thought of sharing with the team to
> have an general idea. Idea was raised at the hip chat discussion with
> Marlon and Gaourav. Using a common messaging medium might enable
> microservices to communicate with in a decoupled manner which will increase
> the scalability of the system. For example there is a reference
> architecture that we can utilize with kafka based messaging medium [1],
> [2]. However I noticed in one paper that Kafka was previously rejected as
> writing clients was onerous. Please share your views on this as I'm not
> familiar with the existing fan out model based on AMQP and  pain points of
> it.
>
>
>
> Those are the main areas that I have understood while going through
> Airavata current implementation and requirements stated in some of the
> research papers. Please let me know whether my understanding on above items
> are correct and suggestions are always welcome :)
>
>
>
> [1] https://medium.com/@ulymarins/an-introduction-to-apache-kafka-and-
> microservices-communication-bf0a0966d63
>
> [2] https://www.slideshare.net/ConfluentInc/microservices-in-the-apache-
> kafka-ecosystem
>
>
>
> References
>
>
>
> Marru, S., Gunathilake, L., Herath, C., Tangchaisin, P., Pierce, M.,
> Mattmann, C., Singh, R., Gunarathne, T., Chinthaka, E., Gardler, R. and
> Slominski, A., 2011, November. Apache airavata: a framework for distributed
> applications and computational workflows. In Proceedings of the 2011 ACM
> workshop on Gateway computing environments (pp. 21-28). ACM.
>
>
>
> Nakandala, S., Pamidighantam, S., Yodage, S., Doshi, N., Abeysinghe, E.,
> Kankanamalage, C.P., Marru, S. and Pierce, M., 2016, July. Anatomy of the
> SEAGrid Science Gateway. In Proceedings of the XSEDE16 Conference on
> Diversity, Big Data, and Science at Scale (p. 40). ACM.
>
>
>
> Pierce, Marlon E., Suresh Marru, Lahiru Gunathilake, Don Kushan Wijeratne,
> Raminder Singh, Chathuri Wimalasena, Shameera Ratnayaka, and Sudhakar
> Pamidighantam. "Apache Airavata: design and directions of a science gateway
> framework." Concurrency and Computation: Practice and Experience 27, no. 16
> (2015): 4282-4291.
>
>
>
> Pierce, Marlon, Suresh Marru, Borries Demeler, Raminderjeet Singh, and
> Gary Gorbet. "The apache airavata application programming interface:
> overview and evaluation with the UltraScan science gateway." In Proceedings
> of the 9th Gateway Computing Environments Workshop, pp. 25-29. IEEE Press,
> 2014.
>
>
>
> Marru, Suresh, Marlon Pierce, Sudhakar Pamidighantam, and Chathuri
> Wimalasena. "Apache Airavata as a laboratory: architecture and case study
> for component- based gateway middleware." In Proceedings of the 1st
> Workshop on The Science of Cyberinfrastructure: Research, Experience,
> Applications and Models, pp. 19-26. ACM, 2015.
>
>
>
> Thanks
>
> Dimuthu
>
>
>

Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

Posted by "Shenoy, Gourav Ganesh" <go...@indiana.edu>.

Sorry, missed the attachment in my previous email.

PS: DC/OS is just a recommendation for performing containerized deployment and application management for Airavata. I would be happy to consider alternative frameworks such as Kubernetes.

Thanks and Regards,
Gourav Shenoy

From: "Shenoy, Gourav Ganesh" <go...@indiana.edu>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Thursday, October 5, 2017 at 11:16 AM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

Hi Dimuthu,

Very good summary! I am not sure if you have, but DC/OS (DataCenter Operating System) is a container orchestration platform based on Apache Mesos. The beauty of DC/OS is the ease and simplicity of development/deployment; yet being extremely powerful in most of the parameters – multi-datacenter, multi-cloud, scalability, high availability, fault tolerance, load balancing, and more importantly the community support is fantastic.

DC/OS has an exhaustive service catalog, it’s more like a PAAS for containers (not just restricted to containers though) – you can run services like Spark, Kafka, RabbitMQ, etc out of the box with a single click install. And Apache Mesos as the underlying resource manager makes it seamless to deploy applications across different datacenters. There is a concept of SERVICE vs JOB – service is considered long running and DC/OS will make sure it keeps it running (if a service fails, it spins up a new one), whereas jobs are one time executors. This comes handy for using DC/OS as a target runtime for Airavata.

We used DC/OS for our class project to run the distributed task execution prototype we built (which uses RabbitMQ messaging). Here’s a link to the blog I have explaining the process: https://gouravshenoy.github.io/apache-airavata/spring17/2017/04/20/final-report.html . I have also attached a PDF paper we wrote as part of the class explaining the task execution process and one solution using rabbitmq messaging.

I had also started with the work of containerizing Airavata and a unified build + deployment mechanism with CI CD on DC/OS. Unfortunately, I couldn’t complete it due to time constraints, but I would be more than happy to work with you on this. Let me know and we can coordinate.

Thanks and Regards,
Gourav Shenoy

From: DImuthu Upeksha <di...@gmail.com>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Thursday, October 5, 2017 at 9:52 AM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

Hi Marlon,

Thanks for the input. I got your idea of availability mode and will keep in mind while designing the PoC. CI/CD is the one I have missed and thanks for pointing it out.

Thanks
Dimuthu

On Thu, Oct 5, 2017 at 7:04 PM, Pierce, Marlon <ma...@iu.edu>> wrote:
Thanks, Dimuthu, this is a good summary. Others may comment about Kafka, stateful versus stateless parts of Airavata, etc.  You may also find some of this discussion on the mailing list archives.

Active-active vs. active-passive is a good question, and we have typically thought of this in terms of individual Airavata components rather than the whole system.  Some components can be active-active (like a stateless application manager), while others (like the orchestrator example you give below) are stafefull and may be better as active-passive.

There is also the issue of system updates and continuous deployments, which could be added to your list.

Marlon

From: "dimuthu.upeksha2@gmail.com<ma...@gmail.com>" <di...@gmail.com>>
Reply-To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Date: Thursday, October 5, 2017 at 2:40 AM
To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Subject: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

Hi All,

Within last few days, I have been going through the requirements and design of current setup of Airavata and I identified following ares as the key focusing areas in the technology evaluation phase

Micorservices deployment platform (container management system)

Possible candidates: Google Kubernetes, Apache Mesos, Apache Helix
As the most of the operational units of Airavata is supposed to be moving into microservices based deployment pattern, having a unified deployment platform to manage those microservices will make the DevOps operations easier and faster. From the other hand, although writing and maintaining a single micro service is a somewhat straightforward way, making multiple microservies running, monitoring and maintaining the lifecycles manually in a production environment is an tiresome and complex operation to perform. Using such a deployment platform, we can easily automate lots of pain points that I have mentioned earlier.

Scalability

We need a solution that can easily scalable depending on the load condition of several parts of the system. For example, the workers in the post processing pipeline should be able scaled up and down depending on the events come into the message queue.

Availability

We need to support solution to be deployed in multiple geographically distant data centers. When evaluating container management systems, we should consider this is as a primary requirement. However one thing that I am not sure is the availability mode that Airavata normally expect. Is it a active-active mode or active-passive mode?

Service discovery

Once we move in to microservice based deployment pattern, there could be scenarios where we want service discovery for several use cases. For example, if we are going to scale up API Server to handle an increased load, we might have to put a load balancer in between the client and API Server instances. In that case, service discovery is essential to instruct the load balancer with healthy API Server endpoints which are currently running in the system.

Cluster coordination

Although micorservices are supposed to be stateless in most of the cases, we might have scenarios to feed some state to particular micorservices. For example if we are going to implement a microservice that perform Orchestrator's role, there could be issues if we keep multiple instances of it in several data centers to increase the availability. According to my understanding, there should be only one Orchestrator being running at a time as it is the one who takes decisions of the job execution process. So, if we are going to keep multiple instances of it running in the system, there should be an some sort of a leader election in between Orchestrator quorum.

Common messaging medium in between mocroservices

This might be out of the scope but I thought of sharing with the team to have an general idea. Idea was raised at the hip chat discussion with Marlon and Gaourav. Using a common messaging medium might enable microservices to communicate with in a decoupled manner which will increase the scalability of the system. For example there is a reference architecture that we can utilize with kafka based messaging medium [1], [2]. However I noticed in one paper that Kafka was previously rejected as writing clients was onerous. Please share your views on this as I'm not familiar with the existing fan out model based on AMQP and  pain points of it.

Those are the main areas that I have understood while going through Airavata current implementation and requirements stated in some of the research papers. Please let me know whether my understanding on above items are correct and suggestions are always welcome :)

[1] https://medium.com/@ulymarins/an-introduction-to-apache-kafka-and-microservices-communication-bf0a0966d63
[2] https://www.slideshare.net/ConfluentInc/microservices-in-the-apache-kafka-ecosystem

References

Marru, S., Gunathilake, L., Herath, C., Tangchaisin, P., Pierce, M., Mattmann, C., Singh, R., Gunarathne, T., Chinthaka, E., Gardler, R. and Slominski, A., 2011, November. Apache airavata: a framework for distributed applications and computational workflows. In Proceedings of the 2011 ACM workshop on Gateway computing environments (pp. 21-28). ACM.

Nakandala, S., Pamidighantam, S., Yodage, S., Doshi, N., Abeysinghe, E., Kankanamalage, C.P., Marru, S. and Pierce, M., 2016, July. Anatomy of the SEAGrid Science Gateway. In Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale (p. 40). ACM.

Pierce, Marlon E., Suresh Marru, Lahiru Gunathilake, Don Kushan Wijeratne, Raminder Singh, Chathuri Wimalasena, Shameera Ratnayaka, and Sudhakar Pamidighantam. "Apache Airavata: design and directions of a science gateway framework." Concurrency and Computation: Practice and Experience 27, no. 16 (2015): 4282-4291.

Pierce, Marlon, Suresh Marru, Borries Demeler, Raminderjeet Singh, and Gary Gorbet. "The apache airavata application programming interface: overview and evaluation with the UltraScan science gateway." In Proceedings of the 9th Gateway Computing Environments Workshop, pp. 25-29. IEEE Press, 2014.

Marru, Suresh, Marlon Pierce, Sudhakar Pamidighantam, and Chathuri Wimalasena. "Apache Airavata as a laboratory: architecture and case study for component- based gateway middleware." In Proceedings of the 1st Workshop on The Science of Cyberinfrastructure: Research, Experience, Applications and Models, pp. 19-26. ACM, 2015.

Thanks
Dimuthu

Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

Posted by "Shenoy, Gourav Ganesh" <go...@indiana.edu>.

Hi Dimuthu,

Very good summary! I am not sure if you have, but DC/OS (DataCenter Operating System) is a container orchestration platform based on Apache Mesos. The beauty of DC/OS is the ease and simplicity of development/deployment; yet being extremely powerful in most of the parameters – multi-datacenter, multi-cloud, scalability, high availability, fault tolerance, load balancing, and more importantly the community support is fantastic.

DC/OS has an exhaustive service catalog, it’s more like a PAAS for containers (not just restricted to containers though) – you can run services like Spark, Kafka, RabbitMQ, etc out of the box with a single click install. And Apache Mesos as the underlying resource manager makes it seamless to deploy applications across different datacenters. There is a concept of SERVICE vs JOB – service is considered long running and DC/OS will make sure it keeps it running (if a service fails, it spins up a new one), whereas jobs are one time executors. This comes handy for using DC/OS as a target runtime for Airavata.

We used DC/OS for our class project to run the distributed task execution prototype we built (which uses RabbitMQ messaging). Here’s a link to the blog I have explaining the process: https://gouravshenoy.github.io/apache-airavata/spring17/2017/04/20/final-report.html . I have also attached a PDF paper we wrote as part of the class explaining the task execution process and one solution using rabbitmq messaging.

I had also started with the work of containerizing Airavata and a unified build + deployment mechanism with CI CD on DC/OS. Unfortunately, I couldn’t complete it due to time constraints, but I would be more than happy to work with you on this. Let me know and we can coordinate.

Thanks and Regards,
Gourav Shenoy

From: DImuthu Upeksha <di...@gmail.com>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Thursday, October 5, 2017 at 9:52 AM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

Hi Marlon,

Thanks for the input. I got your idea of availability mode and will keep in mind while designing the PoC. CI/CD is the one I have missed and thanks for pointing it out.

Thanks
Dimuthu

On Thu, Oct 5, 2017 at 7:04 PM, Pierce, Marlon <ma...@iu.edu>> wrote:
Thanks, Dimuthu, this is a good summary. Others may comment about Kafka, stateful versus stateless parts of Airavata, etc.  You may also find some of this discussion on the mailing list archives.

Active-active vs. active-passive is a good question, and we have typically thought of this in terms of individual Airavata components rather than the whole system.  Some components can be active-active (like a stateless application manager), while others (like the orchestrator example you give below) are stafefull and may be better as active-passive.

There is also the issue of system updates and continuous deployments, which could be added to your list.

Marlon


From: "dimuthu.upeksha2@gmail.com<ma...@gmail.com>" <di...@gmail.com>>
Reply-To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Date: Thursday, October 5, 2017 at 2:40 AM
To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Subject: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

Hi All,

Within last few days, I have been going through the requirements and design of current setup of Airavata and I identified following ares as the key focusing areas in the technology evaluation phase

Micorservices deployment platform (container management system)

Possible candidates: Google Kubernetes, Apache Mesos, Apache Helix
As the most of the operational units of Airavata is supposed to be moving into microservices based deployment pattern, having a unified deployment platform to manage those microservices will make the DevOps operations easier and faster. From the other hand, although writing and maintaining a single micro service is a somewhat straightforward way, making multiple microservies running, monitoring and maintaining the lifecycles manually in a production environment is an tiresome and complex operation to perform. Using such a deployment platform, we can easily automate lots of pain points that I have mentioned earlier.

Scalability

We need a solution that can easily scalable depending on the load condition of several parts of the system. For example, the workers in the post processing pipeline should be able scaled up and down depending on the events come into the message queue.

Availability

We need to support solution to be deployed in multiple geographically distant data centers. When evaluating container management systems, we should consider this is as a primary requirement. However one thing that I am not sure is the availability mode that Airavata normally expect. Is it a active-active mode or active-passive mode?

Service discovery

Once we move in to microservice based deployment pattern, there could be scenarios where we want service discovery for several use cases. For example, if we are going to scale up API Server to handle an increased load, we might have to put a load balancer in between the client and API Server instances. In that case, service discovery is essential to instruct the load balancer with healthy API Server endpoints which are currently running in the system.

Cluster coordination

Although micorservices are supposed to be stateless in most of the cases, we might have scenarios to feed some state to particular micorservices. For example if we are going to implement a microservice that perform Orchestrator's role, there could be issues if we keep multiple instances of it in several data centers to increase the availability. According to my understanding, there should be only one Orchestrator being running at a time as it is the one who takes decisions of the job execution process. So, if we are going to keep multiple instances of it running in the system, there should be an some sort of a leader election in between Orchestrator quorum.

Common messaging medium in between mocroservices

This might be out of the scope but I thought of sharing with the team to have an general idea. Idea was raised at the hip chat discussion with Marlon and Gaourav. Using a common messaging medium might enable microservices to communicate with in a decoupled manner which will increase the scalability of the system. For example there is a reference architecture that we can utilize with kafka based messaging medium [1], [2]. However I noticed in one paper that Kafka was previously rejected as writing clients was onerous. Please share your views on this as I'm not familiar with the existing fan out model based on AMQP and  pain points of it.

Those are the main areas that I have understood while going through Airavata current implementation and requirements stated in some of the research papers. Please let me know whether my understanding on above items are correct and suggestions are always welcome :)

[1] https://medium.com/@ulymarins/an-introduction-to-apache-kafka-and-microservices-communication-bf0a0966d63
[2] https://www.slideshare.net/ConfluentInc/microservices-in-the-apache-kafka-ecosystem

References

Marru, S., Gunathilake, L., Herath, C., Tangchaisin, P., Pierce, M., Mattmann, C., Singh, R., Gunarathne, T., Chinthaka, E., Gardler, R. and Slominski, A., 2011, November. Apache airavata: a framework for distributed applications and computational workflows. In Proceedings of the 2011 ACM workshop on Gateway computing environments (pp. 21-28). ACM.

Nakandala, S., Pamidighantam, S., Yodage, S., Doshi, N., Abeysinghe, E., Kankanamalage, C.P., Marru, S. and Pierce, M., 2016, July. Anatomy of the SEAGrid Science Gateway. In Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale (p. 40). ACM.

Pierce, Marlon E., Suresh Marru, Lahiru Gunathilake, Don Kushan Wijeratne, Raminder Singh, Chathuri Wimalasena, Shameera Ratnayaka, and Sudhakar Pamidighantam. "Apache Airavata: design and directions of a science gateway framework." Concurrency and Computation: Practice and Experience 27, no. 16 (2015): 4282-4291.

Pierce, Marlon, Suresh Marru, Borries Demeler, Raminderjeet Singh, and Gary Gorbet. "The apache airavata application programming interface: overview and evaluation with the UltraScan science gateway." In Proceedings of the 9th Gateway Computing Environments Workshop, pp. 25-29. IEEE Press, 2014.

Marru, Suresh, Marlon Pierce, Sudhakar Pamidighantam, and Chathuri Wimalasena. "Apache Airavata as a laboratory: architecture and case study for component- based gateway middleware." In Proceedings of the 1st Workshop on The Science of Cyberinfrastructure: Research, Experience, Applications and Models, pp. 19-26. ACM, 2015.

Thanks
Dimuthu

Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

Posted by DImuthu Upeksha <di...@gmail.com>.

Hi Marlon,

Thanks for the input. I got your idea of availability mode and will keep in
mind while designing the PoC. CI/CD is the one I have missed and thanks for
pointing it out.

Thanks
Dimuthu

On Thu, Oct 5, 2017 at 7:04 PM, Pierce, Marlon <ma...@iu.edu> wrote:

> Thanks, Dimuthu, this is a good summary. Others may comment about Kafka,
> stateful versus stateless parts of Airavata, etc.  You may also find some
> of this discussion on the mailing list archives.
>
>
>
> Active-active vs. active-passive is a good question, and we have typically
> thought of this in terms of individual Airavata components rather than the
> whole system.  Some components can be active-active (like a stateless
> application manager), while others (like the orchestrator example you give
> below) are stafefull and may be better as active-passive.
>
>
>
> There is also the issue of system updates and continuous deployments,
> which could be added to your list.
>
>
>
> Marlon
>
>
>
>
>
> *From: *"dimuthu.upeksha2@gmail.com" <di...@gmail.com>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Thursday, October 5, 2017 at 2:40 AM
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Linked Container Services for Apache Airavata Components -
> Phase 1 - Requirement identification
>
>
>
> Hi All,
>
>
>
> Within last few days, I have been going through the requirements and
> design of current setup of Airavata and I identified following ares as the
> key focusing areas in the technology evaluation phase
>
>
>
> Micorservices deployment platform (container management system)
>
>
>
> Possible candidates: Google Kubernetes, Apache Mesos, Apache Helix
>
> As the most of the operational units of Airavata is supposed to be moving
> into microservices based deployment pattern, having a unified deployment
> platform to manage those microservices will make the DevOps operations
> easier and faster. From the other hand, although writing and maintaining a
> single micro service is a somewhat straightforward way, making multiple
> microservies running, monitoring and maintaining the lifecycles manually in
> a production environment is an tiresome and complex operation to perform.
> Using such a deployment platform, we can easily automate lots of pain
> points that I have mentioned earlier.
>
>
>
> Scalability
>
>
>
> We need a solution that can easily scalable depending on the load
> condition of several parts of the system. For example, the workers in the
> post processing pipeline should be able scaled up and down depending on the
> events come into the message queue.
>
>
>
> Availability
>
>
>
> We need to support solution to be deployed in multiple geographically
> distant data centers. When evaluating container management systems, we
> should consider this is as a primary requirement. However one thing that I
> am not sure is the availability mode that Airavata normally expect. Is it a
> active-active mode or active-passive mode?
>
>
>
> Service discovery
>
>
>
> Once we move in to microservice based deployment pattern, there could be
> scenarios where we want service discovery for several use cases. For
> example, if we are going to scale up API Server to handle an increased
> load, we might have to put a load balancer in between the client and API
> Server instances. In that case, service discovery is essential to instruct
> the load balancer with healthy API Server endpoints which are currently
> running in the system.
>
>
>
> Cluster coordination
>
>
>
> Although micorservices are supposed to be stateless in most of the cases,
> we might have scenarios to feed some state to particular micorservices. For
> example if we are going to implement a microservice that perform
> Orchestrator's role, there could be issues if we keep multiple instances of
> it in several data centers to increase the availability. According to my
> understanding, there should be only one Orchestrator being running at a
> time as it is the one who takes decisions of the job execution process. So,
> if we are going to keep multiple instances of it running in the system,
> there should be an some sort of a leader election in between Orchestrator
> quorum.
>
>
>
> Common messaging medium in between mocroservices
>
>
>
> This might be out of the scope but I thought of sharing with the team to
> have an general idea. Idea was raised at the hip chat discussion with
> Marlon and Gaourav. Using a common messaging medium might enable
> microservices to communicate with in a decoupled manner which will increase
> the scalability of the system. For example there is a reference
> architecture that we can utilize with kafka based messaging medium [1],
> [2]. However I noticed in one paper that Kafka was previously rejected as
> writing clients was onerous. Please share your views on this as I'm not
> familiar with the existing fan out model based on AMQP and  pain points of
> it.
>
>
>
> Those are the main areas that I have understood while going through
> Airavata current implementation and requirements stated in some of the
> research papers. Please let me know whether my understanding on above items
> are correct and suggestions are always welcome :)
>
>
>
> [1] https://medium.com/@ulymarins/an-introduction-to-apache-kafka-and-
> microservices-communication-bf0a0966d63
>
> [2] https://www.slideshare.net/ConfluentInc/microservices-in-the-apache-
> kafka-ecosystem
>
>
>
> References
>
>
>
> Marru, S., Gunathilake, L., Herath, C., Tangchaisin, P., Pierce, M.,
> Mattmann, C., Singh, R., Gunarathne, T., Chinthaka, E., Gardler, R. and
> Slominski, A., 2011, November. Apache airavata: a framework for distributed
> applications and computational workflows. In Proceedings of the 2011 ACM
> workshop on Gateway computing environments (pp. 21-28). ACM.
>
>
>
> Nakandala, S., Pamidighantam, S., Yodage, S., Doshi, N., Abeysinghe, E.,
> Kankanamalage, C.P., Marru, S. and Pierce, M., 2016, July. Anatomy of the
> SEAGrid Science Gateway. In Proceedings of the XSEDE16 Conference on
> Diversity, Big Data, and Science at Scale (p. 40). ACM.
>
>
>
> Pierce, Marlon E., Suresh Marru, Lahiru Gunathilake, Don Kushan Wijeratne,
> Raminder Singh, Chathuri Wimalasena, Shameera Ratnayaka, and Sudhakar
> Pamidighantam. "Apache Airavata: design and directions of a science gateway
> framework." Concurrency and Computation: Practice and Experience 27, no. 16
> (2015): 4282-4291.
>
>
>
> Pierce, Marlon, Suresh Marru, Borries Demeler, Raminderjeet Singh, and
> Gary Gorbet. "The apache airavata application programming interface:
> overview and evaluation with the UltraScan science gateway." In Proceedings
> of the 9th Gateway Computing Environments Workshop, pp. 25-29. IEEE Press,
> 2014.
>
>
>
> Marru, Suresh, Marlon Pierce, Sudhakar Pamidighantam, and Chathuri
> Wimalasena. "Apache Airavata as a laboratory: architecture and case study
> for component- based gateway middleware." In Proceedings of the 1st
> Workshop on The Science of Cyberinfrastructure: Research, Experience,
> Applications and Models, pp. 19-26. ACM, 2015.
>
>
>
> Thanks
>
> Dimuthu
>

Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

Posted by "Pierce, Marlon" <ma...@iu.edu>.

Thanks, Dimuthu, this is a good summary. Others may comment about Kafka, stateful versus stateless parts of Airavata, etc.  You may also find some of this discussion on the mailing list archives.

 

Active-active vs. active-passive is a good question, and we have typically thought of this in terms of individual Airavata components rather than the whole system.  Some components can be active-active (like a stateless application manager), while others (like the orchestrator example you give below) are stafefull and may be better as active-passive.  

 

There is also the issue of system updates and continuous deployments, which could be added to your list.

 

Marlon

 

 

From: "dimuthu.upeksha2@gmail.com" <di...@gmail.com>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Thursday, October 5, 2017 at 2:40 AM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification

 

Hi All,

 

Within last few days, I have been going through the requirements and design of current setup of Airavata and I identified following ares as the key focusing areas in the technology evaluation phase

 

Micorservices deployment platform (container management system) 

 

Possible candidates: Google Kubernetes, Apache Mesos, Apache Helix 

As the most of the operational units of Airavata is supposed to be moving into microservices based deployment pattern, having a unified deployment platform to manage those microservices will make the DevOps operations easier and faster. From the other hand, although writing and maintaining a single micro service is a somewhat straightforward way, making multiple microservies running, monitoring and maintaining the lifecycles manually in a production environment is an tiresome and complex operation to perform. Using such a deployment platform, we can easily automate lots of pain points that I have mentioned earlier. 

 

Scalability

 

We need a solution that can easily scalable depending on the load condition of several parts of the system. For example, the workers in the post processing pipeline should be able scaled up and down depending on the events come into the message queue. 

 

Availability

 

We need to support solution to be deployed in multiple geographically distant data centers. When evaluating container management systems, we should consider this is as a primary requirement. However one thing that I am not sure is the availability mode that Airavata normally expect. Is it a active-active mode or active-passive mode? 

 

Service discovery

 

Once we move in to microservice based deployment pattern, there could be scenarios where we want service discovery for several use cases. For example, if we are going to scale up API Server to handle an increased load, we might have to put a load balancer in between the client and API Server instances. In that case, service discovery is essential to instruct the load balancer with healthy API Server endpoints which are currently running in the system.

 

Cluster coordination

 

Although micorservices are supposed to be stateless in most of the cases, we might have scenarios to feed some state to particular micorservices. For example if we are going to implement a microservice that perform Orchestrator's role, there could be issues if we keep multiple instances of it in several data centers to increase the availability. According to my understanding, there should be only one Orchestrator being running at a time as it is the one who takes decisions of the job execution process. So, if we are going to keep multiple instances of it running in the system, there should be an some sort of a leader election in between Orchestrator quorum.

 

Common messaging medium in between mocroservices

 

This might be out of the scope but I thought of sharing with the team to have an general idea. Idea was raised at the hip chat discussion with Marlon and Gaourav. Using a common messaging medium might enable microservices to communicate with in a decoupled manner which will increase the scalability of the system. For example there is a reference architecture that we can utilize with kafka based messaging medium [1], [2]. However I noticed in one paper that Kafka was previously rejected as writing clients was onerous. Please share your views on this as I'm not familiar with the existing fan out model based on AMQP and  pain points of it. 

 

Those are the main areas that I have understood while going through Airavata current implementation and requirements stated in some of the research papers. Please let me know whether my understanding on above items are correct and suggestions are always welcome :)

 

[1] https://medium.com/@ulymarins/an-introduction-to-apache-kafka-and-microservices-communication-bf0a0966d63

[2] https://www.slideshare.net/ConfluentInc/microservices-in-the-apache-kafka-ecosystem

 

References

 

Marru, S., Gunathilake, L., Herath, C., Tangchaisin, P., Pierce, M., Mattmann, C., Singh, R., Gunarathne, T., Chinthaka, E., Gardler, R. and Slominski, A., 2011, November. Apache airavata: a framework for distributed applications and computational workflows. In Proceedings of the 2011 ACM workshop on Gateway computing environments (pp. 21-28). ACM.

 

Nakandala, S., Pamidighantam, S., Yodage, S., Doshi, N., Abeysinghe, E., Kankanamalage, C.P., Marru, S. and Pierce, M., 2016, July. Anatomy of the SEAGrid Science Gateway. In Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale (p. 40). ACM.

 

Pierce, Marlon E., Suresh Marru, Lahiru Gunathilake, Don Kushan Wijeratne, Raminder Singh, Chathuri Wimalasena, Shameera Ratnayaka, and Sudhakar Pamidighantam. "Apache Airavata: design and directions of a science gateway framework." Concurrency and Computation: Practice and Experience 27, no. 16 (2015): 4282-4291.

 

Pierce, Marlon, Suresh Marru, Borries Demeler, Raminderjeet Singh, and Gary Gorbet. "The apache airavata application programming interface: overview and evaluation with the UltraScan science gateway." In Proceedings of the 9th Gateway Computing Environments Workshop, pp. 25-29. IEEE Press, 2014.

 

Marru, Suresh, Marlon Pierce, Sudhakar Pamidighantam, and Chathuri Wimalasena. "Apache Airavata as a laboratory: architecture and case study for component- based gateway middleware." In Proceedings of the 1st Workshop on The Science of Cyberinfrastructure: Research, Experience, Applications and Models, pp. 19-26. ACM, 2015.

 

Thanks

Dimuthu