You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@airavata.apache.org by DImuthu Upeksha <di...@gmail.com> on 2017/10/30 14:45:38 UTC

Linked Container Services for Apache Airavata Components - Phase 2 - Initial Prototype

Hi All,

Based on the analysis of Phase 1, within past two weeks I have been working
on implementing a task execution workflow following the microservices
deployment pattern and Kubernetes as the deployment platform.

Please find attached design document that explains the components and
messaging interactions between components. Based on that design, I have
implemented following components

1. Set of microservices to compose the workflow
2. A simple Web Console to  deploy and monitor workflows on the framework

I used Kakfa as the primary messaging medium to communicate among the
microservices due to its simplicity and powerful features like partitions
and consumer groups.

I have attached a user guide so that you can install and try this in your
local machine. And source code for each component can be found from [1]

Please share you ideas and suggestions.

Thanks
Dimuthu

[1]
https://github.com/DImuthuUpe/airavata/tree/master/sandbox/airavata-kubernetes
[2]
https://docs.google.com/document/d/1R1xrmuPldHiWVDn4xNVay9Vnxn9FODQZXtF55JxJpSY/edit?usp=sharing
[3]
https://docs.google.com/document/d/1A5eRIZiuUj4ShZVMS0NdAxjAxtOTZXculaYDCZ7IMQ8/edit?usp=sharing

Re: Linked Container Services for Apache Airavata Components - Phase 2 - Initial Prototype

Posted by DImuthu Upeksha <di...@gmail.com>.

To be clear, not implemented part that I have mentioned above is the
resuming part of the Task Scheduler. Up to that point including Task
Scheduler getting feedback form the executing tasks and Event Sinks
persisting the status of the task are already there.


On Fri, Nov 3, 2017 at 5:34 AM, DImuthu Upeksha <di...@gmail.com>
wrote:

> Hi Marlon,
>
> Then Task Executor drops the status message to Task-Event topic. Sample
> status message is like this
>
> <ProcessId>,<TaskId>,<Status>,<Reason>
>
>  This is a broadcasting message and all Task Schedulers receives this
> message and the corresponding Task Scheduler which executes the DAG that
> contains this Task will stop DAG execution. Others will safely ignore this
> message. However you might have an issue like, what if that Task Scheduler
> got killed in between? If you can look at the design diagram, there is an
> another type of microservice (Event-Sink) which is reading from the
> Task-Event topic. What they does is, they read each status event from the
> topic and persist them in the database through the API server. That helps
> us to continue the same same DAG in a new Task Scheduler. However in
> current version, this feature was not implemented.
>
>
> 
> 
>
> Thanks
> Dimuthu
>
> On Fri, Nov 3, 2017 at 12:46 AM, Pierce, Marlon <ma...@iu.edu> wrote:
>
>> Hi Dimuthu,
>>
>>
>>
>> Thanks for the clarification on the first issue.  The second issue is
>> more about how you handle external errors in your approach: the SSH
>> connection times out, for instance, and all the retries fail, so the Task
>> Executor needs to report back that the command failed.
>>
>>
>>
>> Marlon
>>
>>
>>
>>
>>
>> *From: *"dimuthu.upeksha2@gmail.com" <di...@gmail.com>
>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Date: *Thursday, November 2, 2017 at 7:33 AM
>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Subject: *Re: Linked Container Services for Apache Airavata Components
>> - Phase 2 - Initial Prototype
>>
>>
>>
>> Hi Marlon,
>>
>>
>>
>> Thanks for the comments and please find the explanations for those
>> comments below.
>>
>>
>>
>> *Transactionality*
>>
>>
>>
>> Kafka has a rich acknowledgment mechanism. Actually there are several
>> acknowledgement levels. Not like in RabbitMQ where messages are removed
>> from the queue once read, Kafka keeps all the messages of the topic for a
>> given time. Until that time expires, consumers are free to go back and
>> forth on the topic to read previously read messages. However if we want to
>> keep track of the read messages by a particular consumer (actually a
>> consumer group) we can use kafka's Current Offset and Committed Offset to
>> achieve this [1]. A consumer can read more than one messages from a
>> partition of the topic. When it is done, Kafka keeps track of the pointer
>> (offset) where that particular consumer finally read from the partition.
>> That is the current offset. However it does not mean that messages are
>> properly consumed by the consumer. Once the messages are consumed, consumer
>> acknowledges that it has consumed x number of messages of the already read
>> messages. Then Kafka keeps track of the last pointer where that particular
>> consumer has acknowledged. This is the Committed Offset. Let's assume that
>> the particular Consumer goes down and a new Consumer from the same consumer
>> group starts to read from vacant partition. Then it will start to read from
>> the Committed offset of last consumer. So to achieve the behavior that you
>> have mentioned, it can be done by controlling the acknowledgement mechanism
>> of a Consumer. There are mainly 2 acknowledgement mechanisms in Kafka
>>
>>
>>
>> 1. Auto commit - Once a consumer reads a message or set of messages at
>> once, it will auto commit after a given time interval
>>
>> 2. Manual commit - Developer has the handle to decide whether to
>> acknowledge or not for a particular message
>>
>>
>>
>> From above two methods we have to use manual commit as we have to make
>> sure that messages are acknowledged once it was properly handled. Manual
>> commit also has several levels and modes of acknowledgements.
>>
>>
>>
>> 1. Per Message acknowledgement
>>
>> 2. Per Message set acknowledgement
>>
>>
>>
>> 1. Asynchronous Mode
>>
>> 2. Synchronous Mode
>>
>>
>>
>> Here we use per message synchronous acknowledgement as we want the
>> highest level of transactionality. In that case, once we read a message
>> from a Kafka topic, we parse it and do the necessary operations and finally
>> acknowledges once the message has been successfully handled. If something
>> went wrong (consumer failed) and read messages were not acknowledged, new
>> consumer will continue from last acknowledged position. I have implemented
>> PoC code that demonstrates above concept with a single producer and 3
>> consumers [2]. You can find the screen recording of the test from here [3].
>>
>>
>>
>> If  you need 100% transactionality form consumer side, it is also
>> possible with an external transactional scope like database operations.
>> However in our case it is not required as the operations that we are doing
>> inside the message handling operation are not reversible. Let's talk more
>> about this approach in future.
>>
>>
>>
>> *Slow SSH communication*
>>
>>
>>
>> If you are expecting to improve the communication mechanism instead of
>> SSH, one option is to use agents inside comupte resources and communicate
>> with them in a optimized messaging protocol. I have provided an approach in
>> the previous thread. But your concern is more about being the whole
>> pipeline blocked due to a slow task not releasing the message from the
>> topic, there is a solution for that. When we are publishing messages to a
>> Kafka topic, they are distributed among the partitions of the topic.
>> Consumers are reading from those partitions. If a consumer becomes slow,
>> only the message read from that partitions will become slow. Other
>> partitions will work without any issue. So we can have a higher number of
>> partitions in a topic to handle that. Further if we want to go further like
>> grouping slow tasks to a one set and allowing low latency tasks to an other
>> set by using a custom partitioner [5]
>>
>>
>>
>> *Registry*
>>
>>
>>
>> It is already there [6]. Actually in this implementation, API server acts
>> as the registry itself. It stores experiment objects and current status of
>> the cluster. However As Gaurav has pointed out in the next email, that
>> would be better if this is separated into multiple parts. What do you think?
>>
>>
>>
>> [1] https://www.youtube.com/watch?v=kZT8v2_b2XE&index=15&lis
>> t=PLkz1SCf5iB4enAR00Z46JwY9GGkaS2NON
>>
>> [2] https://github.com/DImuthuUpe/kafka-transactionality
>>
>> [3] https://www.youtube.com/watch?v=j6bOVLUlyf4&feature=youtu.be
>>
>> [4] https://www.youtube.com/watch?v=AshMNCxSp3c&list=PLkz1SC
>> f5iB4enAR00Z46JwY9GGkaS2NON&index=17
>>
>> [5] https://www.youtube.com/watch?v=pMDAcNRkWkE&index=10&lis
>> t=PLkz1SCf5iB4enAR00Z46JwY9GGkaS2NON
>>
>> [6] https://github.com/apache/airavata-sandbox/tree/master/a
>> iravata-kubernetes/modules/microservices/api-server/src/main
>> /java/org/apache/airavata/k8s/api/server
>>
>>
>>
>> On Thu, Nov 2, 2017 at 2:52 AM, Pierce, Marlon <ma...@iu.edu> wrote:
>>
>> Hi Dimuthu,
>>
>>
>>
>> Thanks for sending this very thoughtful document. A couple of comments:
>>
>>
>>
>> * Use of Kafka instead of RabbitMQ is interesting. Can you say more about
>> how this approach can handle Kafka client failures?  For RabbitMQ, for
>> example, there is the simple “Work Queue” approach in which the broker
>> pushes a task to a worker. The task remains in queue until the worker sends
>> an acknowledgement that the job has been handled, not just received.
>> “Handled” may mean for example that the job has been submitted to an
>> external batch scheduler over SSH, which may require some retries, etc.
>> If the worker crashes before the job has been submitted, then the broker
>> can resend the message to another worker.   I’m wondering how your
>> Kafka-based solution would handle the same issue.
>>
>>
>>
>> * A simpler but more common failure is communicating with external
>> resources. A task executor may need to SSH to a remote resource, which can
>> fail (the resource is slow to communicate, usually). How do you handle this
>> case?
>>
>>
>>
>> * Your design focuses on Airavata’s experiment execution handling.
>> Airavata’s registry is another important component: this is where
>> experiment objects get persistently stored. The registry stores metadata
>> about both “live” experiments that are currently executing as well as
>> archived experiments that have completed.
>>
>>
>>
>> How would you extend your architecture to include the registry?
>>
>>
>>
>> Marlon
>>
>>
>>
>>
>>
>> *From: *"dimuthu.upeksha2@gmail.com" <di...@gmail.com>
>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Date: *Monday, October 30, 2017 at 10:45 AM
>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Subject: *Linked Container Services for Apache Airavata Components -
>> Phase 2 - Initial Prototype
>>
>>
>>
>> Hi All,
>>
>>
>>
>> Based on the analysis of Phase 1, within past two weeks I have been
>> working on implementing a task execution workflow following the
>> microservices deployment pattern and Kubernetes as the deployment platform.
>>
>>
>>
>> Please find attached design document that explains the components and
>> messaging interactions between components. Based on that design, I have
>> implemented following components
>>
>>
>>
>> 1. Set of microservices to compose the workflow
>>
>> 2. A simple Web Console to  deploy and monitor workflows on the framework
>>
>>
>>
>> I used Kakfa as the primary messaging medium to communicate among the
>> microservices due to its simplicity and powerful features like partitions
>> and consumer groups.
>>
>>
>>
>> I have attached a user guide so that you can install and try this in your
>> local machine. And source code for each component can be found from [1]
>>
>>
>>
>> Please share you ideas and suggestions.
>>
>>
>>
>> Thanks
>>
>> Dimuthu
>>
>>
>>
>> [1] https://github.com/DImuthuUpe/airavata/tree/master/
>> sandbox/airavata-kubernetes
>>
>> [2] https://docs.google.com/document/d/1R1xrmuPldHiWVDn4xNVa
>> y9Vnxn9FODQZXtF55JxJpSY/edit?usp=sharing
>>
>> [3] https://docs.google.com/document/d/1A5eRIZiuUj4ShZVMS0Nd
>> AxjAxtOTZXculaYDCZ7IMQ8/edit?usp=sharing
>>
>>
>>
>
>

Re: Linked Container Services for Apache Airavata Components - Phase 2 - Initial Prototype

Posted by DImuthu Upeksha <di...@gmail.com>.

Hi Marlon,

Then Task Executor drops the status message to Task-Event topic. Sample
status message is like this

<ProcessId>,<TaskId>,<Status>,<Reason>

 This is a broadcasting message and all Task Schedulers receives this
message and the corresponding Task Scheduler which executes the DAG that
contains this Task will stop DAG execution. Others will safely ignore this
message. However you might have an issue like, what if that Task Scheduler
got killed in between? If you can look at the design diagram, there is an
another type of microservice (Event-Sink) which is reading from the
Task-Event topic. What they does is, they read each status event from the
topic and persist them in the database through the API server. That helps
us to continue the same same DAG in a new Task Scheduler. However in
current version, this feature was not implemented.





Thanks
Dimuthu

On Fri, Nov 3, 2017 at 12:46 AM, Pierce, Marlon <ma...@iu.edu> wrote:

> Hi Dimuthu,
>
>
>
> Thanks for the clarification on the first issue.  The second issue is more
> about how you handle external errors in your approach: the SSH connection
> times out, for instance, and all the retries fail, so the Task Executor
> needs to report back that the command failed.
>
>
>
> Marlon
>
>
>
>
>
> *From: *"dimuthu.upeksha2@gmail.com" <di...@gmail.com>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Thursday, November 2, 2017 at 7:33 AM
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Re: Linked Container Services for Apache Airavata Components -
> Phase 2 - Initial Prototype
>
>
>
> Hi Marlon,
>
>
>
> Thanks for the comments and please find the explanations for those
> comments below.
>
>
>
> *Transactionality*
>
>
>
> Kafka has a rich acknowledgment mechanism. Actually there are several
> acknowledgement levels. Not like in RabbitMQ where messages are removed
> from the queue once read, Kafka keeps all the messages of the topic for a
> given time. Until that time expires, consumers are free to go back and
> forth on the topic to read previously read messages. However if we want to
> keep track of the read messages by a particular consumer (actually a
> consumer group) we can use kafka's Current Offset and Committed Offset to
> achieve this [1]. A consumer can read more than one messages from a
> partition of the topic. When it is done, Kafka keeps track of the pointer
> (offset) where that particular consumer finally read from the partition.
> That is the current offset. However it does not mean that messages are
> properly consumed by the consumer. Once the messages are consumed, consumer
> acknowledges that it has consumed x number of messages of the already read
> messages. Then Kafka keeps track of the last pointer where that particular
> consumer has acknowledged. This is the Committed Offset. Let's assume that
> the particular Consumer goes down and a new Consumer from the same consumer
> group starts to read from vacant partition. Then it will start to read from
> the Committed offset of last consumer. So to achieve the behavior that you
> have mentioned, it can be done by controlling the acknowledgement mechanism
> of a Consumer. There are mainly 2 acknowledgement mechanisms in Kafka
>
>
>
> 1. Auto commit - Once a consumer reads a message or set of messages at
> once, it will auto commit after a given time interval
>
> 2. Manual commit - Developer has the handle to decide whether to
> acknowledge or not for a particular message
>
>
>
> From above two methods we have to use manual commit as we have to make
> sure that messages are acknowledged once it was properly handled. Manual
> commit also has several levels and modes of acknowledgements.
>
>
>
> 1. Per Message acknowledgement
>
> 2. Per Message set acknowledgement
>
>
>
> 1. Asynchronous Mode
>
> 2. Synchronous Mode
>
>
>
> Here we use per message synchronous acknowledgement as we want the highest
> level of transactionality. In that case, once we read a message from a
> Kafka topic, we parse it and do the necessary operations and finally
> acknowledges once the message has been successfully handled. If something
> went wrong (consumer failed) and read messages were not acknowledged, new
> consumer will continue from last acknowledged position. I have implemented
> PoC code that demonstrates above concept with a single producer and 3
> consumers [2]. You can find the screen recording of the test from here [3].
>
>
>
> If  you need 100% transactionality form consumer side, it is also possible
> with an external transactional scope like database operations. However in
> our case it is not required as the operations that we are doing inside the
> message handling operation are not reversible. Let's talk more about this
> approach in future.
>
>
>
> *Slow SSH communication*
>
>
>
> If you are expecting to improve the communication mechanism instead of
> SSH, one option is to use agents inside comupte resources and communicate
> with them in a optimized messaging protocol. I have provided an approach in
> the previous thread. But your concern is more about being the whole
> pipeline blocked due to a slow task not releasing the message from the
> topic, there is a solution for that. When we are publishing messages to a
> Kafka topic, they are distributed among the partitions of the topic.
> Consumers are reading from those partitions. If a consumer becomes slow,
> only the message read from that partitions will become slow. Other
> partitions will work without any issue. So we can have a higher number of
> partitions in a topic to handle that. Further if we want to go further like
> grouping slow tasks to a one set and allowing low latency tasks to an other
> set by using a custom partitioner [5]
>
>
>
> *Registry*
>
>
>
> It is already there [6]. Actually in this implementation, API server acts
> as the registry itself. It stores experiment objects and current status of
> the cluster. However As Gaurav has pointed out in the next email, that
> would be better if this is separated into multiple parts. What do you think?
>
>
>
> [1] https://www.youtube.com/watch?v=kZT8v2_b2XE&index=15&list=
> PLkz1SCf5iB4enAR00Z46JwY9GGkaS2NON
>
> [2] https://github.com/DImuthuUpe/kafka-transactionality
>
> [3] https://www.youtube.com/watch?v=j6bOVLUlyf4&feature=youtu.be
>
> [4] https://www.youtube.com/watch?v=AshMNCxSp3c&list=
> PLkz1SCf5iB4enAR00Z46JwY9GGkaS2NON&index=17
>
> [5] https://www.youtube.com/watch?v=pMDAcNRkWkE&index=10&list=
> PLkz1SCf5iB4enAR00Z46JwY9GGkaS2NON
>
> [6] https://github.com/apache/airavata-sandbox/tree/master/
> airavata-kubernetes/modules/microservices/api-server/src/
> main/java/org/apache/airavata/k8s/api/server
>
>
>
> On Thu, Nov 2, 2017 at 2:52 AM, Pierce, Marlon <ma...@iu.edu> wrote:
>
> Hi Dimuthu,
>
>
>
> Thanks for sending this very thoughtful document. A couple of comments:
>
>
>
> * Use of Kafka instead of RabbitMQ is interesting. Can you say more about
> how this approach can handle Kafka client failures?  For RabbitMQ, for
> example, there is the simple “Work Queue” approach in which the broker
> pushes a task to a worker. The task remains in queue until the worker sends
> an acknowledgement that the job has been handled, not just received.
> “Handled” may mean for example that the job has been submitted to an
> external batch scheduler over SSH, which may require some retries, etc.
> If the worker crashes before the job has been submitted, then the broker
> can resend the message to another worker.   I’m wondering how your
> Kafka-based solution would handle the same issue.
>
>
>
> * A simpler but more common failure is communicating with external
> resources. A task executor may need to SSH to a remote resource, which can
> fail (the resource is slow to communicate, usually). How do you handle this
> case?
>
>
>
> * Your design focuses on Airavata’s experiment execution handling.
> Airavata’s registry is another important component: this is where
> experiment objects get persistently stored. The registry stores metadata
> about both “live” experiments that are currently executing as well as
> archived experiments that have completed.
>
>
>
> How would you extend your architecture to include the registry?
>
>
>
> Marlon
>
>
>
>
>
> *From: *"dimuthu.upeksha2@gmail.com" <di...@gmail.com>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Monday, October 30, 2017 at 10:45 AM
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Linked Container Services for Apache Airavata Components -
> Phase 2 - Initial Prototype
>
>
>
> Hi All,
>
>
>
> Based on the analysis of Phase 1, within past two weeks I have been
> working on implementing a task execution workflow following the
> microservices deployment pattern and Kubernetes as the deployment platform.
>
>
>
> Please find attached design document that explains the components and
> messaging interactions between components. Based on that design, I have
> implemented following components
>
>
>
> 1. Set of microservices to compose the workflow
>
> 2. A simple Web Console to  deploy and monitor workflows on the framework
>
>
>
> I used Kakfa as the primary messaging medium to communicate among the
> microservices due to its simplicity and powerful features like partitions
> and consumer groups.
>
>
>
> I have attached a user guide so that you can install and try this in your
> local machine. And source code for each component can be found from [1]
>
>
>
> Please share you ideas and suggestions.
>
>
>
> Thanks
>
> Dimuthu
>
>
>
> [1] https://github.com/DImuthuUpe/airavata/tree/master/sandbox/airavata-
> kubernetes
>
> [2] https://docs.google.com/document/d/1R1xrmuPldHiWVDn4xNVay9Vnxn9FO
> DQZXtF55JxJpSY/edit?usp=sharing
>
> [3] https://docs.google.com/document/d/1A5eRIZiuUj4ShZVMS0NdAxjAxtOTZ
> XculaYDCZ7IMQ8/edit?usp=sharing
>
>
>

Re: Linked Container Services for Apache Airavata Components - Phase 2 - Initial Prototype

Posted by "Pierce, Marlon" <ma...@iu.edu>.

Hi Dimuthu,

 

Thanks for the clarification on the first issue.  The second issue is more about how you handle external errors in your approach: the SSH connection times out, for instance, and all the retries fail, so the Task Executor needs to report back that the command failed.

 

Marlon

 

 

From: "dimuthu.upeksha2@gmail.com" <di...@gmail.com>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Thursday, November 2, 2017 at 7:33 AM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Re: Linked Container Services for Apache Airavata Components - Phase 2 - Initial Prototype

 

Hi Marlon, 

 

Thanks for the comments and please find the explanations for those comments below.

 

Transactionality 

 

Kafka has a rich acknowledgment mechanism. Actually there are several acknowledgement levels. Not like in RabbitMQ where messages are removed from the queue once read, Kafka keeps all the messages of the topic for a given time. Until that time expires, consumers are free to go back and forth on the topic to read previously read messages. However if we want to keep track of the read messages by a particular consumer (actually a consumer group) we can use kafka's Current Offset and Committed Offset to achieve this [1]. A consumer can read more than one messages from a partition of the topic. When it is done, Kafka keeps track of the pointer (offset) where that particular consumer finally read from the partition. That is the current offset. However it does not mean that messages are properly consumed by the consumer. Once the messages are consumed, consumer acknowledges that it has consumed x number of messages of the already read messages. Then Kafka keeps track of the last pointer where that particular consumer has acknowledged. This is the Committed Offset. Let's assume that the particular Consumer goes down and a new Consumer from the same consumer group starts to read from vacant partition. Then it will start to read from the Committed offset of last consumer. So to achieve the behavior that you have mentioned, it can be done by controlling the acknowledgement mechanism of a Consumer. There are mainly 2 acknowledgement mechanisms in Kafka

 

1. Auto commit - Once a consumer reads a message or set of messages at once, it will auto commit after a given time interval

2. Manual commit - Developer has the handle to decide whether to acknowledge or not for a particular message

 

From above two methods we have to use manual commit as we have to make sure that messages are acknowledged once it was properly handled. Manual commit also has several levels and modes of acknowledgements.

 

1. Per Message acknowledgement

2. Per Message set acknowledgement

 

1. Asynchronous Mode

2. Synchronous Mode

 

Here we use per message synchronous acknowledgement as we want the highest level of transactionality. In that case, once we read a message from a Kafka topic, we parse it and do the necessary operations and finally acknowledges once the message has been successfully handled. If something went wrong (consumer failed) and read messages were not acknowledged, new consumer will continue from last acknowledged position. I have implemented PoC code that demonstrates above concept with a single producer and 3 consumers [2]. You can find the screen recording of the test from here [3].

 

If  you need 100% transactionality form consumer side, it is also possible with an external transactional scope like database operations. However in our case it is not required as the operations that we are doing inside the message handling operation are not reversible. Let's talk more about this approach in future.

 

Slow SSH communication

 

If you are expecting to improve the communication mechanism instead of SSH, one option is to use agents inside comupte resources and communicate with them in a optimized messaging protocol. I have provided an approach in the previous thread. But your concern is more about being the whole pipeline blocked due to a slow task not releasing the message from the topic, there is a solution for that. When we are publishing messages to a Kafka topic, they are distributed among the partitions of the topic. Consumers are reading from those partitions. If a consumer becomes slow, only the message read from that partitions will become slow. Other partitions will work without any issue. So we can have a higher number of partitions in a topic to handle that. Further if we want to go further like grouping slow tasks to a one set and allowing low latency tasks to an other set by using a custom partitioner [5]

 

Registry

 

It is already there [6]. Actually in this implementation, API server acts as the registry itself. It stores experiment objects and current status of the cluster. However As Gaurav has pointed out in the next email, that would be better if this is separated into multiple parts. What do you think?

 

[1] https://www.youtube.com/watch?v=kZT8v2_b2XE&index=15&list=PLkz1SCf5iB4enAR00Z46JwY9GGkaS2NON

[2] https://github.com/DImuthuUpe/kafka-transactionality

[3] https://www.youtube.com/watch?v=j6bOVLUlyf4&feature=youtu.be

[4] https://www.youtube.com/watch?v=AshMNCxSp3c&list=PLkz1SCf5iB4enAR00Z46JwY9GGkaS2NON&index=17

[5] https://www.youtube.com/watch?v=pMDAcNRkWkE&index=10&list=PLkz1SCf5iB4enAR00Z46JwY9GGkaS2NON

[6] https://github.com/apache/airavata-sandbox/tree/master/airavata-kubernetes/modules/microservices/api-server/src/main/java/org/apache/airavata/k8s/api/server

 

On Thu, Nov 2, 2017 at 2:52 AM, Pierce, Marlon <ma...@iu.edu> wrote:

Hi Dimuthu,

 

Thanks for sending this very thoughtful document. A couple of comments:

 

* Use of Kafka instead of RabbitMQ is interesting. Can you say more about how this approach can handle Kafka client failures?  For RabbitMQ, for example, there is the simple “Work Queue” approach in which the broker pushes a task to a worker. The task remains in queue until the worker sends an acknowledgement that the job has been handled, not just received. “Handled” may mean for example that the job has been submitted to an external batch scheduler over SSH, which may require some retries, etc.   If the worker crashes before the job has been submitted, then the broker can resend the message to another worker.   I’m wondering how your Kafka-based solution would handle the same issue. 

 

* A simpler but more common failure is communicating with external resources. A task executor may need to SSH to a remote resource, which can fail (the resource is slow to communicate, usually). How do you handle this case?

 

* Your design focuses on Airavata’s experiment execution handling. Airavata’s registry is another important component: this is where experiment objects get persistently stored. The registry stores metadata about both “live” experiments that are currently executing as well as archived experiments that have completed.

 

How would you extend your architecture to include the registry?

 

Marlon

 

 

From: "dimuthu.upeksha2@gmail.com" <di...@gmail.com>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Monday, October 30, 2017 at 10:45 AM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Linked Container Services for Apache Airavata Components - Phase 2 - Initial Prototype

 

Hi All, 

 

Based on the analysis of Phase 1, within past two weeks I have been working on implementing a task execution workflow following the microservices deployment pattern and Kubernetes as the deployment platform. 

 

Please find attached design document that explains the components and messaging interactions between components. Based on that design, I have implemented following components

 

1. Set of microservices to compose the workflow

2. A simple Web Console to  deploy and monitor workflows on the framework

 

I used Kakfa as the primary messaging medium to communicate among the microservices due to its simplicity and powerful features like partitions and consumer groups.

 

I have attached a user guide so that you can install and try this in your local machine. And source code for each component can be found from [1]

 

Please share you ideas and suggestions.

 

Thanks

Dimuthu

 

[1] https://github.com/DImuthuUpe/airavata/tree/master/sandbox/airavata-kubernetes

[2] https://docs.google.com/document/d/1R1xrmuPldHiWVDn4xNVay9Vnxn9FODQZXtF55JxJpSY/edit?usp=sharing

[3] https://docs.google.com/document/d/1A5eRIZiuUj4ShZVMS0NdAxjAxtOTZXculaYDCZ7IMQ8/edit?usp=sharing

Re: Linked Container Services for Apache Airavata Components - Phase 2 - Initial Prototype

Posted by DImuthu Upeksha <di...@gmail.com>.

Hi Marlon,

Thanks for the comments and please find the explanations for those comments
below.

*Transactionality*

Kafka has a rich acknowledgment mechanism. Actually there are several
acknowledgement levels. Not like in RabbitMQ where messages are removed
from the queue once read, Kafka keeps all the messages of the topic for a
given time. Until that time expires, consumers are free to go back and
forth on the topic to read previously read messages. However if we want to
keep track of the read messages by a particular consumer (actually a
consumer group) we can use kafka's Current Offset and Committed Offset to
achieve this [1]. A consumer can read more than one messages from a
partition of the topic. When it is done, Kafka keeps track of the pointer
(offset) where that particular consumer finally read from the partition.
That is the current offset. However it does not mean that messages are
properly consumed by the consumer. Once the messages are consumed, consumer
acknowledges that it has consumed x number of messages of the already read
messages. Then Kafka keeps track of the last pointer where that particular
consumer has acknowledged. This is the Committed Offset. Let's assume that
the particular Consumer goes down and a new Consumer from the same consumer
group starts to read from vacant partition. Then it will start to read from
the Committed offset of last consumer. So to achieve the behavior that you
have mentioned, it can be done by controlling the acknowledgement mechanism
of a Consumer. There are mainly 2 acknowledgement mechanisms in Kafka

1. Auto commit - Once a consumer reads a message or set of messages at
once, it will auto commit after a given time interval
2. Manual commit - Developer has the handle to decide whether to
acknowledge or not for a particular message

From above two methods we have to use manual commit as we have to make sure
that messages are acknowledged once it was properly handled. Manual commit
also has several levels and modes of acknowledgements.

1. Per Message acknowledgement
2. Per Message set acknowledgement

1. Asynchronous Mode
2. Synchronous Mode

Here we use per message synchronous acknowledgement as we want the highest
level of transactionality. In that case, once we read a message from a
Kafka topic, we parse it and do the necessary operations and finally
acknowledges once the message has been successfully handled. If something
went wrong (consumer failed) and read messages were not acknowledged, new
consumer will continue from last acknowledged position. I have implemented
PoC code that demonstrates above concept with a single producer and 3
consumers [2]. You can find the screen recording of the test from here [3].

If  you need 100% transactionality form consumer side, it is also possible
with an external transactional scope like database operations. However in
our case it is not required as the operations that we are doing inside the
message handling operation are not reversible. Let's talk more about this
approach in future.

*Slow SSH communication*

If you are expecting to improve the communication mechanism instead of SSH,
one option is to use agents inside comupte resources and communicate with
them in a optimized messaging protocol. I have provided an approach in the
previous thread. But your concern is more about being the whole pipeline
blocked due to a slow task not releasing the message from the topic, there
is a solution for that. When we are publishing messages to a Kafka topic,
they are distributed among the partitions of the topic. Consumers are
reading from those partitions. If a consumer becomes slow, only the message
read from that partitions will become slow. Other partitions will work
without any issue. So we can have a higher number of partitions in a topic
to handle that. Further if we want to go further like grouping slow tasks
to a one set and allowing low latency tasks to an other set by using a
custom partitioner [5]

*Registry*

It is already there [6]. Actually in this implementation, API server acts
as the registry itself. It stores experiment objects and current status of
the cluster. However As Gaurav has pointed out in the next email, that
would be better if this is separated into multiple parts. What do you think?

[1]
https://www.youtube.com/watch?v=kZT8v2_b2XE&index=15&list=PLkz1SCf5iB4enAR00Z46JwY9GGkaS2NON
[2] https://github.com/DImuthuUpe/kafka-transactionality
[3] https://www.youtube.com/watch?v=j6bOVLUlyf4&feature=youtu.be
[4]
https://www.youtube.com/watch?v=AshMNCxSp3c&list=PLkz1SCf5iB4enAR00Z46JwY9GGkaS2NON&index=17
[5]
https://www.youtube.com/watch?v=pMDAcNRkWkE&index=10&list=PLkz1SCf5iB4enAR00Z46JwY9GGkaS2NON
[6]
https://github.com/apache/airavata-sandbox/tree/master/airavata-kubernetes/modules/microservices/api-server/src/main/java/org/apache/airavata/k8s/api/server

On Thu, Nov 2, 2017 at 2:52 AM, Pierce, Marlon <ma...@iu.edu> wrote:

> Hi Dimuthu,
>
>
>
> Thanks for sending this very thoughtful document. A couple of comments:
>
>
>
> * Use of Kafka instead of RabbitMQ is interesting. Can you say more about
> how this approach can handle Kafka client failures?  For RabbitMQ, for
> example, there is the simple “Work Queue” approach in which the broker
> pushes a task to a worker. The task remains in queue until the worker sends
> an acknowledgement that the job has been handled, not just received.
> “Handled” may mean for example that the job has been submitted to an
> external batch scheduler over SSH, which may require some retries, etc.
> If the worker crashes before the job has been submitted, then the broker
> can resend the message to another worker.   I’m wondering how your
> Kafka-based solution would handle the same issue.
>
>
>
> * A simpler but more common failure is communicating with external
> resources. A task executor may need to SSH to a remote resource, which can
> fail (the resource is slow to communicate, usually). How do you handle this
> case?
>
>
>
> * Your design focuses on Airavata’s experiment execution handling.
> Airavata’s registry is another important component: this is where
> experiment objects get persistently stored. The registry stores metadata
> about both “live” experiments that are currently executing as well as
> archived experiments that have completed.
>
>
>
> How would you extend your architecture to include the registry?
>
>
>
> Marlon
>
>
>
>
>
> *From: *"dimuthu.upeksha2@gmail.com" <di...@gmail.com>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Monday, October 30, 2017 at 10:45 AM
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Linked Container Services for Apache Airavata Components -
> Phase 2 - Initial Prototype
>
>
>
> Hi All,
>
>
>
> Based on the analysis of Phase 1, within past two weeks I have been
> working on implementing a task execution workflow following the
> microservices deployment pattern and Kubernetes as the deployment platform.
>
>
>
> Please find attached design document that explains the components and
> messaging interactions between components. Based on that design, I have
> implemented following components
>
>
>
> 1. Set of microservices to compose the workflow
>
> 2. A simple Web Console to  deploy and monitor workflows on the framework
>
>
>
> I used Kakfa as the primary messaging medium to communicate among the
> microservices due to its simplicity and powerful features like partitions
> and consumer groups.
>
>
>
> I have attached a user guide so that you can install and try this in your
> local machine. And source code for each component can be found from [1]
>
>
>
> Please share you ideas and suggestions.
>
>
>
> Thanks
>
> Dimuthu
>
>
>
> [1] https://github.com/DImuthuUpe/airavata/tree/master/sandbox/airavata-
> kubernetes
>
> [2] https://docs.google.com/document/d/1R1xrmuPldHiWVDn4xNVay9Vnxn9FO
> DQZXtF55JxJpSY/edit?usp=sharing
>
> [3] https://docs.google.com/document/d/1A5eRIZiuUj4ShZVMS0NdAxjAxtOTZ
> XculaYDCZ7IMQ8/edit?usp=sharing
>

Re: Linked Container Services for Apache Airavata Components - Phase 2 - Initial Prototype

Posted by DImuthu Upeksha <di...@gmail.com>.

Hi Gaurav,

Thanks a lot for the comments and please find my comments inline.

On Thu, Nov 2, 2017 at 11:41 AM, Shenoy, Gourav Ganesh <goshenoy@indiana.edu
> wrote:

> Hi Dimuthu,
>
>
>
> First of all, I must say this is really impressive – the way you grasped
> the problem and built a working prototype in such a short span – this is
> phenomenal. I haven’t yet installed the prototype, but I did look through
> the design document.
>
>
>
> Here are some thoughts/questions. Please feel free to comment or correct
> anything I’ve misunderstood.
>
>
>
>    - You are assuming a “message oriented” approach to performing
>    distributed task execution. Rather than using a third-party framework
>    (e.g.: Helix).
>    - Granted this gives more control in managing workloads, but this also
>    adds the overhead of:
>       - (1) creating, defining and persisting DAGs,
>       - (2) orchestrating through the DAG (although you aim to leverage
>       message broker),
>       - (3) managing the state of these DAGs as they progress,
>       - (4) handling errors related to workflow execution, and related to
>       broker dependency.
>
> Yes. You are correct. We take the burden of executing the DAG but as you
have mentioned, it provides a better control on the flow to handle edge
cases properly. However we have to implement this Task scheduler in more
generic way (I'm not happy with the current implementation but try to
improve this in future).

>
>    -
>       -
>    - I did not understand how you are defining the DAGs. I remember when
>    we were trying to solve this problem in class using the MQ approach
>    (similar but using RabbitMQ instead of Kafka), one of the challenges was
>    the allow a way to dynamically create DAGs and persist them if necessary.
>    How are you defining DAGs in your prototype?
>
> Defining a DAG has no direct tie with the message broker. Workflow
Generator is the one which generates the DAG for a Experiment. Currently
this is a static method specific to Experiments (pre job commands -> data
staging -> launch -> post job commands) but I suggested a new approach to
define a DAG in more generic way in another mail (you can find the subject
at the end). Once the DAG is saved in the database, Scheduler is the one
who fetches and executes it. To be clear, I followed current Airavata data
model with minor changes. DAG is equivalent to a Process. Process can have
multiple Tasks. For each task there is a DAG index. Task Scheduler fetches
the Tasks for a particular Process and sorts those task according to the
DAG index. I have proposed an improved design to compose DAGs both using
static configurations and user interfaces in an another mail (Subject :
"Customizable workflow design using reusable components").


>
>    -
>    - There might be situations when a workflow needs to be re-run (say
>    something went wrong initially, or needed corrections, or target resource
>    was down, etc). Does the design accommodate these scenarios?
>
> Yes. If you want to re run an Workflow (Experiment in this design), create
a new Process and send the process id to Task Executor. If you try this in
the User Interface, you can launch the Experiment multiple times. Each time
you launch the experiment, new Process is created with a new DAG and
resources locations in the compute hosts (input file paths, output file
paths) are generated by being specific to the Process id
(/tmp/{experimentid}/{processid}/inputs).


>    -
>    - One of the bigger goals is to be able to orchestrate Airavata
>    components, and not just the tasks involved in an experiment. As I
>    understand, the design relies on messaging to orchestrate, but is messaging
>    going to be the only communication paradigm within Airavata microservices?
>    If there are 2 components communicating via Thrift, how does the
>    architecture handle them?
>
> No I used Kafka to coordinate the consumers in the message flow other than
actual messaging. Except that, orchestration is performed through
Kubernetes. You can definitely use Thrift or some other mechanism to
communicate among microservices. In fact in this solution, microservices
communicate with the API server through HTTP/REST.


>    -
>    - Another point which comes to my mind is about moving away from this
>    big “API Server” block and segregating into smaller service level first
>    class SDKs. For e.g.: We now have a first class “Profile Service” which
>    allows isolated interactions pertaining to Users, Tenants and Groups. We
>    might want to keep these SDKs / services as independent as possible, which
>    also means no reliance on messaging. Will the architecture support these
>    SDKs?
>
> Agreed. What we need is to split the API server in to clearly defined
domains. I'll try to implement it in next versions.


>
>    -
>    - As Marlon pointed out and something we looked at last Spring, about
>    “database-per-microservice”. Currently we have a Registry which is shared
>    among different components like Orchestrator and GFac. Ideally each
>    microservice will own its database, and for any intersections in data
>    between microservices, we would sync up using events (messages). I can see
>    the Kafka broker come in handy for this.
>
> Yes. But in this case only API server has a database. Others are stateless
services. If they want any information, it will query API Server to fetch
them. However in future I guess Task Scheduler might want a direct database
access to guarantee some transactional operations.


>
>    -
>    - As far as possible, we would like to adopt a generic method to
>    define/maintain/execute the 3 types of workflows (or maybe more):
>       - External – user defined multi-application experiment; which would
>       mean a parent experiment constituting child experiments.
>       - Internal – component level workflows between Airavata
>       microservices. One such use-case I can think about is about dynamic
>       resource binding. Eg: provisioning a container (as target resource) if it
>       does not exist, or spinning up a VM and deploying an application at runtime.
>       - Experiment – typical experiment level task execution workflows
>       needed to complete the experiment.
>
>
Good idea. First and third scenarios can be handled in the new approach
that I have proposed. Can you please explain more about the second point?
Sorry, I'm not sure that I have understood it properly.

I hope I have addressed to all your comments, Please let me know if
anything is not clear enough.

>
>    -
>       -
>
>
>
> I apologize for this lengthy email, and I really appreciate the work you
> did. Some of these points might not make sense, so I would encourage
> discussions from the mailing list. Keep up the good work!
>

I really value your feedback and it helped me to look at the problems at
different angle. Thanks a lot for your time for going through all these. :)


>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *"Pierce, Marlon" <ma...@iu.edu>
> *Reply-To: *<de...@airavata.apache.org>
> *Date: *Wednesday, November 1, 2017 at 5:22 PM
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Re: Linked Container Services for Apache Airavata Components -
> Phase 2 - Initial Prototype
>
>
>
> Hi Dimuthu,
>
>
>
> Thanks for sending this very thoughtful document. A couple of comments:
>
>
>
> * Use of Kafka instead of RabbitMQ is interesting. Can you say more about
> how this approach can handle Kafka client failures?  For RabbitMQ, for
> example, there is the simple “Work Queue” approach in which the broker
> pushes a task to a worker. The task remains in queue until the worker sends
> an acknowledgement that the job has been handled, not just received.
> “Handled” may mean for example that the job has been submitted to an
> external batch scheduler over SSH, which may require some retries, etc.
> If the worker crashes before the job has been submitted, then the broker
> can resend the message to another worker.   I’m wondering how your
> Kafka-based solution would handle the same issue.
>
>
>
> * A simpler but more common failure is communicating with external
> resources. A task executor may need to SSH to a remote resource, which can
> fail (the resource is slow to communicate, usually). How do you handle this
> case?
>
>
>
> * Your design focuses on Airavata’s experiment execution handling.
> Airavata’s registry is another important component: this is where
> experiment objects get persistently stored. The registry stores metadata
> about both “live” experiments that are currently executing as well as
> archived experiments that have completed.
>
>
>
> How would you extend your architecture to include the registry?
>
>
>
> Marlon
>
>
>
>
>
> *From: *"dimuthu.upeksha2@gmail.com" <di...@gmail.com>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Monday, October 30, 2017 at 10:45 AM
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Linked Container Services for Apache Airavata Components -
> Phase 2 - Initial Prototype
>
>
>
> Hi All,
>
>
>
> Based on the analysis of Phase 1, within past two weeks I have been
> working on implementing a task execution workflow following the
> microservices deployment pattern and Kubernetes as the deployment platform.
>
>
>
> Please find attached design document that explains the components and
> messaging interactions between components. Based on that design, I have
> implemented following components
>
>
>
> 1. Set of microservices to compose the workflow
>
> 2. A simple Web Console to  deploy and monitor workflows on the framework
>
>
>
> I used Kakfa as the primary messaging medium to communicate among the
> microservices due to its simplicity and powerful features like partitions
> and consumer groups.
>
>
>
> I have attached a user guide so that you can install and try this in your
> local machine. And source code for each component can be found from [1]
>
>
>
> Please share you ideas and suggestions.
>
>
>
> Thanks
>
> Dimuthu
>
>
>
> [1] https://github.com/DImuthuUpe/airavata/tree/master/
> sandbox/airavata-kubernetes
>
> [2] https://docs.google.com/document/d/1R1xrmuPldHiWVDn4xNVa
> y9Vnxn9FODQZXtF55JxJpSY/edit?usp=sharing
>
> [3] https://docs.google.com/document/d/1A5eRIZiuUj4ShZVMS0Nd
> AxjAxtOTZXculaYDCZ7IMQ8/edit?usp=sharing
>

Re: Linked Container Services for Apache Airavata Components - Phase 2 - Initial Prototype

Posted by "Christie, Marcus Aaron" <ma...@iu.edu>.

On Nov 2, 2017, at 2:11 AM, Shenoy, Gourav Ganesh <go...@indiana.edu>> wrote:


  *   Another point which comes to my mind is about moving away from this big “API Server” block and segregating into smaller service level first class SDKs. For e.g.: We now have a first class “Profile Service” which allows isolated interactions pertaining to Users, Tenants and Groups. We might want to keep these SDKs / services as independent as possible, which also means no reliance on messaging. Will the architecture support these SDKs?


Gourav and others,

This is somewhat off topic, but something I never really understood, why did we decide to break apart the API server?  I think having a single API for Airavata has some advantages. Creating and using a client is simplified since there is only one API. Also securing the API is something that can be done once instead of over and over with each service’s API.

I could understand multiplexing the API server’s handlers to cut down on the size of handler code.  But that doesn’t require breaking up the API server to multiple service level APIs.

Thanks,

Marcus

Re: Linked Container Services for Apache Airavata Components - Phase 2 - Initial Prototype

Posted by "Shenoy, Gourav Ganesh" <go...@indiana.edu>.

Hi Dimuthu,

First of all, I must say this is really impressive – the way you grasped the problem and built a working prototype in such a short span – this is phenomenal. I haven’t yet installed the prototype, but I did look through the design document.

Here are some thoughts/questions. Please feel free to comment or correct anything I’ve misunderstood.


  *   You are assuming a “message oriented” approach to performing distributed task execution. Rather than using a third-party framework (e.g.: Helix).
  *   Granted this gives more control in managing workloads, but this also adds the overhead of:
     *   (1) creating, defining and persisting DAGs,
     *   (2) orchestrating through the DAG (although you aim to leverage message broker),
     *   (3) managing the state of these DAGs as they progress,
     *   (4) handling errors related to workflow execution, and related to broker dependency.
  *   I did not understand how you are defining the DAGs. I remember when we were trying to solve this problem in class using the MQ approach (similar but using RabbitMQ instead of Kafka), one of the challenges was the allow a way to dynamically create DAGs and persist them if necessary. How are you defining DAGs in your prototype?
  *   There might be situations when a workflow needs to be re-run (say something went wrong initially, or needed corrections, or target resource was down, etc). Does the design accommodate these scenarios?
  *   One of the bigger goals is to be able to orchestrate Airavata components, and not just the tasks involved in an experiment. As I understand, the design relies on messaging to orchestrate, but is messaging going to be the only communication paradigm within Airavata microservices? If there are 2 components communicating via Thrift, how does the architecture handle them?
  *   Another point which comes to my mind is about moving away from this big “API Server” block and segregating into smaller service level first class SDKs. For e.g.: We now have a first class “Profile Service” which allows isolated interactions pertaining to Users, Tenants and Groups. We might want to keep these SDKs / services as independent as possible, which also means no reliance on messaging. Will the architecture support these SDKs?
  *   As Marlon pointed out and something we looked at last Spring, about “database-per-microservice”. Currently we have a Registry which is shared among different components like Orchestrator and GFac. Ideally each microservice will own its database, and for any intersections in data between microservices, we would sync up using events (messages). I can see the Kafka broker come in handy for this.
  *   As far as possible, we would like to adopt a generic method to define/maintain/execute the 3 types of workflows (or maybe more):
     *   External – user defined multi-application experiment; which would mean a parent experiment constituting child experiments.
     *   Internal – component level workflows between Airavata microservices. One such use-case I can think about is about dynamic resource binding. Eg: provisioning a container (as target resource) if it does not exist, or spinning up a VM and deploying an application at runtime.
     *   Experiment – typical experiment level task execution workflows needed to complete the experiment.

I apologize for this lengthy email, and I really appreciate the work you did. Some of these points might not make sense, so I would encourage discussions from the mailing list. Keep up the good work!

Thanks and Regards,
Gourav Shenoy

From: "Pierce, Marlon" <ma...@iu.edu>
Reply-To: <de...@airavata.apache.org>
Date: Wednesday, November 1, 2017 at 5:22 PM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Re: Linked Container Services for Apache Airavata Components - Phase 2 - Initial Prototype

Hi Dimuthu,

Thanks for sending this very thoughtful document. A couple of comments:

* Use of Kafka instead of RabbitMQ is interesting. Can you say more about how this approach can handle Kafka client failures?  For RabbitMQ, for example, there is the simple “Work Queue” approach in which the broker pushes a task to a worker. The task remains in queue until the worker sends an acknowledgement that the job has been handled, not just received. “Handled” may mean for example that the job has been submitted to an external batch scheduler over SSH, which may require some retries, etc.   If the worker crashes before the job has been submitted, then the broker can resend the message to another worker.   I’m wondering how your Kafka-based solution would handle the same issue.

* A simpler but more common failure is communicating with external resources. A task executor may need to SSH to a remote resource, which can fail (the resource is slow to communicate, usually). How do you handle this case?

* Your design focuses on Airavata’s experiment execution handling. Airavata’s registry is another important component: this is where experiment objects get persistently stored. The registry stores metadata about both “live” experiments that are currently executing as well as archived experiments that have completed.

How would you extend your architecture to include the registry?

Marlon


From: "dimuthu.upeksha2@gmail.com" <di...@gmail.com>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Monday, October 30, 2017 at 10:45 AM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Linked Container Services for Apache Airavata Components - Phase 2 - Initial Prototype

Hi All,

Based on the analysis of Phase 1, within past two weeks I have been working on implementing a task execution workflow following the microservices deployment pattern and Kubernetes as the deployment platform.

Please find attached design document that explains the components and messaging interactions between components. Based on that design, I have implemented following components

1. Set of microservices to compose the workflow
2. A simple Web Console to  deploy and monitor workflows on the framework

I used Kakfa as the primary messaging medium to communicate among the microservices due to its simplicity and powerful features like partitions and consumer groups.

I have attached a user guide so that you can install and try this in your local machine. And source code for each component can be found from [1]

Please share you ideas and suggestions.

Thanks
Dimuthu

[1] https://github.com/DImuthuUpe/airavata/tree/master/sandbox/airavata-kubernetes
[2] https://docs.google.com/document/d/1R1xrmuPldHiWVDn4xNVay9Vnxn9FODQZXtF55JxJpSY/edit?usp=sharing
[3] https://docs.google.com/document/d/1A5eRIZiuUj4ShZVMS0NdAxjAxtOTZXculaYDCZ7IMQ8/edit?usp=sharing

Re: Linked Container Services for Apache Airavata Components - Phase 2 - Initial Prototype

Posted by "Pierce, Marlon" <ma...@iu.edu>.

Hi Dimuthu,

 

Thanks for sending this very thoughtful document. A couple of comments:

 

* Use of Kafka instead of RabbitMQ is interesting. Can you say more about how this approach can handle Kafka client failures?  For RabbitMQ, for example, there is the simple “Work Queue” approach in which the broker pushes a task to a worker. The task remains in queue until the worker sends an acknowledgement that the job has been handled, not just received. “Handled” may mean for example that the job has been submitted to an external batch scheduler over SSH, which may require some retries, etc.   If the worker crashes before the job has been submitted, then the broker can resend the message to another worker.   I’m wondering how your Kafka-based solution would handle the same issue. 

 

* A simpler but more common failure is communicating with external resources. A task executor may need to SSH to a remote resource, which can fail (the resource is slow to communicate, usually). How do you handle this case?

 

* Your design focuses on Airavata’s experiment execution handling. Airavata’s registry is another important component: this is where experiment objects get persistently stored. The registry stores metadata about both “live” experiments that are currently executing as well as archived experiments that have completed.

 

How would you extend your architecture to include the registry?

 

Marlon

 

 

From: "dimuthu.upeksha2@gmail.com" <di...@gmail.com>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Monday, October 30, 2017 at 10:45 AM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Linked Container Services for Apache Airavata Components - Phase 2 - Initial Prototype

 

Hi All, 

 

Based on the analysis of Phase 1, within past two weeks I have been working on implementing a task execution workflow following the microservices deployment pattern and Kubernetes as the deployment platform. 

 

Please find attached design document that explains the components and messaging interactions between components. Based on that design, I have implemented following components

 

1. Set of microservices to compose the workflow

2. A simple Web Console to  deploy and monitor workflows on the framework

 

I used Kakfa as the primary messaging medium to communicate among the microservices due to its simplicity and powerful features like partitions and consumer groups.

 

I have attached a user guide so that you can install and try this in your local machine. And source code for each component can be found from [1]

 

Please share you ideas and suggestions.

 

Thanks

Dimuthu

 

[1] https://github.com/DImuthuUpe/airavata/tree/master/sandbox/airavata-kubernetes

[2] https://docs.google.com/document/d/1R1xrmuPldHiWVDn4xNVay9Vnxn9FODQZXtF55JxJpSY/edit?usp=sharing

[3] https://docs.google.com/document/d/1A5eRIZiuUj4ShZVMS0NdAxjAxtOTZXculaYDCZ7IMQ8/edit?usp=sharing