You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airavata.apache.org by "Shenoy, Gourav Ganesh" <go...@indiana.edu> on 2017/02/01 17:52:52 UTC

[#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hello dev, arch,

As part of this Spring’17 Advanced Science Gateway Architecture course, we are working on trying to debate and find possible solutions to the issue of managing distributed workloads in Apache Airavata. This leads to the discussion of finding the most efficient way that different Airavata micro-services should communicate and distribute work, in such a way that:

1.       We maintain the ability to scale these micro-services whenever needed (autoscale perhaps?).

2.       Achieve fault tolerance.

3.       We can deploy these micro-services independently, or better in a containerized manner – keeping in mind the ability to use devops for deployment.

As of now the options we are exploring are:

1.       RPC based communication

2.       Message based – either master-worker, or work-queue, etc

3.       A combination of both these approaches

I am more inclined towards exploring the message based approach, but again there arises the possibility of handling limitations/corner cases of message broker such as downtimes (may be more). In my opinion, having asynchronous communication will help us achieve most of the above-mentioned points. Another debatable issue is making the micro-services implementation stateless, such that we do not have to pass the state information between micro-services.

I would love to hear any thoughts/suggestions/comments on this topic and open up a discussion via this mail thread. If there is anything that I have missed which is relevant to this issue, please let me know.

Thanks and Regards,
Gourav Shenoy

Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Posted by Supun Nakandala <su...@gmail.com>.
Yes, monitoring does not have to be part of the DAG.

But we need to halt the DAG execution until the monitoring decision comes.
Monitoring is done by a single module and can mark the monitoring task has
been completed.

Considering the monitoring as a task dependency helps to simplify the
orchestrator logic when executing the DAG (in my opinion).

On Fri, Feb 10, 2017 at 2:24 PM, Amila Jayasekara <th...@gmail.com>
wrote:

> I did not understand scheduling dependency very well, but why monitoring
> dependency needs to be part of the DAG ?
>
> Thanks
> -Thejaka
>
> On Fri, Feb 10, 2017 at 2:00 PM, Supun Nakandala <
> supun.nakandala@gmail.com> wrote:
>
>> Hi Amila,
>>
>> By monitoring and scheduling dependencies I meant the following.
>>
>> Monitoring dependencies: Eg. After a job is submitted to the remote host,
>> the DAG execution has to wait until the job completes before proceeding to
>> the next task. Currently, we handle this by monitoring emails. A separate
>> daemon is checking for emails. So (I think) we can consider waiting for
>> this email as having a monitoring dependency to the next task that has to
>> be executed.
>>
>> Scheduling dependency: This is something that we currently don't have a
>> use case but which I think soon become as a requirement. For example, when
>> submitting jobs to Jetstream(which gives preference to interactive users)
>> we have to wait until the system becomes vacant. Thus even though a user
>> submits a job that job will have to wait until it is scheduled by an
>> external system/call. So my idea was to consider these things as external
>> scheduling dependencies. One might argue that scheduling sub-system also
>> has to be part of Airavata. But I think we can separate scheduling
>> sub-system and execution sub-system by having these scheduling dependencies.
>>
>> Hope this clarifies your question.
>>
>> On Fri, Feb 10, 2017 at 1:25 PM, Amila Jayasekara <
>> thejaka.amila@gmail.com> wrote:
>>
>>> What are monitoring dependencies and scheduling dependencies in the
>>> execution DAG ?
>>>
>>> Thanks
>>> -Thejaka
>>>
>>> On Tue, Feb 7, 2017 at 5:47 PM, Supun Nakandala <
>>> supun.nakandala@gmail.com> wrote:
>>>
>>>> Hi Gourav,I agree with your idea of using one “workflow micro-service”
>>>> which would basically be the mediator/orchestrator for deciding which
>>>> micro-service should be executed next. But I think these components do not
>>>> necessarily have to be micro-services but rather conforms to the
>>>> master-worker paradigm in some sense. But the trick here is how can we
>>>> implement a scalable, fault tolerant system to do distributed workload
>>>> management and from CAP theorem what is the property that we are going to
>>>> compromise.
>>>>
>>>> I think you are heading in the right direction. But I would like to add
>>>> more details to your solution. Please note that I haven't evaluated these
>>>> ideas 100%. Perhaps we can talk more about this in the next class.
>>>>
>>>> As you have done, I think we should centralize the state information
>>>> into one component (orchestrator in our case). From my experience, it is
>>>> very hard to achieve consistency in a distributed state setting in the
>>>> events of failure.
>>>>
>>>> Second, to maintain generalizability in Airavata I think we should
>>>> treat each application/use-cases as a DAG of execution. For example, HPC
>>>> job and a cloud job will have two different DAGs which consists of tasks
>>>> (data staging, job submission, out staging etc). These tasks should be
>>>> short tasks and should roughly have the same execution time. And having
>>>> idempotent tasks is preferable.
>>>>
>>>> Orchestrator is responsible for executing the DAG and assign tasks to
>>>> the workers(how? will follow) based on the control dependencies in the DAG
>>>> tasks. In addition to the dependencies generated from tasks I see, there
>>>> can be other dependencies to things like monitoring and scheduling which
>>>> the orchestrator has to make into account when executing the DAG.
>>>>
>>>> The next question is how we distribute jobs from Orchestrator to
>>>> workers. I think here it is ok to compromise availability in favor of
>>>> consistency. I suggest that we use the request/response messaging pattern
>>>> which uses a persistent message broker (critical service). In this
>>>> architecture, we can safely allow orchestrator or workers to fail without
>>>> losing consistency (because of the persistent queue). But if the
>>>> orchestrator fails then the availability will go down. One way to overcome
>>>> this would be to come up with an orchestrator quorum.Attached figure
>>>> summarizes my idea.
>>>>
>>>> I think we can also evaluate this solution with the concerns that
>>>> Shameera pointed out such as can we enable cancel?. Once again it's just my
>>>> idea and is open for argument and debate.
>>>>
>>>>
>>>>
>>>> [image: Inline image 2]
>>>>
>>>> Thanks
>>>> -Supun
>>>>
>>>>
>>>>
>>>> On Tue, Feb 7, 2017 at 10:54 AM, Shenoy, Gourav Ganesh <
>>>> goshenoy@indiana.edu> wrote:
>>>>
>>>>> Hi Supun,
>>>>>
>>>>>
>>>>>
>>>>> I agree, but may be for the example I mentioned, multiple
>>>>> micro-services might not sound necessary. I was trying to generalize
>>>>> towards a scenario where we have multiple independent micro-services (not
>>>>> necessarily for task execution). Again, I am not certain if this is the
>>>>> right architecture but yours (and other’s) inputs, will definitely help us
>>>>> narrow down on the different scenarios we need to exactly focus on. Do let
>>>>> me know if I make sense.
>>>>>
>>>>>
>>>>>
>>>>> Thanks and Regards,
>>>>>
>>>>> Gourav Shenoy
>>>>>
>>>>>
>>>>>
>>>>> *From: *Supun Nakandala <su...@gmail.com>
>>>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>>> *Date: *Monday, February 6, 2017 at 12:15 PM
>>>>> *To: *dev <de...@airavata.apache.org>
>>>>>
>>>>> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
>>>>> Management for Airavata
>>>>>
>>>>>
>>>>>
>>>>> Hi Gourav,
>>>>>
>>>>>
>>>>>
>>>>> It is my belief that we don't need a separate microservice to each
>>>>> task. I favor a single micro service which can execute all tasks (or in
>>>>> other words a generic task execution micro service). Of course, we can have
>>>>> many of them when we want to scale. WDYT?
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Feb 5, 2017 at 3:07 PM, Shenoy, Gourav Ganesh <
>>>>> goshenoy@indiana.edu> wrote:
>>>>>
>>>>> Hi dev,
>>>>>
>>>>>
>>>>>
>>>>> We were brainstorming some potential designs that might help us with
>>>>> this problem. One possible option would be to have a “workflow
>>>>> micro-service” which would basically be the mediator/orchestrator for
>>>>> deciding which micro-service should be executed next – based on the type of
>>>>> the job. The motive is to make micro-services independent of the workflow;
>>>>> i.e. a micro-service implementation should be not be aware of which
>>>>> micro-service will be executed next and we should have a central control of
>>>>> deciding this pattern.
>>>>>
>>>>> Eg: For job type X, the pattern could be A -> B -> C -> D. Whereas for
>>>>> job type Y, the pattern could be A -> C -> D; and so on.
>>>>>
>>>>>
>>>>>
>>>>> An initial design with this idea looks like follows:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> We would have a common messaging framework (implementation has not
>>>>> been decided yet). The database associated with the workflow micro-service
>>>>> could be a graph database (maybe?) – again the implementation/technology
>>>>> has not been decided yet.
>>>>>
>>>>>
>>>>>
>>>>> This is just a proposed design, and I would love to hear your thoughts
>>>>> on this and any suggestions/comments if any. If there is anything that we
>>>>> are missing or should consider, please do let us know.
>>>>>
>>>>>
>>>>>
>>>>> Thanks and Regards,
>>>>>
>>>>> Gourav Shenoy
>>>>>
>>>>>
>>>>>
>>>>> *From: *"Christie, Marcus Aaron" <ma...@iu.edu>
>>>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>>> *Date: *Friday, February 3, 2017 at 9:21 AM
>>>>>
>>>>>
>>>>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>>> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
>>>>> Management for Airavata
>>>>>
>>>>>
>>>>>
>>>>> Vidya,
>>>>>
>>>>>
>>>>>
>>>>> I’m not sure how relevant it is, but it occurs to me that a
>>>>> microservice that executes jobs on a cloud requires very little in terms of
>>>>> resources to submit and monitor that job on the cloud. It doesn’t really
>>>>> matter if the job is a “big” or a “small” job.  So I’m not sure what
>>>>> heuristic makes sense regarding distributing work to these job execution
>>>>> microservices.  Maybe a simple round robin approach would be sufficient.
>>>>>
>>>>>
>>>>>
>>>>> I think a job scheduling algorithm does make sense, however, for a
>>>>> higher level component, some sort of metascheduler that understands what
>>>>> resources are available on the cloud resources on which the jobs will be
>>>>> running.  The metascheduler could create work for the job exection
>>>>> microservices to run on particular cloud resources in a way that optimizes
>>>>> for some metric (e.g., throughput).
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>>
>>>>> Marcus
>>>>>
>>>>>
>>>>>
>>>>> On Feb 3, 2017, at 3:19 AM, Vidya Sagar Kalvakunta <
>>>>> vkalvaku@umail.iu.edu> wrote:
>>>>>
>>>>>
>>>>>
>>>>> Ajinkya,
>>>>>
>>>>>
>>>>>
>>>>> My scenario is for workload distribution among multiple instances of
>>>>> the same microservice.
>>>>>
>>>>>
>>>>>
>>>>> If a message broker needs to distribute the available jobs among
>>>>> multiple workers, the common approach would be to use round robin or a
>>>>> similar algorithm. This approach works best when all the workers are
>>>>> similar and the jobs are equal.
>>>>>
>>>>>
>>>>>
>>>>> So I think that a genetic or heuristic job scheduling algorithm, which
>>>>> is also aware of each of the worker's current state (CPU, RAM, No of Jobs
>>>>> processing) can more efficiently distribute the jobs. The workers can
>>>>> periodically ping the message broker with their current state info.
>>>>>
>>>>>
>>>>>
>>>>> The other advantage of using a customized algorithm is that it can
>>>>> be tweaked to use embedded routing, priority or other information in the
>>>>> job metadata to resolve all of the concerns raised by Amrutha viz message
>>>>> grouping, ordering, repeated messages, etc.
>>>>>
>>>>>
>>>>>
>>>>> We can even ensure data privacy, i.e if the workers are spread across
>>>>> multiple compute clusters say AWS and IU Big Red and we want to restrict
>>>>> certain sensitive jobs to be run only on Big Red.
>>>>>
>>>>>
>>>>>
>>>>> Some distributed job scheduling algorithms for cloud computing.
>>>>>
>>>>>    - http://www.ijimai.org/journal/sites/default/files/files/2013
>>>>>    /03/ijimai20132_18_pdf_62825.pdf
>>>>>    <http://www.ijimai.org/journal/sites/default/files/files/2013/03/ijimai20132_18_pdf_62825.pdf>
>>>>>    - https://arxiv.org/pdf/1404.5528.pdf
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Regards
>>>>>
>>>>> Vidya Sagar
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Feb 3, 2017 at 1:38 AM, Kamat, Amruta Ravalnath <
>>>>> arkamat@indiana.edu> wrote:
>>>>>
>>>>> Hello all,
>>>>>
>>>>>
>>>>>
>>>>> Adding more information to the message based approach. Messaging is a
>>>>> key strategy employed in many distributed environments. Message queuing is
>>>>> ideally suited to performing asynchronous operations. A sender can post a
>>>>> message to a queue, but it does not have to wait while the message is
>>>>> retrieved and processed. A sender and receiver do not even have to be
>>>>> running concurrently.
>>>>>
>>>>>
>>>>>
>>>>> With message queuing there can be 2 possible scenarios:
>>>>>
>>>>>    1. ​Sending and receiving messages using a * single message queue.*
>>>>>    2. ​*Sharing a message queue* between many senders and receivers
>>>>>
>>>>> ​When a message is retrieved, it is removed from the queue. A message
>>>>> queue may also support message peeking. This mechanism can be useful if
>>>>> several receivers are retrieving messages from the same queue, but each
>>>>> receiver only wishes to handle specific messages. The receiver can examine
>>>>> the message it has peeked, and decide whether to retrieve the message
>>>>> (which removes it from the queue) or leave it on the queue for another
>>>>> receiver to handle.
>>>>>
>>>>>
>>>>>
>>>>> A few basic message queuing patterns are:
>>>>>
>>>>>    1. *One-way messaging*: The sender simply posts a message to the
>>>>>    queue in the expectation that a receiver will retrieve it and process it at
>>>>>    some point.
>>>>>    2. *Request/response messaging*: In this pattern a sender posts a
>>>>>    message to a queue and expects a response from the receiver. The sender can
>>>>>    resend if the message is not delivered. This pattern typically requires
>>>>>    some form of correlation to enable the sender to determine which response
>>>>>    message corresponds to which request sent to the receiver.
>>>>>    3. *Broadcast messaging*: In this pattern a sender posts a message
>>>>>    to a queue, and multiple receivers can read a copy of the message. This
>>>>>    pattern depends on the message queue being able to disseminate the same
>>>>>    message to multiple receivers. There is a queue to which the senders can
>>>>>    post messages that include metadata in the form of attributes. Each
>>>>>    receiver can create a subscription to the queue, specifying a filter that
>>>>>    examines the values of message attributes. Any messages posted to the
>>>>>    queue with attribute values that match the filter are automatically
>>>>>    forwarded to that subscription.
>>>>>
>>>>> A solution based on asynchronous messaging might need to address a
>>>>> number of concerns:
>>>>>
>>>>>
>>>>>
>>>>> *Message ordering, Message grouping: *Process messages either in the
>>>>> order they are posted or in a specific order based on priority. Also, there
>>>>> may be occasions when it is difficult to eliminate dependencies, and it may
>>>>> be necessary to group messages together so that they are all handled by the
>>>>> same receiver.
>>>>> *Idempotency: *Ideally the message processing logic in a receiver
>>>>> should be idempotent so that, if the work performed is repeated, this
>>>>> repetition does not change the state of the system.
>>>>> *Repeated messages: *Some message queuing systems implement duplicate
>>>>> message detection and removal based on message IDs
>>>>> *Poison messages: *A poison message is a message that cannot be
>>>>> handled, often because it is malformed or contains unexpected information.
>>>>> *Message expiration: *A message might have a limited lifetime, and if
>>>>> it is not processed within this period it might no longer be relevant and
>>>>> should be discarded.
>>>>> *Message scheduling: *A message might be temporarily embargoed and
>>>>> should not be processed until a specific date and time. The message should
>>>>> not be available to a receiver until this time.
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Amruta Kamat
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> *From:* Shenoy, Gourav Ganesh <go...@indiana.edu>
>>>>> *Sent:* Thursday, February 2, 2017 7:57 PM
>>>>> *To:* dev@airavata.apache.org
>>>>>
>>>>>
>>>>> *Subject:* Re: [#Spring17-Airavata-Courses] : Distributed Workload
>>>>> Management for Airavata
>>>>>
>>>>>
>>>>>
>>>>> Hello all,
>>>>>
>>>>>
>>>>>
>>>>> Amila, Sagar, thank you for the response and raising those concerns;
>>>>> and apologies because my email resonated the topic of workload management
>>>>> in terms of how micro-services communicate. As Ajinkya rightly mentioned,
>>>>> there exists some sort of correlation between micro-services communication
>>>>> and it’s impact on how that micro-service performs the work under those
>>>>> circumstances. The goal is to make sure we have maximum independence
>>>>> between micro-services, and investigate the workflow pattern in which these
>>>>> micro-services will operate such that we can find the right balance between
>>>>> availability & consistency. Again, from our preliminary analysis we can
>>>>> assert that these solutions may not be generic and the specific use-case
>>>>> will have a big decisive role.
>>>>>
>>>>>
>>>>>
>>>>> For starters, we are focusing on the following example – and I think
>>>>> this will clarify the doubts on what we are exactly trying to investigate
>>>>> about.
>>>>>
>>>>>
>>>>>
>>>>> *Our test example *
>>>>>
>>>>> Say we have the following 4 micro-services, which each perform a
>>>>> specific task as mentioned in the box.
>>>>>
>>>>>
>>>>>
>>>>> <image001.png>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *A state-full pattern to distribute work*
>>>>>
>>>>> <image002.png>
>>>>>
>>>>>
>>>>>
>>>>> Here each communication between micro-services could be via RPC or
>>>>> Messaging (eg: RabbitMQ). Obvious disadvantage is that if any micro-service
>>>>> is down, then the system availability is at stake. In this test example, we
>>>>> can see that Microservice-A coordinates the work and maintains the state
>>>>> information.
>>>>>
>>>>>
>>>>>
>>>>> *A state-less pattern to distribute work*
>>>>>
>>>>>
>>>>>
>>>>> <image003.png>
>>>>>
>>>>>
>>>>>
>>>>> Another purely asynchronous approach would be to associate
>>>>> message-queues with each micro-service, where each micro-service performs
>>>>> it’s task, submits a request (message on bus) to the next micro-service,
>>>>> and continues to process more requests. This ensures more availability, and
>>>>> perhaps we might need to handle corner cases for failures such as message
>>>>> broker down, or message loss, etc.
>>>>>
>>>>>
>>>>>
>>>>> As mentioned, these are just a few proposals that we are planning to
>>>>> investigate via a prototype project. Inject corner cases/failures and try
>>>>> and find ways to handle these cases. I would love to hear more
>>>>> thoughts/questions/suggestions.
>>>>>
>>>>>
>>>>>
>>>>> Thanks and Regards,
>>>>>
>>>>> Gourav Shenoy
>>>>>
>>>>>
>>>>>
>>>>> *From: *Ajinkya Dhamnaskar <ad...@umail.iu.edu>
>>>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>>> *Date: *Thursday, February 2, 2017 at 2:22 AM
>>>>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>>> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
>>>>> Management for Airavata
>>>>>
>>>>>
>>>>>
>>>>> Hello all,
>>>>>
>>>>>
>>>>>
>>>>> Just a heads up. Here the name Distributed workload management does
>>>>> not necessarily mean having different instances of a microservice and then
>>>>> distributing work among these instances.
>>>>>
>>>>>
>>>>>
>>>>> Apparently, the problem is how to make each microservice work
>>>>> independently with concrete distributed communication infrastructure. So,
>>>>> think of it as a workflow where each microservice does its part of work and
>>>>> communicates (how? yet to be decided) output. The next underlying
>>>>> microservice identifies and picks up that output and takes it further
>>>>> towards the final outcome, having said that, the crux here is, none of the
>>>>> miscoservices need to worry about other miscoservices in a pipeline.
>>>>>
>>>>>
>>>>>
>>>>> Vidya Sagar,
>>>>>
>>>>> I completely second your opinion of having stateless miscoservices, in
>>>>> fact that is the key. With stateless miscroservices it is difficult to
>>>>> guarantee consistency in a system but it solves the availability problem to
>>>>> some extent. I would be interested to understand what do you mean by "an
>>>>> intelligent job scheduling algorithm, which receives real-time updates from
>>>>> the microservices with their current state information".
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Feb 1, 2017 at 11:48 PM, Vidya Sagar Kalvakunta <
>>>>> vkalvaku@umail.iu.edu> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Feb 1, 2017 at 2:37 PM, Amila Jayasekara <
>>>>> thejaka.amila@gmail.com> wrote:
>>>>>
>>>>> Hi Gourav,
>>>>>
>>>>>
>>>>>
>>>>> Sorry, I did not understand your question. Specifically I am having
>>>>> trouble relating "work load management" to options you suggest (RPC,
>>>>> message based etc.).
>>>>>
>>>>> So what exactly you mean by "workload management" ?
>>>>>
>>>>> What is work in this context ?
>>>>>
>>>>>
>>>>>
>>>>> Also, I did not understand what you meant by "the most efficient
>>>>> way". Efficient interms of what ? Are you looking at speed ?
>>>>>
>>>>>
>>>>>
>>>>> As per your suggestions, it seems you are trying to find a way to
>>>>> communicate between micro services. RPC might be troublesome if you need to
>>>>> communicate with processes separated from a firewall.
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> -Thejaka
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Feb 1, 2017 at 12:52 PM, Shenoy, Gourav Ganesh <
>>>>> goshenoy@indiana.edu> wrote:
>>>>>
>>>>> Hello dev, arch,
>>>>>
>>>>>
>>>>>
>>>>> As part of this Spring’17 Advanced Science Gateway Architecture
>>>>> course, we are working on trying to debate and find possible solutions to
>>>>> the issue of managing distributed workloads in Apache Airavata. This leads
>>>>> to the discussion of finding the most efficient way that different Airavata
>>>>> micro-services should communicate and distribute work, in such a way that:
>>>>>
>>>>> 1.       We maintain the ability to scale these micro-services
>>>>> whenever needed (autoscale perhaps?).
>>>>>
>>>>> 2.       Achieve fault tolerance.
>>>>>
>>>>> 3.       We can deploy these micro-services independently, or better
>>>>> in a containerized manner – keeping in mind the ability to use devops for
>>>>> deployment.
>>>>>
>>>>>
>>>>>
>>>>> As of now the options we are exploring are:
>>>>>
>>>>> 1.       RPC based communication
>>>>>
>>>>> 2.       Message based – either master-worker, or work-queue, etc
>>>>>
>>>>> 3.       A combination of both these approaches
>>>>>
>>>>>
>>>>>
>>>>> I am more inclined towards exploring the message based approach, but
>>>>> again there arises the possibility of handling limitations/corner cases of
>>>>> message broker such as downtimes (may be more). In my opinion, having
>>>>> asynchronous communication will help us achieve most of the above-mentioned
>>>>> points. Another debatable issue is making the micro-services implementation
>>>>> stateless, such that we do not have to pass the state information between
>>>>> micro-services.
>>>>>
>>>>>
>>>>>
>>>>> I would love to hear any thoughts/suggestions/comments on this topic
>>>>> and open up a discussion via this mail thread. If there is anything that I
>>>>> have missed which is relevant to this issue, please let me know.
>>>>>
>>>>>
>>>>>
>>>>> Thanks and Regards,
>>>>>
>>>>> Gourav Shenoy
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Hi Gourav,
>>>>>
>>>>>
>>>>>
>>>>> Correct me if I'm wrong, but I think this is a case of the job shop
>>>>> scheduling problem, as we may have 'n' jobs of varying processing
>>>>> times and memory requirements, and we have 'm' microservices with possibly
>>>>> different computing and memory capacities, and we are trying to minimize
>>>>> the makespan <https://en.wikipedia.org/wiki/Makespan>.
>>>>>
>>>>>
>>>>>
>>>>> For this use-case, I'm in favor a highly available and consistent
>>>>> message broker with an intelligent job scheduling algorithm, which receives
>>>>> real-time updates from the microservices with their current state
>>>>> information.
>>>>>
>>>>>
>>>>>
>>>>> As for the state vs stateless implementation, I think that question
>>>>> depends on the functionality of a particular microservice. In a broad
>>>>> sense, the stateless implementation should be preferred as it will scale
>>>>> better horizontally.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Vidya Sagar
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of
>>>>> Informatics and Computing | Indiana University Bloomington | (812)
>>>>> 691-5002 <8126915002> | vkalvaku@iu.edu
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Thanks and regards,
>>>>>
>>>>>
>>>>>
>>>>> Ajinkya Dhamnaskar
>>>>>
>>>>> Student ID : 0003469679
>>>>>
>>>>> Masters (CS)
>>>>>
>>>>> +1 (812) 369- 5416 <(812)%20369-5416>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of
>>>>> Informatics and Computing | Indiana University Bloomington | (812)
>>>>> 691-5002 <8126915002> | vkalvaku@iu.edu
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Thank you
>>>>> Supun Nakandala
>>>>> Dept. Computer Science and Engineering
>>>>> University of Moratuwa
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thank you
>>>> Supun Nakandala
>>>> Dept. Computer Science and Engineering
>>>> University of Moratuwa
>>>>
>>>
>>>
>>
>>
>> --
>> Thank you
>> Supun Nakandala
>> Dept. Computer Science and Engineering
>> University of Moratuwa
>>
>
>


-- 
Thank you
Supun Nakandala
Dept. Computer Science and Engineering
University of Moratuwa

Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Posted by Amila Jayasekara <th...@gmail.com>.
I did not understand scheduling dependency very well, but why monitoring
dependency needs to be part of the DAG ?

Thanks
-Thejaka

On Fri, Feb 10, 2017 at 2:00 PM, Supun Nakandala <su...@gmail.com>
wrote:

> Hi Amila,
>
> By monitoring and scheduling dependencies I meant the following.
>
> Monitoring dependencies: Eg. After a job is submitted to the remote host,
> the DAG execution has to wait until the job completes before proceeding to
> the next task. Currently, we handle this by monitoring emails. A separate
> daemon is checking for emails. So (I think) we can consider waiting for
> this email as having a monitoring dependency to the next task that has to
> be executed.
>
> Scheduling dependency: This is something that we currently don't have a
> use case but which I think soon become as a requirement. For example, when
> submitting jobs to Jetstream(which gives preference to interactive users)
> we have to wait until the system becomes vacant. Thus even though a user
> submits a job that job will have to wait until it is scheduled by an
> external system/call. So my idea was to consider these things as external
> scheduling dependencies. One might argue that scheduling sub-system also
> has to be part of Airavata. But I think we can separate scheduling
> sub-system and execution sub-system by having these scheduling dependencies.
>
> Hope this clarifies your question.
>
> On Fri, Feb 10, 2017 at 1:25 PM, Amila Jayasekara <thejaka.amila@gmail.com
> > wrote:
>
>> What are monitoring dependencies and scheduling dependencies in the
>> execution DAG ?
>>
>> Thanks
>> -Thejaka
>>
>> On Tue, Feb 7, 2017 at 5:47 PM, Supun Nakandala <
>> supun.nakandala@gmail.com> wrote:
>>
>>> Hi Gourav,I agree with your idea of using one “workflow micro-service”
>>> which would basically be the mediator/orchestrator for deciding which
>>> micro-service should be executed next. But I think these components do not
>>> necessarily have to be micro-services but rather conforms to the
>>> master-worker paradigm in some sense. But the trick here is how can we
>>> implement a scalable, fault tolerant system to do distributed workload
>>> management and from CAP theorem what is the property that we are going to
>>> compromise.
>>>
>>> I think you are heading in the right direction. But I would like to add
>>> more details to your solution. Please note that I haven't evaluated these
>>> ideas 100%. Perhaps we can talk more about this in the next class.
>>>
>>> As you have done, I think we should centralize the state information
>>> into one component (orchestrator in our case). From my experience, it is
>>> very hard to achieve consistency in a distributed state setting in the
>>> events of failure.
>>>
>>> Second, to maintain generalizability in Airavata I think we should treat
>>> each application/use-cases as a DAG of execution. For example, HPC job and
>>> a cloud job will have two different DAGs which consists of tasks (data
>>> staging, job submission, out staging etc). These tasks should be short
>>> tasks and should roughly have the same execution time. And having
>>> idempotent tasks is preferable.
>>>
>>> Orchestrator is responsible for executing the DAG and assign tasks to
>>> the workers(how? will follow) based on the control dependencies in the DAG
>>> tasks. In addition to the dependencies generated from tasks I see, there
>>> can be other dependencies to things like monitoring and scheduling which
>>> the orchestrator has to make into account when executing the DAG.
>>>
>>> The next question is how we distribute jobs from Orchestrator to
>>> workers. I think here it is ok to compromise availability in favor of
>>> consistency. I suggest that we use the request/response messaging pattern
>>> which uses a persistent message broker (critical service). In this
>>> architecture, we can safely allow orchestrator or workers to fail without
>>> losing consistency (because of the persistent queue). But if the
>>> orchestrator fails then the availability will go down. One way to overcome
>>> this would be to come up with an orchestrator quorum.Attached figure
>>> summarizes my idea.
>>>
>>> I think we can also evaluate this solution with the concerns that
>>> Shameera pointed out such as can we enable cancel?. Once again it's just my
>>> idea and is open for argument and debate.
>>>
>>>
>>>
>>> [image: Inline image 2]
>>>
>>> Thanks
>>> -Supun
>>>
>>>
>>>
>>> On Tue, Feb 7, 2017 at 10:54 AM, Shenoy, Gourav Ganesh <
>>> goshenoy@indiana.edu> wrote:
>>>
>>>> Hi Supun,
>>>>
>>>>
>>>>
>>>> I agree, but may be for the example I mentioned, multiple
>>>> micro-services might not sound necessary. I was trying to generalize
>>>> towards a scenario where we have multiple independent micro-services (not
>>>> necessarily for task execution). Again, I am not certain if this is the
>>>> right architecture but yours (and other’s) inputs, will definitely help us
>>>> narrow down on the different scenarios we need to exactly focus on. Do let
>>>> me know if I make sense.
>>>>
>>>>
>>>>
>>>> Thanks and Regards,
>>>>
>>>> Gourav Shenoy
>>>>
>>>>
>>>>
>>>> *From: *Supun Nakandala <su...@gmail.com>
>>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>> *Date: *Monday, February 6, 2017 at 12:15 PM
>>>> *To: *dev <de...@airavata.apache.org>
>>>>
>>>> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
>>>> Management for Airavata
>>>>
>>>>
>>>>
>>>> Hi Gourav,
>>>>
>>>>
>>>>
>>>> It is my belief that we don't need a separate microservice to each
>>>> task. I favor a single micro service which can execute all tasks (or in
>>>> other words a generic task execution micro service). Of course, we can have
>>>> many of them when we want to scale. WDYT?
>>>>
>>>>
>>>>
>>>> On Sun, Feb 5, 2017 at 3:07 PM, Shenoy, Gourav Ganesh <
>>>> goshenoy@indiana.edu> wrote:
>>>>
>>>> Hi dev,
>>>>
>>>>
>>>>
>>>> We were brainstorming some potential designs that might help us with
>>>> this problem. One possible option would be to have a “workflow
>>>> micro-service” which would basically be the mediator/orchestrator for
>>>> deciding which micro-service should be executed next – based on the type of
>>>> the job. The motive is to make micro-services independent of the workflow;
>>>> i.e. a micro-service implementation should be not be aware of which
>>>> micro-service will be executed next and we should have a central control of
>>>> deciding this pattern.
>>>>
>>>> Eg: For job type X, the pattern could be A -> B -> C -> D. Whereas for
>>>> job type Y, the pattern could be A -> C -> D; and so on.
>>>>
>>>>
>>>>
>>>> An initial design with this idea looks like follows:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> We would have a common messaging framework (implementation has not been
>>>> decided yet). The database associated with the workflow micro-service could
>>>> be a graph database (maybe?) – again the implementation/technology has not
>>>> been decided yet.
>>>>
>>>>
>>>>
>>>> This is just a proposed design, and I would love to hear your thoughts
>>>> on this and any suggestions/comments if any. If there is anything that we
>>>> are missing or should consider, please do let us know.
>>>>
>>>>
>>>>
>>>> Thanks and Regards,
>>>>
>>>> Gourav Shenoy
>>>>
>>>>
>>>>
>>>> *From: *"Christie, Marcus Aaron" <ma...@iu.edu>
>>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>> *Date: *Friday, February 3, 2017 at 9:21 AM
>>>>
>>>>
>>>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
>>>> Management for Airavata
>>>>
>>>>
>>>>
>>>> Vidya,
>>>>
>>>>
>>>>
>>>> I’m not sure how relevant it is, but it occurs to me that a
>>>> microservice that executes jobs on a cloud requires very little in terms of
>>>> resources to submit and monitor that job on the cloud. It doesn’t really
>>>> matter if the job is a “big” or a “small” job.  So I’m not sure what
>>>> heuristic makes sense regarding distributing work to these job execution
>>>> microservices.  Maybe a simple round robin approach would be sufficient.
>>>>
>>>>
>>>>
>>>> I think a job scheduling algorithm does make sense, however, for a
>>>> higher level component, some sort of metascheduler that understands what
>>>> resources are available on the cloud resources on which the jobs will be
>>>> running.  The metascheduler could create work for the job exection
>>>> microservices to run on particular cloud resources in a way that optimizes
>>>> for some metric (e.g., throughput).
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>>
>>>>
>>>> Marcus
>>>>
>>>>
>>>>
>>>> On Feb 3, 2017, at 3:19 AM, Vidya Sagar Kalvakunta <
>>>> vkalvaku@umail.iu.edu> wrote:
>>>>
>>>>
>>>>
>>>> Ajinkya,
>>>>
>>>>
>>>>
>>>> My scenario is for workload distribution among multiple instances of
>>>> the same microservice.
>>>>
>>>>
>>>>
>>>> If a message broker needs to distribute the available jobs among
>>>> multiple workers, the common approach would be to use round robin or a
>>>> similar algorithm. This approach works best when all the workers are
>>>> similar and the jobs are equal.
>>>>
>>>>
>>>>
>>>> So I think that a genetic or heuristic job scheduling algorithm, which
>>>> is also aware of each of the worker's current state (CPU, RAM, No of Jobs
>>>> processing) can more efficiently distribute the jobs. The workers can
>>>> periodically ping the message broker with their current state info.
>>>>
>>>>
>>>>
>>>> The other advantage of using a customized algorithm is that it can
>>>> be tweaked to use embedded routing, priority or other information in the
>>>> job metadata to resolve all of the concerns raised by Amrutha viz message
>>>> grouping, ordering, repeated messages, etc.
>>>>
>>>>
>>>>
>>>> We can even ensure data privacy, i.e if the workers are spread across
>>>> multiple compute clusters say AWS and IU Big Red and we want to restrict
>>>> certain sensitive jobs to be run only on Big Red.
>>>>
>>>>
>>>>
>>>> Some distributed job scheduling algorithms for cloud computing.
>>>>
>>>>    - http://www.ijimai.org/journal/sites/default/files/files/2013
>>>>    /03/ijimai20132_18_pdf_62825.pdf
>>>>    <http://www.ijimai.org/journal/sites/default/files/files/2013/03/ijimai20132_18_pdf_62825.pdf>
>>>>    - https://arxiv.org/pdf/1404.5528.pdf
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Regards
>>>>
>>>> Vidya Sagar
>>>>
>>>>
>>>>
>>>> On Fri, Feb 3, 2017 at 1:38 AM, Kamat, Amruta Ravalnath <
>>>> arkamat@indiana.edu> wrote:
>>>>
>>>> Hello all,
>>>>
>>>>
>>>>
>>>> Adding more information to the message based approach. Messaging is a
>>>> key strategy employed in many distributed environments. Message queuing is
>>>> ideally suited to performing asynchronous operations. A sender can post a
>>>> message to a queue, but it does not have to wait while the message is
>>>> retrieved and processed. A sender and receiver do not even have to be
>>>> running concurrently.
>>>>
>>>>
>>>>
>>>> With message queuing there can be 2 possible scenarios:
>>>>
>>>>    1. ​Sending and receiving messages using a * single message queue.*
>>>>    2. ​*Sharing a message queue* between many senders and receivers
>>>>
>>>> ​When a message is retrieved, it is removed from the queue. A message
>>>> queue may also support message peeking. This mechanism can be useful if
>>>> several receivers are retrieving messages from the same queue, but each
>>>> receiver only wishes to handle specific messages. The receiver can examine
>>>> the message it has peeked, and decide whether to retrieve the message
>>>> (which removes it from the queue) or leave it on the queue for another
>>>> receiver to handle.
>>>>
>>>>
>>>>
>>>> A few basic message queuing patterns are:
>>>>
>>>>    1. *One-way messaging*: The sender simply posts a message to the
>>>>    queue in the expectation that a receiver will retrieve it and process it at
>>>>    some point.
>>>>    2. *Request/response messaging*: In this pattern a sender posts a
>>>>    message to a queue and expects a response from the receiver. The sender can
>>>>    resend if the message is not delivered. This pattern typically requires
>>>>    some form of correlation to enable the sender to determine which response
>>>>    message corresponds to which request sent to the receiver.
>>>>    3. *Broadcast messaging*: In this pattern a sender posts a message
>>>>    to a queue, and multiple receivers can read a copy of the message. This
>>>>    pattern depends on the message queue being able to disseminate the same
>>>>    message to multiple receivers. There is a queue to which the senders can
>>>>    post messages that include metadata in the form of attributes. Each
>>>>    receiver can create a subscription to the queue, specifying a filter that
>>>>    examines the values of message attributes. Any messages posted to the
>>>>    queue with attribute values that match the filter are automatically
>>>>    forwarded to that subscription.
>>>>
>>>> A solution based on asynchronous messaging might need to address a
>>>> number of concerns:
>>>>
>>>>
>>>>
>>>> *Message ordering, Message grouping: *Process messages either in the
>>>> order they are posted or in a specific order based on priority. Also, there
>>>> may be occasions when it is difficult to eliminate dependencies, and it may
>>>> be necessary to group messages together so that they are all handled by the
>>>> same receiver.
>>>> *Idempotency: *Ideally the message processing logic in a receiver
>>>> should be idempotent so that, if the work performed is repeated, this
>>>> repetition does not change the state of the system.
>>>> *Repeated messages: *Some message queuing systems implement duplicate
>>>> message detection and removal based on message IDs
>>>> *Poison messages: *A poison message is a message that cannot be
>>>> handled, often because it is malformed or contains unexpected information.
>>>> *Message expiration: *A message might have a limited lifetime, and if
>>>> it is not processed within this period it might no longer be relevant and
>>>> should be discarded.
>>>> *Message scheduling: *A message might be temporarily embargoed and
>>>> should not be processed until a specific date and time. The message should
>>>> not be available to a receiver until this time.
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Amruta Kamat
>>>>
>>>> ------------------------------
>>>>
>>>> *From:* Shenoy, Gourav Ganesh <go...@indiana.edu>
>>>> *Sent:* Thursday, February 2, 2017 7:57 PM
>>>> *To:* dev@airavata.apache.org
>>>>
>>>>
>>>> *Subject:* Re: [#Spring17-Airavata-Courses] : Distributed Workload
>>>> Management for Airavata
>>>>
>>>>
>>>>
>>>> Hello all,
>>>>
>>>>
>>>>
>>>> Amila, Sagar, thank you for the response and raising those concerns;
>>>> and apologies because my email resonated the topic of workload management
>>>> in terms of how micro-services communicate. As Ajinkya rightly mentioned,
>>>> there exists some sort of correlation between micro-services communication
>>>> and it’s impact on how that micro-service performs the work under those
>>>> circumstances. The goal is to make sure we have maximum independence
>>>> between micro-services, and investigate the workflow pattern in which these
>>>> micro-services will operate such that we can find the right balance between
>>>> availability & consistency. Again, from our preliminary analysis we can
>>>> assert that these solutions may not be generic and the specific use-case
>>>> will have a big decisive role.
>>>>
>>>>
>>>>
>>>> For starters, we are focusing on the following example – and I think
>>>> this will clarify the doubts on what we are exactly trying to investigate
>>>> about.
>>>>
>>>>
>>>>
>>>> *Our test example *
>>>>
>>>> Say we have the following 4 micro-services, which each perform a
>>>> specific task as mentioned in the box.
>>>>
>>>>
>>>>
>>>> <image001.png>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *A state-full pattern to distribute work*
>>>>
>>>> <image002.png>
>>>>
>>>>
>>>>
>>>> Here each communication between micro-services could be via RPC or
>>>> Messaging (eg: RabbitMQ). Obvious disadvantage is that if any micro-service
>>>> is down, then the system availability is at stake. In this test example, we
>>>> can see that Microservice-A coordinates the work and maintains the state
>>>> information.
>>>>
>>>>
>>>>
>>>> *A state-less pattern to distribute work*
>>>>
>>>>
>>>>
>>>> <image003.png>
>>>>
>>>>
>>>>
>>>> Another purely asynchronous approach would be to associate
>>>> message-queues with each micro-service, where each micro-service performs
>>>> it’s task, submits a request (message on bus) to the next micro-service,
>>>> and continues to process more requests. This ensures more availability, and
>>>> perhaps we might need to handle corner cases for failures such as message
>>>> broker down, or message loss, etc.
>>>>
>>>>
>>>>
>>>> As mentioned, these are just a few proposals that we are planning to
>>>> investigate via a prototype project. Inject corner cases/failures and try
>>>> and find ways to handle these cases. I would love to hear more
>>>> thoughts/questions/suggestions.
>>>>
>>>>
>>>>
>>>> Thanks and Regards,
>>>>
>>>> Gourav Shenoy
>>>>
>>>>
>>>>
>>>> *From: *Ajinkya Dhamnaskar <ad...@umail.iu.edu>
>>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>> *Date: *Thursday, February 2, 2017 at 2:22 AM
>>>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
>>>> Management for Airavata
>>>>
>>>>
>>>>
>>>> Hello all,
>>>>
>>>>
>>>>
>>>> Just a heads up. Here the name Distributed workload management does not
>>>> necessarily mean having different instances of a microservice and then
>>>> distributing work among these instances.
>>>>
>>>>
>>>>
>>>> Apparently, the problem is how to make each microservice work
>>>> independently with concrete distributed communication infrastructure. So,
>>>> think of it as a workflow where each microservice does its part of work and
>>>> communicates (how? yet to be decided) output. The next underlying
>>>> microservice identifies and picks up that output and takes it further
>>>> towards the final outcome, having said that, the crux here is, none of the
>>>> miscoservices need to worry about other miscoservices in a pipeline.
>>>>
>>>>
>>>>
>>>> Vidya Sagar,
>>>>
>>>> I completely second your opinion of having stateless miscoservices, in
>>>> fact that is the key. With stateless miscroservices it is difficult to
>>>> guarantee consistency in a system but it solves the availability problem to
>>>> some extent. I would be interested to understand what do you mean by "an
>>>> intelligent job scheduling algorithm, which receives real-time updates from
>>>> the microservices with their current state information".
>>>>
>>>>
>>>>
>>>> On Wed, Feb 1, 2017 at 11:48 PM, Vidya Sagar Kalvakunta <
>>>> vkalvaku@umail.iu.edu> wrote:
>>>>
>>>>
>>>>
>>>> On Wed, Feb 1, 2017 at 2:37 PM, Amila Jayasekara <
>>>> thejaka.amila@gmail.com> wrote:
>>>>
>>>> Hi Gourav,
>>>>
>>>>
>>>>
>>>> Sorry, I did not understand your question. Specifically I am having
>>>> trouble relating "work load management" to options you suggest (RPC,
>>>> message based etc.).
>>>>
>>>> So what exactly you mean by "workload management" ?
>>>>
>>>> What is work in this context ?
>>>>
>>>>
>>>>
>>>> Also, I did not understand what you meant by "the most efficient way".
>>>> Efficient interms of what ? Are you looking at speed ?
>>>>
>>>>
>>>>
>>>> As per your suggestions, it seems you are trying to find a way to
>>>> communicate between micro services. RPC might be troublesome if you need to
>>>> communicate with processes separated from a firewall.
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>> -Thejaka
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Feb 1, 2017 at 12:52 PM, Shenoy, Gourav Ganesh <
>>>> goshenoy@indiana.edu> wrote:
>>>>
>>>> Hello dev, arch,
>>>>
>>>>
>>>>
>>>> As part of this Spring’17 Advanced Science Gateway Architecture course,
>>>> we are working on trying to debate and find possible solutions to the issue
>>>> of managing distributed workloads in Apache Airavata. This leads to the
>>>> discussion of finding the most efficient way that different Airavata
>>>> micro-services should communicate and distribute work, in such a way that:
>>>>
>>>> 1.       We maintain the ability to scale these micro-services
>>>> whenever needed (autoscale perhaps?).
>>>>
>>>> 2.       Achieve fault tolerance.
>>>>
>>>> 3.       We can deploy these micro-services independently, or better
>>>> in a containerized manner – keeping in mind the ability to use devops for
>>>> deployment.
>>>>
>>>>
>>>>
>>>> As of now the options we are exploring are:
>>>>
>>>> 1.       RPC based communication
>>>>
>>>> 2.       Message based – either master-worker, or work-queue, etc
>>>>
>>>> 3.       A combination of both these approaches
>>>>
>>>>
>>>>
>>>> I am more inclined towards exploring the message based approach, but
>>>> again there arises the possibility of handling limitations/corner cases of
>>>> message broker such as downtimes (may be more). In my opinion, having
>>>> asynchronous communication will help us achieve most of the above-mentioned
>>>> points. Another debatable issue is making the micro-services implementation
>>>> stateless, such that we do not have to pass the state information between
>>>> micro-services.
>>>>
>>>>
>>>>
>>>> I would love to hear any thoughts/suggestions/comments on this topic
>>>> and open up a discussion via this mail thread. If there is anything that I
>>>> have missed which is relevant to this issue, please let me know.
>>>>
>>>>
>>>>
>>>> Thanks and Regards,
>>>>
>>>> Gourav Shenoy
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Hi Gourav,
>>>>
>>>>
>>>>
>>>> Correct me if I'm wrong, but I think this is a case of the job shop
>>>> scheduling problem, as we may have 'n' jobs of varying processing
>>>> times and memory requirements, and we have 'm' microservices with possibly
>>>> different computing and memory capacities, and we are trying to minimize
>>>> the makespan <https://en.wikipedia.org/wiki/Makespan>.
>>>>
>>>>
>>>>
>>>> For this use-case, I'm in favor a highly available and consistent
>>>> message broker with an intelligent job scheduling algorithm, which receives
>>>> real-time updates from the microservices with their current state
>>>> information.
>>>>
>>>>
>>>>
>>>> As for the state vs stateless implementation, I think that question
>>>> depends on the functionality of a particular microservice. In a broad
>>>> sense, the stateless implementation should be preferred as it will scale
>>>> better horizontally.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Vidya Sagar
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of
>>>> Informatics and Computing | Indiana University Bloomington | (812)
>>>> 691-5002 <8126915002> | vkalvaku@iu.edu
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Thanks and regards,
>>>>
>>>>
>>>>
>>>> Ajinkya Dhamnaskar
>>>>
>>>> Student ID : 0003469679
>>>>
>>>> Masters (CS)
>>>>
>>>> +1 (812) 369- 5416 <(812)%20369-5416>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of
>>>> Informatics and Computing | Indiana University Bloomington | (812)
>>>> 691-5002 <8126915002> | vkalvaku@iu.edu
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Thank you
>>>> Supun Nakandala
>>>> Dept. Computer Science and Engineering
>>>> University of Moratuwa
>>>>
>>>
>>>
>>>
>>> --
>>> Thank you
>>> Supun Nakandala
>>> Dept. Computer Science and Engineering
>>> University of Moratuwa
>>>
>>
>>
>
>
> --
> Thank you
> Supun Nakandala
> Dept. Computer Science and Engineering
> University of Moratuwa
>

Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Posted by Supun Nakandala <su...@gmail.com>.
Hi Amila,

By monitoring and scheduling dependencies I meant the following.

Monitoring dependencies: Eg. After a job is submitted to the remote host,
the DAG execution has to wait until the job completes before proceeding to
the next task. Currently, we handle this by monitoring emails. A separate
daemon is checking for emails. So (I think) we can consider waiting for
this email as having a monitoring dependency to the next task that has to
be executed.

Scheduling dependency: This is something that we currently don't have a use
case but which I think soon become as a requirement. For example, when
submitting jobs to Jetstream(which gives preference to interactive users)
we have to wait until the system becomes vacant. Thus even though a user
submits a job that job will have to wait until it is scheduled by an
external system/call. So my idea was to consider these things as external
scheduling dependencies. One might argue that scheduling sub-system also
has to be part of Airavata. But I think we can separate scheduling
sub-system and execution sub-system by having these scheduling dependencies.

Hope this clarifies your question.

On Fri, Feb 10, 2017 at 1:25 PM, Amila Jayasekara <th...@gmail.com>
wrote:

> What are monitoring dependencies and scheduling dependencies in the
> execution DAG ?
>
> Thanks
> -Thejaka
>
> On Tue, Feb 7, 2017 at 5:47 PM, Supun Nakandala <supun.nakandala@gmail.com
> > wrote:
>
>> Hi Gourav,I agree with your idea of using one “workflow micro-service”
>> which would basically be the mediator/orchestrator for deciding which
>> micro-service should be executed next. But I think these components do not
>> necessarily have to be micro-services but rather conforms to the
>> master-worker paradigm in some sense. But the trick here is how can we
>> implement a scalable, fault tolerant system to do distributed workload
>> management and from CAP theorem what is the property that we are going to
>> compromise.
>>
>> I think you are heading in the right direction. But I would like to add
>> more details to your solution. Please note that I haven't evaluated these
>> ideas 100%. Perhaps we can talk more about this in the next class.
>>
>> As you have done, I think we should centralize the state information into
>> one component (orchestrator in our case). From my experience, it is very
>> hard to achieve consistency in a distributed state setting in the events of
>> failure.
>>
>> Second, to maintain generalizability in Airavata I think we should treat
>> each application/use-cases as a DAG of execution. For example, HPC job and
>> a cloud job will have two different DAGs which consists of tasks (data
>> staging, job submission, out staging etc). These tasks should be short
>> tasks and should roughly have the same execution time. And having
>> idempotent tasks is preferable.
>>
>> Orchestrator is responsible for executing the DAG and assign tasks to the
>> workers(how? will follow) based on the control dependencies in the DAG
>> tasks. In addition to the dependencies generated from tasks I see, there
>> can be other dependencies to things like monitoring and scheduling which
>> the orchestrator has to make into account when executing the DAG.
>>
>> The next question is how we distribute jobs from Orchestrator to workers.
>> I think here it is ok to compromise availability in favor of consistency. I
>> suggest that we use the request/response messaging pattern which uses a
>> persistent message broker (critical service). In this architecture, we can
>> safely allow orchestrator or workers to fail without losing consistency
>> (because of the persistent queue). But if the orchestrator fails then the
>> availability will go down. One way to overcome this would be to come
>> up with an orchestrator quorum.Attached figure summarizes my idea.
>>
>> I think we can also evaluate this solution with the concerns that
>> Shameera pointed out such as can we enable cancel?. Once again it's just my
>> idea and is open for argument and debate.
>>
>>
>>
>> [image: Inline image 2]
>>
>> Thanks
>> -Supun
>>
>>
>>
>> On Tue, Feb 7, 2017 at 10:54 AM, Shenoy, Gourav Ganesh <
>> goshenoy@indiana.edu> wrote:
>>
>>> Hi Supun,
>>>
>>>
>>>
>>> I agree, but may be for the example I mentioned, multiple micro-services
>>> might not sound necessary. I was trying to generalize towards a scenario
>>> where we have multiple independent micro-services (not necessarily for task
>>> execution). Again, I am not certain if this is the right architecture but
>>> yours (and other’s) inputs, will definitely help us narrow down on the
>>> different scenarios we need to exactly focus on. Do let me know if I make
>>> sense.
>>>
>>>
>>>
>>> Thanks and Regards,
>>>
>>> Gourav Shenoy
>>>
>>>
>>>
>>> *From: *Supun Nakandala <su...@gmail.com>
>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>> *Date: *Monday, February 6, 2017 at 12:15 PM
>>> *To: *dev <de...@airavata.apache.org>
>>>
>>> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
>>> Management for Airavata
>>>
>>>
>>>
>>> Hi Gourav,
>>>
>>>
>>>
>>> It is my belief that we don't need a separate microservice to each task.
>>> I favor a single micro service which can execute all tasks (or in other
>>> words a generic task execution micro service). Of course, we can have many
>>> of them when we want to scale. WDYT?
>>>
>>>
>>>
>>> On Sun, Feb 5, 2017 at 3:07 PM, Shenoy, Gourav Ganesh <
>>> goshenoy@indiana.edu> wrote:
>>>
>>> Hi dev,
>>>
>>>
>>>
>>> We were brainstorming some potential designs that might help us with
>>> this problem. One possible option would be to have a “workflow
>>> micro-service” which would basically be the mediator/orchestrator for
>>> deciding which micro-service should be executed next – based on the type of
>>> the job. The motive is to make micro-services independent of the workflow;
>>> i.e. a micro-service implementation should be not be aware of which
>>> micro-service will be executed next and we should have a central control of
>>> deciding this pattern.
>>>
>>> Eg: For job type X, the pattern could be A -> B -> C -> D. Whereas for
>>> job type Y, the pattern could be A -> C -> D; and so on.
>>>
>>>
>>>
>>> An initial design with this idea looks like follows:
>>>
>>>
>>>
>>>
>>>
>>> We would have a common messaging framework (implementation has not been
>>> decided yet). The database associated with the workflow micro-service could
>>> be a graph database (maybe?) – again the implementation/technology has not
>>> been decided yet.
>>>
>>>
>>>
>>> This is just a proposed design, and I would love to hear your thoughts
>>> on this and any suggestions/comments if any. If there is anything that we
>>> are missing or should consider, please do let us know.
>>>
>>>
>>>
>>> Thanks and Regards,
>>>
>>> Gourav Shenoy
>>>
>>>
>>>
>>> *From: *"Christie, Marcus Aaron" <ma...@iu.edu>
>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>> *Date: *Friday, February 3, 2017 at 9:21 AM
>>>
>>>
>>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
>>> Management for Airavata
>>>
>>>
>>>
>>> Vidya,
>>>
>>>
>>>
>>> I’m not sure how relevant it is, but it occurs to me that a microservice
>>> that executes jobs on a cloud requires very little in terms of resources to
>>> submit and monitor that job on the cloud. It doesn’t really matter if the
>>> job is a “big” or a “small” job.  So I’m not sure what heuristic makes
>>> sense regarding distributing work to these job execution microservices.
>>> Maybe a simple round robin approach would be sufficient.
>>>
>>>
>>>
>>> I think a job scheduling algorithm does make sense, however, for a
>>> higher level component, some sort of metascheduler that understands what
>>> resources are available on the cloud resources on which the jobs will be
>>> running.  The metascheduler could create work for the job exection
>>> microservices to run on particular cloud resources in a way that optimizes
>>> for some metric (e.g., throughput).
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Marcus
>>>
>>>
>>>
>>> On Feb 3, 2017, at 3:19 AM, Vidya Sagar Kalvakunta <
>>> vkalvaku@umail.iu.edu> wrote:
>>>
>>>
>>>
>>> Ajinkya,
>>>
>>>
>>>
>>> My scenario is for workload distribution among multiple instances of the
>>> same microservice.
>>>
>>>
>>>
>>> If a message broker needs to distribute the available jobs among
>>> multiple workers, the common approach would be to use round robin or a
>>> similar algorithm. This approach works best when all the workers are
>>> similar and the jobs are equal.
>>>
>>>
>>>
>>> So I think that a genetic or heuristic job scheduling algorithm, which
>>> is also aware of each of the worker's current state (CPU, RAM, No of Jobs
>>> processing) can more efficiently distribute the jobs. The workers can
>>> periodically ping the message broker with their current state info.
>>>
>>>
>>>
>>> The other advantage of using a customized algorithm is that it can
>>> be tweaked to use embedded routing, priority or other information in the
>>> job metadata to resolve all of the concerns raised by Amrutha viz message
>>> grouping, ordering, repeated messages, etc.
>>>
>>>
>>>
>>> We can even ensure data privacy, i.e if the workers are spread across
>>> multiple compute clusters say AWS and IU Big Red and we want to restrict
>>> certain sensitive jobs to be run only on Big Red.
>>>
>>>
>>>
>>> Some distributed job scheduling algorithms for cloud computing.
>>>
>>>    - http://www.ijimai.org/journal/sites/default/files/files/2013
>>>    /03/ijimai20132_18_pdf_62825.pdf
>>>    <http://www.ijimai.org/journal/sites/default/files/files/2013/03/ijimai20132_18_pdf_62825.pdf>
>>>    - https://arxiv.org/pdf/1404.5528.pdf
>>>
>>>
>>>
>>>
>>>
>>> Regards
>>>
>>> Vidya Sagar
>>>
>>>
>>>
>>> On Fri, Feb 3, 2017 at 1:38 AM, Kamat, Amruta Ravalnath <
>>> arkamat@indiana.edu> wrote:
>>>
>>> Hello all,
>>>
>>>
>>>
>>> Adding more information to the message based approach. Messaging is a
>>> key strategy employed in many distributed environments. Message queuing is
>>> ideally suited to performing asynchronous operations. A sender can post a
>>> message to a queue, but it does not have to wait while the message is
>>> retrieved and processed. A sender and receiver do not even have to be
>>> running concurrently.
>>>
>>>
>>>
>>> With message queuing there can be 2 possible scenarios:
>>>
>>>    1. ​Sending and receiving messages using a * single message queue.*
>>>    2. ​*Sharing a message queue* between many senders and receivers
>>>
>>> ​When a message is retrieved, it is removed from the queue. A message
>>> queue may also support message peeking. This mechanism can be useful if
>>> several receivers are retrieving messages from the same queue, but each
>>> receiver only wishes to handle specific messages. The receiver can examine
>>> the message it has peeked, and decide whether to retrieve the message
>>> (which removes it from the queue) or leave it on the queue for another
>>> receiver to handle.
>>>
>>>
>>>
>>> A few basic message queuing patterns are:
>>>
>>>    1. *One-way messaging*: The sender simply posts a message to the
>>>    queue in the expectation that a receiver will retrieve it and process it at
>>>    some point.
>>>    2. *Request/response messaging*: In this pattern a sender posts a
>>>    message to a queue and expects a response from the receiver. The sender can
>>>    resend if the message is not delivered. This pattern typically requires
>>>    some form of correlation to enable the sender to determine which response
>>>    message corresponds to which request sent to the receiver.
>>>    3. *Broadcast messaging*: In this pattern a sender posts a message
>>>    to a queue, and multiple receivers can read a copy of the message. This
>>>    pattern depends on the message queue being able to disseminate the same
>>>    message to multiple receivers. There is a queue to which the senders can
>>>    post messages that include metadata in the form of attributes. Each
>>>    receiver can create a subscription to the queue, specifying a filter that
>>>    examines the values of message attributes. Any messages posted to the
>>>    queue with attribute values that match the filter are automatically
>>>    forwarded to that subscription.
>>>
>>> A solution based on asynchronous messaging might need to address a
>>> number of concerns:
>>>
>>>
>>>
>>> *Message ordering, Message grouping: *Process messages either in the
>>> order they are posted or in a specific order based on priority. Also, there
>>> may be occasions when it is difficult to eliminate dependencies, and it may
>>> be necessary to group messages together so that they are all handled by the
>>> same receiver.
>>> *Idempotency: *Ideally the message processing logic in a receiver
>>> should be idempotent so that, if the work performed is repeated, this
>>> repetition does not change the state of the system.
>>> *Repeated messages: *Some message queuing systems implement duplicate
>>> message detection and removal based on message IDs
>>> *Poison messages: *A poison message is a message that cannot be
>>> handled, often because it is malformed or contains unexpected information.
>>> *Message expiration: *A message might have a limited lifetime, and if
>>> it is not processed within this period it might no longer be relevant and
>>> should be discarded.
>>> *Message scheduling: *A message might be temporarily embargoed and
>>> should not be processed until a specific date and time. The message should
>>> not be available to a receiver until this time.
>>>
>>>
>>> Thanks
>>>
>>> Amruta Kamat
>>>
>>> ------------------------------
>>>
>>> *From:* Shenoy, Gourav Ganesh <go...@indiana.edu>
>>> *Sent:* Thursday, February 2, 2017 7:57 PM
>>> *To:* dev@airavata.apache.org
>>>
>>>
>>> *Subject:* Re: [#Spring17-Airavata-Courses] : Distributed Workload
>>> Management for Airavata
>>>
>>>
>>>
>>> Hello all,
>>>
>>>
>>>
>>> Amila, Sagar, thank you for the response and raising those concerns; and
>>> apologies because my email resonated the topic of workload management in
>>> terms of how micro-services communicate. As Ajinkya rightly mentioned,
>>> there exists some sort of correlation between micro-services communication
>>> and it’s impact on how that micro-service performs the work under those
>>> circumstances. The goal is to make sure we have maximum independence
>>> between micro-services, and investigate the workflow pattern in which these
>>> micro-services will operate such that we can find the right balance between
>>> availability & consistency. Again, from our preliminary analysis we can
>>> assert that these solutions may not be generic and the specific use-case
>>> will have a big decisive role.
>>>
>>>
>>>
>>> For starters, we are focusing on the following example – and I think
>>> this will clarify the doubts on what we are exactly trying to investigate
>>> about.
>>>
>>>
>>>
>>> *Our test example *
>>>
>>> Say we have the following 4 micro-services, which each perform a
>>> specific task as mentioned in the box.
>>>
>>>
>>>
>>> <image001.png>
>>>
>>>
>>>
>>>
>>>
>>> *A state-full pattern to distribute work*
>>>
>>> <image002.png>
>>>
>>>
>>>
>>> Here each communication between micro-services could be via RPC or
>>> Messaging (eg: RabbitMQ). Obvious disadvantage is that if any micro-service
>>> is down, then the system availability is at stake. In this test example, we
>>> can see that Microservice-A coordinates the work and maintains the state
>>> information.
>>>
>>>
>>>
>>> *A state-less pattern to distribute work*
>>>
>>>
>>>
>>> <image003.png>
>>>
>>>
>>>
>>> Another purely asynchronous approach would be to associate
>>> message-queues with each micro-service, where each micro-service performs
>>> it’s task, submits a request (message on bus) to the next micro-service,
>>> and continues to process more requests. This ensures more availability, and
>>> perhaps we might need to handle corner cases for failures such as message
>>> broker down, or message loss, etc.
>>>
>>>
>>>
>>> As mentioned, these are just a few proposals that we are planning to
>>> investigate via a prototype project. Inject corner cases/failures and try
>>> and find ways to handle these cases. I would love to hear more
>>> thoughts/questions/suggestions.
>>>
>>>
>>>
>>> Thanks and Regards,
>>>
>>> Gourav Shenoy
>>>
>>>
>>>
>>> *From: *Ajinkya Dhamnaskar <ad...@umail.iu.edu>
>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>> *Date: *Thursday, February 2, 2017 at 2:22 AM
>>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
>>> Management for Airavata
>>>
>>>
>>>
>>> Hello all,
>>>
>>>
>>>
>>> Just a heads up. Here the name Distributed workload management does not
>>> necessarily mean having different instances of a microservice and then
>>> distributing work among these instances.
>>>
>>>
>>>
>>> Apparently, the problem is how to make each microservice work
>>> independently with concrete distributed communication infrastructure. So,
>>> think of it as a workflow where each microservice does its part of work and
>>> communicates (how? yet to be decided) output. The next underlying
>>> microservice identifies and picks up that output and takes it further
>>> towards the final outcome, having said that, the crux here is, none of the
>>> miscoservices need to worry about other miscoservices in a pipeline.
>>>
>>>
>>>
>>> Vidya Sagar,
>>>
>>> I completely second your opinion of having stateless miscoservices, in
>>> fact that is the key. With stateless miscroservices it is difficult to
>>> guarantee consistency in a system but it solves the availability problem to
>>> some extent. I would be interested to understand what do you mean by "an
>>> intelligent job scheduling algorithm, which receives real-time updates from
>>> the microservices with their current state information".
>>>
>>>
>>>
>>> On Wed, Feb 1, 2017 at 11:48 PM, Vidya Sagar Kalvakunta <
>>> vkalvaku@umail.iu.edu> wrote:
>>>
>>>
>>>
>>> On Wed, Feb 1, 2017 at 2:37 PM, Amila Jayasekara <
>>> thejaka.amila@gmail.com> wrote:
>>>
>>> Hi Gourav,
>>>
>>>
>>>
>>> Sorry, I did not understand your question. Specifically I am having
>>> trouble relating "work load management" to options you suggest (RPC,
>>> message based etc.).
>>>
>>> So what exactly you mean by "workload management" ?
>>>
>>> What is work in this context ?
>>>
>>>
>>>
>>> Also, I did not understand what you meant by "the most efficient way".
>>> Efficient interms of what ? Are you looking at speed ?
>>>
>>>
>>>
>>> As per your suggestions, it seems you are trying to find a way to
>>> communicate between micro services. RPC might be troublesome if you need to
>>> communicate with processes separated from a firewall.
>>>
>>>
>>>
>>> Thanks
>>>
>>> -Thejaka
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Feb 1, 2017 at 12:52 PM, Shenoy, Gourav Ganesh <
>>> goshenoy@indiana.edu> wrote:
>>>
>>> Hello dev, arch,
>>>
>>>
>>>
>>> As part of this Spring’17 Advanced Science Gateway Architecture course,
>>> we are working on trying to debate and find possible solutions to the issue
>>> of managing distributed workloads in Apache Airavata. This leads to the
>>> discussion of finding the most efficient way that different Airavata
>>> micro-services should communicate and distribute work, in such a way that:
>>>
>>> 1.       We maintain the ability to scale these micro-services whenever
>>> needed (autoscale perhaps?).
>>>
>>> 2.       Achieve fault tolerance.
>>>
>>> 3.       We can deploy these micro-services independently, or better in
>>> a containerized manner – keeping in mind the ability to use devops for
>>> deployment.
>>>
>>>
>>>
>>> As of now the options we are exploring are:
>>>
>>> 1.       RPC based communication
>>>
>>> 2.       Message based – either master-worker, or work-queue, etc
>>>
>>> 3.       A combination of both these approaches
>>>
>>>
>>>
>>> I am more inclined towards exploring the message based approach, but
>>> again there arises the possibility of handling limitations/corner cases of
>>> message broker such as downtimes (may be more). In my opinion, having
>>> asynchronous communication will help us achieve most of the above-mentioned
>>> points. Another debatable issue is making the micro-services implementation
>>> stateless, such that we do not have to pass the state information between
>>> micro-services.
>>>
>>>
>>>
>>> I would love to hear any thoughts/suggestions/comments on this topic and
>>> open up a discussion via this mail thread. If there is anything that I have
>>> missed which is relevant to this issue, please let me know.
>>>
>>>
>>>
>>> Thanks and Regards,
>>>
>>> Gourav Shenoy
>>>
>>>
>>>
>>>
>>>
>>> Hi Gourav,
>>>
>>>
>>>
>>> Correct me if I'm wrong, but I think this is a case of the job shop
>>> scheduling problem, as we may have 'n' jobs of varying processing times
>>> and memory requirements, and we have 'm' microservices with possibly
>>> different computing and memory capacities, and we are trying to minimize
>>> the makespan <https://en.wikipedia.org/wiki/Makespan>.
>>>
>>>
>>>
>>> For this use-case, I'm in favor a highly available and consistent
>>> message broker with an intelligent job scheduling algorithm, which receives
>>> real-time updates from the microservices with their current state
>>> information.
>>>
>>>
>>>
>>> As for the state vs stateless implementation, I think that question
>>> depends on the functionality of a particular microservice. In a broad
>>> sense, the stateless implementation should be preferred as it will scale
>>> better horizontally.
>>>
>>>
>>>
>>>
>>>
>>> Regards,
>>>
>>> Vidya Sagar
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of
>>> Informatics and Computing | Indiana University Bloomington | (812)
>>> 691-5002 <8126915002> | vkalvaku@iu.edu
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Thanks and regards,
>>>
>>>
>>>
>>> Ajinkya Dhamnaskar
>>>
>>> Student ID : 0003469679
>>>
>>> Masters (CS)
>>>
>>> +1 (812) 369- 5416 <(812)%20369-5416>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of
>>> Informatics and Computing | Indiana University Bloomington | (812)
>>> 691-5002 <8126915002> | vkalvaku@iu.edu
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Thank you
>>> Supun Nakandala
>>> Dept. Computer Science and Engineering
>>> University of Moratuwa
>>>
>>
>>
>>
>> --
>> Thank you
>> Supun Nakandala
>> Dept. Computer Science and Engineering
>> University of Moratuwa
>>
>
>


-- 
Thank you
Supun Nakandala
Dept. Computer Science and Engineering
University of Moratuwa

Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Posted by Amila Jayasekara <th...@gmail.com>.
What are monitoring dependencies and scheduling dependencies in the
execution DAG ?

Thanks
-Thejaka

On Tue, Feb 7, 2017 at 5:47 PM, Supun Nakandala <su...@gmail.com>
wrote:

> Hi Gourav,I agree with your idea of using one “workflow micro-service”
> which would basically be the mediator/orchestrator for deciding which
> micro-service should be executed next. But I think these components do not
> necessarily have to be micro-services but rather conforms to the
> master-worker paradigm in some sense. But the trick here is how can we
> implement a scalable, fault tolerant system to do distributed workload
> management and from CAP theorem what is the property that we are going to
> compromise.
>
> I think you are heading in the right direction. But I would like to add
> more details to your solution. Please note that I haven't evaluated these
> ideas 100%. Perhaps we can talk more about this in the next class.
>
> As you have done, I think we should centralize the state information into
> one component (orchestrator in our case). From my experience, it is very
> hard to achieve consistency in a distributed state setting in the events of
> failure.
>
> Second, to maintain generalizability in Airavata I think we should treat
> each application/use-cases as a DAG of execution. For example, HPC job and
> a cloud job will have two different DAGs which consists of tasks (data
> staging, job submission, out staging etc). These tasks should be short
> tasks and should roughly have the same execution time. And having
> idempotent tasks is preferable.
>
> Orchestrator is responsible for executing the DAG and assign tasks to the
> workers(how? will follow) based on the control dependencies in the DAG
> tasks. In addition to the dependencies generated from tasks I see, there
> can be other dependencies to things like monitoring and scheduling which
> the orchestrator has to make into account when executing the DAG.
>
> The next question is how we distribute jobs from Orchestrator to workers.
> I think here it is ok to compromise availability in favor of consistency. I
> suggest that we use the request/response messaging pattern which uses a
> persistent message broker (critical service). In this architecture, we can
> safely allow orchestrator or workers to fail without losing consistency
> (because of the persistent queue). But if the orchestrator fails then the
> availability will go down. One way to overcome this would be to come
> up with an orchestrator quorum.Attached figure summarizes my idea.
>
> I think we can also evaluate this solution with the concerns that Shameera
> pointed out such as can we enable cancel?. Once again it's just my idea and
> is open for argument and debate.
>
>
>
> [image: Inline image 2]
>
> Thanks
> -Supun
>
>
>
> On Tue, Feb 7, 2017 at 10:54 AM, Shenoy, Gourav Ganesh <
> goshenoy@indiana.edu> wrote:
>
>> Hi Supun,
>>
>>
>>
>> I agree, but may be for the example I mentioned, multiple micro-services
>> might not sound necessary. I was trying to generalize towards a scenario
>> where we have multiple independent micro-services (not necessarily for task
>> execution). Again, I am not certain if this is the right architecture but
>> yours (and other’s) inputs, will definitely help us narrow down on the
>> different scenarios we need to exactly focus on. Do let me know if I make
>> sense.
>>
>>
>>
>> Thanks and Regards,
>>
>> Gourav Shenoy
>>
>>
>>
>> *From: *Supun Nakandala <su...@gmail.com>
>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Date: *Monday, February 6, 2017 at 12:15 PM
>> *To: *dev <de...@airavata.apache.org>
>>
>> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
>> Management for Airavata
>>
>>
>>
>> Hi Gourav,
>>
>>
>>
>> It is my belief that we don't need a separate microservice to each task.
>> I favor a single micro service which can execute all tasks (or in other
>> words a generic task execution micro service). Of course, we can have many
>> of them when we want to scale. WDYT?
>>
>>
>>
>> On Sun, Feb 5, 2017 at 3:07 PM, Shenoy, Gourav Ganesh <
>> goshenoy@indiana.edu> wrote:
>>
>> Hi dev,
>>
>>
>>
>> We were brainstorming some potential designs that might help us with this
>> problem. One possible option would be to have a “workflow micro-service”
>> which would basically be the mediator/orchestrator for deciding which
>> micro-service should be executed next – based on the type of the job. The
>> motive is to make micro-services independent of the workflow; i.e. a
>> micro-service implementation should be not be aware of which micro-service
>> will be executed next and we should have a central control of deciding this
>> pattern.
>>
>> Eg: For job type X, the pattern could be A -> B -> C -> D. Whereas for
>> job type Y, the pattern could be A -> C -> D; and so on.
>>
>>
>>
>> An initial design with this idea looks like follows:
>>
>>
>>
>>
>>
>> We would have a common messaging framework (implementation has not been
>> decided yet). The database associated with the workflow micro-service could
>> be a graph database (maybe?) – again the implementation/technology has not
>> been decided yet.
>>
>>
>>
>> This is just a proposed design, and I would love to hear your thoughts on
>> this and any suggestions/comments if any. If there is anything that we are
>> missing or should consider, please do let us know.
>>
>>
>>
>> Thanks and Regards,
>>
>> Gourav Shenoy
>>
>>
>>
>> *From: *"Christie, Marcus Aaron" <ma...@iu.edu>
>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Date: *Friday, February 3, 2017 at 9:21 AM
>>
>>
>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
>> Management for Airavata
>>
>>
>>
>> Vidya,
>>
>>
>>
>> I’m not sure how relevant it is, but it occurs to me that a microservice
>> that executes jobs on a cloud requires very little in terms of resources to
>> submit and monitor that job on the cloud. It doesn’t really matter if the
>> job is a “big” or a “small” job.  So I’m not sure what heuristic makes
>> sense regarding distributing work to these job execution microservices.
>> Maybe a simple round robin approach would be sufficient.
>>
>>
>>
>> I think a job scheduling algorithm does make sense, however, for a higher
>> level component, some sort of metascheduler that understands what resources
>> are available on the cloud resources on which the jobs will be running.
>> The metascheduler could create work for the job exection microservices to
>> run on particular cloud resources in a way that optimizes for some metric
>> (e.g., throughput).
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Marcus
>>
>>
>>
>> On Feb 3, 2017, at 3:19 AM, Vidya Sagar Kalvakunta <vk...@umail.iu.edu>
>> wrote:
>>
>>
>>
>> Ajinkya,
>>
>>
>>
>> My scenario is for workload distribution among multiple instances of the
>> same microservice.
>>
>>
>>
>> If a message broker needs to distribute the available jobs among multiple
>> workers, the common approach would be to use round robin or a similar
>> algorithm. This approach works best when all the workers are similar and
>> the jobs are equal.
>>
>>
>>
>> So I think that a genetic or heuristic job scheduling algorithm, which is
>> also aware of each of the worker's current state (CPU, RAM, No of Jobs
>> processing) can more efficiently distribute the jobs. The workers can
>> periodically ping the message broker with their current state info.
>>
>>
>>
>> The other advantage of using a customized algorithm is that it can
>> be tweaked to use embedded routing, priority or other information in the
>> job metadata to resolve all of the concerns raised by Amrutha viz message
>> grouping, ordering, repeated messages, etc.
>>
>>
>>
>> We can even ensure data privacy, i.e if the workers are spread across
>> multiple compute clusters say AWS and IU Big Red and we want to restrict
>> certain sensitive jobs to be run only on Big Red.
>>
>>
>>
>> Some distributed job scheduling algorithms for cloud computing.
>>
>>    - http://www.ijimai.org/journal/sites/default/files/files/2013
>>    /03/ijimai20132_18_pdf_62825.pdf
>>    <http://www.ijimai.org/journal/sites/default/files/files/2013/03/ijimai20132_18_pdf_62825.pdf>
>>    - https://arxiv.org/pdf/1404.5528.pdf
>>
>>
>>
>>
>>
>> Regards
>>
>> Vidya Sagar
>>
>>
>>
>> On Fri, Feb 3, 2017 at 1:38 AM, Kamat, Amruta Ravalnath <
>> arkamat@indiana.edu> wrote:
>>
>> Hello all,
>>
>>
>>
>> Adding more information to the message based approach. Messaging is a key
>> strategy employed in many distributed environments. Message queuing is
>> ideally suited to performing asynchronous operations. A sender can post a
>> message to a queue, but it does not have to wait while the message is
>> retrieved and processed. A sender and receiver do not even have to be
>> running concurrently.
>>
>>
>>
>> With message queuing there can be 2 possible scenarios:
>>
>>    1. ​Sending and receiving messages using a * single message queue.*
>>    2. ​*Sharing a message queue* between many senders and receivers
>>
>> ​When a message is retrieved, it is removed from the queue. A message
>> queue may also support message peeking. This mechanism can be useful if
>> several receivers are retrieving messages from the same queue, but each
>> receiver only wishes to handle specific messages. The receiver can examine
>> the message it has peeked, and decide whether to retrieve the message
>> (which removes it from the queue) or leave it on the queue for another
>> receiver to handle.
>>
>>
>>
>> A few basic message queuing patterns are:
>>
>>    1. *One-way messaging*: The sender simply posts a message to the
>>    queue in the expectation that a receiver will retrieve it and process it at
>>    some point.
>>    2. *Request/response messaging*: In this pattern a sender posts a
>>    message to a queue and expects a response from the receiver. The sender can
>>    resend if the message is not delivered. This pattern typically requires
>>    some form of correlation to enable the sender to determine which response
>>    message corresponds to which request sent to the receiver.
>>    3. *Broadcast messaging*: In this pattern a sender posts a message to
>>    a queue, and multiple receivers can read a copy of the message. This
>>    pattern depends on the message queue being able to disseminate the same
>>    message to multiple receivers. There is a queue to which the senders can
>>    post messages that include metadata in the form of attributes. Each
>>    receiver can create a subscription to the queue, specifying a filter that
>>    examines the values of message attributes. Any messages posted to the
>>    queue with attribute values that match the filter are automatically
>>    forwarded to that subscription.
>>
>> A solution based on asynchronous messaging might need to address a number
>> of concerns:
>>
>>
>>
>> *Message ordering, Message grouping: *Process messages either in the
>> order they are posted or in a specific order based on priority. Also, there
>> may be occasions when it is difficult to eliminate dependencies, and it may
>> be necessary to group messages together so that they are all handled by the
>> same receiver.
>> *Idempotency: *Ideally the message processing logic in a receiver should
>> be idempotent so that, if the work performed is repeated, this repetition
>> does not change the state of the system.
>> *Repeated messages: *Some message queuing systems implement duplicate
>> message detection and removal based on message IDs
>> *Poison messages: *A poison message is a message that cannot be handled,
>> often because it is malformed or contains unexpected information.
>> *Message expiration: *A message might have a limited lifetime, and if it
>> is not processed within this period it might no longer be relevant and
>> should be discarded.
>> *Message scheduling: *A message might be temporarily embargoed and
>> should not be processed until a specific date and time. The message should
>> not be available to a receiver until this time.
>>
>>
>> Thanks
>>
>> Amruta Kamat
>>
>> ------------------------------
>>
>> *From:* Shenoy, Gourav Ganesh <go...@indiana.edu>
>> *Sent:* Thursday, February 2, 2017 7:57 PM
>> *To:* dev@airavata.apache.org
>>
>>
>> *Subject:* Re: [#Spring17-Airavata-Courses] : Distributed Workload
>> Management for Airavata
>>
>>
>>
>> Hello all,
>>
>>
>>
>> Amila, Sagar, thank you for the response and raising those concerns; and
>> apologies because my email resonated the topic of workload management in
>> terms of how micro-services communicate. As Ajinkya rightly mentioned,
>> there exists some sort of correlation between micro-services communication
>> and it’s impact on how that micro-service performs the work under those
>> circumstances. The goal is to make sure we have maximum independence
>> between micro-services, and investigate the workflow pattern in which these
>> micro-services will operate such that we can find the right balance between
>> availability & consistency. Again, from our preliminary analysis we can
>> assert that these solutions may not be generic and the specific use-case
>> will have a big decisive role.
>>
>>
>>
>> For starters, we are focusing on the following example – and I think this
>> will clarify the doubts on what we are exactly trying to investigate about.
>>
>>
>>
>> *Our test example *
>>
>> Say we have the following 4 micro-services, which each perform a specific
>> task as mentioned in the box.
>>
>>
>>
>> <image001.png>
>>
>>
>>
>>
>>
>> *A state-full pattern to distribute work*
>>
>> <image002.png>
>>
>>
>>
>> Here each communication between micro-services could be via RPC or
>> Messaging (eg: RabbitMQ). Obvious disadvantage is that if any micro-service
>> is down, then the system availability is at stake. In this test example, we
>> can see that Microservice-A coordinates the work and maintains the state
>> information.
>>
>>
>>
>> *A state-less pattern to distribute work*
>>
>>
>>
>> <image003.png>
>>
>>
>>
>> Another purely asynchronous approach would be to associate message-queues
>> with each micro-service, where each micro-service performs it’s task,
>> submits a request (message on bus) to the next micro-service, and continues
>> to process more requests. This ensures more availability, and perhaps we
>> might need to handle corner cases for failures such as message broker down,
>> or message loss, etc.
>>
>>
>>
>> As mentioned, these are just a few proposals that we are planning to
>> investigate via a prototype project. Inject corner cases/failures and try
>> and find ways to handle these cases. I would love to hear more
>> thoughts/questions/suggestions.
>>
>>
>>
>> Thanks and Regards,
>>
>> Gourav Shenoy
>>
>>
>>
>> *From: *Ajinkya Dhamnaskar <ad...@umail.iu.edu>
>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Date: *Thursday, February 2, 2017 at 2:22 AM
>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
>> Management for Airavata
>>
>>
>>
>> Hello all,
>>
>>
>>
>> Just a heads up. Here the name Distributed workload management does not
>> necessarily mean having different instances of a microservice and then
>> distributing work among these instances.
>>
>>
>>
>> Apparently, the problem is how to make each microservice work
>> independently with concrete distributed communication infrastructure. So,
>> think of it as a workflow where each microservice does its part of work and
>> communicates (how? yet to be decided) output. The next underlying
>> microservice identifies and picks up that output and takes it further
>> towards the final outcome, having said that, the crux here is, none of the
>> miscoservices need to worry about other miscoservices in a pipeline.
>>
>>
>>
>> Vidya Sagar,
>>
>> I completely second your opinion of having stateless miscoservices, in
>> fact that is the key. With stateless miscroservices it is difficult to
>> guarantee consistency in a system but it solves the availability problem to
>> some extent. I would be interested to understand what do you mean by "an
>> intelligent job scheduling algorithm, which receives real-time updates from
>> the microservices with their current state information".
>>
>>
>>
>> On Wed, Feb 1, 2017 at 11:48 PM, Vidya Sagar Kalvakunta <
>> vkalvaku@umail.iu.edu> wrote:
>>
>>
>>
>> On Wed, Feb 1, 2017 at 2:37 PM, Amila Jayasekara <th...@gmail.com>
>> wrote:
>>
>> Hi Gourav,
>>
>>
>>
>> Sorry, I did not understand your question. Specifically I am having
>> trouble relating "work load management" to options you suggest (RPC,
>> message based etc.).
>>
>> So what exactly you mean by "workload management" ?
>>
>> What is work in this context ?
>>
>>
>>
>> Also, I did not understand what you meant by "the most efficient way".
>> Efficient interms of what ? Are you looking at speed ?
>>
>>
>>
>> As per your suggestions, it seems you are trying to find a way to
>> communicate between micro services. RPC might be troublesome if you need to
>> communicate with processes separated from a firewall.
>>
>>
>>
>> Thanks
>>
>> -Thejaka
>>
>>
>>
>>
>>
>> On Wed, Feb 1, 2017 at 12:52 PM, Shenoy, Gourav Ganesh <
>> goshenoy@indiana.edu> wrote:
>>
>> Hello dev, arch,
>>
>>
>>
>> As part of this Spring’17 Advanced Science Gateway Architecture course,
>> we are working on trying to debate and find possible solutions to the issue
>> of managing distributed workloads in Apache Airavata. This leads to the
>> discussion of finding the most efficient way that different Airavata
>> micro-services should communicate and distribute work, in such a way that:
>>
>> 1.       We maintain the ability to scale these micro-services whenever
>> needed (autoscale perhaps?).
>>
>> 2.       Achieve fault tolerance.
>>
>> 3.       We can deploy these micro-services independently, or better in
>> a containerized manner – keeping in mind the ability to use devops for
>> deployment.
>>
>>
>>
>> As of now the options we are exploring are:
>>
>> 1.       RPC based communication
>>
>> 2.       Message based – either master-worker, or work-queue, etc
>>
>> 3.       A combination of both these approaches
>>
>>
>>
>> I am more inclined towards exploring the message based approach, but
>> again there arises the possibility of handling limitations/corner cases of
>> message broker such as downtimes (may be more). In my opinion, having
>> asynchronous communication will help us achieve most of the above-mentioned
>> points. Another debatable issue is making the micro-services implementation
>> stateless, such that we do not have to pass the state information between
>> micro-services.
>>
>>
>>
>> I would love to hear any thoughts/suggestions/comments on this topic and
>> open up a discussion via this mail thread. If there is anything that I have
>> missed which is relevant to this issue, please let me know.
>>
>>
>>
>> Thanks and Regards,
>>
>> Gourav Shenoy
>>
>>
>>
>>
>>
>> Hi Gourav,
>>
>>
>>
>> Correct me if I'm wrong, but I think this is a case of the job shop
>> scheduling problem, as we may have 'n' jobs of varying processing times
>> and memory requirements, and we have 'm' microservices with possibly
>> different computing and memory capacities, and we are trying to minimize
>> the makespan <https://en.wikipedia.org/wiki/Makespan>.
>>
>>
>>
>> For this use-case, I'm in favor a highly available and consistent message
>> broker with an intelligent job scheduling algorithm, which receives
>> real-time updates from the microservices with their current state
>> information.
>>
>>
>>
>> As for the state vs stateless implementation, I think that question
>> depends on the functionality of a particular microservice. In a broad
>> sense, the stateless implementation should be preferred as it will scale
>> better horizontally.
>>
>>
>>
>>
>>
>> Regards,
>>
>> Vidya Sagar
>>
>>
>>
>>
>> --
>>
>> Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of
>> Informatics and Computing | Indiana University Bloomington | (812)
>> 691-5002 <8126915002> | vkalvaku@iu.edu
>>
>>
>>
>>
>>
>> --
>>
>> Thanks and regards,
>>
>>
>>
>> Ajinkya Dhamnaskar
>>
>> Student ID : 0003469679
>>
>> Masters (CS)
>>
>> +1 (812) 369- 5416 <(812)%20369-5416>
>>
>>
>>
>>
>>
>> --
>>
>> Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of
>> Informatics and Computing | Indiana University Bloomington | (812)
>> 691-5002 <8126915002> | vkalvaku@iu.edu
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Thank you
>> Supun Nakandala
>> Dept. Computer Science and Engineering
>> University of Moratuwa
>>
>
>
>
> --
> Thank you
> Supun Nakandala
> Dept. Computer Science and Engineering
> University of Moratuwa
>

Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Posted by Ajinkya Dhamnaskar <ad...@umail.iu.edu>.
Hello Ameya,

I am addressing your concerns inline,
*1.* The diagram depicts services would be deployed as independent jars
bundled in a war to a worker (based off "WAR" in the diagram). So I am
assuming in case we have 3 micro-services, there would be jar1, jar2, jar3
bundled inside war.

Now these services are independent and would be worked on separately with
probably separate releases.
But, having a single deploy-able war may lead to all services getting
re-deployed on a worker node for just a single service upgrade.
Ideally, an incremental build of Service 1 should only push Service 1 code
to the worker.

So probably a separate CI/CD for each component with its own deploy-able
jar instead of a single war would to be a better approach?

Here jar1, jar2, jar3 are not necessarily microservices, these can be
considered as tasks inside a miscoservice. Think of it as a different
implementation of a common interface. As far as CD/CI is concern, now when
all these task implementations are inside single microsevice, it will take
entire worker down while redeploying. Any upgrade or addition of a new task
would eventually be pushed on all the workers.

Key here is, when any new task (new jar) is being added none of the
existing code should be changed, that way we don't need to perform tedious
and time consuming regression testing every time we add new capability on a
worker.

It is possible to add new jars (new tasks) to worker on a fly, in that case
we wont be able to package them inside a war and also we needs some
supporting infrastructure in place which would overkill the solution.


*2.* As per my understanding of the design so far, a Worker is a collection
of implementations i.e. A,B,C,D,etc and the Workers would be scaled
horizontally as needed.
What I would like to clarify is that whether 1 worker would necessarily
have just* 1 implementation of each service* or *could have
nx-implementations of  mx-services*.
A probable scaling issue I see with the former implementation, if it is
what is intended, is that in case only service x needs to be scaled up n
times, then it will have to be achieved by scaling the worker n times, but
it will lead to all the other services being scaled up too. I am not sure
how crucial resources/space are, but if they are, then this strategy might
not be optimal.
The latter implementation, which allows flexibility, would be favorable I
believe.

Now, as everything is a part of a microservice running on workers, we would
need to spawn worker instances for scaling.

Please revert if you still have any concerns.

On Thu, Feb 9, 2017 at 1:55 AM, Ameya Advankar <aa...@umail.iu.edu>
wrote:

> Hi,
>
> The proposed design seems like a feasible solution for workload
> distribution.
> Some queries which I had are as follows -
>
> *1.* The diagram depicts services would be deployed as independent jars
> bundled in a war to a worker (based off "WAR" in the diagram). So I am
> assuming in case we have 3 micro-services, there would be jar1, jar2, jar3
> bundled inside war.
>
> Now these services are independent and would be worked on separately with
> probably separate releases.
> But, having a single deploy-able war may lead to all services getting
> re-deployed on a worker node for just a single service upgrade.
> Ideally, an incremental build of Service 1 should only push Service 1 code
> to the worker.
>
> So probably a separate CI/CD for each component with its own deploy-able
> jar instead of a single war would to be a better approach?
>
>
> *2.* As per my understanding of the design so far, a Worker is a
> collection of implementations i.e. A,B,C,D,etc and the Workers would be
> scaled horizontally as needed.
> What I would like to clarify is that whether 1 worker would necessarily
> have just* 1 implementation of each service* or *could have
> nx-implementations of  mx-services*.
> A probable scaling issue I see with the former implementation, if it is
> what is intended, is that in case only service x needs to be scaled up n
> times, then it will have to be achieved by scaling the worker n times, but
> it will lead to all the other services being scaled up too. I am not sure
> how crucial resources/space are, but if they are, then this strategy might
> not be optimal.
> The latter implementation, which allows flexibility, would be favorable I
> believe.
>
>
> Thanks & Regards,
> Ameya Advankar
>
> On Wed, Feb 8, 2017 at 9:59 PM, Shenoy, Gourav Ganesh <goshenoy@indiana.
> edu> wrote:
>
>> Hi All,
>>
>>
>>
>> As I mentioned before, here is the design we have kind of reached a
>> consensus on (please do provide comments/suggestions). This idea has been
>> motivated from an understanding of the Aurora/Mesos architecture, and how
>> they function.
>>
>>
>>
>>
>>
>> This design has the following benefits:
>>
>> -          Loosely coupled, independent micro-services.
>>
>> -          Inherently scalable in nature.
>>
>> -          Highly available, and consistent architecture.
>>
>> -          Supports incremental upgrade, without the risk of breaking
>> any existing implementation while doing so.
>>
>> -          Ability to add/remove tasks in a DAG, and also add new task
>> implementations (abstraction).
>>
>> -          Custom scheduler provides us greater flexibility (see below).
>>
>>
>>
>> We have the orchestrator (will eventually be HA using zookeeper), which
>> will centrally maintain the state of an experiment – in short the status of
>> the tasks it composes. Based on the type of job request, it will fetch the
>> task execution DAG – this DAG will be made pre-available to the
>> orchestrator via a graph database (debatable), and this DAG is nothing but
>> a definition of sequence of tasks needed for that experiment (not the
>> implementation of tasks).
>>
>>
>>
>> There is a scheduler which will receive a task execution request from the
>> orchestrator, and *decide* which worker will be executing it. each
>> worker here will be analogous to the current Airavata GFAC module which
>> executes the task. We can think of the worker to be a collection of
>> implementations of different tasks. Eg: W1, W2, W3 in figure above will
>> have code to execute tasks A, B, C, D.
>>
>>
>>
>> There are 2 concerns which arise here:
>>
>> -          How does the scheduler know/decide which worker to pass on
>> the task execution to?
>>
>> -          How do we upgrade a worker, say with a new task ‘E’
>> implementation, in such a manner that if something goes wrong with code for
>> ‘E’, the entire worker node should not fail? In short, avoid regression
>> testing the entire worker module.
>>
>>
>>
>> To address the first problem, I suggest we use a paradigm similar to how
>> Aurora agents (workers) report available capabilities to the Aurora master
>> (scheduler). In Aurora, the slave nodes constantly report back to the
>> master how much processing power they have; and accordingly, the master
>> decides which slave to pass a new job request to. In our case, we can have
>> the workers advertise to the scheduler which tasks they are capable of
>> executing and the scheduler acts accordingly.
>>
>>
>>
>> To address the second concern, I suggest we have the task implementations
>> bundled in separate JARs, so that if there is a problem with one task the
>> others don’t get affected and can be “repaired” without impacting other
>> existing tasks impls. There might be better ways to do this, but this is
>> what I could think of right now.
>>
>>
>>
>> As mentioned before, adding a new task implementation – which will need
>> upgrades to all workers will be easy and hassle-free as each worker will
>> report back to the scheduler their capability to handle that new task, as
>> and when upgrade finishes (incremental upgrade). Having a custom scheduler
>> also provides us other benefits such as:
>>
>> -          Handling corner cases – eg: task execution on one worker
>> fails (for some unforeseen reason), then the scheduler can retry it on a
>> different worker.
>>
>> -          Prioritize experiments – scheduler higher priority
>> experiments before normal priority ones (I just made this one up).
>>
>>
>>
>> We have decided to go ahead and start building a prototype of this design
>> starting tomorrow, unless there are any concerns/issues. Please do let me
>> know your views on this approach, as every concern helps us better our
>> design.
>>
>>
>>
>> Thanks and Regards,
>>
>> Gourav Shenoy
>>
>>
>>
>>
>>
>> *From: *"Shenoy, Gourav Ganesh" <go...@indiana.edu>
>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Date: *Wednesday, February 8, 2017 at 7:06 PM
>>
>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
>> Management for Airavata
>>
>>
>>
>> Hi Amruta,
>>
>>
>>
>> Thanks for providing your inputs, and yes in fact we had started out our
>> design discussions with a decentralized framework in mind. But then we
>> considered the problem of making each micro-service independent of each
>> other and more importantly not making them aware of what the DAG is. For
>> this reason, we decided to push and maintain the DAG at a centralized &
>> highly available place (the orchestrator), giving us more control and
>> flexibility in adding/removing tasks from the DAG. This also provides us
>> with the ability to scale each service when needed and also perform
>> incremental upgrades via devops.
>>
>>
>>
>> Do let me know if I make sense, or if there is something I am missing. I
>> would also like to add that we have today nearly come to a consensus on a
>> “fairly good” design – which I will be detailing in another email shortly.
>>
>>
>>
>> Thanks and Regards,
>>
>> Gourav Shenoy
>>
>>
>>
>> *From: *"Kamat, Amruta Ravalnath" <ar...@indiana.edu>
>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Date: *Wednesday, February 8, 2017 at 2:59 AM
>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
>> Management for Airavata
>>
>>
>>
>> Hello Gourav,
>>
>>
>>
>> I agree with your solution, but I just came across a decentralized
>> architecture which might serve our purpose and might provide a looser
>> coupling.
>>
>>
>>
>> Having a common workflow would mean a centralized orchestrator i.e. a
>> process which coordinates with multiple services to complete a larger
>> workflow. The services have no knowledge of the workflow or their specific
>> involvement in it. The orchestrator takes care of the complexities.
>> However, The challenge with an orchestrator is that business logic will
>> build up in a central place.
>>
>> If there is a central shared instance of the orchestrator for all
>> requests, then the orchestrator is a single point of failure. If it goes
>> down, all processing stops.
>>
>>
>>
>> With decentralized interactions, each service takes full responsibility
>> for its role in the greater workflow. It will listen for events from other
>> services, complete it's work as soon as possible, retry if a failure occurs
>> and send out events upon completion. Here, communications tend to be
>> asynchronous and business logic stays within the related services.
>> Instead of having a central orchestrator that controls the logic of what
>> steps happen when, that logic is built into each service ahead of time. The
>> services know what to react to and how, ahead of time. Multiple services
>> can consume the same events, do some processing, and then produce their own
>> events back into the event stream, all at the same time. The event stream
>> does not have any logic and is intended to be a dumb pipe.
>>
>>
>>
>> ​Decentralized interactions meet our requirements better: loose coupling,
>> high cohesion and each service responsible for it's own bounded context.
>>
>>
>>
>> Thanks
>>
>> Amruta Kamat
>> ------------------------------
>>
>> *From:* Shenoy, Gourav Ganesh <go...@indiana.edu>
>> *Sent:* Tuesday, February 7, 2017 11:49 PM
>> *To:* dev@airavata.apache.org
>> *Subject:* Re: [#Spring17-Airavata-Courses] : Distributed Workload
>> Management for Airavata
>>
>>
>>
>> Supun,
>>
>>
>>
>> Thank you for this excellent explanation. I see that the architecture you
>> mentioned covers most of the concerns we discussed in this thread and in
>> class. I just had one clarifying question though – what does “worker”
>> signify here? Is it a generic task execution framework which runs the DAG?
>> Or is it a like a platform where the DAG runs (and how?).
>>
>>
>>
>> Apart from that, I am looking at Storm’s architecture to see if we can
>> get some clues as they are tackling a similar problem. I shall update once
>> I get some concrete answer.
>>
>>
>>
>> Thanks and Regards,
>>
>> Gourav Shenoy
>>
>>
>>
>> *From: *Supun Nakandala <su...@gmail.com>
>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Date: *Tuesday, February 7, 2017 at 5:47 PM
>> *To: *dev <de...@airavata.apache.org>
>> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
>> Management for Airavata
>>
>>
>>
>> Hi Gourav,I agree with your idea of using one “workflow micro-service”
>> which would basically be the mediator/orchestrator for deciding which
>> micro-service should be executed next. But I think these components do not
>> necessarily have to be micro-services but rather conforms to the
>> master-worker paradigm in some sense. But the trick here is how can we
>> implement a scalable, fault tolerant system to do distributed workload
>> management and from CAP theorem what is the property that we are going to
>> compromise.
>>
>>
>>
>> I think you are heading in the right direction. But I would like to add
>> more details to your solution. Please note that I haven't evaluated these
>> ideas 100%. Perhaps we can talk more about this in the next class.
>>
>>
>>
>> As you have done, I think we should centralize the state information into
>> one component (orchestrator in our case). From my experience, it is very
>> hard to achieve consistency in a distributed state setting in the events of
>> failure.
>>
>>
>>
>> Second, to maintain generalizability in Airavata I think we should treat
>> each application/use-cases as a DAG of execution. For example, HPC job and
>> a cloud job will have two different DAGs which consists of tasks (data
>> staging, job submission, out staging etc). These tasks should be short
>> tasks and should roughly have the same execution time. And having
>> idempotent tasks is preferable.
>>
>>
>>
>> Orchestrator is responsible for executing the DAG and assign tasks to the
>> workers(how? will follow) based on the control dependencies in the DAG
>> tasks. In addition to the dependencies generated from tasks I see, there
>> can be other dependencies to things like monitoring and scheduling which
>> the orchestrator has to make into account when executing the DAG.
>>
>>
>>
>> The next question is how we distribute jobs from Orchestrator to workers.
>> I think here it is ok to compromise availability in favor of consistency. I
>> suggest that we use the request/response messaging pattern which uses a
>> persistent message broker (critical service). In this architecture, we can
>> safely allow orchestrator or workers to fail without losing consistency
>> (because of the persistent queue). But if the orchestrator fails then the
>> availability will go down. One way to overcome this would be to come
>> up with an orchestrator quorum.Attached figure summarizes my idea.
>>
>>
>>
>> I think we can also evaluate this solution with the concerns that
>> Shameera pointed out such as can we enable cancel?. Once again it's just my
>> idea and is open for argument and debate.
>>
>>
>>
>>
>>
>>
>>
>> [image: ine image 2]
>>
>>
>>
>> Thanks
>>
>> -Supun
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Feb 7, 2017 at 10:54 AM, Shenoy, Gourav Ganesh <
>> goshenoy@indiana.edu> wrote:
>>
>> Hi Supun,
>>
>>
>>
>> I agree, but may be for the example I mentioned, multiple micro-services
>> might not sound necessary. I was trying to generalize towards a scenario
>> where we have multiple independent micro-services (not necessarily for task
>> execution). Again, I am not certain if this is the right architecture but
>> yours (and other’s) inputs, will definitely help us narrow down on the
>> different scenarios we need to exactly focus on. Do let me know if I make
>> sense.
>>
>>
>>
>> Thanks and Regards,
>>
>> Gourav Shenoy
>>
>>
>>
>> *From: *Supun Nakandala <su...@gmail.com>
>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Date: *Monday, February 6, 2017 at 12:15 PM
>> *To: *dev <de...@airavata.apache.org>
>>
>>
>> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
>> Management for Airavata
>>
>>
>>
>> Hi Gourav,
>>
>>
>>
>> It is my belief that we don't need a separate microservice to each task.
>> I favor a single micro service which can execute all tasks (or in other
>> words a generic task execution micro service). Of course, we can have many
>> of them when we want to scale. WDYT?
>>
>>
>>
>> On Sun, Feb 5, 2017 at 3:07 PM, Shenoy, Gourav Ganesh <
>> goshenoy@indiana.edu> wrote:
>>
>> Hi dev,
>>
>>
>>
>> We were brainstorming some potential designs that might help us with this
>> problem. One possible option would be to have a “workflow micro-service”
>> which would basically be the mediator/orchestrator for deciding which
>> micro-service should be executed next – based on the type of the job. The
>> motive is to make micro-services independent of the workflow; i.e. a
>> micro-service implementation should be not be aware of which micro-service
>> will be executed next and we should have a central control of deciding this
>> pattern.
>>
>> Eg: For job type X, the pattern could be A -> B -> C -> D. Whereas for
>> job type Y, the pattern could be A -> C -> D; and so on.
>>
>>
>>
>> An initial design with this idea looks like follows:
>>
>>
>>
>>
>>
>> We would have a common messaging framework (implementation has not been
>> decided yet). The database associated with the workflow micro-service could
>> be a graph database (maybe?) – again the implementation/technology has not
>> been decided yet.
>>
>>
>>
>> This is just a proposed design, and I would love to hear your thoughts on
>> this and any suggestions/comments if any. If there is anything that we are
>> missing or should consider, please do let us know.
>>
>>
>>
>> Thanks and Regards,
>>
>> Gourav Shenoy
>>
>>
>>
>> *From: *"Christie, Marcus Aaron" <ma...@iu.edu>
>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Date: *Friday, February 3, 2017 at 9:21 AM
>>
>>
>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
>> Management for Airavata
>>
>>
>>
>> Vidya,
>>
>>
>>
>> I’m not sure how relevant it is, but it occurs to me that a microservice
>> that executes jobs on a cloud requires very little in terms of resources to
>> submit and monitor that job on the cloud. It doesn’t really matter if the
>> job is a “big” or a “small” job.  So I’m not sure what heuristic makes
>> sense regarding distributing work to these job execution microservices.
>> Maybe a simple round robin approach would be sufficient.
>>
>>
>>
>> I think a job scheduling algorithm does make sense, however, for a higher
>> level component, some sort of metascheduler that understands what resources
>> are available on the cloud resources on which the jobs will be running.
>> The metascheduler could create work for the job exection microservices to
>> run on particular cloud resources in a way that optimizes for some metric
>> (e.g., throughput).
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Marcus
>>
>>
>>
>> On Feb 3, 2017, at 3:19 AM, Vidya Sagar Kalvakunta <vk...@umail.iu.edu>
>> wrote:
>>
>>
>>
>> Ajinkya,
>>
>>
>>
>> My scenario is for workload distribution among multiple instances of the
>> same microservice.
>>
>>
>>
>> If a message broker needs to distribute the available jobs among multiple
>> workers, the common approach would be to use round robin or a similar
>> algorithm. This approach works best when all the workers are similar and
>> the jobs are equal.
>>
>>
>>
>> So I think that a genetic or heuristic job scheduling algorithm, which is
>> also aware of each of the worker's current state (CPU, RAM, No of Jobs
>> processing) can more efficiently distribute the jobs. The workers can
>> periodically ping the message broker with their current state info.
>>
>>
>>
>> The other advantage of using a customized algorithm is that it can
>> be tweaked to use embedded routing, priority or other information in the
>> job metadata to resolve all of the concerns raised by Amrutha viz message
>> grouping, ordering, repeated messages, etc.
>>
>>
>>
>> We can even ensure data privacy, i.e if the workers are spread across
>> multiple compute clusters say AWS and IU Big Red and we want to restrict
>> certain sensitive jobs to be run only on Big Red.
>>
>>
>>
>> Some distributed job scheduling algorithms for cloud computing.
>>
>>    - http://www.ijimai.org/journal/sites/default/files/files/2013
>>    /03/ijimai20132_18_pdf_62825.pdf
>>    <http://www.ijimai.org/journal/sites/default/files/files/2013/03/ijimai20132_18_pdf_62825.pdf>
>>    - https://arxiv.org/pdf/1404.5528.pdf
>>
>>
>>
>>
>>
>> Regards
>>
>> Vidya Sagar
>>
>>
>>
>> On Fri, Feb 3, 2017 at 1:38 AM, Kamat, Amruta Ravalnath <
>> arkamat@indiana.edu> wrote:
>>
>> Hello all,
>>
>>
>>
>> Adding more information to the message based approach. Messaging is a key
>> strategy employed in many distributed environments. Message queuing is
>> ideally suited to performing asynchronous operations. A sender can post a
>> message to a queue, but it does not have to wait while the message is
>> retrieved and processed. A sender and receiver do not even have to be
>> running concurrently.
>>
>>
>>
>> With message queuing there can be 2 possible scenarios:
>>
>>    1. ​Sending and receiving messages using a * single message queue.*
>>    2. ​*Sharing a message queue* between many senders and receivers
>>
>> ​When a message is retrieved, it is removed from the queue. A message
>> queue may also support message peeking. This mechanism can be useful if
>> several receivers are retrieving messages from the same queue, but each
>> receiver only wishes to handle specific messages. The receiver can examine
>> the message it has peeked, and decide whether to retrieve the message
>> (which removes it from the queue) or leave it on the queue for another
>> receiver to handle.
>>
>>
>>
>> A few basic message queuing patterns are:
>>
>>    1. *One-way messaging*: The sender simply posts a message to the
>>    queue in the expectation that a receiver will retrieve it and process it at
>>    some point.
>>    2. *Request/response messaging*: In this pattern a sender posts a
>>    message to a queue and expects a response from the receiver. The sender can
>>    resend if the message is not delivered. This pattern typically requires
>>    some form of correlation to enable the sender to determine which response
>>    message corresponds to which request sent to the receiver.
>>    3. *Broadcast messaging*: In this pattern a sender posts a message to
>>    a queue, and multiple receivers can read a copy of the message. This
>>    pattern depends on the message queue being able to disseminate the same
>>    message to multiple receivers. There is a queue to which the senders can
>>    post messages that include metadata in the form of attributes. Each
>>    receiver can create a subscription to the queue, specifying a filter that
>>    examines the values of message attributes. Any messages posted to the
>>    queue with attribute values that match the filter are automatically
>>    forwarded to that subscription.
>>
>> A solution based on asynchronous messaging might need to address a number
>> of concerns:
>>
>>
>>
>> *Message ordering, Message grouping: *Process messages either in the
>> order they are posted or in a specific order based on priority. Also, there
>> may be occasions when it is difficult to eliminate dependencies, and it may
>> be necessary to group messages together so that they are all handled by the
>> same receiver.
>> *Idempotency: *Ideally the message processing logic in a receiver should
>> be idempotent so that, if the work performed is repeated, this repetition
>> does not change the state of the system.
>> *Repeated messages: *Some message queuing systems implement duplicate
>> message detection and removal based on message IDs
>> *Poison messages: *A poison message is a message that cannot be handled,
>> often because it is malformed or contains unexpected information.
>> *Message expiration: *A message might have a limited lifetime, and if it
>> is not processed within this period it might no longer be relevant and
>> should be discarded.
>> *Message scheduling: *A message might be temporarily embargoed and
>> should not be processed until a specific date and time. The message should
>> not be available to a receiver until this time.
>>
>>
>> Thanks
>>
>> Amruta Kamat
>> ------------------------------
>>
>> *From:* Shenoy, Gourav Ganesh <go...@indiana.edu>
>> *Sent:* Thursday, February 2, 2017 7:57 PM
>> *To:* dev@airavata.apache.org
>>
>>
>> *Subject:* Re: [#Spring17-Airavata-Courses] : Distributed Workload
>> Management for Airavata
>>
>>
>>
>> Hello all,
>>
>>
>>
>> Amila, Sagar, thank you for the response and raising those concerns; and
>> apologies because my email resonated the topic of workload management in
>> terms of how micro-services communicate. As Ajinkya rightly mentioned,
>> there exists some sort of correlation between micro-services communication
>> and it’s impact on how that micro-service performs the work under those
>> circumstances. The goal is to make sure we have maximum independence
>> between micro-services, and investigate the workflow pattern in which these
>> micro-services will operate such that we can find the right balance between
>> availability & consistency. Again, from our preliminary analysis we can
>> assert that these solutions may not be generic and the specific use-case
>> will have a big decisive role.
>>
>>
>>
>> For starters, we are focusing on the following example – and I think this
>> will clarify the doubts on what we are exactly trying to investigate about.
>>
>>
>>
>> *Our test example *
>>
>> Say we have the following 4 micro-services, which each perform a specific
>> task as mentioned in the box.
>>
>>
>>
>> <image001.png>
>>
>>
>>
>>
>>
>> *A state-full pattern to distribute work*
>>
>> <image002.png>
>>
>>
>>
>> Here each communication between micro-services could be via RPC or
>> Messaging (eg: RabbitMQ). Obvious disadvantage is that if any micro-service
>> is down, then the system availability is at stake. In this test example, we
>> can see that Microservice-A coordinates the work and maintains the state
>> information.
>>
>>
>>
>> *A state-less pattern to distribute work*
>>
>>
>>
>> <image003.png>
>>
>>
>>
>> Another purely asynchronous approach would be to associate message-queues
>> with each micro-service, where each micro-service performs it’s task,
>> submits a request (message on bus) to the next micro-service, and continues
>> to process more requests. This ensures more availability, and perhaps we
>> might need to handle corner cases for failures such as message broker down,
>> or message loss, etc.
>>
>>
>>
>> As mentioned, these are just a few proposals that we are planning to
>> investigate via a prototype project. Inject corner cases/failures and try
>> and find ways to handle these cases. I would love to hear more
>> thoughts/questions/suggestions.
>>
>>
>>
>> Thanks and Regards,
>>
>> Gourav Shenoy
>>
>>
>>
>> *From: *Ajinkya Dhamnaskar <ad...@umail.iu.edu>
>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Date: *Thursday, February 2, 2017 at 2:22 AM
>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
>> Management for Airavata
>>
>>
>>
>> Hello all,
>>
>>
>>
>> Just a heads up. Here the name Distributed workload management does not
>> necessarily mean having different instances of a microservice and then
>> distributing work among these instances.
>>
>>
>>
>> Apparently, the problem is how to make each microservice work
>> independently with concrete distributed communication infrastructure. So,
>> think of it as a workflow where each microservice does its part of work and
>> communicates (how? yet to be decided) output. The next underlying
>> microservice identifies and picks up that output and takes it further
>> towards the final outcome, having said that, the crux here is, none of the
>> miscoservices need to worry about other miscoservices in a pipeline.
>>
>>
>>
>> Vidya Sagar,
>>
>> I completely second your opinion of having stateless miscoservices, in
>> fact that is the key. With stateless miscroservices it is difficult to
>> guarantee consistency in a system but it solves the availability problem to
>> some extent. I would be interested to understand what do you mean by "an
>> intelligent job scheduling algorithm, which receives real-time updates from
>> the microservices with their current state information".
>>
>>
>>
>> On Wed, Feb 1, 2017 at 11:48 PM, Vidya Sagar Kalvakunta <
>> vkalvaku@umail.iu.edu> wrote:
>>
>>
>>
>> On Wed, Feb 1, 2017 at 2:37 PM, Amila Jayasekara <th...@gmail.com>
>> wrote:
>>
>> Hi Gourav,
>>
>>
>>
>> Sorry, I did not understand your question. Specifically I am having
>> trouble relating "work load management" to options you suggest (RPC,
>> message based etc.).
>>
>> So what exactly you mean by "workload management" ?
>>
>> What is work in this context ?
>>
>>
>>
>> Also, I did not understand what you meant by "the most efficient way".
>> Efficient interms of what ? Are you looking at speed ?
>>
>>
>>
>> As per your suggestions, it seems you are trying to find a way to
>> communicate between micro services. RPC might be troublesome if you need to
>> communicate with processes separated from a firewall.
>>
>>
>>
>> Thanks
>>
>> -Thejaka
>>
>>
>>
>>
>>
>> On Wed, Feb 1, 2017 at 12:52 PM, Shenoy, Gourav Ganesh <
>> goshenoy@indiana.edu> wrote:
>>
>> Hello dev, arch,
>>
>>
>>
>> As part of this Spring’17 Advanced Science Gateway Architecture course,
>> we are working on trying to debate and find possible solutions to the issue
>> of managing distributed workloads in Apache Airavata. This leads to the
>> discussion of finding the most efficient way that different Airavata
>> micro-services should communicate and distribute work, in such a way that:
>>
>> 1.       We maintain the ability to scale these micro-services whenever
>> needed (autoscale perhaps?).
>>
>> 2.       Achieve fault tolerance.
>>
>> 3.       We can deploy these micro-services independently, or better in
>> a containerized manner – keeping in mind the ability to use devops for
>> deployment.
>>
>>
>>
>> As of now the options we are exploring are:
>>
>> 1.       RPC based communication
>>
>> 2.       Message based – either master-worker, or work-queue, etc
>>
>> 3.       A combination of both these approaches
>>
>>
>>
>> I am more inclined towards exploring the message based approach, but
>> again there arises the possibility of handling limitations/corner cases of
>> message broker such as downtimes (may be more). In my opinion, having
>> asynchronous communication will help us achieve most of the above-mentioned
>> points. Another debatable issue is making the micro-services implementation
>> stateless, such that we do not have to pass the state information between
>> micro-services.
>>
>>
>>
>> I would love to hear any thoughts/suggestions/comments on this topic and
>> open up a discussion via this mail thread. If there is anything that I have
>> missed which is relevant to this issue, please let me know.
>>
>>
>>
>> Thanks and Regards,
>>
>> Gourav Shenoy
>>
>>
>>
>>
>>
>> Hi Gourav,
>>
>>
>>
>> Correct me if I'm wrong, but I think this is a case of the job shop
>> scheduling problem, as we may have 'n' jobs of varying processing times
>> and memory requirements, and we have 'm' microservices with possibly
>> different computing and memory capacities, and we are trying to minimize
>> the makespan <https://en.wikipedia.org/wiki/Makespan>.
>>
>>
>>
>> For this use-case, I'm in favor a highly available and consistent message
>> broker with an intelligent job scheduling algorithm, which receives
>> real-time updates from the microservices with their current state
>> information.
>>
>>
>>
>> As for the state vs stateless implementation, I think that question
>> depends on the functionality of a particular microservice. In a broad
>> sense, the stateless implementation should be preferred as it will scale
>> better horizontally.
>>
>>
>>
>>
>>
>> Regards,
>>
>> Vidya Sagar
>>
>>
>>
>>
>> --
>>
>> Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of
>> Informatics and Computing | Indiana University Bloomington | (812)
>> 691-5002 <8126915002> | vkalvaku@iu.edu
>>
>>
>>
>>
>>
>> --
>>
>> Thanks and regards,
>>
>>
>>
>> Ajinkya Dhamnaskar
>>
>> Student ID : 0003469679
>>
>> Masters (CS)
>>
>> +1 (812) 369- 5416 <(812)%20369-5416>
>>
>>
>>
>>
>>
>> --
>>
>> Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of
>> Informatics and Computing | Indiana University Bloomington | (812)
>> 691-5002 <8126915002> | vkalvaku@iu.edu
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Thank you
>> Supun Nakandala
>> Dept. Computer Science and Engineering
>> University of Moratuwa
>>
>>
>>
>>
>>
>> --
>>
>> Thank you
>> Supun Nakandala
>> Dept. Computer Science and Engineering
>> University of Moratuwa
>>
>
>


-- 
Thanks and regards,

Ajinkya Dhamnaskar
Student ID : 0003469679
Masters (CS)
+1 (812) 369- 5416

Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Posted by Ameya Advankar <aa...@umail.iu.edu>.
Hi,

The proposed design seems like a feasible solution for workload
distribution.
Some queries which I had are as follows -

*1.* The diagram depicts services would be deployed as independent jars
bundled in a war to a worker (based off "WAR" in the diagram). So I am
assuming in case we have 3 micro-services, there would be jar1, jar2, jar3
bundled inside war.

Now these services are independent and would be worked on separately with
probably separate releases.
But, having a single deploy-able war may lead to all services getting
re-deployed on a worker node for just a single service upgrade.
Ideally, an incremental build of Service 1 should only push Service 1 code
to the worker.

So probably a separate CI/CD for each component with its own deploy-able
jar instead of a single war would to be a better approach?


*2.* As per my understanding of the design so far, a Worker is a collection
of implementations i.e. A,B,C,D,etc and the Workers would be scaled
horizontally as needed.
What I would like to clarify is that whether 1 worker would necessarily
have just* 1 implementation of each service* or *could have
nx-implementations of  mx-services*.
A probable scaling issue I see with the former implementation, if it is
what is intended, is that in case only service x needs to be scaled up n
times, then it will have to be achieved by scaling the worker n times, but
it will lead to all the other services being scaled up too. I am not sure
how crucial resources/space are, but if they are, then this strategy might
not be optimal.
The latter implementation, which allows flexibility, would be favorable I
believe.


Thanks & Regards,
Ameya Advankar

On Wed, Feb 8, 2017 at 9:59 PM, Shenoy, Gourav Ganesh <go...@indiana.edu>
wrote:

> Hi All,
>
>
>
> As I mentioned before, here is the design we have kind of reached a
> consensus on (please do provide comments/suggestions). This idea has been
> motivated from an understanding of the Aurora/Mesos architecture, and how
> they function.
>
>
>
>
>
> This design has the following benefits:
>
> -          Loosely coupled, independent micro-services.
>
> -          Inherently scalable in nature.
>
> -          Highly available, and consistent architecture.
>
> -          Supports incremental upgrade, without the risk of breaking any
> existing implementation while doing so.
>
> -          Ability to add/remove tasks in a DAG, and also add new task
> implementations (abstraction).
>
> -          Custom scheduler provides us greater flexibility (see below).
>
>
>
> We have the orchestrator (will eventually be HA using zookeeper), which
> will centrally maintain the state of an experiment – in short the status of
> the tasks it composes. Based on the type of job request, it will fetch the
> task execution DAG – this DAG will be made pre-available to the
> orchestrator via a graph database (debatable), and this DAG is nothing but
> a definition of sequence of tasks needed for that experiment (not the
> implementation of tasks).
>
>
>
> There is a scheduler which will receive a task execution request from the
> orchestrator, and *decide* which worker will be executing it. each worker
> here will be analogous to the current Airavata GFAC module which executes
> the task. We can think of the worker to be a collection of implementations
> of different tasks. Eg: W1, W2, W3 in figure above will have code to
> execute tasks A, B, C, D.
>
>
>
> There are 2 concerns which arise here:
>
> -          How does the scheduler know/decide which worker to pass on the
> task execution to?
>
> -          How do we upgrade a worker, say with a new task ‘E’
> implementation, in such a manner that if something goes wrong with code for
> ‘E’, the entire worker node should not fail? In short, avoid regression
> testing the entire worker module.
>
>
>
> To address the first problem, I suggest we use a paradigm similar to how
> Aurora agents (workers) report available capabilities to the Aurora master
> (scheduler). In Aurora, the slave nodes constantly report back to the
> master how much processing power they have; and accordingly, the master
> decides which slave to pass a new job request to. In our case, we can have
> the workers advertise to the scheduler which tasks they are capable of
> executing and the scheduler acts accordingly.
>
>
>
> To address the second concern, I suggest we have the task implementations
> bundled in separate JARs, so that if there is a problem with one task the
> others don’t get affected and can be “repaired” without impacting other
> existing tasks impls. There might be better ways to do this, but this is
> what I could think of right now.
>
>
>
> As mentioned before, adding a new task implementation – which will need
> upgrades to all workers will be easy and hassle-free as each worker will
> report back to the scheduler their capability to handle that new task, as
> and when upgrade finishes (incremental upgrade). Having a custom scheduler
> also provides us other benefits such as:
>
> -          Handling corner cases – eg: task execution on one worker fails
> (for some unforeseen reason), then the scheduler can retry it on a
> different worker.
>
> -          Prioritize experiments – scheduler higher priority experiments
> before normal priority ones (I just made this one up).
>
>
>
> We have decided to go ahead and start building a prototype of this design
> starting tomorrow, unless there are any concerns/issues. Please do let me
> know your views on this approach, as every concern helps us better our
> design.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
>
>
> *From: *"Shenoy, Gourav Ganesh" <go...@indiana.edu>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Wednesday, February 8, 2017 at 7:06 PM
>
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
>
> Hi Amruta,
>
>
>
> Thanks for providing your inputs, and yes in fact we had started out our
> design discussions with a decentralized framework in mind. But then we
> considered the problem of making each micro-service independent of each
> other and more importantly not making them aware of what the DAG is. For
> this reason, we decided to push and maintain the DAG at a centralized &
> highly available place (the orchestrator), giving us more control and
> flexibility in adding/removing tasks from the DAG. This also provides us
> with the ability to scale each service when needed and also perform
> incremental upgrades via devops.
>
>
>
> Do let me know if I make sense, or if there is something I am missing. I
> would also like to add that we have today nearly come to a consensus on a
> “fairly good” design – which I will be detailing in another email shortly.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *"Kamat, Amruta Ravalnath" <ar...@indiana.edu>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Wednesday, February 8, 2017 at 2:59 AM
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
>
> Hello Gourav,
>
>
>
> I agree with your solution, but I just came across a decentralized
> architecture which might serve our purpose and might provide a looser
> coupling.
>
>
>
> Having a common workflow would mean a centralized orchestrator i.e. a
> process which coordinates with multiple services to complete a larger
> workflow. The services have no knowledge of the workflow or their specific
> involvement in it. The orchestrator takes care of the complexities.
> However, The challenge with an orchestrator is that business logic will
> build up in a central place.
>
> If there is a central shared instance of the orchestrator for all
> requests, then the orchestrator is a single point of failure. If it goes
> down, all processing stops.
>
>
>
> With decentralized interactions, each service takes full responsibility
> for its role in the greater workflow. It will listen for events from other
> services, complete it's work as soon as possible, retry if a failure occurs
> and send out events upon completion. Here, communications tend to be
> asynchronous and business logic stays within the related services.
> Instead of having a central orchestrator that controls the logic of what
> steps happen when, that logic is built into each service ahead of time. The
> services know what to react to and how, ahead of time. Multiple services
> can consume the same events, do some processing, and then produce their own
> events back into the event stream, all at the same time. The event stream
> does not have any logic and is intended to be a dumb pipe.
>
>
>
> ​Decentralized interactions meet our requirements better: loose coupling,
> high cohesion and each service responsible for it's own bounded context.
>
>
>
> Thanks
>
> Amruta Kamat
> ------------------------------
>
> *From:* Shenoy, Gourav Ganesh <go...@indiana.edu>
> *Sent:* Tuesday, February 7, 2017 11:49 PM
> *To:* dev@airavata.apache.org
> *Subject:* Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
>
> Supun,
>
>
>
> Thank you for this excellent explanation. I see that the architecture you
> mentioned covers most of the concerns we discussed in this thread and in
> class. I just had one clarifying question though – what does “worker”
> signify here? Is it a generic task execution framework which runs the DAG?
> Or is it a like a platform where the DAG runs (and how?).
>
>
>
> Apart from that, I am looking at Storm’s architecture to see if we can get
> some clues as they are tackling a similar problem. I shall update once I
> get some concrete answer.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *Supun Nakandala <su...@gmail.com>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Tuesday, February 7, 2017 at 5:47 PM
> *To: *dev <de...@airavata.apache.org>
> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
>
> Hi Gourav,I agree with your idea of using one “workflow micro-service”
> which would basically be the mediator/orchestrator for deciding which
> micro-service should be executed next. But I think these components do not
> necessarily have to be micro-services but rather conforms to the
> master-worker paradigm in some sense. But the trick here is how can we
> implement a scalable, fault tolerant system to do distributed workload
> management and from CAP theorem what is the property that we are going to
> compromise.
>
>
>
> I think you are heading in the right direction. But I would like to add
> more details to your solution. Please note that I haven't evaluated these
> ideas 100%. Perhaps we can talk more about this in the next class.
>
>
>
> As you have done, I think we should centralize the state information into
> one component (orchestrator in our case). From my experience, it is very
> hard to achieve consistency in a distributed state setting in the events of
> failure.
>
>
>
> Second, to maintain generalizability in Airavata I think we should treat
> each application/use-cases as a DAG of execution. For example, HPC job and
> a cloud job will have two different DAGs which consists of tasks (data
> staging, job submission, out staging etc). These tasks should be short
> tasks and should roughly have the same execution time. And having
> idempotent tasks is preferable.
>
>
>
> Orchestrator is responsible for executing the DAG and assign tasks to the
> workers(how? will follow) based on the control dependencies in the DAG
> tasks. In addition to the dependencies generated from tasks I see, there
> can be other dependencies to things like monitoring and scheduling which
> the orchestrator has to make into account when executing the DAG.
>
>
>
> The next question is how we distribute jobs from Orchestrator to workers.
> I think here it is ok to compromise availability in favor of consistency. I
> suggest that we use the request/response messaging pattern which uses a
> persistent message broker (critical service). In this architecture, we can
> safely allow orchestrator or workers to fail without losing consistency
> (because of the persistent queue). But if the orchestrator fails then the
> availability will go down. One way to overcome this would be to come
> up with an orchestrator quorum.Attached figure summarizes my idea.
>
>
>
> I think we can also evaluate this solution with the concerns that Shameera
> pointed out such as can we enable cancel?. Once again it's just my idea and
> is open for argument and debate.
>
>
>
>
>
>
>
> [image: ine image 2]
>
>
>
> Thanks
>
> -Supun
>
>
>
>
>
>
>
> On Tue, Feb 7, 2017 at 10:54 AM, Shenoy, Gourav Ganesh <
> goshenoy@indiana.edu> wrote:
>
> Hi Supun,
>
>
>
> I agree, but may be for the example I mentioned, multiple micro-services
> might not sound necessary. I was trying to generalize towards a scenario
> where we have multiple independent micro-services (not necessarily for task
> execution). Again, I am not certain if this is the right architecture but
> yours (and other’s) inputs, will definitely help us narrow down on the
> different scenarios we need to exactly focus on. Do let me know if I make
> sense.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *Supun Nakandala <su...@gmail.com>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Monday, February 6, 2017 at 12:15 PM
> *To: *dev <de...@airavata.apache.org>
>
>
> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
>
> Hi Gourav,
>
>
>
> It is my belief that we don't need a separate microservice to each task. I
> favor a single micro service which can execute all tasks (or in other words
> a generic task execution micro service). Of course, we can have many of
> them when we want to scale. WDYT?
>
>
>
> On Sun, Feb 5, 2017 at 3:07 PM, Shenoy, Gourav Ganesh <
> goshenoy@indiana.edu> wrote:
>
> Hi dev,
>
>
>
> We were brainstorming some potential designs that might help us with this
> problem. One possible option would be to have a “workflow micro-service”
> which would basically be the mediator/orchestrator for deciding which
> micro-service should be executed next – based on the type of the job. The
> motive is to make micro-services independent of the workflow; i.e. a
> micro-service implementation should be not be aware of which micro-service
> will be executed next and we should have a central control of deciding this
> pattern.
>
> Eg: For job type X, the pattern could be A -> B -> C -> D. Whereas for job
> type Y, the pattern could be A -> C -> D; and so on.
>
>
>
> An initial design with this idea looks like follows:
>
>
>
>
>
> We would have a common messaging framework (implementation has not been
> decided yet). The database associated with the workflow micro-service could
> be a graph database (maybe?) – again the implementation/technology has not
> been decided yet.
>
>
>
> This is just a proposed design, and I would love to hear your thoughts on
> this and any suggestions/comments if any. If there is anything that we are
> missing or should consider, please do let us know.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *"Christie, Marcus Aaron" <ma...@iu.edu>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Friday, February 3, 2017 at 9:21 AM
>
>
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
>
> Vidya,
>
>
>
> I’m not sure how relevant it is, but it occurs to me that a microservice
> that executes jobs on a cloud requires very little in terms of resources to
> submit and monitor that job on the cloud. It doesn’t really matter if the
> job is a “big” or a “small” job.  So I’m not sure what heuristic makes
> sense regarding distributing work to these job execution microservices.
> Maybe a simple round robin approach would be sufficient.
>
>
>
> I think a job scheduling algorithm does make sense, however, for a higher
> level component, some sort of metascheduler that understands what resources
> are available on the cloud resources on which the jobs will be running.
> The metascheduler could create work for the job exection microservices to
> run on particular cloud resources in a way that optimizes for some metric
> (e.g., throughput).
>
>
>
> Thanks,
>
>
>
> Marcus
>
>
>
> On Feb 3, 2017, at 3:19 AM, Vidya Sagar Kalvakunta <vk...@umail.iu.edu>
> wrote:
>
>
>
> Ajinkya,
>
>
>
> My scenario is for workload distribution among multiple instances of the
> same microservice.
>
>
>
> If a message broker needs to distribute the available jobs among multiple
> workers, the common approach would be to use round robin or a similar
> algorithm. This approach works best when all the workers are similar and
> the jobs are equal.
>
>
>
> So I think that a genetic or heuristic job scheduling algorithm, which is
> also aware of each of the worker's current state (CPU, RAM, No of Jobs
> processing) can more efficiently distribute the jobs. The workers can
> periodically ping the message broker with their current state info.
>
>
>
> The other advantage of using a customized algorithm is that it can
> be tweaked to use embedded routing, priority or other information in the
> job metadata to resolve all of the concerns raised by Amrutha viz message
> grouping, ordering, repeated messages, etc.
>
>
>
> We can even ensure data privacy, i.e if the workers are spread across
> multiple compute clusters say AWS and IU Big Red and we want to restrict
> certain sensitive jobs to be run only on Big Red.
>
>
>
> Some distributed job scheduling algorithms for cloud computing.
>
>    - http://www.ijimai.org/journal/sites/default/files/files/2013
>    /03/ijimai20132_18_pdf_62825.pdf
>    <http://www.ijimai.org/journal/sites/default/files/files/2013/03/ijimai20132_18_pdf_62825.pdf>
>    - https://arxiv.org/pdf/1404.5528.pdf
>
>
>
>
>
> Regards
>
> Vidya Sagar
>
>
>
> On Fri, Feb 3, 2017 at 1:38 AM, Kamat, Amruta Ravalnath <
> arkamat@indiana.edu> wrote:
>
> Hello all,
>
>
>
> Adding more information to the message based approach. Messaging is a key
> strategy employed in many distributed environments. Message queuing is
> ideally suited to performing asynchronous operations. A sender can post a
> message to a queue, but it does not have to wait while the message is
> retrieved and processed. A sender and receiver do not even have to be
> running concurrently.
>
>
>
> With message queuing there can be 2 possible scenarios:
>
>    1. ​Sending and receiving messages using a * single message queue.*
>    2. ​*Sharing a message queue* between many senders and receivers
>
> ​When a message is retrieved, it is removed from the queue. A message
> queue may also support message peeking. This mechanism can be useful if
> several receivers are retrieving messages from the same queue, but each
> receiver only wishes to handle specific messages. The receiver can examine
> the message it has peeked, and decide whether to retrieve the message
> (which removes it from the queue) or leave it on the queue for another
> receiver to handle.
>
>
>
> A few basic message queuing patterns are:
>
>    1. *One-way messaging*: The sender simply posts a message to the queue
>    in the expectation that a receiver will retrieve it and process it at some
>    point.
>    2. *Request/response messaging*: In this pattern a sender posts a
>    message to a queue and expects a response from the receiver. The sender can
>    resend if the message is not delivered. This pattern typically requires
>    some form of correlation to enable the sender to determine which response
>    message corresponds to which request sent to the receiver.
>    3. *Broadcast messaging*: In this pattern a sender posts a message to
>    a queue, and multiple receivers can read a copy of the message. This
>    pattern depends on the message queue being able to disseminate the same
>    message to multiple receivers. There is a queue to which the senders can
>    post messages that include metadata in the form of attributes. Each
>    receiver can create a subscription to the queue, specifying a filter that
>    examines the values of message attributes. Any messages posted to the
>    queue with attribute values that match the filter are automatically
>    forwarded to that subscription.
>
> A solution based on asynchronous messaging might need to address a number
> of concerns:
>
>
>
> *Message ordering, Message grouping: *Process messages either in the
> order they are posted or in a specific order based on priority. Also, there
> may be occasions when it is difficult to eliminate dependencies, and it may
> be necessary to group messages together so that they are all handled by the
> same receiver.
> *Idempotency: *Ideally the message processing logic in a receiver should
> be idempotent so that, if the work performed is repeated, this repetition
> does not change the state of the system.
> *Repeated messages: *Some message queuing systems implement duplicate
> message detection and removal based on message IDs
> *Poison messages: *A poison message is a message that cannot be handled,
> often because it is malformed or contains unexpected information.
> *Message expiration: *A message might have a limited lifetime, and if it
> is not processed within this period it might no longer be relevant and
> should be discarded.
> *Message scheduling: *A message might be temporarily embargoed and should
> not be processed until a specific date and time. The message should not be
> available to a receiver until this time.
>
>
> Thanks
>
> Amruta Kamat
> ------------------------------
>
> *From:* Shenoy, Gourav Ganesh <go...@indiana.edu>
> *Sent:* Thursday, February 2, 2017 7:57 PM
> *To:* dev@airavata.apache.org
>
>
> *Subject:* Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
>
> Hello all,
>
>
>
> Amila, Sagar, thank you for the response and raising those concerns; and
> apologies because my email resonated the topic of workload management in
> terms of how micro-services communicate. As Ajinkya rightly mentioned,
> there exists some sort of correlation between micro-services communication
> and it’s impact on how that micro-service performs the work under those
> circumstances. The goal is to make sure we have maximum independence
> between micro-services, and investigate the workflow pattern in which these
> micro-services will operate such that we can find the right balance between
> availability & consistency. Again, from our preliminary analysis we can
> assert that these solutions may not be generic and the specific use-case
> will have a big decisive role.
>
>
>
> For starters, we are focusing on the following example – and I think this
> will clarify the doubts on what we are exactly trying to investigate about.
>
>
>
> *Our test example *
>
> Say we have the following 4 micro-services, which each perform a specific
> task as mentioned in the box.
>
>
>
> <image001.png>
>
>
>
>
>
> *A state-full pattern to distribute work*
>
> <image002.png>
>
>
>
> Here each communication between micro-services could be via RPC or
> Messaging (eg: RabbitMQ). Obvious disadvantage is that if any micro-service
> is down, then the system availability is at stake. In this test example, we
> can see that Microservice-A coordinates the work and maintains the state
> information.
>
>
>
> *A state-less pattern to distribute work*
>
>
>
> <image003.png>
>
>
>
> Another purely asynchronous approach would be to associate message-queues
> with each micro-service, where each micro-service performs it’s task,
> submits a request (message on bus) to the next micro-service, and continues
> to process more requests. This ensures more availability, and perhaps we
> might need to handle corner cases for failures such as message broker down,
> or message loss, etc.
>
>
>
> As mentioned, these are just a few proposals that we are planning to
> investigate via a prototype project. Inject corner cases/failures and try
> and find ways to handle these cases. I would love to hear more
> thoughts/questions/suggestions.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *Ajinkya Dhamnaskar <ad...@umail.iu.edu>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Thursday, February 2, 2017 at 2:22 AM
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
>
> Hello all,
>
>
>
> Just a heads up. Here the name Distributed workload management does not
> necessarily mean having different instances of a microservice and then
> distributing work among these instances.
>
>
>
> Apparently, the problem is how to make each microservice work
> independently with concrete distributed communication infrastructure. So,
> think of it as a workflow where each microservice does its part of work and
> communicates (how? yet to be decided) output. The next underlying
> microservice identifies and picks up that output and takes it further
> towards the final outcome, having said that, the crux here is, none of the
> miscoservices need to worry about other miscoservices in a pipeline.
>
>
>
> Vidya Sagar,
>
> I completely second your opinion of having stateless miscoservices, in
> fact that is the key. With stateless miscroservices it is difficult to
> guarantee consistency in a system but it solves the availability problem to
> some extent. I would be interested to understand what do you mean by "an
> intelligent job scheduling algorithm, which receives real-time updates from
> the microservices with their current state information".
>
>
>
> On Wed, Feb 1, 2017 at 11:48 PM, Vidya Sagar Kalvakunta <
> vkalvaku@umail.iu.edu> wrote:
>
>
>
> On Wed, Feb 1, 2017 at 2:37 PM, Amila Jayasekara <th...@gmail.com>
> wrote:
>
> Hi Gourav,
>
>
>
> Sorry, I did not understand your question. Specifically I am having
> trouble relating "work load management" to options you suggest (RPC,
> message based etc.).
>
> So what exactly you mean by "workload management" ?
>
> What is work in this context ?
>
>
>
> Also, I did not understand what you meant by "the most efficient way".
> Efficient interms of what ? Are you looking at speed ?
>
>
>
> As per your suggestions, it seems you are trying to find a way to
> communicate between micro services. RPC might be troublesome if you need to
> communicate with processes separated from a firewall.
>
>
>
> Thanks
>
> -Thejaka
>
>
>
>
>
> On Wed, Feb 1, 2017 at 12:52 PM, Shenoy, Gourav Ganesh <
> goshenoy@indiana.edu> wrote:
>
> Hello dev, arch,
>
>
>
> As part of this Spring’17 Advanced Science Gateway Architecture course, we
> are working on trying to debate and find possible solutions to the issue of
> managing distributed workloads in Apache Airavata. This leads to the
> discussion of finding the most efficient way that different Airavata
> micro-services should communicate and distribute work, in such a way that:
>
> 1.       We maintain the ability to scale these micro-services whenever
> needed (autoscale perhaps?).
>
> 2.       Achieve fault tolerance.
>
> 3.       We can deploy these micro-services independently, or better in a
> containerized manner – keeping in mind the ability to use devops for
> deployment.
>
>
>
> As of now the options we are exploring are:
>
> 1.       RPC based communication
>
> 2.       Message based – either master-worker, or work-queue, etc
>
> 3.       A combination of both these approaches
>
>
>
> I am more inclined towards exploring the message based approach, but again
> there arises the possibility of handling limitations/corner cases of
> message broker such as downtimes (may be more). In my opinion, having
> asynchronous communication will help us achieve most of the above-mentioned
> points. Another debatable issue is making the micro-services implementation
> stateless, such that we do not have to pass the state information between
> micro-services.
>
>
>
> I would love to hear any thoughts/suggestions/comments on this topic and
> open up a discussion via this mail thread. If there is anything that I have
> missed which is relevant to this issue, please let me know.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
>
>
> Hi Gourav,
>
>
>
> Correct me if I'm wrong, but I think this is a case of the job shop
> scheduling problem, as we may have 'n' jobs of varying processing times
> and memory requirements, and we have 'm' microservices with possibly
> different computing and memory capacities, and we are trying to minimize
> the makespan <https://en.wikipedia.org/wiki/Makespan>.
>
>
>
> For this use-case, I'm in favor a highly available and consistent message
> broker with an intelligent job scheduling algorithm, which receives
> real-time updates from the microservices with their current state
> information.
>
>
>
> As for the state vs stateless implementation, I think that question
> depends on the functionality of a particular microservice. In a broad
> sense, the stateless implementation should be preferred as it will scale
> better horizontally.
>
>
>
>
>
> Regards,
>
> Vidya Sagar
>
>
>
>
> --
>
> Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics
> and Computing | Indiana University Bloomington | (812) 691-5002
> <8126915002> | vkalvaku@iu.edu
>
>
>
>
>
> --
>
> Thanks and regards,
>
>
>
> Ajinkya Dhamnaskar
>
> Student ID : 0003469679
>
> Masters (CS)
>
> +1 (812) 369- 5416 <(812)%20369-5416>
>
>
>
>
>
> --
>
> Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics
> and Computing | Indiana University Bloomington | (812) 691-5002
> <8126915002> | vkalvaku@iu.edu
>
>
>
>
>
>
>
> --
>
> Thank you
> Supun Nakandala
> Dept. Computer Science and Engineering
> University of Moratuwa
>
>
>
>
>
> --
>
> Thank you
> Supun Nakandala
> Dept. Computer Science and Engineering
> University of Moratuwa
>

Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Posted by "Shenoy, Gourav Ganesh" <go...@indiana.edu>.
Hi All,

As I mentioned before, here is the design we have kind of reached a consensus on (please do provide comments/suggestions). This idea has been motivated from an understanding of the Aurora/Mesos architecture, and how they function.

[cid:image001.png@01D28256.A72BF580]

This design has the following benefits:

-          Loosely coupled, independent micro-services.

-          Inherently scalable in nature.

-          Highly available, and consistent architecture.

-          Supports incremental upgrade, without the risk of breaking any existing implementation while doing so.

-          Ability to add/remove tasks in a DAG, and also add new task implementations (abstraction).

-          Custom scheduler provides us greater flexibility (see below).

We have the orchestrator (will eventually be HA using zookeeper), which will centrally maintain the state of an experiment – in short the status of the tasks it composes. Based on the type of job request, it will fetch the task execution DAG – this DAG will be made pre-available to the orchestrator via a graph database (debatable), and this DAG is nothing but a definition of sequence of tasks needed for that experiment (not the implementation of tasks).

There is a scheduler which will receive a task execution request from the orchestrator, and decide which worker will be executing it. each worker here will be analogous to the current Airavata GFAC module which executes the task. We can think of the worker to be a collection of implementations of different tasks. Eg: W1, W2, W3 in figure above will have code to execute tasks A, B, C, D.

There are 2 concerns which arise here:

-          How does the scheduler know/decide which worker to pass on the task execution to?

-          How do we upgrade a worker, say with a new task ‘E’ implementation, in such a manner that if something goes wrong with code for ‘E’, the entire worker node should not fail? In short, avoid regression testing the entire worker module.

To address the first problem, I suggest we use a paradigm similar to how Aurora agents (workers) report available capabilities to the Aurora master (scheduler). In Aurora, the slave nodes constantly report back to the master how much processing power they have; and accordingly, the master decides which slave to pass a new job request to. In our case, we can have the workers advertise to the scheduler which tasks they are capable of executing and the scheduler acts accordingly.

To address the second concern, I suggest we have the task implementations bundled in separate JARs, so that if there is a problem with one task the others don’t get affected and can be “repaired” without impacting other existing tasks impls. There might be better ways to do this, but this is what I could think of right now.

As mentioned before, adding a new task implementation – which will need upgrades to all workers will be easy and hassle-free as each worker will report back to the scheduler their capability to handle that new task, as and when upgrade finishes (incremental upgrade). Having a custom scheduler also provides us other benefits such as:

-          Handling corner cases – eg: task execution on one worker fails (for some unforeseen reason), then the scheduler can retry it on a different worker.

-          Prioritize experiments – scheduler higher priority experiments before normal priority ones (I just made this one up).

We have decided to go ahead and start building a prototype of this design starting tomorrow, unless there are any concerns/issues. Please do let me know your views on this approach, as every concern helps us better our design.

Thanks and Regards,
Gourav Shenoy


From: "Shenoy, Gourav Ganesh" <go...@indiana.edu>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Wednesday, February 8, 2017 at 7:06 PM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hi Amruta,

Thanks for providing your inputs, and yes in fact we had started out our design discussions with a decentralized framework in mind. But then we considered the problem of making each micro-service independent of each other and more importantly not making them aware of what the DAG is. For this reason, we decided to push and maintain the DAG at a centralized & highly available place (the orchestrator), giving us more control and flexibility in adding/removing tasks from the DAG. This also provides us with the ability to scale each service when needed and also perform incremental upgrades via devops.

Do let me know if I make sense, or if there is something I am missing. I would also like to add that we have today nearly come to a consensus on a “fairly good” design – which I will be detailing in another email shortly.

Thanks and Regards,
Gourav Shenoy

From: "Kamat, Amruta Ravalnath" <ar...@indiana.edu>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Wednesday, February 8, 2017 at 2:59 AM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata


Hello Gourav,



I agree with your solution, but I just came across a decentralized architecture which might serve our purpose and might provide a looser coupling.



Having a common workflow would mean a centralized orchestrator i.e. a process which coordinates with multiple services to complete a larger workflow. The services have no knowledge of the workflow or their specific involvement in it. The orchestrator takes care of the complexities. However, The challenge with an orchestrator is that business logic will build up in a central place.
If there is a central shared instance of the orchestrator for all requests, then the orchestrator is a single point of failure. If it goes down, all processing stops.

With decentralized interactions, each service takes full responsibility for its role in the greater workflow. It will listen for events from other services, complete it's work as soon as possible, retry if a failure occurs and send out events upon completion. Here, communications tend to be asynchronous and business logic stays within the related services.
Instead of having a central orchestrator that controls the logic of what steps happen when, that logic is built into each service ahead of time. The services know what to react to and how, ahead of time. Multiple services can consume the same events, do some processing, and then produce their own events back into the event stream, all at the same time. The event stream does not have any logic and is intended to be a dumb pipe.


​Decentralized interactions meet our requirements better: loose coupling, high cohesion and each service responsible for it's own bounded context.

Thanks
Amruta Kamat
________________________________
From: Shenoy, Gourav Ganesh <go...@indiana.edu>
Sent: Tuesday, February 7, 2017 11:49 PM
To: dev@airavata.apache.org
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Supun,

Thank you for this excellent explanation. I see that the architecture you mentioned covers most of the concerns we discussed in this thread and in class. I just had one clarifying question though – what does “worker” signify here? Is it a generic task execution framework which runs the DAG? Or is it a like a platform where the DAG runs (and how?).

Apart from that, I am looking at Storm’s architecture to see if we can get some clues as they are tackling a similar problem. I shall update once I get some concrete answer.

Thanks and Regards,
Gourav Shenoy

From: Supun Nakandala <su...@gmail.com>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Tuesday, February 7, 2017 at 5:47 PM
To: dev <de...@airavata.apache.org>
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hi Gourav,I agree with your idea of using one “workflow micro-service” which would basically be the mediator/orchestrator for deciding which micro-service should be executed next. But I think these components do not necessarily have to be micro-services but rather conforms to the master-worker paradigm in some sense. But the trick here is how can we implement a scalable, fault tolerant system to do distributed workload management and from CAP theorem what is the property that we are going to compromise.

I think you are heading in the right direction. But I would like to add more details to your solution. Please note that I haven't evaluated these ideas 100%. Perhaps we can talk more about this in the next class.

As you have done, I think we should centralize the state information into one component (orchestrator in our case). From my experience, it is very hard to achieve consistency in a distributed state setting in the events of failure.

Second, to maintain generalizability in Airavata I think we should treat each application/use-cases as a DAG of execution. For example, HPC job and a cloud job will have two different DAGs which consists of tasks (data staging, job submission, out staging etc). These tasks should be short tasks and should roughly have the same execution time. And having idempotent tasks is preferable.

Orchestrator is responsible for executing the DAG and assign tasks to the workers(how? will follow) based on the control dependencies in the DAG tasks. In addition to the dependencies generated from tasks I see, there can be other dependencies to things like monitoring and scheduling which the orchestrator has to make into account when executing the DAG.

The next question is how we distribute jobs from Orchestrator to workers. I think here it is ok to compromise availability in favor of consistency. I suggest that we use the request/response messaging pattern which uses a persistent message broker (critical service). In this architecture, we can safely allow orchestrator or workers to fail without losing consistency (because of the persistent queue). But if the orchestrator fails then the availability will go down. One way to overcome this would be to come up with an orchestrator quorum.Attached figure summarizes my idea.

I think we can also evaluate this solution with the concerns that Shameera pointed out such as can we enable cancel?. Once again it's just my idea and is open for argument and debate.



[ine image 2]

Thanks
-Supun



On Tue, Feb 7, 2017 at 10:54 AM, Shenoy, Gourav Ganesh <go...@indiana.edu>> wrote:
Hi Supun,

I agree, but may be for the example I mentioned, multiple micro-services might not sound necessary. I was trying to generalize towards a scenario where we have multiple independent micro-services (not necessarily for task execution). Again, I am not certain if this is the right architecture but yours (and other’s) inputs, will definitely help us narrow down on the different scenarios we need to exactly focus on. Do let me know if I make sense.

Thanks and Regards,
Gourav Shenoy

From: Supun Nakandala <su...@gmail.com>>
Reply-To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Date: Monday, February 6, 2017 at 12:15 PM
To: dev <de...@airavata.apache.org>>

Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hi Gourav,

It is my belief that we don't need a separate microservice to each task. I favor a single micro service which can execute all tasks (or in other words a generic task execution micro service). Of course, we can have many of them when we want to scale. WDYT?

On Sun, Feb 5, 2017 at 3:07 PM, Shenoy, Gourav Ganesh <go...@indiana.edu>> wrote:
Hi dev,

We were brainstorming some potential designs that might help us with this problem. One possible option would be to have a “workflow micro-service” which would basically be the mediator/orchestrator for deciding which micro-service should be executed next – based on the type of the job. The motive is to make micro-services independent of the workflow; i.e. a micro-service implementation should be not be aware of which micro-service will be executed next and we should have a central control of deciding this pattern.
Eg: For job type X, the pattern could be A -> B -> C -> D. Whereas for job type Y, the pattern could be A -> C -> D; and so on.

An initial design with this idea looks like follows:
[cid:image003.png@01D28256.A72BF580]


We would have a common messaging framework (implementation has not been decided yet). The database associated with the workflow micro-service could be a graph database (maybe?) – again the implementation/technology has not been decided yet.

This is just a proposed design, and I would love to hear your thoughts on this and any suggestions/comments if any. If there is anything that we are missing or should consider, please do let us know.

Thanks and Regards,
Gourav Shenoy

From: "Christie, Marcus Aaron" <ma...@iu.edu>>
Reply-To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Date: Friday, February 3, 2017 at 9:21 AM

To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Vidya,

I’m not sure how relevant it is, but it occurs to me that a microservice that executes jobs on a cloud requires very little in terms of resources to submit and monitor that job on the cloud. It doesn’t really matter if the job is a “big” or a “small” job.  So I’m not sure what heuristic makes sense regarding distributing work to these job execution microservices.  Maybe a simple round robin approach would be sufficient.

I think a job scheduling algorithm does make sense, however, for a higher level component, some sort of metascheduler that understands what resources are available on the cloud resources on which the jobs will be running.  The metascheduler could create work for the job exection microservices to run on particular cloud resources in a way that optimizes for some metric (e.g., throughput).

Thanks,

Marcus

On Feb 3, 2017, at 3:19 AM, Vidya Sagar Kalvakunta <vk...@umail.iu.edu>> wrote:

Ajinkya,

My scenario is for workload distribution among multiple instances of the same microservice.

If a message broker needs to distribute the available jobs among multiple workers, the common approach would be to use round robin or a similar algorithm. This approach works best when all the workers are similar and the jobs are equal.

So I think that a genetic or heuristic job scheduling algorithm, which is also aware of each of the worker's current state (CPU, RAM, No of Jobs processing) can more efficiently distribute the jobs. The workers can periodically ping the message broker with their current state info.

The other advantage of using a customized algorithm is that it can be tweaked to use embedded routing, priority or other information in the job metadata to resolve all of the concerns raised by Amrutha viz message grouping, ordering, repeated messages, etc.

We can even ensure data privacy, i.e if the workers are spread across multiple compute clusters say AWS and IU Big Red and we want to restrict certain sensitive jobs to be run only on Big Red.

Some distributed job scheduling algorithms for cloud computing.

  *   http://www.ijimai.org/journal/sites/default/files/files/2013/03/ijimai20132_18_pdf_62825.pdf
  *   https://arxiv.org/pdf/1404.5528.pdf


Regards
Vidya Sagar

On Fri, Feb 3, 2017 at 1:38 AM, Kamat, Amruta Ravalnath <ar...@indiana.edu>> wrote:
Hello all,

Adding more information to the message based approach. Messaging is a key strategy employed in many distributed environments. Message queuing is ideally suited to performing asynchronous operations. A sender can post a message to a queue, but it does not have to wait while the message is retrieved and processed. A sender and receiver do not even have to be running concurrently.

With message queuing there can be 2 possible scenarios:

  1.  ​Sending and receiving messages using a single message queue.
  2.  ​Sharing a message queue between many senders and receivers
​When a message is retrieved, it is removed from the queue. A message queue may also support message peeking. This mechanism can be useful if several receivers are retrieving messages from the same queue, but each receiver only wishes to handle specific messages. The receiver can examine the message it has peeked, and decide whether to retrieve the message (which removes it from the queue) or leave it on the queue for another receiver to handle.

A few basic message queuing patterns are:

  1.  One-way messaging: The sender simply posts a message to the queue in the expectation that a receiver will retrieve it and process it at some point.
  2.  Request/response messaging: In this pattern a sender posts a message to a queue and expects a response from the receiver. The sender can resend if the message is not delivered. This pattern typically requires some form of correlation to enable the sender to determine which response message corresponds to which request sent to the receiver.
  3.  Broadcast messaging: In this pattern a sender posts a message to a queue, and multiple receivers can read a copy of the message. This pattern depends on the message queue being able to disseminate the same message to multiple receivers. There is a queue to which the senders can post messages that include metadata in the form of attributes. Each receiver can create a subscription to the queue, specifying a filter that examines the values of message attributes. Any messages posted to the queue with attribute values that match the filter are automatically forwarded to that subscription.
A solution based on asynchronous messaging might need to address a number of concerns:

Message ordering, Message grouping: Process messages either in the order they are posted or in a specific order based on priority. Also, there may be occasions when it is difficult to eliminate dependencies, and it may be necessary to group messages together so that they are all handled by the same receiver.
Idempotency: Ideally the message processing logic in a receiver should be idempotent so that, if the work performed is repeated, this repetition does not change the state of the system.
Repeated messages: Some message queuing systems implement duplicate message detection and removal based on message IDs
Poison messages: A poison message is a message that cannot be handled, often because it is malformed or contains unexpected information.
Message expiration: A message might have a limited lifetime, and if it is not processed within this period it might no longer be relevant and should be discarded.
Message scheduling: A message might be temporarily embargoed and should not be processed until a specific date and time. The message should not be available to a receiver until this time.


Thanks
Amruta Kamat
________________________________
From: Shenoy, Gourav Ganesh <go...@indiana.edu>>
Sent: Thursday, February 2, 2017 7:57 PM
To: dev@airavata.apache.org<ma...@airavata.apache.org>

Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hello all,

Amila, Sagar, thank you for the response and raising those concerns; and apologies because my email resonated the topic of workload management in terms of how micro-services communicate. As Ajinkya rightly mentioned, there exists some sort of correlation between micro-services communication and it’s impact on how that micro-service performs the work under those circumstances. The goal is to make sure we have maximum independence between micro-services, and investigate the workflow pattern in which these micro-services will operate such that we can find the right balance between availability & consistency. Again, from our preliminary analysis we can assert that these solutions may not be generic and the specific use-case will have a big decisive role.

For starters, we are focusing on the following example – and I think this will clarify the doubts on what we are exactly trying to investigate about.

Our test example
Say we have the following 4 micro-services, which each perform a specific task as mentioned in the box.

<image001.png>


A state-full pattern to distribute work
<image002.png>

Here each communication between micro-services could be via RPC or Messaging (eg: RabbitMQ). Obvious disadvantage is that if any micro-service is down, then the system availability is at stake. In this test example, we can see that Microservice-A coordinates the work and maintains the state information.

A state-less pattern to distribute work

<image003.png>

Another purely asynchronous approach would be to associate message-queues with each micro-service, where each micro-service performs it’s task, submits a request (message on bus) to the next micro-service, and continues to process more requests. This ensures more availability, and perhaps we might need to handle corner cases for failures such as message broker down, or message loss, etc.

As mentioned, these are just a few proposals that we are planning to investigate via a prototype project. Inject corner cases/failures and try and find ways to handle these cases. I would love to hear more thoughts/questions/suggestions.

Thanks and Regards,
Gourav Shenoy

From: Ajinkya Dhamnaskar <ad...@umail.iu.edu>>
Reply-To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Date: Thursday, February 2, 2017 at 2:22 AM
To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hello all,

Just a heads up. Here the name Distributed workload management does not necessarily mean having different instances of a microservice and then distributing work among these instances.

Apparently, the problem is how to make each microservice work independently with concrete distributed communication infrastructure. So, think of it as a workflow where each microservice does its part of work and communicates (how? yet to be decided) output. The next underlying microservice identifies and picks up that output and takes it further towards the final outcome, having said that, the crux here is, none of the miscoservices need to worry about other miscoservices in a pipeline.

Vidya Sagar,
I completely second your opinion of having stateless miscoservices, in fact that is the key. With stateless miscroservices it is difficult to guarantee consistency in a system but it solves the availability problem to some extent. I would be interested to understand what do you mean by "an intelligent job scheduling algorithm, which receives real-time updates from the microservices with their current state information".

On Wed, Feb 1, 2017 at 11:48 PM, Vidya Sagar Kalvakunta <vk...@umail.iu.edu>> wrote:

On Wed, Feb 1, 2017 at 2:37 PM, Amila Jayasekara <th...@gmail.com>> wrote:
Hi Gourav,

Sorry, I did not understand your question. Specifically I am having trouble relating "work load management" to options you suggest (RPC, message based etc.).
So what exactly you mean by "workload management" ?
What is work in this context ?

Also, I did not understand what you meant by "the most efficient way". Efficient interms of what ? Are you looking at speed ?

As per your suggestions, it seems you are trying to find a way to communicate between micro services. RPC might be troublesome if you need to communicate with processes separated from a firewall.

Thanks
-Thejaka


On Wed, Feb 1, 2017 at 12:52 PM, Shenoy, Gourav Ganesh <go...@indiana.edu>> wrote:
Hello dev, arch,

As part of this Spring’17 Advanced Science Gateway Architecture course, we are working on trying to debate and find possible solutions to the issue of managing distributed workloads in Apache Airavata. This leads to the discussion of finding the most efficient way that different Airavata micro-services should communicate and distribute work, in such a way that:

1.       We maintain the ability to scale these micro-services whenever needed (autoscale perhaps?).

2.       Achieve fault tolerance.

3.       We can deploy these micro-services independently, or better in a containerized manner – keeping in mind the ability to use devops for deployment.

As of now the options we are exploring are:

1.       RPC based communication

2.       Message based – either master-worker, or work-queue, etc

3.       A combination of both these approaches

I am more inclined towards exploring the message based approach, but again there arises the possibility of handling limitations/corner cases of message broker such as downtimes (may be more). In my opinion, having asynchronous communication will help us achieve most of the above-mentioned points. Another debatable issue is making the micro-services implementation stateless, such that we do not have to pass the state information between micro-services.

I would love to hear any thoughts/suggestions/comments on this topic and open up a discussion via this mail thread. If there is anything that I have missed which is relevant to this issue, please let me know.

Thanks and Regards,
Gourav Shenoy


Hi Gourav,

Correct me if I'm wrong, but I think this is a case of the job shop scheduling problem, as we may have 'n' jobs of varying processing times and memory requirements, and we have 'm' microservices with possibly different computing and memory capacities, and we are trying to minimize the makespan<https://en.wikipedia.org/wiki/Makespan>.

For this use-case, I'm in favor a highly available and consistent message broker with an intelligent job scheduling algorithm, which receives real-time updates from the microservices with their current state information.

As for the state vs stateless implementation, I think that question depends on the functionality of a particular microservice. In a broad sense, the stateless implementation should be preferred as it will scale better horizontally.


Regards,
Vidya Sagar


--
Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics and Computing | Indiana University Bloomington | (812) 691-5002<tel:8126915002> | vkalvaku@iu.edu<ma...@iu.edu>



--
Thanks and regards,

Ajinkya Dhamnaskar
Student ID : 0003469679
Masters (CS)
+1 (812) 369- 5416<tel:(812)%20369-5416>



--
Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics and Computing | Indiana University Bloomington | (812) 691-5002<tel:8126915002> | vkalvaku@iu.edu<ma...@iu.edu>




--
Thank you
Supun Nakandala
Dept. Computer Science and Engineering
University of Moratuwa



--
Thank you
Supun Nakandala
Dept. Computer Science and Engineering
University of Moratuwa

Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Posted by "Shenoy, Gourav Ganesh" <go...@indiana.edu>.
Hi Amruta,

Thanks for providing your inputs, and yes in fact we had started out our design discussions with a decentralized framework in mind. But then we considered the problem of making each micro-service independent of each other and more importantly not making them aware of what the DAG is. For this reason, we decided to push and maintain the DAG at a centralized & highly available place (the orchestrator), giving us more control and flexibility in adding/removing tasks from the DAG. This also provides us with the ability to scale each service when needed and also perform incremental upgrades via devops.

Do let me know if I make sense, or if there is something I am missing. I would also like to add that we have today nearly come to a consensus on a “fairly good” design – which I will be detailing in another email shortly.

Thanks and Regards,
Gourav Shenoy

From: "Kamat, Amruta Ravalnath" <ar...@indiana.edu>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Wednesday, February 8, 2017 at 2:59 AM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata


Hello Gourav,



I agree with your solution, but I just came across a decentralized architecture which might serve our purpose and might provide a looser coupling.



Having a common workflow would mean a centralized orchestrator i.e. a process which coordinates with multiple services to complete a larger workflow. The services have no knowledge of the workflow or their specific involvement in it. The orchestrator takes care of the complexities. However, The challenge with an orchestrator is that business logic will build up in a central place.
If there is a central shared instance of the orchestrator for all requests, then the orchestrator is a single point of failure. If it goes down, all processing stops.

With decentralized interactions, each service takes full responsibility for its role in the greater workflow. It will listen for events from other services, complete it's work as soon as possible, retry if a failure occurs and send out events upon completion. Here, communications tend to be asynchronous and business logic stays within the related services.
Instead of having a central orchestrator that controls the logic of what steps happen when, that logic is built into each service ahead of time. The services know what to react to and how, ahead of time. Multiple services can consume the same events, do some processing, and then produce their own events back into the event stream, all at the same time. The event stream does not have any logic and is intended to be a dumb pipe.


​Decentralized interactions meet our requirements better: loose coupling, high cohesion and each service responsible for it's own bounded context.

Thanks
Amruta Kamat
________________________________
From: Shenoy, Gourav Ganesh <go...@indiana.edu>
Sent: Tuesday, February 7, 2017 11:49 PM
To: dev@airavata.apache.org
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Supun,

Thank you for this excellent explanation. I see that the architecture you mentioned covers most of the concerns we discussed in this thread and in class. I just had one clarifying question though – what does “worker” signify here? Is it a generic task execution framework which runs the DAG? Or is it a like a platform where the DAG runs (and how?).

Apart from that, I am looking at Storm’s architecture to see if we can get some clues as they are tackling a similar problem. I shall update once I get some concrete answer.

Thanks and Regards,
Gourav Shenoy

From: Supun Nakandala <su...@gmail.com>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Tuesday, February 7, 2017 at 5:47 PM
To: dev <de...@airavata.apache.org>
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hi Gourav,I agree with your idea of using one “workflow micro-service” which would basically be the mediator/orchestrator for deciding which micro-service should be executed next. But I think these components do not necessarily have to be micro-services but rather conforms to the master-worker paradigm in some sense. But the trick here is how can we implement a scalable, fault tolerant system to do distributed workload management and from CAP theorem what is the property that we are going to compromise.

I think you are heading in the right direction. But I would like to add more details to your solution. Please note that I haven't evaluated these ideas 100%. Perhaps we can talk more about this in the next class.

As you have done, I think we should centralize the state information into one component (orchestrator in our case). From my experience, it is very hard to achieve consistency in a distributed state setting in the events of failure.

Second, to maintain generalizability in Airavata I think we should treat each application/use-cases as a DAG of execution. For example, HPC job and a cloud job will have two different DAGs which consists of tasks (data staging, job submission, out staging etc). These tasks should be short tasks and should roughly have the same execution time. And having idempotent tasks is preferable.

Orchestrator is responsible for executing the DAG and assign tasks to the workers(how? will follow) based on the control dependencies in the DAG tasks. In addition to the dependencies generated from tasks I see, there can be other dependencies to things like monitoring and scheduling which the orchestrator has to make into account when executing the DAG.

The next question is how we distribute jobs from Orchestrator to workers. I think here it is ok to compromise availability in favor of consistency. I suggest that we use the request/response messaging pattern which uses a persistent message broker (critical service). In this architecture, we can safely allow orchestrator or workers to fail without losing consistency (because of the persistent queue). But if the orchestrator fails then the availability will go down. One way to overcome this would be to come up with an orchestrator quorum.Attached figure summarizes my idea.

I think we can also evaluate this solution with the concerns that Shameera pointed out such as can we enable cancel?. Once again it's just my idea and is open for argument and debate.



[line image 2]

Thanks
-Supun



On Tue, Feb 7, 2017 at 10:54 AM, Shenoy, Gourav Ganesh <go...@indiana.edu>> wrote:
Hi Supun,

I agree, but may be for the example I mentioned, multiple micro-services might not sound necessary. I was trying to generalize towards a scenario where we have multiple independent micro-services (not necessarily for task execution). Again, I am not certain if this is the right architecture but yours (and other’s) inputs, will definitely help us narrow down on the different scenarios we need to exactly focus on. Do let me know if I make sense.

Thanks and Regards,
Gourav Shenoy

From: Supun Nakandala <su...@gmail.com>>
Reply-To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Date: Monday, February 6, 2017 at 12:15 PM
To: dev <de...@airavata.apache.org>>

Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hi Gourav,

It is my belief that we don't need a separate microservice to each task. I favor a single micro service which can execute all tasks (or in other words a generic task execution micro service). Of course, we can have many of them when we want to scale. WDYT?

On Sun, Feb 5, 2017 at 3:07 PM, Shenoy, Gourav Ganesh <go...@indiana.edu>> wrote:
Hi dev,

We were brainstorming some potential designs that might help us with this problem. One possible option would be to have a “workflow micro-service” which would basically be the mediator/orchestrator for deciding which micro-service should be executed next – based on the type of the job. The motive is to make micro-services independent of the workflow; i.e. a micro-service implementation should be not be aware of which micro-service will be executed next and we should have a central control of deciding this pattern.
Eg: For job type X, the pattern could be A -> B -> C -> D. Whereas for job type Y, the pattern could be A -> C -> D; and so on.

An initial design with this idea looks like follows:
[cid:image002.png@01D2823E.69F3BC10]


We would have a common messaging framework (implementation has not been decided yet). The database associated with the workflow micro-service could be a graph database (maybe?) – again the implementation/technology has not been decided yet.

This is just a proposed design, and I would love to hear your thoughts on this and any suggestions/comments if any. If there is anything that we are missing or should consider, please do let us know.

Thanks and Regards,
Gourav Shenoy

From: "Christie, Marcus Aaron" <ma...@iu.edu>>
Reply-To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Date: Friday, February 3, 2017 at 9:21 AM

To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Vidya,

I’m not sure how relevant it is, but it occurs to me that a microservice that executes jobs on a cloud requires very little in terms of resources to submit and monitor that job on the cloud. It doesn’t really matter if the job is a “big” or a “small” job.  So I’m not sure what heuristic makes sense regarding distributing work to these job execution microservices.  Maybe a simple round robin approach would be sufficient.

I think a job scheduling algorithm does make sense, however, for a higher level component, some sort of metascheduler that understands what resources are available on the cloud resources on which the jobs will be running.  The metascheduler could create work for the job exection microservices to run on particular cloud resources in a way that optimizes for some metric (e.g., throughput).

Thanks,

Marcus

On Feb 3, 2017, at 3:19 AM, Vidya Sagar Kalvakunta <vk...@umail.iu.edu>> wrote:

Ajinkya,

My scenario is for workload distribution among multiple instances of the same microservice.

If a message broker needs to distribute the available jobs among multiple workers, the common approach would be to use round robin or a similar algorithm. This approach works best when all the workers are similar and the jobs are equal.

So I think that a genetic or heuristic job scheduling algorithm, which is also aware of each of the worker's current state (CPU, RAM, No of Jobs processing) can more efficiently distribute the jobs. The workers can periodically ping the message broker with their current state info.

The other advantage of using a customized algorithm is that it can be tweaked to use embedded routing, priority or other information in the job metadata to resolve all of the concerns raised by Amrutha viz message grouping, ordering, repeated messages, etc.

We can even ensure data privacy, i.e if the workers are spread across multiple compute clusters say AWS and IU Big Red and we want to restrict certain sensitive jobs to be run only on Big Red.

Some distributed job scheduling algorithms for cloud computing.

  *   http://www.ijimai.org/journal/sites/default/files/files/2013/03/ijimai20132_18_pdf_62825.pdf
  *   https://arxiv.org/pdf/1404.5528.pdf


Regards
Vidya Sagar

On Fri, Feb 3, 2017 at 1:38 AM, Kamat, Amruta Ravalnath <ar...@indiana.edu>> wrote:
Hello all,

Adding more information to the message based approach. Messaging is a key strategy employed in many distributed environments. Message queuing is ideally suited to performing asynchronous operations. A sender can post a message to a queue, but it does not have to wait while the message is retrieved and processed. A sender and receiver do not even have to be running concurrently.

With message queuing there can be 2 possible scenarios:

  1.  ​Sending and receiving messages using a single message queue.
  2.  ​Sharing a message queue between many senders and receivers
​When a message is retrieved, it is removed from the queue. A message queue may also support message peeking. This mechanism can be useful if several receivers are retrieving messages from the same queue, but each receiver only wishes to handle specific messages. The receiver can examine the message it has peeked, and decide whether to retrieve the message (which removes it from the queue) or leave it on the queue for another receiver to handle.

A few basic message queuing patterns are:

  1.  One-way messaging: The sender simply posts a message to the queue in the expectation that a receiver will retrieve it and process it at some point.
  2.  Request/response messaging: In this pattern a sender posts a message to a queue and expects a response from the receiver. The sender can resend if the message is not delivered. This pattern typically requires some form of correlation to enable the sender to determine which response message corresponds to which request sent to the receiver.
  3.  Broadcast messaging: In this pattern a sender posts a message to a queue, and multiple receivers can read a copy of the message. This pattern depends on the message queue being able to disseminate the same message to multiple receivers. There is a queue to which the senders can post messages that include metadata in the form of attributes. Each receiver can create a subscription to the queue, specifying a filter that examines the values of message attributes. Any messages posted to the queue with attribute values that match the filter are automatically forwarded to that subscription.
A solution based on asynchronous messaging might need to address a number of concerns:

Message ordering, Message grouping: Process messages either in the order they are posted or in a specific order based on priority. Also, there may be occasions when it is difficult to eliminate dependencies, and it may be necessary to group messages together so that they are all handled by the same receiver.
Idempotency: Ideally the message processing logic in a receiver should be idempotent so that, if the work performed is repeated, this repetition does not change the state of the system.
Repeated messages: Some message queuing systems implement duplicate message detection and removal based on message IDs
Poison messages: A poison message is a message that cannot be handled, often because it is malformed or contains unexpected information.
Message expiration: A message might have a limited lifetime, and if it is not processed within this period it might no longer be relevant and should be discarded.
Message scheduling: A message might be temporarily embargoed and should not be processed until a specific date and time. The message should not be available to a receiver until this time.


Thanks
Amruta Kamat
________________________________
From: Shenoy, Gourav Ganesh <go...@indiana.edu>>
Sent: Thursday, February 2, 2017 7:57 PM
To: dev@airavata.apache.org<ma...@airavata.apache.org>

Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hello all,

Amila, Sagar, thank you for the response and raising those concerns; and apologies because my email resonated the topic of workload management in terms of how micro-services communicate. As Ajinkya rightly mentioned, there exists some sort of correlation between micro-services communication and it’s impact on how that micro-service performs the work under those circumstances. The goal is to make sure we have maximum independence between micro-services, and investigate the workflow pattern in which these micro-services will operate such that we can find the right balance between availability & consistency. Again, from our preliminary analysis we can assert that these solutions may not be generic and the specific use-case will have a big decisive role.

For starters, we are focusing on the following example – and I think this will clarify the doubts on what we are exactly trying to investigate about.

Our test example
Say we have the following 4 micro-services, which each perform a specific task as mentioned in the box.

<image001.png>


A state-full pattern to distribute work
<image002.png>

Here each communication between micro-services could be via RPC or Messaging (eg: RabbitMQ). Obvious disadvantage is that if any micro-service is down, then the system availability is at stake. In this test example, we can see that Microservice-A coordinates the work and maintains the state information.

A state-less pattern to distribute work

<image003.png>

Another purely asynchronous approach would be to associate message-queues with each micro-service, where each micro-service performs it’s task, submits a request (message on bus) to the next micro-service, and continues to process more requests. This ensures more availability, and perhaps we might need to handle corner cases for failures such as message broker down, or message loss, etc.

As mentioned, these are just a few proposals that we are planning to investigate via a prototype project. Inject corner cases/failures and try and find ways to handle these cases. I would love to hear more thoughts/questions/suggestions.

Thanks and Regards,
Gourav Shenoy

From: Ajinkya Dhamnaskar <ad...@umail.iu.edu>>
Reply-To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Date: Thursday, February 2, 2017 at 2:22 AM
To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hello all,

Just a heads up. Here the name Distributed workload management does not necessarily mean having different instances of a microservice and then distributing work among these instances.

Apparently, the problem is how to make each microservice work independently with concrete distributed communication infrastructure. So, think of it as a workflow where each microservice does its part of work and communicates (how? yet to be decided) output. The next underlying microservice identifies and picks up that output and takes it further towards the final outcome, having said that, the crux here is, none of the miscoservices need to worry about other miscoservices in a pipeline.

Vidya Sagar,
I completely second your opinion of having stateless miscoservices, in fact that is the key. With stateless miscroservices it is difficult to guarantee consistency in a system but it solves the availability problem to some extent. I would be interested to understand what do you mean by "an intelligent job scheduling algorithm, which receives real-time updates from the microservices with their current state information".

On Wed, Feb 1, 2017 at 11:48 PM, Vidya Sagar Kalvakunta <vk...@umail.iu.edu>> wrote:

On Wed, Feb 1, 2017 at 2:37 PM, Amila Jayasekara <th...@gmail.com>> wrote:
Hi Gourav,

Sorry, I did not understand your question. Specifically I am having trouble relating "work load management" to options you suggest (RPC, message based etc.).
So what exactly you mean by "workload management" ?
What is work in this context ?

Also, I did not understand what you meant by "the most efficient way". Efficient interms of what ? Are you looking at speed ?

As per your suggestions, it seems you are trying to find a way to communicate between micro services. RPC might be troublesome if you need to communicate with processes separated from a firewall.

Thanks
-Thejaka


On Wed, Feb 1, 2017 at 12:52 PM, Shenoy, Gourav Ganesh <go...@indiana.edu>> wrote:
Hello dev, arch,

As part of this Spring’17 Advanced Science Gateway Architecture course, we are working on trying to debate and find possible solutions to the issue of managing distributed workloads in Apache Airavata. This leads to the discussion of finding the most efficient way that different Airavata micro-services should communicate and distribute work, in such a way that:

1.       We maintain the ability to scale these micro-services whenever needed (autoscale perhaps?).

2.       Achieve fault tolerance.

3.       We can deploy these micro-services independently, or better in a containerized manner – keeping in mind the ability to use devops for deployment.

As of now the options we are exploring are:

1.       RPC based communication

2.       Message based – either master-worker, or work-queue, etc

3.       A combination of both these approaches

I am more inclined towards exploring the message based approach, but again there arises the possibility of handling limitations/corner cases of message broker such as downtimes (may be more). In my opinion, having asynchronous communication will help us achieve most of the above-mentioned points. Another debatable issue is making the micro-services implementation stateless, such that we do not have to pass the state information between micro-services.

I would love to hear any thoughts/suggestions/comments on this topic and open up a discussion via this mail thread. If there is anything that I have missed which is relevant to this issue, please let me know.

Thanks and Regards,
Gourav Shenoy


Hi Gourav,

Correct me if I'm wrong, but I think this is a case of the job shop scheduling problem, as we may have 'n' jobs of varying processing times and memory requirements, and we have 'm' microservices with possibly different computing and memory capacities, and we are trying to minimize the makespan<https://en.wikipedia.org/wiki/Makespan>.

For this use-case, I'm in favor a highly available and consistent message broker with an intelligent job scheduling algorithm, which receives real-time updates from the microservices with their current state information.

As for the state vs stateless implementation, I think that question depends on the functionality of a particular microservice. In a broad sense, the stateless implementation should be preferred as it will scale better horizontally.


Regards,
Vidya Sagar


--
Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics and Computing | Indiana University Bloomington | (812) 691-5002<tel:8126915002> | vkalvaku@iu.edu<ma...@iu.edu>



--
Thanks and regards,

Ajinkya Dhamnaskar
Student ID : 0003469679
Masters (CS)
+1 (812) 369- 5416<tel:(812)%20369-5416>



--
Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics and Computing | Indiana University Bloomington | (812) 691-5002<tel:8126915002> | vkalvaku@iu.edu<ma...@iu.edu>




--
Thank you
Supun Nakandala
Dept. Computer Science and Engineering
University of Moratuwa



--
Thank you
Supun Nakandala
Dept. Computer Science and Engineering
University of Moratuwa

Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Posted by "Kamat, Amruta Ravalnath" <ar...@indiana.edu>.
Hello Gourav,


I agree with your solution, but I just came across a decentralized architecture which might serve our purpose and might provide a looser coupling.


Having a common workflow would mean a centralized orchestrator i.e. a process which coordinates with multiple services to complete a larger workflow. The services have no knowledge of the workflow or their specific involvement in it. The orchestrator takes care of the complexities. However, The challenge with an orchestrator is that business logic will build up in a central place.

If there is a central shared instance of the orchestrator for all requests, then the orchestrator is a single point of failure. If it goes down, all processing stops.

With decentralized interactions, each service takes full responsibility for its role in the greater workflow. It will listen for events from other services, complete it's work as soon as possible, retry if a failure occurs and send out events upon completion. Here, communications tend to be asynchronous and business logic stays within the related services.
Instead of having a central orchestrator that controls the logic of what steps happen when, that logic is built into each service ahead of time. The services know what to react to and how, ahead of time. Multiple services can consume the same events, do some processing, and then produce their own events back into the event stream, all at the same time. The event stream does not have any logic and is intended to be a dumb pipe.


​Decentralized interactions meet our requirements better: loose coupling, high cohesion and each service responsible for it's own bounded context.

Thanks
Amruta Kamat
________________________________
From: Shenoy, Gourav Ganesh <go...@indiana.edu>
Sent: Tuesday, February 7, 2017 11:49 PM
To: dev@airavata.apache.org
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Supun,

Thank you for this excellent explanation. I see that the architecture you mentioned covers most of the concerns we discussed in this thread and in class. I just had one clarifying question though – what does “worker” signify here? Is it a generic task execution framework which runs the DAG? Or is it a like a platform where the DAG runs (and how?).

Apart from that, I am looking at Storm’s architecture to see if we can get some clues as they are tackling a similar problem. I shall update once I get some concrete answer.

Thanks and Regards,
Gourav Shenoy

From: Supun Nakandala <su...@gmail.com>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Tuesday, February 7, 2017 at 5:47 PM
To: dev <de...@airavata.apache.org>
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hi Gourav,I agree with your idea of using one “workflow micro-service” which would basically be the mediator/orchestrator for deciding which micro-service should be executed next. But I think these components do not necessarily have to be micro-services but rather conforms to the master-worker paradigm in some sense. But the trick here is how can we implement a scalable, fault tolerant system to do distributed workload management and from CAP theorem what is the property that we are going to compromise.

I think you are heading in the right direction. But I would like to add more details to your solution. Please note that I haven't evaluated these ideas 100%. Perhaps we can talk more about this in the next class.

As you have done, I think we should centralize the state information into one component (orchestrator in our case). From my experience, it is very hard to achieve consistency in a distributed state setting in the events of failure.

Second, to maintain generalizability in Airavata I think we should treat each application/use-cases as a DAG of execution. For example, HPC job and a cloud job will have two different DAGs which consists of tasks (data staging, job submission, out staging etc). These tasks should be short tasks and should roughly have the same execution time. And having idempotent tasks is preferable.

Orchestrator is responsible for executing the DAG and assign tasks to the workers(how? will follow) based on the control dependencies in the DAG tasks. In addition to the dependencies generated from tasks I see, there can be other dependencies to things like monitoring and scheduling which the orchestrator has to make into account when executing the DAG.

The next question is how we distribute jobs from Orchestrator to workers. I think here it is ok to compromise availability in favor of consistency. I suggest that we use the request/response messaging pattern which uses a persistent message broker (critical service). In this architecture, we can safely allow orchestrator or workers to fail without losing consistency (because of the persistent queue). But if the orchestrator fails then the availability will go down. One way to overcome this would be to come up with an orchestrator quorum.Attached figure summarizes my idea.

I think we can also evaluate this solution with the concerns that Shameera pointed out such as can we enable cancel?. Once again it's just my idea and is open for argument and debate.



[nline image 2]

Thanks
-Supun



On Tue, Feb 7, 2017 at 10:54 AM, Shenoy, Gourav Ganesh <go...@indiana.edu>> wrote:
Hi Supun,

I agree, but may be for the example I mentioned, multiple micro-services might not sound necessary. I was trying to generalize towards a scenario where we have multiple independent micro-services (not necessarily for task execution). Again, I am not certain if this is the right architecture but yours (and other’s) inputs, will definitely help us narrow down on the different scenarios we need to exactly focus on. Do let me know if I make sense.

Thanks and Regards,
Gourav Shenoy

From: Supun Nakandala <su...@gmail.com>>
Reply-To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Date: Monday, February 6, 2017 at 12:15 PM
To: dev <de...@airavata.apache.org>>

Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hi Gourav,

It is my belief that we don't need a separate microservice to each task. I favor a single micro service which can execute all tasks (or in other words a generic task execution micro service). Of course, we can have many of them when we want to scale. WDYT?

On Sun, Feb 5, 2017 at 3:07 PM, Shenoy, Gourav Ganesh <go...@indiana.edu>> wrote:
Hi dev,

We were brainstorming some potential designs that might help us with this problem. One possible option would be to have a “workflow micro-service” which would basically be the mediator/orchestrator for deciding which micro-service should be executed next – based on the type of the job. The motive is to make micro-services independent of the workflow; i.e. a micro-service implementation should be not be aware of which micro-service will be executed next and we should have a central control of deciding this pattern.
Eg: For job type X, the pattern could be A -> B -> C -> D. Whereas for job type Y, the pattern could be A -> C -> D; and so on.

An initial design with this idea looks like follows:
[cid:image002.png@01D2819C.C1A3BB60]


We would have a common messaging framework (implementation has not been decided yet). The database associated with the workflow micro-service could be a graph database (maybe?) – again the implementation/technology has not been decided yet.

This is just a proposed design, and I would love to hear your thoughts on this and any suggestions/comments if any. If there is anything that we are missing or should consider, please do let us know.

Thanks and Regards,
Gourav Shenoy

From: "Christie, Marcus Aaron" <ma...@iu.edu>>
Reply-To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Date: Friday, February 3, 2017 at 9:21 AM

To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Vidya,

I’m not sure how relevant it is, but it occurs to me that a microservice that executes jobs on a cloud requires very little in terms of resources to submit and monitor that job on the cloud. It doesn’t really matter if the job is a “big” or a “small” job.  So I’m not sure what heuristic makes sense regarding distributing work to these job execution microservices.  Maybe a simple round robin approach would be sufficient.

I think a job scheduling algorithm does make sense, however, for a higher level component, some sort of metascheduler that understands what resources are available on the cloud resources on which the jobs will be running.  The metascheduler could create work for the job exection microservices to run on particular cloud resources in a way that optimizes for some metric (e.g., throughput).

Thanks,

Marcus

On Feb 3, 2017, at 3:19 AM, Vidya Sagar Kalvakunta <vk...@umail.iu.edu>> wrote:

Ajinkya,

My scenario is for workload distribution among multiple instances of the same microservice.

If a message broker needs to distribute the available jobs among multiple workers, the common approach would be to use round robin or a similar algorithm. This approach works best when all the workers are similar and the jobs are equal.

So I think that a genetic or heuristic job scheduling algorithm, which is also aware of each of the worker's current state (CPU, RAM, No of Jobs processing) can more efficiently distribute the jobs. The workers can periodically ping the message broker with their current state info.

The other advantage of using a customized algorithm is that it can be tweaked to use embedded routing, priority or other information in the job metadata to resolve all of the concerns raised by Amrutha viz message grouping, ordering, repeated messages, etc.

We can even ensure data privacy, i.e if the workers are spread across multiple compute clusters say AWS and IU Big Red and we want to restrict certain sensitive jobs to be run only on Big Red.

Some distributed job scheduling algorithms for cloud computing.

  *   http://www.ijimai.org/journal/sites/default/files/files/2013/03/ijimai20132_18_pdf_62825.pdf
  *   https://arxiv.org/pdf/1404.5528.pdf


Regards
Vidya Sagar

On Fri, Feb 3, 2017 at 1:38 AM, Kamat, Amruta Ravalnath <ar...@indiana.edu>> wrote:
Hello all,

Adding more information to the message based approach. Messaging is a key strategy employed in many distributed environments. Message queuing is ideally suited to performing asynchronous operations. A sender can post a message to a queue, but it does not have to wait while the message is retrieved and processed. A sender and receiver do not even have to be running concurrently.

With message queuing there can be 2 possible scenarios:

  1.  ​Sending and receiving messages using a single message queue.
  2.  ​Sharing a message queue between many senders and receivers
​When a message is retrieved, it is removed from the queue. A message queue may also support message peeking. This mechanism can be useful if several receivers are retrieving messages from the same queue, but each receiver only wishes to handle specific messages. The receiver can examine the message it has peeked, and decide whether to retrieve the message (which removes it from the queue) or leave it on the queue for another receiver to handle.

A few basic message queuing patterns are:

  1.  One-way messaging: The sender simply posts a message to the queue in the expectation that a receiver will retrieve it and process it at some point.
  2.  Request/response messaging: In this pattern a sender posts a message to a queue and expects a response from the receiver. The sender can resend if the message is not delivered. This pattern typically requires some form of correlation to enable the sender to determine which response message corresponds to which request sent to the receiver.
  3.  Broadcast messaging: In this pattern a sender posts a message to a queue, and multiple receivers can read a copy of the message. This pattern depends on the message queue being able to disseminate the same message to multiple receivers. There is a queue to which the senders can post messages that include metadata in the form of attributes. Each receiver can create a subscription to the queue, specifying a filter that examines the values of message attributes. Any messages posted to the queue with attribute values that match the filter are automatically forwarded to that subscription.
A solution based on asynchronous messaging might need to address a number of concerns:

Message ordering, Message grouping: Process messages either in the order they are posted or in a specific order based on priority. Also, there may be occasions when it is difficult to eliminate dependencies, and it may be necessary to group messages together so that they are all handled by the same receiver.
Idempotency: Ideally the message processing logic in a receiver should be idempotent so that, if the work performed is repeated, this repetition does not change the state of the system.
Repeated messages: Some message queuing systems implement duplicate message detection and removal based on message IDs
Poison messages: A poison message is a message that cannot be handled, often because it is malformed or contains unexpected information.
Message expiration: A message might have a limited lifetime, and if it is not processed within this period it might no longer be relevant and should be discarded.
Message scheduling: A message might be temporarily embargoed and should not be processed until a specific date and time. The message should not be available to a receiver until this time.


Thanks
Amruta Kamat
________________________________
From: Shenoy, Gourav Ganesh <go...@indiana.edu>>
Sent: Thursday, February 2, 2017 7:57 PM
To: dev@airavata.apache.org<ma...@airavata.apache.org>

Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hello all,

Amila, Sagar, thank you for the response and raising those concerns; and apologies because my email resonated the topic of workload management in terms of how micro-services communicate. As Ajinkya rightly mentioned, there exists some sort of correlation between micro-services communication and it’s impact on how that micro-service performs the work under those circumstances. The goal is to make sure we have maximum independence between micro-services, and investigate the workflow pattern in which these micro-services will operate such that we can find the right balance between availability & consistency. Again, from our preliminary analysis we can assert that these solutions may not be generic and the specific use-case will have a big decisive role.

For starters, we are focusing on the following example – and I think this will clarify the doubts on what we are exactly trying to investigate about.

Our test example
Say we have the following 4 micro-services, which each perform a specific task as mentioned in the box.

<image001.png>


A state-full pattern to distribute work
<image002.png>

Here each communication between micro-services could be via RPC or Messaging (eg: RabbitMQ). Obvious disadvantage is that if any micro-service is down, then the system availability is at stake. In this test example, we can see that Microservice-A coordinates the work and maintains the state information.

A state-less pattern to distribute work

<image003.png>

Another purely asynchronous approach would be to associate message-queues with each micro-service, where each micro-service performs it’s task, submits a request (message on bus) to the next micro-service, and continues to process more requests. This ensures more availability, and perhaps we might need to handle corner cases for failures such as message broker down, or message loss, etc.

As mentioned, these are just a few proposals that we are planning to investigate via a prototype project. Inject corner cases/failures and try and find ways to handle these cases. I would love to hear more thoughts/questions/suggestions.

Thanks and Regards,
Gourav Shenoy

From: Ajinkya Dhamnaskar <ad...@umail.iu.edu>>
Reply-To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Date: Thursday, February 2, 2017 at 2:22 AM
To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hello all,

Just a heads up. Here the name Distributed workload management does not necessarily mean having different instances of a microservice and then distributing work among these instances.

Apparently, the problem is how to make each microservice work independently with concrete distributed communication infrastructure. So, think of it as a workflow where each microservice does its part of work and communicates (how? yet to be decided) output. The next underlying microservice identifies and picks up that output and takes it further towards the final outcome, having said that, the crux here is, none of the miscoservices need to worry about other miscoservices in a pipeline.

Vidya Sagar,
I completely second your opinion of having stateless miscoservices, in fact that is the key. With stateless miscroservices it is difficult to guarantee consistency in a system but it solves the availability problem to some extent. I would be interested to understand what do you mean by "an intelligent job scheduling algorithm, which receives real-time updates from the microservices with their current state information".

On Wed, Feb 1, 2017 at 11:48 PM, Vidya Sagar Kalvakunta <vk...@umail.iu.edu>> wrote:

On Wed, Feb 1, 2017 at 2:37 PM, Amila Jayasekara <th...@gmail.com>> wrote:
Hi Gourav,

Sorry, I did not understand your question. Specifically I am having trouble relating "work load management" to options you suggest (RPC, message based etc.).
So what exactly you mean by "workload management" ?
What is work in this context ?

Also, I did not understand what you meant by "the most efficient way". Efficient interms of what ? Are you looking at speed ?

As per your suggestions, it seems you are trying to find a way to communicate between micro services. RPC might be troublesome if you need to communicate with processes separated from a firewall.

Thanks
-Thejaka


On Wed, Feb 1, 2017 at 12:52 PM, Shenoy, Gourav Ganesh <go...@indiana.edu>> wrote:
Hello dev, arch,

As part of this Spring’17 Advanced Science Gateway Architecture course, we are working on trying to debate and find possible solutions to the issue of managing distributed workloads in Apache Airavata. This leads to the discussion of finding the most efficient way that different Airavata micro-services should communicate and distribute work, in such a way that:

1.       We maintain the ability to scale these micro-services whenever needed (autoscale perhaps?).

2.       Achieve fault tolerance.

3.       We can deploy these micro-services independently, or better in a containerized manner – keeping in mind the ability to use devops for deployment.

As of now the options we are exploring are:

1.       RPC based communication

2.       Message based – either master-worker, or work-queue, etc

3.       A combination of both these approaches

I am more inclined towards exploring the message based approach, but again there arises the possibility of handling limitations/corner cases of message broker such as downtimes (may be more). In my opinion, having asynchronous communication will help us achieve most of the above-mentioned points. Another debatable issue is making the micro-services implementation stateless, such that we do not have to pass the state information between micro-services.

I would love to hear any thoughts/suggestions/comments on this topic and open up a discussion via this mail thread. If there is anything that I have missed which is relevant to this issue, please let me know.

Thanks and Regards,
Gourav Shenoy


Hi Gourav,

Correct me if I'm wrong, but I think this is a case of the job shop scheduling problem, as we may have 'n' jobs of varying processing times and memory requirements, and we have 'm' microservices with possibly different computing and memory capacities, and we are trying to minimize the makespan<https://en.wikipedia.org/wiki/Makespan>.

For this use-case, I'm in favor a highly available and consistent message broker with an intelligent job scheduling algorithm, which receives real-time updates from the microservices with their current state information.

As for the state vs stateless implementation, I think that question depends on the functionality of a particular microservice. In a broad sense, the stateless implementation should be preferred as it will scale better horizontally.


Regards,
Vidya Sagar


--
Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics and Computing | Indiana University Bloomington | (812) 691-5002<tel:8126915002> | vkalvaku@iu.edu<ma...@iu.edu>



--
Thanks and regards,

Ajinkya Dhamnaskar
Student ID : 0003469679
Masters (CS)
+1 (812) 369- 5416<tel:(812)%20369-5416>



--
Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics and Computing | Indiana University Bloomington | (812) 691-5002<tel:8126915002> | vkalvaku@iu.edu<ma...@iu.edu>




--
Thank you
Supun Nakandala
Dept. Computer Science and Engineering
University of Moratuwa



--
Thank you
Supun Nakandala
Dept. Computer Science and Engineering
University of Moratuwa

Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Posted by "Shenoy, Gourav Ganesh" <go...@indiana.edu>.
Supun,

Thank you for this excellent explanation. I see that the architecture you mentioned covers most of the concerns we discussed in this thread and in class. I just had one clarifying question though – what does “worker” signify here? Is it a generic task execution framework which runs the DAG? Or is it a like a platform where the DAG runs (and how?).

Apart from that, I am looking at Storm’s architecture to see if we can get some clues as they are tackling a similar problem. I shall update once I get some concrete answer.

Thanks and Regards,
Gourav Shenoy

From: Supun Nakandala <su...@gmail.com>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Tuesday, February 7, 2017 at 5:47 PM
To: dev <de...@airavata.apache.org>
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hi Gourav,I agree with your idea of using one “workflow micro-service” which would basically be the mediator/orchestrator for deciding which micro-service should be executed next. But I think these components do not necessarily have to be micro-services but rather conforms to the master-worker paradigm in some sense. But the trick here is how can we implement a scalable, fault tolerant system to do distributed workload management and from CAP theorem what is the property that we are going to compromise.

I think you are heading in the right direction. But I would like to add more details to your solution. Please note that I haven't evaluated these ideas 100%. Perhaps we can talk more about this in the next class.

As you have done, I think we should centralize the state information into one component (orchestrator in our case). From my experience, it is very hard to achieve consistency in a distributed state setting in the events of failure.

Second, to maintain generalizability in Airavata I think we should treat each application/use-cases as a DAG of execution. For example, HPC job and a cloud job will have two different DAGs which consists of tasks (data staging, job submission, out staging etc). These tasks should be short tasks and should roughly have the same execution time. And having idempotent tasks is preferable.

Orchestrator is responsible for executing the DAG and assign tasks to the workers(how? will follow) based on the control dependencies in the DAG tasks. In addition to the dependencies generated from tasks I see, there can be other dependencies to things like monitoring and scheduling which the orchestrator has to make into account when executing the DAG.

The next question is how we distribute jobs from Orchestrator to workers. I think here it is ok to compromise availability in favor of consistency. I suggest that we use the request/response messaging pattern which uses a persistent message broker (critical service). In this architecture, we can safely allow orchestrator or workers to fail without losing consistency (because of the persistent queue). But if the orchestrator fails then the availability will go down. One way to overcome this would be to come up with an orchestrator quorum.Attached figure summarizes my idea.

I think we can also evaluate this solution with the concerns that Shameera pointed out such as can we enable cancel?. Once again it's just my idea and is open for argument and debate.



[nline image 2]

Thanks
-Supun



On Tue, Feb 7, 2017 at 10:54 AM, Shenoy, Gourav Ganesh <go...@indiana.edu>> wrote:
Hi Supun,

I agree, but may be for the example I mentioned, multiple micro-services might not sound necessary. I was trying to generalize towards a scenario where we have multiple independent micro-services (not necessarily for task execution). Again, I am not certain if this is the right architecture but yours (and other’s) inputs, will definitely help us narrow down on the different scenarios we need to exactly focus on. Do let me know if I make sense.

Thanks and Regards,
Gourav Shenoy

From: Supun Nakandala <su...@gmail.com>>
Reply-To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Date: Monday, February 6, 2017 at 12:15 PM
To: dev <de...@airavata.apache.org>>

Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hi Gourav,

It is my belief that we don't need a separate microservice to each task. I favor a single micro service which can execute all tasks (or in other words a generic task execution micro service). Of course, we can have many of them when we want to scale. WDYT?

On Sun, Feb 5, 2017 at 3:07 PM, Shenoy, Gourav Ganesh <go...@indiana.edu>> wrote:
Hi dev,

We were brainstorming some potential designs that might help us with this problem. One possible option would be to have a “workflow micro-service” which would basically be the mediator/orchestrator for deciding which micro-service should be executed next – based on the type of the job. The motive is to make micro-services independent of the workflow; i.e. a micro-service implementation should be not be aware of which micro-service will be executed next and we should have a central control of deciding this pattern.
Eg: For job type X, the pattern could be A -> B -> C -> D. Whereas for job type Y, the pattern could be A -> C -> D; and so on.

An initial design with this idea looks like follows:
[cid:image002.png@01D2819C.C1A3BB60]


We would have a common messaging framework (implementation has not been decided yet). The database associated with the workflow micro-service could be a graph database (maybe?) – again the implementation/technology has not been decided yet.

This is just a proposed design, and I would love to hear your thoughts on this and any suggestions/comments if any. If there is anything that we are missing or should consider, please do let us know.

Thanks and Regards,
Gourav Shenoy

From: "Christie, Marcus Aaron" <ma...@iu.edu>>
Reply-To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Date: Friday, February 3, 2017 at 9:21 AM

To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Vidya,

I’m not sure how relevant it is, but it occurs to me that a microservice that executes jobs on a cloud requires very little in terms of resources to submit and monitor that job on the cloud. It doesn’t really matter if the job is a “big” or a “small” job.  So I’m not sure what heuristic makes sense regarding distributing work to these job execution microservices.  Maybe a simple round robin approach would be sufficient.

I think a job scheduling algorithm does make sense, however, for a higher level component, some sort of metascheduler that understands what resources are available on the cloud resources on which the jobs will be running.  The metascheduler could create work for the job exection microservices to run on particular cloud resources in a way that optimizes for some metric (e.g., throughput).

Thanks,

Marcus

On Feb 3, 2017, at 3:19 AM, Vidya Sagar Kalvakunta <vk...@umail.iu.edu>> wrote:

Ajinkya,

My scenario is for workload distribution among multiple instances of the same microservice.

If a message broker needs to distribute the available jobs among multiple workers, the common approach would be to use round robin or a similar algorithm. This approach works best when all the workers are similar and the jobs are equal.

So I think that a genetic or heuristic job scheduling algorithm, which is also aware of each of the worker's current state (CPU, RAM, No of Jobs processing) can more efficiently distribute the jobs. The workers can periodically ping the message broker with their current state info.

The other advantage of using a customized algorithm is that it can be tweaked to use embedded routing, priority or other information in the job metadata to resolve all of the concerns raised by Amrutha viz message grouping, ordering, repeated messages, etc.

We can even ensure data privacy, i.e if the workers are spread across multiple compute clusters say AWS and IU Big Red and we want to restrict certain sensitive jobs to be run only on Big Red.

Some distributed job scheduling algorithms for cloud computing.

  *   http://www.ijimai.org/journal/sites/default/files/files/2013/03/ijimai20132_18_pdf_62825.pdf
  *   https://arxiv.org/pdf/1404.5528.pdf


Regards
Vidya Sagar

On Fri, Feb 3, 2017 at 1:38 AM, Kamat, Amruta Ravalnath <ar...@indiana.edu>> wrote:
Hello all,

Adding more information to the message based approach. Messaging is a key strategy employed in many distributed environments. Message queuing is ideally suited to performing asynchronous operations. A sender can post a message to a queue, but it does not have to wait while the message is retrieved and processed. A sender and receiver do not even have to be running concurrently.

With message queuing there can be 2 possible scenarios:

  1.  ​Sending and receiving messages using a single message queue.
  2.  ​Sharing a message queue between many senders and receivers
​When a message is retrieved, it is removed from the queue. A message queue may also support message peeking. This mechanism can be useful if several receivers are retrieving messages from the same queue, but each receiver only wishes to handle specific messages. The receiver can examine the message it has peeked, and decide whether to retrieve the message (which removes it from the queue) or leave it on the queue for another receiver to handle.

A few basic message queuing patterns are:

  1.  One-way messaging: The sender simply posts a message to the queue in the expectation that a receiver will retrieve it and process it at some point.
  2.  Request/response messaging: In this pattern a sender posts a message to a queue and expects a response from the receiver. The sender can resend if the message is not delivered. This pattern typically requires some form of correlation to enable the sender to determine which response message corresponds to which request sent to the receiver.
  3.  Broadcast messaging: In this pattern a sender posts a message to a queue, and multiple receivers can read a copy of the message. This pattern depends on the message queue being able to disseminate the same message to multiple receivers. There is a queue to which the senders can post messages that include metadata in the form of attributes. Each receiver can create a subscription to the queue, specifying a filter that examines the values of message attributes. Any messages posted to the queue with attribute values that match the filter are automatically forwarded to that subscription.
A solution based on asynchronous messaging might need to address a number of concerns:

Message ordering, Message grouping: Process messages either in the order they are posted or in a specific order based on priority. Also, there may be occasions when it is difficult to eliminate dependencies, and it may be necessary to group messages together so that they are all handled by the same receiver.
Idempotency: Ideally the message processing logic in a receiver should be idempotent so that, if the work performed is repeated, this repetition does not change the state of the system.
Repeated messages: Some message queuing systems implement duplicate message detection and removal based on message IDs
Poison messages: A poison message is a message that cannot be handled, often because it is malformed or contains unexpected information.
Message expiration: A message might have a limited lifetime, and if it is not processed within this period it might no longer be relevant and should be discarded.
Message scheduling: A message might be temporarily embargoed and should not be processed until a specific date and time. The message should not be available to a receiver until this time.


Thanks
Amruta Kamat
________________________________
From: Shenoy, Gourav Ganesh <go...@indiana.edu>>
Sent: Thursday, February 2, 2017 7:57 PM
To: dev@airavata.apache.org<ma...@airavata.apache.org>

Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hello all,

Amila, Sagar, thank you for the response and raising those concerns; and apologies because my email resonated the topic of workload management in terms of how micro-services communicate. As Ajinkya rightly mentioned, there exists some sort of correlation between micro-services communication and it’s impact on how that micro-service performs the work under those circumstances. The goal is to make sure we have maximum independence between micro-services, and investigate the workflow pattern in which these micro-services will operate such that we can find the right balance between availability & consistency. Again, from our preliminary analysis we can assert that these solutions may not be generic and the specific use-case will have a big decisive role.

For starters, we are focusing on the following example – and I think this will clarify the doubts on what we are exactly trying to investigate about.

Our test example
Say we have the following 4 micro-services, which each perform a specific task as mentioned in the box.

<image001.png>


A state-full pattern to distribute work
<image002.png>

Here each communication between micro-services could be via RPC or Messaging (eg: RabbitMQ). Obvious disadvantage is that if any micro-service is down, then the system availability is at stake. In this test example, we can see that Microservice-A coordinates the work and maintains the state information.

A state-less pattern to distribute work

<image003.png>

Another purely asynchronous approach would be to associate message-queues with each micro-service, where each micro-service performs it’s task, submits a request (message on bus) to the next micro-service, and continues to process more requests. This ensures more availability, and perhaps we might need to handle corner cases for failures such as message broker down, or message loss, etc.

As mentioned, these are just a few proposals that we are planning to investigate via a prototype project. Inject corner cases/failures and try and find ways to handle these cases. I would love to hear more thoughts/questions/suggestions.

Thanks and Regards,
Gourav Shenoy

From: Ajinkya Dhamnaskar <ad...@umail.iu.edu>>
Reply-To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Date: Thursday, February 2, 2017 at 2:22 AM
To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hello all,

Just a heads up. Here the name Distributed workload management does not necessarily mean having different instances of a microservice and then distributing work among these instances.

Apparently, the problem is how to make each microservice work independently with concrete distributed communication infrastructure. So, think of it as a workflow where each microservice does its part of work and communicates (how? yet to be decided) output. The next underlying microservice identifies and picks up that output and takes it further towards the final outcome, having said that, the crux here is, none of the miscoservices need to worry about other miscoservices in a pipeline.

Vidya Sagar,
I completely second your opinion of having stateless miscoservices, in fact that is the key. With stateless miscroservices it is difficult to guarantee consistency in a system but it solves the availability problem to some extent. I would be interested to understand what do you mean by "an intelligent job scheduling algorithm, which receives real-time updates from the microservices with their current state information".

On Wed, Feb 1, 2017 at 11:48 PM, Vidya Sagar Kalvakunta <vk...@umail.iu.edu>> wrote:

On Wed, Feb 1, 2017 at 2:37 PM, Amila Jayasekara <th...@gmail.com>> wrote:
Hi Gourav,

Sorry, I did not understand your question. Specifically I am having trouble relating "work load management" to options you suggest (RPC, message based etc.).
So what exactly you mean by "workload management" ?
What is work in this context ?

Also, I did not understand what you meant by "the most efficient way". Efficient interms of what ? Are you looking at speed ?

As per your suggestions, it seems you are trying to find a way to communicate between micro services. RPC might be troublesome if you need to communicate with processes separated from a firewall.

Thanks
-Thejaka


On Wed, Feb 1, 2017 at 12:52 PM, Shenoy, Gourav Ganesh <go...@indiana.edu>> wrote:
Hello dev, arch,

As part of this Spring’17 Advanced Science Gateway Architecture course, we are working on trying to debate and find possible solutions to the issue of managing distributed workloads in Apache Airavata. This leads to the discussion of finding the most efficient way that different Airavata micro-services should communicate and distribute work, in such a way that:

1.       We maintain the ability to scale these micro-services whenever needed (autoscale perhaps?).

2.       Achieve fault tolerance.

3.       We can deploy these micro-services independently, or better in a containerized manner – keeping in mind the ability to use devops for deployment.

As of now the options we are exploring are:

1.       RPC based communication

2.       Message based – either master-worker, or work-queue, etc

3.       A combination of both these approaches

I am more inclined towards exploring the message based approach, but again there arises the possibility of handling limitations/corner cases of message broker such as downtimes (may be more). In my opinion, having asynchronous communication will help us achieve most of the above-mentioned points. Another debatable issue is making the micro-services implementation stateless, such that we do not have to pass the state information between micro-services.

I would love to hear any thoughts/suggestions/comments on this topic and open up a discussion via this mail thread. If there is anything that I have missed which is relevant to this issue, please let me know.

Thanks and Regards,
Gourav Shenoy


Hi Gourav,

Correct me if I'm wrong, but I think this is a case of the job shop scheduling problem, as we may have 'n' jobs of varying processing times and memory requirements, and we have 'm' microservices with possibly different computing and memory capacities, and we are trying to minimize the makespan<https://en.wikipedia.org/wiki/Makespan>.

For this use-case, I'm in favor a highly available and consistent message broker with an intelligent job scheduling algorithm, which receives real-time updates from the microservices with their current state information.

As for the state vs stateless implementation, I think that question depends on the functionality of a particular microservice. In a broad sense, the stateless implementation should be preferred as it will scale better horizontally.


Regards,
Vidya Sagar


--
Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics and Computing | Indiana University Bloomington | (812) 691-5002<tel:8126915002> | vkalvaku@iu.edu<ma...@iu.edu>



--
Thanks and regards,

Ajinkya Dhamnaskar
Student ID : 0003469679
Masters (CS)
+1 (812) 369- 5416<tel:(812)%20369-5416>



--
Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics and Computing | Indiana University Bloomington | (812) 691-5002<tel:8126915002> | vkalvaku@iu.edu<ma...@iu.edu>




--
Thank you
Supun Nakandala
Dept. Computer Science and Engineering
University of Moratuwa



--
Thank you
Supun Nakandala
Dept. Computer Science and Engineering
University of Moratuwa

Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Posted by Supun Nakandala <su...@gmail.com>.
Hi Gourav,I agree with your idea of using one “workflow micro-service”
which would basically be the mediator/orchestrator for deciding which
micro-service should be executed next. But I think these components do not
necessarily have to be micro-services but rather conforms to the
master-worker paradigm in some sense. But the trick here is how can we
implement a scalable, fault tolerant system to do distributed workload
management and from CAP theorem what is the property that we are going to
compromise.

I think you are heading in the right direction. But I would like to add
more details to your solution. Please note that I haven't evaluated these
ideas 100%. Perhaps we can talk more about this in the next class.

As you have done, I think we should centralize the state information into
one component (orchestrator in our case). From my experience, it is very
hard to achieve consistency in a distributed state setting in the events of
failure.

Second, to maintain generalizability in Airavata I think we should treat
each application/use-cases as a DAG of execution. For example, HPC job and
a cloud job will have two different DAGs which consists of tasks (data
staging, job submission, out staging etc). These tasks should be short
tasks and should roughly have the same execution time. And having
idempotent tasks is preferable.

Orchestrator is responsible for executing the DAG and assign tasks to the
workers(how? will follow) based on the control dependencies in the DAG
tasks. In addition to the dependencies generated from tasks I see, there
can be other dependencies to things like monitoring and scheduling which
the orchestrator has to make into account when executing the DAG.

The next question is how we distribute jobs from Orchestrator to workers. I
think here it is ok to compromise availability in favor of consistency. I
suggest that we use the request/response messaging pattern which uses a
persistent message broker (critical service). In this architecture, we can
safely allow orchestrator or workers to fail without losing consistency
(because of the persistent queue). But if the orchestrator fails then the
availability will go down. One way to overcome this would be to come
up with an orchestrator quorum.Attached figure summarizes my idea.

I think we can also evaluate this solution with the concerns that Shameera
pointed out such as can we enable cancel?. Once again it's just my idea and
is open for argument and debate.



[image: Inline image 2]

Thanks
-Supun



On Tue, Feb 7, 2017 at 10:54 AM, Shenoy, Gourav Ganesh <goshenoy@indiana.edu
> wrote:

> Hi Supun,
>
>
>
> I agree, but may be for the example I mentioned, multiple micro-services
> might not sound necessary. I was trying to generalize towards a scenario
> where we have multiple independent micro-services (not necessarily for task
> execution). Again, I am not certain if this is the right architecture but
> yours (and other’s) inputs, will definitely help us narrow down on the
> different scenarios we need to exactly focus on. Do let me know if I make
> sense.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *Supun Nakandala <su...@gmail.com>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Monday, February 6, 2017 at 12:15 PM
> *To: *dev <de...@airavata.apache.org>
>
> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
>
> Hi Gourav,
>
>
>
> It is my belief that we don't need a separate microservice to each task. I
> favor a single micro service which can execute all tasks (or in other words
> a generic task execution micro service). Of course, we can have many of
> them when we want to scale. WDYT?
>
>
>
> On Sun, Feb 5, 2017 at 3:07 PM, Shenoy, Gourav Ganesh <
> goshenoy@indiana.edu> wrote:
>
> Hi dev,
>
>
>
> We were brainstorming some potential designs that might help us with this
> problem. One possible option would be to have a “workflow micro-service”
> which would basically be the mediator/orchestrator for deciding which
> micro-service should be executed next – based on the type of the job. The
> motive is to make micro-services independent of the workflow; i.e. a
> micro-service implementation should be not be aware of which micro-service
> will be executed next and we should have a central control of deciding this
> pattern.
>
> Eg: For job type X, the pattern could be A -> B -> C -> D. Whereas for job
> type Y, the pattern could be A -> C -> D; and so on.
>
>
>
> An initial design with this idea looks like follows:
>
>
>
>
>
> We would have a common messaging framework (implementation has not been
> decided yet). The database associated with the workflow micro-service could
> be a graph database (maybe?) – again the implementation/technology has not
> been decided yet.
>
>
>
> This is just a proposed design, and I would love to hear your thoughts on
> this and any suggestions/comments if any. If there is anything that we are
> missing or should consider, please do let us know.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *"Christie, Marcus Aaron" <ma...@iu.edu>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Friday, February 3, 2017 at 9:21 AM
>
>
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
>
> Vidya,
>
>
>
> I’m not sure how relevant it is, but it occurs to me that a microservice
> that executes jobs on a cloud requires very little in terms of resources to
> submit and monitor that job on the cloud. It doesn’t really matter if the
> job is a “big” or a “small” job.  So I’m not sure what heuristic makes
> sense regarding distributing work to these job execution microservices.
> Maybe a simple round robin approach would be sufficient.
>
>
>
> I think a job scheduling algorithm does make sense, however, for a higher
> level component, some sort of metascheduler that understands what resources
> are available on the cloud resources on which the jobs will be running.
> The metascheduler could create work for the job exection microservices to
> run on particular cloud resources in a way that optimizes for some metric
> (e.g., throughput).
>
>
>
> Thanks,
>
>
>
> Marcus
>
>
>
> On Feb 3, 2017, at 3:19 AM, Vidya Sagar Kalvakunta <vk...@umail.iu.edu>
> wrote:
>
>
>
> Ajinkya,
>
>
>
> My scenario is for workload distribution among multiple instances of the
> same microservice.
>
>
>
> If a message broker needs to distribute the available jobs among multiple
> workers, the common approach would be to use round robin or a similar
> algorithm. This approach works best when all the workers are similar and
> the jobs are equal.
>
>
>
> So I think that a genetic or heuristic job scheduling algorithm, which is
> also aware of each of the worker's current state (CPU, RAM, No of Jobs
> processing) can more efficiently distribute the jobs. The workers can
> periodically ping the message broker with their current state info.
>
>
>
> The other advantage of using a customized algorithm is that it can
> be tweaked to use embedded routing, priority or other information in the
> job metadata to resolve all of the concerns raised by Amrutha viz message
> grouping, ordering, repeated messages, etc.
>
>
>
> We can even ensure data privacy, i.e if the workers are spread across
> multiple compute clusters say AWS and IU Big Red and we want to restrict
> certain sensitive jobs to be run only on Big Red.
>
>
>
> Some distributed job scheduling algorithms for cloud computing.
>
>    - http://www.ijimai.org/journal/sites/default/files/files/2013
>    /03/ijimai20132_18_pdf_62825.pdf
>    <http://www.ijimai.org/journal/sites/default/files/files/2013/03/ijimai20132_18_pdf_62825.pdf>
>    - https://arxiv.org/pdf/1404.5528.pdf
>
>
>
>
>
> Regards
>
> Vidya Sagar
>
>
>
> On Fri, Feb 3, 2017 at 1:38 AM, Kamat, Amruta Ravalnath <
> arkamat@indiana.edu> wrote:
>
> Hello all,
>
>
>
> Adding more information to the message based approach. Messaging is a key
> strategy employed in many distributed environments. Message queuing is
> ideally suited to performing asynchronous operations. A sender can post a
> message to a queue, but it does not have to wait while the message is
> retrieved and processed. A sender and receiver do not even have to be
> running concurrently.
>
>
>
> With message queuing there can be 2 possible scenarios:
>
>    1. ​Sending and receiving messages using a * single message queue.*
>    2. ​*Sharing a message queue* between many senders and receivers
>
> ​When a message is retrieved, it is removed from the queue. A message
> queue may also support message peeking. This mechanism can be useful if
> several receivers are retrieving messages from the same queue, but each
> receiver only wishes to handle specific messages. The receiver can examine
> the message it has peeked, and decide whether to retrieve the message
> (which removes it from the queue) or leave it on the queue for another
> receiver to handle.
>
>
>
> A few basic message queuing patterns are:
>
>    1. *One-way messaging*: The sender simply posts a message to the queue
>    in the expectation that a receiver will retrieve it and process it at some
>    point.
>    2. *Request/response messaging*: In this pattern a sender posts a
>    message to a queue and expects a response from the receiver. The sender can
>    resend if the message is not delivered. This pattern typically requires
>    some form of correlation to enable the sender to determine which response
>    message corresponds to which request sent to the receiver.
>    3. *Broadcast messaging*: In this pattern a sender posts a message to
>    a queue, and multiple receivers can read a copy of the message. This
>    pattern depends on the message queue being able to disseminate the same
>    message to multiple receivers. There is a queue to which the senders can
>    post messages that include metadata in the form of attributes. Each
>    receiver can create a subscription to the queue, specifying a filter that
>    examines the values of message attributes. Any messages posted to the
>    queue with attribute values that match the filter are automatically
>    forwarded to that subscription.
>
> A solution based on asynchronous messaging might need to address a number
> of concerns:
>
>
>
> *Message ordering, Message grouping: *Process messages either in the
> order they are posted or in a specific order based on priority. Also, there
> may be occasions when it is difficult to eliminate dependencies, and it may
> be necessary to group messages together so that they are all handled by the
> same receiver.
> *Idempotency: *Ideally the message processing logic in a receiver should
> be idempotent so that, if the work performed is repeated, this repetition
> does not change the state of the system.
> *Repeated messages: *Some message queuing systems implement duplicate
> message detection and removal based on message IDs
> *Poison messages: *A poison message is a message that cannot be handled,
> often because it is malformed or contains unexpected information.
> *Message expiration: *A message might have a limited lifetime, and if it
> is not processed within this period it might no longer be relevant and
> should be discarded.
> *Message scheduling: *A message might be temporarily embargoed and should
> not be processed until a specific date and time. The message should not be
> available to a receiver until this time.
>
>
> Thanks
>
> Amruta Kamat
>
> ------------------------------
>
> *From:* Shenoy, Gourav Ganesh <go...@indiana.edu>
> *Sent:* Thursday, February 2, 2017 7:57 PM
> *To:* dev@airavata.apache.org
>
>
> *Subject:* Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
>
> Hello all,
>
>
>
> Amila, Sagar, thank you for the response and raising those concerns; and
> apologies because my email resonated the topic of workload management in
> terms of how micro-services communicate. As Ajinkya rightly mentioned,
> there exists some sort of correlation between micro-services communication
> and it’s impact on how that micro-service performs the work under those
> circumstances. The goal is to make sure we have maximum independence
> between micro-services, and investigate the workflow pattern in which these
> micro-services will operate such that we can find the right balance between
> availability & consistency. Again, from our preliminary analysis we can
> assert that these solutions may not be generic and the specific use-case
> will have a big decisive role.
>
>
>
> For starters, we are focusing on the following example – and I think this
> will clarify the doubts on what we are exactly trying to investigate about.
>
>
>
> *Our test example *
>
> Say we have the following 4 micro-services, which each perform a specific
> task as mentioned in the box.
>
>
>
> <image001.png>
>
>
>
>
>
> *A state-full pattern to distribute work*
>
> <image002.png>
>
>
>
> Here each communication between micro-services could be via RPC or
> Messaging (eg: RabbitMQ). Obvious disadvantage is that if any micro-service
> is down, then the system availability is at stake. In this test example, we
> can see that Microservice-A coordinates the work and maintains the state
> information.
>
>
>
> *A state-less pattern to distribute work*
>
>
>
> <image003.png>
>
>
>
> Another purely asynchronous approach would be to associate message-queues
> with each micro-service, where each micro-service performs it’s task,
> submits a request (message on bus) to the next micro-service, and continues
> to process more requests. This ensures more availability, and perhaps we
> might need to handle corner cases for failures such as message broker down,
> or message loss, etc.
>
>
>
> As mentioned, these are just a few proposals that we are planning to
> investigate via a prototype project. Inject corner cases/failures and try
> and find ways to handle these cases. I would love to hear more
> thoughts/questions/suggestions.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *Ajinkya Dhamnaskar <ad...@umail.iu.edu>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Thursday, February 2, 2017 at 2:22 AM
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
>
> Hello all,
>
>
>
> Just a heads up. Here the name Distributed workload management does not
> necessarily mean having different instances of a microservice and then
> distributing work among these instances.
>
>
>
> Apparently, the problem is how to make each microservice work
> independently with concrete distributed communication infrastructure. So,
> think of it as a workflow where each microservice does its part of work and
> communicates (how? yet to be decided) output. The next underlying
> microservice identifies and picks up that output and takes it further
> towards the final outcome, having said that, the crux here is, none of the
> miscoservices need to worry about other miscoservices in a pipeline.
>
>
>
> Vidya Sagar,
>
> I completely second your opinion of having stateless miscoservices, in
> fact that is the key. With stateless miscroservices it is difficult to
> guarantee consistency in a system but it solves the availability problem to
> some extent. I would be interested to understand what do you mean by "an
> intelligent job scheduling algorithm, which receives real-time updates from
> the microservices with their current state information".
>
>
>
> On Wed, Feb 1, 2017 at 11:48 PM, Vidya Sagar Kalvakunta <
> vkalvaku@umail.iu.edu> wrote:
>
>
>
> On Wed, Feb 1, 2017 at 2:37 PM, Amila Jayasekara <th...@gmail.com>
> wrote:
>
> Hi Gourav,
>
>
>
> Sorry, I did not understand your question. Specifically I am having
> trouble relating "work load management" to options you suggest (RPC,
> message based etc.).
>
> So what exactly you mean by "workload management" ?
>
> What is work in this context ?
>
>
>
> Also, I did not understand what you meant by "the most efficient way".
> Efficient interms of what ? Are you looking at speed ?
>
>
>
> As per your suggestions, it seems you are trying to find a way to
> communicate between micro services. RPC might be troublesome if you need to
> communicate with processes separated from a firewall.
>
>
>
> Thanks
>
> -Thejaka
>
>
>
>
>
> On Wed, Feb 1, 2017 at 12:52 PM, Shenoy, Gourav Ganesh <
> goshenoy@indiana.edu> wrote:
>
> Hello dev, arch,
>
>
>
> As part of this Spring’17 Advanced Science Gateway Architecture course, we
> are working on trying to debate and find possible solutions to the issue of
> managing distributed workloads in Apache Airavata. This leads to the
> discussion of finding the most efficient way that different Airavata
> micro-services should communicate and distribute work, in such a way that:
>
> 1.       We maintain the ability to scale these micro-services whenever
> needed (autoscale perhaps?).
>
> 2.       Achieve fault tolerance.
>
> 3.       We can deploy these micro-services independently, or better in a
> containerized manner – keeping in mind the ability to use devops for
> deployment.
>
>
>
> As of now the options we are exploring are:
>
> 1.       RPC based communication
>
> 2.       Message based – either master-worker, or work-queue, etc
>
> 3.       A combination of both these approaches
>
>
>
> I am more inclined towards exploring the message based approach, but again
> there arises the possibility of handling limitations/corner cases of
> message broker such as downtimes (may be more). In my opinion, having
> asynchronous communication will help us achieve most of the above-mentioned
> points. Another debatable issue is making the micro-services implementation
> stateless, such that we do not have to pass the state information between
> micro-services.
>
>
>
> I would love to hear any thoughts/suggestions/comments on this topic and
> open up a discussion via this mail thread. If there is anything that I have
> missed which is relevant to this issue, please let me know.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
>
>
> Hi Gourav,
>
>
>
> Correct me if I'm wrong, but I think this is a case of the job shop
> scheduling problem, as we may have 'n' jobs of varying processing times
> and memory requirements, and we have 'm' microservices with possibly
> different computing and memory capacities, and we are trying to minimize
> the makespan <https://en.wikipedia.org/wiki/Makespan>.
>
>
>
> For this use-case, I'm in favor a highly available and consistent message
> broker with an intelligent job scheduling algorithm, which receives
> real-time updates from the microservices with their current state
> information.
>
>
>
> As for the state vs stateless implementation, I think that question
> depends on the functionality of a particular microservice. In a broad
> sense, the stateless implementation should be preferred as it will scale
> better horizontally.
>
>
>
>
>
> Regards,
>
> Vidya Sagar
>
>
>
>
> --
>
> Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics
> and Computing | Indiana University Bloomington | (812) 691-5002
> <8126915002> | vkalvaku@iu.edu
>
>
>
>
>
> --
>
> Thanks and regards,
>
>
>
> Ajinkya Dhamnaskar
>
> Student ID : 0003469679
>
> Masters (CS)
>
> +1 (812) 369- 5416 <(812)%20369-5416>
>
>
>
>
>
> --
>
> Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics
> and Computing | Indiana University Bloomington | (812) 691-5002
> <8126915002> | vkalvaku@iu.edu
>
>
>
>
>
>
>
> --
>
> Thank you
> Supun Nakandala
> Dept. Computer Science and Engineering
> University of Moratuwa
>



-- 
Thank you
Supun Nakandala
Dept. Computer Science and Engineering
University of Moratuwa

Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Posted by "Shenoy, Gourav Ganesh" <go...@indiana.edu>.
Hi Supun,

I agree, but may be for the example I mentioned, multiple micro-services might not sound necessary. I was trying to generalize towards a scenario where we have multiple independent micro-services (not necessarily for task execution). Again, I am not certain if this is the right architecture but yours (and other’s) inputs, will definitely help us narrow down on the different scenarios we need to exactly focus on. Do let me know if I make sense.

Thanks and Regards,
Gourav Shenoy

From: Supun Nakandala <su...@gmail.com>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Monday, February 6, 2017 at 12:15 PM
To: dev <de...@airavata.apache.org>
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hi Gourav,

It is my belief that we don't need a separate microservice to each task. I favor a single micro service which can execute all tasks (or in other words a generic task execution micro service). Of course, we can have many of them when we want to scale. WDYT?

On Sun, Feb 5, 2017 at 3:07 PM, Shenoy, Gourav Ganesh <go...@indiana.edu>> wrote:
Hi dev,

We were brainstorming some potential designs that might help us with this problem. One possible option would be to have a “workflow micro-service” which would basically be the mediator/orchestrator for deciding which micro-service should be executed next – based on the type of the job. The motive is to make micro-services independent of the workflow; i.e. a micro-service implementation should be not be aware of which micro-service will be executed next and we should have a central control of deciding this pattern.
Eg: For job type X, the pattern could be A -> B -> C -> D. Whereas for job type Y, the pattern could be A -> C -> D; and so on.

An initial design with this idea looks like follows:
[cid:image001.png@01D28130.8F7A0E80]


We would have a common messaging framework (implementation has not been decided yet). The database associated with the workflow micro-service could be a graph database (maybe?) – again the implementation/technology has not been decided yet.

This is just a proposed design, and I would love to hear your thoughts on this and any suggestions/comments if any. If there is anything that we are missing or should consider, please do let us know.

Thanks and Regards,
Gourav Shenoy

From: "Christie, Marcus Aaron" <ma...@iu.edu>>
Reply-To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Date: Friday, February 3, 2017 at 9:21 AM

To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Vidya,

I’m not sure how relevant it is, but it occurs to me that a microservice that executes jobs on a cloud requires very little in terms of resources to submit and monitor that job on the cloud. It doesn’t really matter if the job is a “big” or a “small” job.  So I’m not sure what heuristic makes sense regarding distributing work to these job execution microservices.  Maybe a simple round robin approach would be sufficient.

I think a job scheduling algorithm does make sense, however, for a higher level component, some sort of metascheduler that understands what resources are available on the cloud resources on which the jobs will be running.  The metascheduler could create work for the job exection microservices to run on particular cloud resources in a way that optimizes for some metric (e.g., throughput).

Thanks,

Marcus

On Feb 3, 2017, at 3:19 AM, Vidya Sagar Kalvakunta <vk...@umail.iu.edu>> wrote:

Ajinkya,

My scenario is for workload distribution among multiple instances of the same microservice.

If a message broker needs to distribute the available jobs among multiple workers, the common approach would be to use round robin or a similar algorithm. This approach works best when all the workers are similar and the jobs are equal.

So I think that a genetic or heuristic job scheduling algorithm, which is also aware of each of the worker's current state (CPU, RAM, No of Jobs processing) can more efficiently distribute the jobs. The workers can periodically ping the message broker with their current state info.

The other advantage of using a customized algorithm is that it can be tweaked to use embedded routing, priority or other information in the job metadata to resolve all of the concerns raised by Amrutha viz message grouping, ordering, repeated messages, etc.

We can even ensure data privacy, i.e if the workers are spread across multiple compute clusters say AWS and IU Big Red and we want to restrict certain sensitive jobs to be run only on Big Red.

Some distributed job scheduling algorithms for cloud computing.

  *   http://www.ijimai.org/journal/sites/default/files/files/2013/03/ijimai20132_18_pdf_62825.pdf
  *   https://arxiv.org/pdf/1404.5528.pdf


Regards
Vidya Sagar

On Fri, Feb 3, 2017 at 1:38 AM, Kamat, Amruta Ravalnath <ar...@indiana.edu>> wrote:
Hello all,

Adding more information to the message based approach. Messaging is a key strategy employed in many distributed environments. Message queuing is ideally suited to performing asynchronous operations. A sender can post a message to a queue, but it does not have to wait while the message is retrieved and processed. A sender and receiver do not even have to be running concurrently.

With message queuing there can be 2 possible scenarios:

  1.  ​Sending and receiving messages using a single message queue.
  2.  ​Sharing a message queue between many senders and receivers
​When a message is retrieved, it is removed from the queue. A message queue may also support message peeking. This mechanism can be useful if several receivers are retrieving messages from the same queue, but each receiver only wishes to handle specific messages. The receiver can examine the message it has peeked, and decide whether to retrieve the message (which removes it from the queue) or leave it on the queue for another receiver to handle.

A few basic message queuing patterns are:

  1.  One-way messaging: The sender simply posts a message to the queue in the expectation that a receiver will retrieve it and process it at some point.
  2.  Request/response messaging: In this pattern a sender posts a message to a queue and expects a response from the receiver. The sender can resend if the message is not delivered. This pattern typically requires some form of correlation to enable the sender to determine which response message corresponds to which request sent to the receiver.
  3.  Broadcast messaging: In this pattern a sender posts a message to a queue, and multiple receivers can read a copy of the message. This pattern depends on the message queue being able to disseminate the same message to multiple receivers. There is a queue to which the senders can post messages that include metadata in the form of attributes. Each receiver can create a subscription to the queue, specifying a filter that examines the values of message attributes. Any messages posted to the queue with attribute values that match the filter are automatically forwarded to that subscription.
A solution based on asynchronous messaging might need to address a number of concerns:

Message ordering, Message grouping: Process messages either in the order they are posted or in a specific order based on priority. Also, there may be occasions when it is difficult to eliminate dependencies, and it may be necessary to group messages together so that they are all handled by the same receiver.
Idempotency: Ideally the message processing logic in a receiver should be idempotent so that, if the work performed is repeated, this repetition does not change the state of the system.
Repeated messages: Some message queuing systems implement duplicate message detection and removal based on message IDs
Poison messages: A poison message is a message that cannot be handled, often because it is malformed or contains unexpected information.
Message expiration: A message might have a limited lifetime, and if it is not processed within this period it might no longer be relevant and should be discarded.
Message scheduling: A message might be temporarily embargoed and should not be processed until a specific date and time. The message should not be available to a receiver until this time.


Thanks
Amruta Kamat

________________________________
From: Shenoy, Gourav Ganesh <go...@indiana.edu>>
Sent: Thursday, February 2, 2017 7:57 PM
To: dev@airavata.apache.org<ma...@airavata.apache.org>

Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hello all,

Amila, Sagar, thank you for the response and raising those concerns; and apologies because my email resonated the topic of workload management in terms of how micro-services communicate. As Ajinkya rightly mentioned, there exists some sort of correlation between micro-services communication and it’s impact on how that micro-service performs the work under those circumstances. The goal is to make sure we have maximum independence between micro-services, and investigate the workflow pattern in which these micro-services will operate such that we can find the right balance between availability & consistency. Again, from our preliminary analysis we can assert that these solutions may not be generic and the specific use-case will have a big decisive role.

For starters, we are focusing on the following example – and I think this will clarify the doubts on what we are exactly trying to investigate about.

Our test example
Say we have the following 4 micro-services, which each perform a specific task as mentioned in the box.

<image001.png>


A state-full pattern to distribute work
<image002.png>

Here each communication between micro-services could be via RPC or Messaging (eg: RabbitMQ). Obvious disadvantage is that if any micro-service is down, then the system availability is at stake. In this test example, we can see that Microservice-A coordinates the work and maintains the state information.

A state-less pattern to distribute work

<image003.png>

Another purely asynchronous approach would be to associate message-queues with each micro-service, where each micro-service performs it’s task, submits a request (message on bus) to the next micro-service, and continues to process more requests. This ensures more availability, and perhaps we might need to handle corner cases for failures such as message broker down, or message loss, etc.

As mentioned, these are just a few proposals that we are planning to investigate via a prototype project. Inject corner cases/failures and try and find ways to handle these cases. I would love to hear more thoughts/questions/suggestions.

Thanks and Regards,
Gourav Shenoy

From: Ajinkya Dhamnaskar <ad...@umail.iu.edu>>
Reply-To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Date: Thursday, February 2, 2017 at 2:22 AM
To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hello all,

Just a heads up. Here the name Distributed workload management does not necessarily mean having different instances of a microservice and then distributing work among these instances.

Apparently, the problem is how to make each microservice work independently with concrete distributed communication infrastructure. So, think of it as a workflow where each microservice does its part of work and communicates (how? yet to be decided) output. The next underlying microservice identifies and picks up that output and takes it further towards the final outcome, having said that, the crux here is, none of the miscoservices need to worry about other miscoservices in a pipeline.

Vidya Sagar,
I completely second your opinion of having stateless miscoservices, in fact that is the key. With stateless miscroservices it is difficult to guarantee consistency in a system but it solves the availability problem to some extent. I would be interested to understand what do you mean by "an intelligent job scheduling algorithm, which receives real-time updates from the microservices with their current state information".

On Wed, Feb 1, 2017 at 11:48 PM, Vidya Sagar Kalvakunta <vk...@umail.iu.edu>> wrote:

On Wed, Feb 1, 2017 at 2:37 PM, Amila Jayasekara <th...@gmail.com>> wrote:
Hi Gourav,

Sorry, I did not understand your question. Specifically I am having trouble relating "work load management" to options you suggest (RPC, message based etc.).
So what exactly you mean by "workload management" ?
What is work in this context ?

Also, I did not understand what you meant by "the most efficient way". Efficient interms of what ? Are you looking at speed ?

As per your suggestions, it seems you are trying to find a way to communicate between micro services. RPC might be troublesome if you need to communicate with processes separated from a firewall.

Thanks
-Thejaka


On Wed, Feb 1, 2017 at 12:52 PM, Shenoy, Gourav Ganesh <go...@indiana.edu>> wrote:
Hello dev, arch,

As part of this Spring’17 Advanced Science Gateway Architecture course, we are working on trying to debate and find possible solutions to the issue of managing distributed workloads in Apache Airavata. This leads to the discussion of finding the most efficient way that different Airavata micro-services should communicate and distribute work, in such a way that:

1.       We maintain the ability to scale these micro-services whenever needed (autoscale perhaps?).

2.       Achieve fault tolerance.

3.       We can deploy these micro-services independently, or better in a containerized manner – keeping in mind the ability to use devops for deployment.

As of now the options we are exploring are:

1.       RPC based communication

2.       Message based – either master-worker, or work-queue, etc

3.       A combination of both these approaches

I am more inclined towards exploring the message based approach, but again there arises the possibility of handling limitations/corner cases of message broker such as downtimes (may be more). In my opinion, having asynchronous communication will help us achieve most of the above-mentioned points. Another debatable issue is making the micro-services implementation stateless, such that we do not have to pass the state information between micro-services.

I would love to hear any thoughts/suggestions/comments on this topic and open up a discussion via this mail thread. If there is anything that I have missed which is relevant to this issue, please let me know.

Thanks and Regards,
Gourav Shenoy


Hi Gourav,

Correct me if I'm wrong, but I think this is a case of the job shop scheduling problem, as we may have 'n' jobs of varying processing times and memory requirements, and we have 'm' microservices with possibly different computing and memory capacities, and we are trying to minimize the makespan<https://en.wikipedia.org/wiki/Makespan>.

For this use-case, I'm in favor a highly available and consistent message broker with an intelligent job scheduling algorithm, which receives real-time updates from the microservices with their current state information.

As for the state vs stateless implementation, I think that question depends on the functionality of a particular microservice. In a broad sense, the stateless implementation should be preferred as it will scale better horizontally.


Regards,
Vidya Sagar


--
Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics and Computing | Indiana University Bloomington | (812) 691-5002<tel:8126915002> | vkalvaku@iu.edu<ma...@iu.edu>



--
Thanks and regards,

Ajinkya Dhamnaskar
Student ID : 0003469679
Masters (CS)
+1 (812) 369- 5416<tel:(812)%20369-5416>



--
Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics and Computing | Indiana University Bloomington | (812) 691-5002<tel:8126915002> | vkalvaku@iu.edu<ma...@iu.edu>




--
Thank you
Supun Nakandala
Dept. Computer Science and Engineering
University of Moratuwa

Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Posted by Supun Nakandala <su...@gmail.com>.
Hi Gourav,

It is my belief that we don't need a separate microservice to each task. I
favor a single micro service which can execute all tasks (or in other words
a generic task execution micro service). Of course, we can have many of
them when we want to scale. WDYT?

On Sun, Feb 5, 2017 at 3:07 PM, Shenoy, Gourav Ganesh <go...@indiana.edu>
wrote:

> Hi dev,
>
>
>
> We were brainstorming some potential designs that might help us with this
> problem. One possible option would be to have a “workflow micro-service”
> which would basically be the mediator/orchestrator for deciding which
> micro-service should be executed next – based on the type of the job. The
> motive is to make micro-services independent of the workflow; i.e. a
> micro-service implementation should be not be aware of which micro-service
> will be executed next and we should have a central control of deciding this
> pattern.
>
> Eg: For job type X, the pattern could be A -> B -> C -> D. Whereas for job
> type Y, the pattern could be A -> C -> D; and so on.
>
>
>
> An initial design with this idea looks like follows:
>
>
>
>
>
> We would have a common messaging framework (implementation has not been
> decided yet). The database associated with the workflow micro-service could
> be a graph database (maybe?) – again the implementation/technology has not
> been decided yet.
>
>
>
> This is just a proposed design, and I would love to hear your thoughts on
> this and any suggestions/comments if any. If there is anything that we are
> missing or should consider, please do let us know.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *"Christie, Marcus Aaron" <ma...@iu.edu>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Friday, February 3, 2017 at 9:21 AM
>
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
>
> Vidya,
>
>
>
> I’m not sure how relevant it is, but it occurs to me that a microservice
> that executes jobs on a cloud requires very little in terms of resources to
> submit and monitor that job on the cloud. It doesn’t really matter if the
> job is a “big” or a “small” job.  So I’m not sure what heuristic makes
> sense regarding distributing work to these job execution microservices.
> Maybe a simple round robin approach would be sufficient.
>
>
>
> I think a job scheduling algorithm does make sense, however, for a higher
> level component, some sort of metascheduler that understands what resources
> are available on the cloud resources on which the jobs will be running.
> The metascheduler could create work for the job exection microservices to
> run on particular cloud resources in a way that optimizes for some metric
> (e.g., throughput).
>
>
>
> Thanks,
>
>
>
> Marcus
>
>
>
> On Feb 3, 2017, at 3:19 AM, Vidya Sagar Kalvakunta <vk...@umail.iu.edu>
> wrote:
>
>
>
> Ajinkya,
>
>
>
> My scenario is for workload distribution among multiple instances of the
> same microservice.
>
>
>
> If a message broker needs to distribute the available jobs among multiple
> workers, the common approach would be to use round robin or a similar
> algorithm. This approach works best when all the workers are similar and
> the jobs are equal.
>
>
>
> So I think that a genetic or heuristic job scheduling algorithm, which is
> also aware of each of the worker's current state (CPU, RAM, No of Jobs
> processing) can more efficiently distribute the jobs. The workers can
> periodically ping the message broker with their current state info.
>
>
>
> The other advantage of using a customized algorithm is that it can
> be tweaked to use embedded routing, priority or other information in the
> job metadata to resolve all of the concerns raised by Amrutha viz message
> grouping, ordering, repeated messages, etc.
>
>
>
> We can even ensure data privacy, i.e if the workers are spread across
> multiple compute clusters say AWS and IU Big Red and we want to restrict
> certain sensitive jobs to be run only on Big Red.
>
>
>
> Some distributed job scheduling algorithms for cloud computing.
>
>    - http://www.ijimai.org/journal/sites/default/files/files/
>    2013/03/ijimai20132_18_pdf_62825.pdf
>    <http://www.ijimai.org/journal/sites/default/files/files/2013/03/ijimai20132_18_pdf_62825.pdf>
>    - https://arxiv.org/pdf/1404.5528.pdf
>
>
>
>
>
> Regards
>
> Vidya Sagar
>
>
>
> On Fri, Feb 3, 2017 at 1:38 AM, Kamat, Amruta Ravalnath <
> arkamat@indiana.edu> wrote:
>
> Hello all,
>
>
>
> Adding more information to the message based approach. Messaging is a key
> strategy employed in many distributed environments. Message queuing is
> ideally suited to performing asynchronous operations. A sender can post a
> message to a queue, but it does not have to wait while the message is
> retrieved and processed. A sender and receiver do not even have to be
> running concurrently.
>
>
>
> With message queuing there can be 2 possible scenarios:
>
>    1. ​Sending and receiving messages using a * single message queue.*
>    2. ​*Sharing a message queue* between many senders and receivers
>
> ​When a message is retrieved, it is removed from the queue. A message
> queue may also support message peeking. This mechanism can be useful if
> several receivers are retrieving messages from the same queue, but each
> receiver only wishes to handle specific messages. The receiver can examine
> the message it has peeked, and decide whether to retrieve the message
> (which removes it from the queue) or leave it on the queue for another
> receiver to handle.
>
>
>
> A few basic message queuing patterns are:
>
>    1. *One-way messaging*: The sender simply posts a message to the queue
>    in the expectation that a receiver will retrieve it and process it at some
>    point.
>    2. *Request/response messaging*: In this pattern a sender posts a
>    message to a queue and expects a response from the receiver. The sender can
>    resend if the message is not delivered. This pattern typically requires
>    some form of correlation to enable the sender to determine which response
>    message corresponds to which request sent to the receiver.
>    3. *Broadcast messaging*: In this pattern a sender posts a message to
>    a queue, and multiple receivers can read a copy of the message. This
>    pattern depends on the message queue being able to disseminate the same
>    message to multiple receivers. There is a queue to which the senders can
>    post messages that include metadata in the form of attributes. Each
>    receiver can create a subscription to the queue, specifying a filter that
>    examines the values of message attributes. Any messages posted to the
>    queue with attribute values that match the filter are automatically
>    forwarded to that subscription.
>
> A solution based on asynchronous messaging might need to address a number
> of concerns:
>
>
>
> *Message ordering, Message grouping: *Process messages either in the
> order they are posted or in a specific order based on priority. Also, there
> may be occasions when it is difficult to eliminate dependencies, and it may
> be necessary to group messages together so that they are all handled by the
> same receiver.
> *Idempotency: *Ideally the message processing logic in a receiver should
> be idempotent so that, if the work performed is repeated, this repetition
> does not change the state of the system.
> *Repeated messages: *Some message queuing systems implement duplicate
> message detection and removal based on message IDs
> *Poison messages: *A poison message is a message that cannot be handled,
> often because it is malformed or contains unexpected information.
> *Message expiration: *A message might have a limited lifetime, and if it
> is not processed within this period it might no longer be relevant and
> should be discarded.
> *Message scheduling: *A message might be temporarily embargoed and should
> not be processed until a specific date and time. The message should not be
> available to a receiver until this time.
>
>
> Thanks
>
> Amruta Kamat
>
>
> ------------------------------
>
> *From:* Shenoy, Gourav Ganesh <go...@indiana.edu>
> *Sent:* Thursday, February 2, 2017 7:57 PM
> *To:* dev@airavata.apache.org
>
>
> *Subject:* Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
>
> Hello all,
>
>
>
> Amila, Sagar, thank you for the response and raising those concerns; and
> apologies because my email resonated the topic of workload management in
> terms of how micro-services communicate. As Ajinkya rightly mentioned,
> there exists some sort of correlation between micro-services communication
> and it’s impact on how that micro-service performs the work under those
> circumstances. The goal is to make sure we have maximum independence
> between micro-services, and investigate the workflow pattern in which these
> micro-services will operate such that we can find the right balance between
> availability & consistency. Again, from our preliminary analysis we can
> assert that these solutions may not be generic and the specific use-case
> will have a big decisive role.
>
>
>
> For starters, we are focusing on the following example – and I think this
> will clarify the doubts on what we are exactly trying to investigate about.
>
>
>
> *Our test example *
>
> Say we have the following 4 micro-services, which each perform a specific
> task as mentioned in the box.
>
>
>
> <image001.png>
>
>
>
>
>
> *A state-full pattern to distribute work*
>
> <image002.png>
>
>
>
> Here each communication between micro-services could be via RPC or
> Messaging (eg: RabbitMQ). Obvious disadvantage is that if any micro-service
> is down, then the system availability is at stake. In this test example, we
> can see that Microservice-A coordinates the work and maintains the state
> information.
>
>
>
> *A state-less pattern to distribute work*
>
>
>
> <image003.png>
>
>
>
> Another purely asynchronous approach would be to associate message-queues
> with each micro-service, where each micro-service performs it’s task,
> submits a request (message on bus) to the next micro-service, and continues
> to process more requests. This ensures more availability, and perhaps we
> might need to handle corner cases for failures such as message broker down,
> or message loss, etc.
>
>
>
> As mentioned, these are just a few proposals that we are planning to
> investigate via a prototype project. Inject corner cases/failures and try
> and find ways to handle these cases. I would love to hear more
> thoughts/questions/suggestions.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *Ajinkya Dhamnaskar <ad...@umail.iu.edu>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Thursday, February 2, 2017 at 2:22 AM
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
>
> Hello all,
>
>
>
> Just a heads up. Here the name Distributed workload management does not
> necessarily mean having different instances of a microservice and then
> distributing work among these instances.
>
>
>
> Apparently, the problem is how to make each microservice work
> independently with concrete distributed communication infrastructure. So,
> think of it as a workflow where each microservice does its part of work and
> communicates (how? yet to be decided) output. The next underlying
> microservice identifies and picks up that output and takes it further
> towards the final outcome, having said that, the crux here is, none of the
> miscoservices need to worry about other miscoservices in a pipeline.
>
>
>
> Vidya Sagar,
>
> I completely second your opinion of having stateless miscoservices, in
> fact that is the key. With stateless miscroservices it is difficult to
> guarantee consistency in a system but it solves the availability problem to
> some extent. I would be interested to understand what do you mean by "an
> intelligent job scheduling algorithm, which receives real-time updates from
> the microservices with their current state information".
>
>
>
> On Wed, Feb 1, 2017 at 11:48 PM, Vidya Sagar Kalvakunta <
> vkalvaku@umail.iu.edu> wrote:
>
>
>
> On Wed, Feb 1, 2017 at 2:37 PM, Amila Jayasekara <th...@gmail.com>
> wrote:
>
> Hi Gourav,
>
>
>
> Sorry, I did not understand your question. Specifically I am having
> trouble relating "work load management" to options you suggest (RPC,
> message based etc.).
>
> So what exactly you mean by "workload management" ?
>
> What is work in this context ?
>
>
>
> Also, I did not understand what you meant by "the most efficient way".
> Efficient interms of what ? Are you looking at speed ?
>
>
>
> As per your suggestions, it seems you are trying to find a way to
> communicate between micro services. RPC might be troublesome if you need to
> communicate with processes separated from a firewall.
>
>
>
> Thanks
>
> -Thejaka
>
>
>
>
>
> On Wed, Feb 1, 2017 at 12:52 PM, Shenoy, Gourav Ganesh <
> goshenoy@indiana.edu> wrote:
>
> Hello dev, arch,
>
>
>
> As part of this Spring’17 Advanced Science Gateway Architecture course, we
> are working on trying to debate and find possible solutions to the issue of
> managing distributed workloads in Apache Airavata. This leads to the
> discussion of finding the most efficient way that different Airavata
> micro-services should communicate and distribute work, in such a way that:
>
> 1.       We maintain the ability to scale these micro-services whenever
> needed (autoscale perhaps?).
>
> 2.       Achieve fault tolerance.
>
> 3.       We can deploy these micro-services independently, or better in a
> containerized manner – keeping in mind the ability to use devops for
> deployment.
>
>
>
> As of now the options we are exploring are:
>
> 1.       RPC based communication
>
> 2.       Message based – either master-worker, or work-queue, etc
>
> 3.       A combination of both these approaches
>
>
>
> I am more inclined towards exploring the message based approach, but again
> there arises the possibility of handling limitations/corner cases of
> message broker such as downtimes (may be more). In my opinion, having
> asynchronous communication will help us achieve most of the above-mentioned
> points. Another debatable issue is making the micro-services implementation
> stateless, such that we do not have to pass the state information between
> micro-services.
>
>
>
> I would love to hear any thoughts/suggestions/comments on this topic and
> open up a discussion via this mail thread. If there is anything that I have
> missed which is relevant to this issue, please let me know.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
>
>
> Hi Gourav,
>
>
>
> Correct me if I'm wrong, but I think this is a case of the job shop
> scheduling problem, as we may have 'n' jobs of varying processing times
> and memory requirements, and we have 'm' microservices with possibly
> different computing and memory capacities, and we are trying to minimize
> the makespan <https://en.wikipedia.org/wiki/Makespan>.
>
>
>
> For this use-case, I'm in favor a highly available and consistent message
> broker with an intelligent job scheduling algorithm, which receives
> real-time updates from the microservices with their current state
> information.
>
>
>
> As for the state vs stateless implementation, I think that question
> depends on the functionality of a particular microservice. In a broad
> sense, the stateless implementation should be preferred as it will scale
> better horizontally.
>
>
>
>
>
> Regards,
>
> Vidya Sagar
>
>
>
>
> --
>
> Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics
> and Computing | Indiana University Bloomington | (812) 691-5002
> <8126915002> | vkalvaku@iu.edu
>
>
>
>
>
> --
>
> Thanks and regards,
>
>
>
> Ajinkya Dhamnaskar
>
> Student ID : 0003469679
>
> Masters (CS)
>
> +1 (812) 369- 5416 <(812)%20369-5416>
>
>
>
>
>
> --
>
> Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics
> and Computing | Indiana University Bloomington | (812) 691-5002
> <8126915002> | vkalvaku@iu.edu
>
>
>



-- 
Thank you
Supun Nakandala
Dept. Computer Science and Engineering
University of Moratuwa

Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Posted by "Shenoy, Gourav Ganesh" <go...@indiana.edu>.
Hi dev,

We were brainstorming some potential designs that might help us with this problem. One possible option would be to have a “workflow micro-service” which would basically be the mediator/orchestrator for deciding which micro-service should be executed next – based on the type of the job. The motive is to make micro-services independent of the workflow; i.e. a micro-service implementation should be not be aware of which micro-service will be executed next and we should have a central control of deciding this pattern.
Eg: For job type X, the pattern could be A -> B -> C -> D. Whereas for job type Y, the pattern could be A -> C -> D; and so on.

An initial design with this idea looks like follows:
[cid:image001.png@01D27FC1.A0C5D0E0]


We would have a common messaging framework (implementation has not been decided yet). The database associated with the workflow micro-service could be a graph database (maybe?) – again the implementation/technology has not been decided yet.

This is just a proposed design, and I would love to hear your thoughts on this and any suggestions/comments if any. If there is anything that we are missing or should consider, please do let us know.

Thanks and Regards,
Gourav Shenoy

From: "Christie, Marcus Aaron" <ma...@iu.edu>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Friday, February 3, 2017 at 9:21 AM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Vidya,

I’m not sure how relevant it is, but it occurs to me that a microservice that executes jobs on a cloud requires very little in terms of resources to submit and monitor that job on the cloud. It doesn’t really matter if the job is a “big” or a “small” job.  So I’m not sure what heuristic makes sense regarding distributing work to these job execution microservices.  Maybe a simple round robin approach would be sufficient.

I think a job scheduling algorithm does make sense, however, for a higher level component, some sort of metascheduler that understands what resources are available on the cloud resources on which the jobs will be running.  The metascheduler could create work for the job exection microservices to run on particular cloud resources in a way that optimizes for some metric (e.g., throughput).

Thanks,

Marcus

On Feb 3, 2017, at 3:19 AM, Vidya Sagar Kalvakunta <vk...@umail.iu.edu>> wrote:

Ajinkya,

My scenario is for workload distribution among multiple instances of the same microservice.

If a message broker needs to distribute the available jobs among multiple workers, the common approach would be to use round robin or a similar algorithm. This approach works best when all the workers are similar and the jobs are equal.

So I think that a genetic or heuristic job scheduling algorithm, which is also aware of each of the worker's current state (CPU, RAM, No of Jobs processing) can more efficiently distribute the jobs. The workers can periodically ping the message broker with their current state info.

The other advantage of using a customized algorithm is that it can be tweaked to use embedded routing, priority or other information in the job metadata to resolve all of the concerns raised by Amrutha viz message grouping, ordering, repeated messages, etc.

We can even ensure data privacy, i.e if the workers are spread across multiple compute clusters say AWS and IU Big Red and we want to restrict certain sensitive jobs to be run only on Big Red.

Some distributed job scheduling algorithms for cloud computing.

  *   http://www.ijimai.org/journal/sites/default/files/files/2013/03/ijimai20132_18_pdf_62825.pdf
  *   https://arxiv.org/pdf/1404.5528.pdf


Regards
Vidya Sagar

On Fri, Feb 3, 2017 at 1:38 AM, Kamat, Amruta Ravalnath <ar...@indiana.edu>> wrote:
Hello all,

Adding more information to the message based approach. Messaging is a key strategy employed in many distributed environments. Message queuing is ideally suited to performing asynchronous operations. A sender can post a message to a queue, but it does not have to wait while the message is retrieved and processed. A sender and receiver do not even have to be running concurrently.

With message queuing there can be 2 possible scenarios:

  1.  ​Sending and receiving messages using a single message queue.
  2.  ​Sharing a message queue between many senders and receivers
​When a message is retrieved, it is removed from the queue. A message queue may also support message peeking. This mechanism can be useful if several receivers are retrieving messages from the same queue, but each receiver only wishes to handle specific messages. The receiver can examine the message it has peeked, and decide whether to retrieve the message (which removes it from the queue) or leave it on the queue for another receiver to handle.

A few basic message queuing patterns are:

  1.  One-way messaging: The sender simply posts a message to the queue in the expectation that a receiver will retrieve it and process it at some point.
  2.  Request/response messaging: In this pattern a sender posts a message to a queue and expects a response from the receiver. The sender can resend if the message is not delivered. This pattern typically requires some form of correlation to enable the sender to determine which response message corresponds to which request sent to the receiver.
  3.  Broadcast messaging: In this pattern a sender posts a message to a queue, and multiple receivers can read a copy of the message. This pattern depends on the message queue being able to disseminate the same message to multiple receivers. There is a queue to which the senders can post messages that include metadata in the form of attributes. Each receiver can create a subscription to the queue, specifying a filter that examines the values of message attributes. Any messages posted to the queue with attribute values that match the filter are automatically forwarded to that subscription.
A solution based on asynchronous messaging might need to address a number of concerns:

Message ordering, Message grouping: Process messages either in the order they are posted or in a specific order based on priority. Also, there may be occasions when it is difficult to eliminate dependencies, and it may be necessary to group messages together so that they are all handled by the same receiver.
Idempotency: Ideally the message processing logic in a receiver should be idempotent so that, if the work performed is repeated, this repetition does not change the state of the system.
Repeated messages: Some message queuing systems implement duplicate message detection and removal based on message IDs
Poison messages: A poison message is a message that cannot be handled, often because it is malformed or contains unexpected information.
Message expiration: A message might have a limited lifetime, and if it is not processed within this period it might no longer be relevant and should be discarded.
Message scheduling: A message might be temporarily embargoed and should not be processed until a specific date and time. The message should not be available to a receiver until this time.


Thanks
Amruta Kamat


________________________________
From: Shenoy, Gourav Ganesh <go...@indiana.edu>>
Sent: Thursday, February 2, 2017 7:57 PM
To: dev@airavata.apache.org<ma...@airavata.apache.org>

Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hello all,

Amila, Sagar, thank you for the response and raising those concerns; and apologies because my email resonated the topic of workload management in terms of how micro-services communicate. As Ajinkya rightly mentioned, there exists some sort of correlation between micro-services communication and it’s impact on how that micro-service performs the work under those circumstances. The goal is to make sure we have maximum independence between micro-services, and investigate the workflow pattern in which these micro-services will operate such that we can find the right balance between availability & consistency. Again, from our preliminary analysis we can assert that these solutions may not be generic and the specific use-case will have a big decisive role.

For starters, we are focusing on the following example – and I think this will clarify the doubts on what we are exactly trying to investigate about.

Our test example
Say we have the following 4 micro-services, which each perform a specific task as mentioned in the box.

<image001.png>


A state-full pattern to distribute work
<image002.png>

Here each communication between micro-services could be via RPC or Messaging (eg: RabbitMQ). Obvious disadvantage is that if any micro-service is down, then the system availability is at stake. In this test example, we can see that Microservice-A coordinates the work and maintains the state information.

A state-less pattern to distribute work

<image003.png>

Another purely asynchronous approach would be to associate message-queues with each micro-service, where each micro-service performs it’s task, submits a request (message on bus) to the next micro-service, and continues to process more requests. This ensures more availability, and perhaps we might need to handle corner cases for failures such as message broker down, or message loss, etc.

As mentioned, these are just a few proposals that we are planning to investigate via a prototype project. Inject corner cases/failures and try and find ways to handle these cases. I would love to hear more thoughts/questions/suggestions.

Thanks and Regards,
Gourav Shenoy

From: Ajinkya Dhamnaskar <ad...@umail.iu.edu>>
Reply-To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Date: Thursday, February 2, 2017 at 2:22 AM
To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hello all,

Just a heads up. Here the name Distributed workload management does not necessarily mean having different instances of a microservice and then distributing work among these instances.

Apparently, the problem is how to make each microservice work independently with concrete distributed communication infrastructure. So, think of it as a workflow where each microservice does its part of work and communicates (how? yet to be decided) output. The next underlying microservice identifies and picks up that output and takes it further towards the final outcome, having said that, the crux here is, none of the miscoservices need to worry about other miscoservices in a pipeline.

Vidya Sagar,
I completely second your opinion of having stateless miscoservices, in fact that is the key. With stateless miscroservices it is difficult to guarantee consistency in a system but it solves the availability problem to some extent. I would be interested to understand what do you mean by "an intelligent job scheduling algorithm, which receives real-time updates from the microservices with their current state information".

On Wed, Feb 1, 2017 at 11:48 PM, Vidya Sagar Kalvakunta <vk...@umail.iu.edu>> wrote:

On Wed, Feb 1, 2017 at 2:37 PM, Amila Jayasekara <th...@gmail.com>> wrote:
Hi Gourav,

Sorry, I did not understand your question. Specifically I am having trouble relating "work load management" to options you suggest (RPC, message based etc.).
So what exactly you mean by "workload management" ?
What is work in this context ?

Also, I did not understand what you meant by "the most efficient way". Efficient interms of what ? Are you looking at speed ?

As per your suggestions, it seems you are trying to find a way to communicate between micro services. RPC might be troublesome if you need to communicate with processes separated from a firewall.

Thanks
-Thejaka


On Wed, Feb 1, 2017 at 12:52 PM, Shenoy, Gourav Ganesh <go...@indiana.edu>> wrote:
Hello dev, arch,

As part of this Spring’17 Advanced Science Gateway Architecture course, we are working on trying to debate and find possible solutions to the issue of managing distributed workloads in Apache Airavata. This leads to the discussion of finding the most efficient way that different Airavata micro-services should communicate and distribute work, in such a way that:

1.       We maintain the ability to scale these micro-services whenever needed (autoscale perhaps?).

2.       Achieve fault tolerance.

3.       We can deploy these micro-services independently, or better in a containerized manner – keeping in mind the ability to use devops for deployment.

As of now the options we are exploring are:

1.       RPC based communication

2.       Message based – either master-worker, or work-queue, etc

3.       A combination of both these approaches

I am more inclined towards exploring the message based approach, but again there arises the possibility of handling limitations/corner cases of message broker such as downtimes (may be more). In my opinion, having asynchronous communication will help us achieve most of the above-mentioned points. Another debatable issue is making the micro-services implementation stateless, such that we do not have to pass the state information between micro-services.

I would love to hear any thoughts/suggestions/comments on this topic and open up a discussion via this mail thread. If there is anything that I have missed which is relevant to this issue, please let me know.

Thanks and Regards,
Gourav Shenoy


Hi Gourav,

Correct me if I'm wrong, but I think this is a case of the job shop scheduling problem, as we may have 'n' jobs of varying processing times and memory requirements, and we have 'm' microservices with possibly different computing and memory capacities, and we are trying to minimize the makespan<https://en.wikipedia.org/wiki/Makespan>.

For this use-case, I'm in favor a highly available and consistent message broker with an intelligent job scheduling algorithm, which receives real-time updates from the microservices with their current state information.

As for the state vs stateless implementation, I think that question depends on the functionality of a particular microservice. In a broad sense, the stateless implementation should be preferred as it will scale better horizontally.


Regards,
Vidya Sagar


--
Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics and Computing | Indiana University Bloomington | (812) 691-5002<tel:8126915002> | vkalvaku@iu.edu<ma...@iu.edu>



--
Thanks and regards,

Ajinkya Dhamnaskar
Student ID : 0003469679
Masters (CS)
+1 (812) 369- 5416<tel:(812)%20369-5416>



--
Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics and Computing | Indiana University Bloomington | (812) 691-5002<tel:8126915002> | vkalvaku@iu.edu<ma...@iu.edu>


Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Posted by "Christie, Marcus Aaron" <ma...@iu.edu>.
Vidya,

I’m not sure how relevant it is, but it occurs to me that a microservice that executes jobs on a cloud requires very little in terms of resources to submit and monitor that job on the cloud. It doesn’t really matter if the job is a “big” or a “small” job.  So I’m not sure what heuristic makes sense regarding distributing work to these job execution microservices.  Maybe a simple round robin approach would be sufficient.

I think a job scheduling algorithm does make sense, however, for a higher level component, some sort of metascheduler that understands what resources are available on the cloud resources on which the jobs will be running.  The metascheduler could create work for the job exection microservices to run on particular cloud resources in a way that optimizes for some metric (e.g., throughput).

Thanks,

Marcus

On Feb 3, 2017, at 3:19 AM, Vidya Sagar Kalvakunta <vk...@umail.iu.edu>> wrote:

Ajinkya,

My scenario is for workload distribution among multiple instances of the same microservice.

If a message broker needs to distribute the available jobs among multiple workers, the common approach would be to use round robin or a similar algorithm. This approach works best when all the workers are similar and the jobs are equal.

So I think that a genetic or heuristic job scheduling algorithm, which is also aware of each of the worker's current state (CPU, RAM, No of Jobs processing) can more efficiently distribute the jobs. The workers can periodically ping the message broker with their current state info.

The other advantage of using a customized algorithm is that it can be tweaked to use embedded routing, priority or other information in the job metadata to resolve all of the concerns raised by Amrutha viz message grouping, ordering, repeated messages, etc.

We can even ensure data privacy, i.e if the workers are spread across multiple compute clusters say AWS and IU Big Red and we want to restrict certain sensitive jobs to be run only on Big Red.

Some distributed job scheduling algorithms for cloud computing.

  *   http://www.ijimai.org/journal/sites/default/files/files/2013/03/ijimai20132_18_pdf_62825.pdf
  *   https://arxiv.org/pdf/1404.5528.pdf


Regards
Vidya Sagar

On Fri, Feb 3, 2017 at 1:38 AM, Kamat, Amruta Ravalnath <ar...@indiana.edu>> wrote:

Hello all,


Adding more information to the message based approach. Messaging is a key strategy employed in many distributed environments. Message queuing is ideally suited to performing asynchronous operations. A sender can post a message to a queue, but it does not have to wait while the message is retrieved and processed. A sender and receiver do not even have to be running concurrently.


With message queuing there can be 2 possible scenarios:

  1.  ​Sending and receiving messages using a single message queue.
  2.  ​Sharing a message queue between many senders and receivers

​When a message is retrieved, it is removed from the queue. A message queue may also support message peeking. This mechanism can be useful if several receivers are retrieving messages from the same queue, but each receiver only wishes to handle specific messages. The receiver can examine the message it has peeked, and decide whether to retrieve the message (which removes it from the queue) or leave it on the queue for another receiver to handle.


A few basic message queuing patterns are:

  1.  One-way messaging: The sender simply posts a message to the queue in the expectation that a receiver will retrieve it and process it at some point.
  2.  Request/response messaging: In this pattern a sender posts a message to a queue and expects a response from the receiver. The sender can resend if the message is not delivered. This pattern typically requires some form of correlation to enable the sender to determine which response message corresponds to which request sent to the receiver.
  3.  Broadcast messaging: In this pattern a sender posts a message to a queue, and multiple receivers can read a copy of the message. This pattern depends on the message queue being able to disseminate the same message to multiple receivers. There is a queue to which the senders can post messages that include metadata in the form of attributes. Each receiver can create a subscription to the queue, specifying a filter that examines the values of message attributes. Any messages posted to the queue with attribute values that match the filter are automatically forwarded to that subscription.

A solution based on asynchronous messaging might need to address a number of concerns:


Message ordering, Message grouping: Process messages either in the order they are posted or in a specific order based on priority. Also, there may be occasions when it is difficult to eliminate dependencies, and it may be necessary to group messages together so that they are all handled by the same receiver.
Idempotency: Ideally the message processing logic in a receiver should be idempotent so that, if the work performed is repeated, this repetition does not change the state of the system.
Repeated messages: Some message queuing systems implement duplicate message detection and removal based on message IDs
Poison messages: A poison message is a message that cannot be handled, often because it is malformed or contains unexpected information.
Message expiration: A message might have a limited lifetime, and if it is not processed within this period it might no longer be relevant and should be discarded.
Message scheduling: A message might be temporarily embargoed and should not be processed until a specific date and time. The message should not be available to a receiver until this time.


Thanks

Amruta Kamat




________________________________
From: Shenoy, Gourav Ganesh <go...@indiana.edu>>
Sent: Thursday, February 2, 2017 7:57 PM
To: dev@airavata.apache.org<ma...@airavata.apache.org>

Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hello all,

Amila, Sagar, thank you for the response and raising those concerns; and apologies because my email resonated the topic of workload management in terms of how micro-services communicate. As Ajinkya rightly mentioned, there exists some sort of correlation between micro-services communication and it’s impact on how that micro-service performs the work under those circumstances. The goal is to make sure we have maximum independence between micro-services, and investigate the workflow pattern in which these micro-services will operate such that we can find the right balance between availability & consistency. Again, from our preliminary analysis we can assert that these solutions may not be generic and the specific use-case will have a big decisive role.

For starters, we are focusing on the following example – and I think this will clarify the doubts on what we are exactly trying to investigate about.

Our test example
Say we have the following 4 micro-services, which each perform a specific task as mentioned in the box.

<image001.png>


A state-full pattern to distribute work
<image002.png>

Here each communication between micro-services could be via RPC or Messaging (eg: RabbitMQ). Obvious disadvantage is that if any micro-service is down, then the system availability is at stake. In this test example, we can see that Microservice-A coordinates the work and maintains the state information.

A state-less pattern to distribute work

<image003.png>

Another purely asynchronous approach would be to associate message-queues with each micro-service, where each micro-service performs it’s task, submits a request (message on bus) to the next micro-service, and continues to process more requests. This ensures more availability, and perhaps we might need to handle corner cases for failures such as message broker down, or message loss, etc.

As mentioned, these are just a few proposals that we are planning to investigate via a prototype project. Inject corner cases/failures and try and find ways to handle these cases. I would love to hear more thoughts/questions/suggestions.

Thanks and Regards,
Gourav Shenoy

From: Ajinkya Dhamnaskar <ad...@umail.iu.edu>>
Reply-To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Date: Thursday, February 2, 2017 at 2:22 AM
To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hello all,

Just a heads up. Here the name Distributed workload management does not necessarily mean having different instances of a microservice and then distributing work among these instances.

Apparently, the problem is how to make each microservice work independently with concrete distributed communication infrastructure. So, think of it as a workflow where each microservice does its part of work and communicates (how? yet to be decided) output. The next underlying microservice identifies and picks up that output and takes it further towards the final outcome, having said that, the crux here is, none of the miscoservices need to worry about other miscoservices in a pipeline.

Vidya Sagar,
I completely second your opinion of having stateless miscoservices, in fact that is the key. With stateless miscroservices it is difficult to guarantee consistency in a system but it solves the availability problem to some extent. I would be interested to understand what do you mean by "an intelligent job scheduling algorithm, which receives real-time updates from the microservices with their current state information".

On Wed, Feb 1, 2017 at 11:48 PM, Vidya Sagar Kalvakunta <vk...@umail.iu.edu>> wrote:

On Wed, Feb 1, 2017 at 2:37 PM, Amila Jayasekara <th...@gmail.com>> wrote:
Hi Gourav,

Sorry, I did not understand your question. Specifically I am having trouble relating "work load management" to options you suggest (RPC, message based etc.).
So what exactly you mean by "workload management" ?
What is work in this context ?

Also, I did not understand what you meant by "the most efficient way". Efficient interms of what ? Are you looking at speed ?

As per your suggestions, it seems you are trying to find a way to communicate between micro services. RPC might be troublesome if you need to communicate with processes separated from a firewall.

Thanks
-Thejaka


On Wed, Feb 1, 2017 at 12:52 PM, Shenoy, Gourav Ganesh <go...@indiana.edu>> wrote:
Hello dev, arch,

As part of this Spring’17 Advanced Science Gateway Architecture course, we are working on trying to debate and find possible solutions to the issue of managing distributed workloads in Apache Airavata. This leads to the discussion of finding the most efficient way that different Airavata micro-services should communicate and distribute work, in such a way that:

1.       We maintain the ability to scale these micro-services whenever needed (autoscale perhaps?).

2.       Achieve fault tolerance.

3.       We can deploy these micro-services independently, or better in a containerized manner – keeping in mind the ability to use devops for deployment.


As of now the options we are exploring are:

1.       RPC based communication

2.       Message based – either master-worker, or work-queue, etc

3.       A combination of both these approaches


I am more inclined towards exploring the message based approach, but again there arises the possibility of handling limitations/corner cases of message broker such as downtimes (may be more). In my opinion, having asynchronous communication will help us achieve most of the above-mentioned points. Another debatable issue is making the micro-services implementation stateless, such that we do not have to pass the state information between micro-services.

I would love to hear any thoughts/suggestions/comments on this topic and open up a discussion via this mail thread. If there is anything that I have missed which is relevant to this issue, please let me know.

Thanks and Regards,
Gourav Shenoy


Hi Gourav,

Correct me if I'm wrong, but I think this is a case of the job shop scheduling problem, as we may have 'n' jobs of varying processing times and memory requirements, and we have 'm' microservices with possibly different computing and memory capacities, and we are trying to minimize the makespan<https://en.wikipedia.org/wiki/Makespan>.

For this use-case, I'm in favor a highly available and consistent message broker with an intelligent job scheduling algorithm, which receives real-time updates from the microservices with their current state information.

As for the state vs stateless implementation, I think that question depends on the functionality of a particular microservice. In a broad sense, the stateless implementation should be preferred as it will scale better horizontally.


Regards,
Vidya Sagar


--
Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics and Computing | Indiana University Bloomington | (812) 691-5002<tel:8126915002> | vkalvaku@iu.edu<ma...@iu.edu>



--
Thanks and regards,

Ajinkya Dhamnaskar
Student ID : 0003469679
Masters (CS)
+1 (812) 369- 5416<tel:(812)%20369-5416>



--
Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics and Computing | Indiana University Bloomington | (812) 691-5002<tel:8126915002> | vkalvaku@iu.edu<ma...@iu.edu>


Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Posted by Vidya Sagar Kalvakunta <vk...@umail.iu.edu>.
Ajinkya,

My scenario is for workload distribution among multiple instances of the
same microservice.

If a message broker needs to distribute the available jobs among multiple
workers, the common approach would be to use round robin or a similar
algorithm. This approach works best when all the workers are similar and
the jobs are equal.

So I think that a genetic or heuristic job scheduling algorithm, which is
also aware of each of the worker's current state (CPU, RAM, No of Jobs
processing) can more efficiently distribute the jobs. The workers can
periodically ping the message broker with their current state info.

The other advantage of using a customized algorithm is that it can
be tweaked to use embedded routing, priority or other information in the
job metadata to resolve all of the concerns raised by Amrutha viz message
grouping, ordering, repeated messages, etc.

We can even ensure data privacy, i.e if the workers are spread across
multiple compute clusters say AWS and IU Big Red and we want to restrict
certain sensitive jobs to be run only on Big Red.

Some distributed job scheduling algorithms for cloud computing.

   -
   http://www.ijimai.org/journal/sites/default/files/files/2013/03/ijimai20132_18_pdf_62825.pdf
   - https://arxiv.org/pdf/1404.5528.pdf



Regards
Vidya Sagar

On Fri, Feb 3, 2017 at 1:38 AM, Kamat, Amruta Ravalnath <arkamat@indiana.edu
> wrote:

> Hello all,
>
>
> Adding more information to the message based approach. Messaging is a key
> strategy employed in many distributed environments. Message queuing is
> ideally suited to performing asynchronous operations. A sender can post a
> message to a queue, but it does not have to wait while the message is
> retrieved and processed. A sender and receiver do not even have to be
> running concurrently.
>
>
> With message queuing there can be 2 possible scenarios:
>
>    1. ​Sending and receiving messages using a *single message queue.*
>    2. ​*Sharing a message queue* between many senders and receivers
>
> ​When a message is retrieved, it is removed from the queue. A message
> queue may also support message peeking. This mechanism can be useful if
> several receivers are retrieving messages from the same queue, but each
> receiver only wishes to handle specific messages. The receiver can examine
> the message it has peeked, and decide whether to retrieve the message
> (which removes it from the queue) or leave it on the queue for another
> receiver to handle.
>
>
> A few basic message queuing patterns are:
>
>    1. *One-way messaging*: The sender simply posts a message to the queue
>    in the expectation that a receiver will retrieve it and process it at some
>    point.
>    2. *Request/response messaging*: In this pattern a sender posts a
>    message to a queue and expects a response from the receiver. The sender can
>    resend if the message is not delivered. This pattern typically requires
>    some form of correlation to enable the sender to determine which response
>    message corresponds to which request sent to the receiver.
>    3. *Broadcast messaging*: In this pattern a sender posts a message to
>    a queue, and multiple receivers can read a copy of the message. This
>    pattern depends on the message queue being able to disseminate the same
>    message to multiple receivers. There is a queue to which the senders can
>    post messages that include metadata in the form of attributes. Each
>    receiver can create a subscription to the queue, specifying a filter that
>    examines the values of message attributes. Any messages posted to the
>    queue with attribute values that match the filter are automatically
>    forwarded to that subscription.
>
> A solution based on asynchronous messaging might need to address a number
> of concerns:
>
>
> *Message ordering, Message grouping: * Process messages either in the
> order they are posted or in a specific order based on priority. Also, there
> may be occasions when it is difficult to eliminate dependencies, and it may
> be necessary to group messages together so that they are all handled by the
> same receiver.
> *Idempotency: *Ideally the message processing logic in a receiver should
> be idempotent so that, if the work performed is repeated, this repetition
> does not change the state of the system.
> *Repeated messages: *Some message queuing systems implement duplicate
> message detection and removal based on message IDs
> *Poison messages: *A poison message is a message that cannot be handled,
> often because it is malformed or contains unexpected information.
> *Message expiration: *A message might have a limited lifetime, and if it
> is not processed within this period it might no longer be relevant and
> should be discarded.
> *Message scheduling: *A message might be temporarily embargoed and should
> not be processed until a specific date and time. The message should not be
> available to a receiver until this time.
>
>
> Thanks
>
> Amruta Kamat
>
>
>
> ------------------------------
> *From:* Shenoy, Gourav Ganesh <go...@indiana.edu>
> *Sent:* Thursday, February 2, 2017 7:57 PM
> *To:* dev@airavata.apache.org
>
> *Subject:* Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
> Hello all,
>
>
>
> Amila, Sagar, thank you for the response and raising those concerns; and
> apologies because my email resonated the topic of workload management in
> terms of how micro-services communicate. As Ajinkya rightly mentioned,
> there exists some sort of correlation between micro-services communication
> and it’s impact on how that micro-service performs the work under those
> circumstances. The goal is to make sure we have maximum independence
> between micro-services, and investigate the workflow pattern in which these
> micro-services will operate such that we can find the right balance between
> availability & consistency. Again, from our preliminary analysis we can
> assert that these solutions may not be generic and the specific use-case
> will have a big decisive role.
>
>
>
> For starters, we are focusing on the following example – and I think this
> will clarify the doubts on what we are exactly trying to investigate about.
>
>
>
> *Our test example *
>
> Say we have the following 4 micro-services, which each perform a specific
> task as mentioned in the box.
>
>
>
>
>
>
>
> *A state-full pattern to distribute work*
>
>
>
> Here each communication between micro-services could be via RPC or
> Messaging (eg: RabbitMQ). Obvious disadvantage is that if any micro-service
> is down, then the system availability is at stake. In this test example, we
> can see that Microservice-A coordinates the work and maintains the state
> information.
>
>
>
> *A state-less pattern to distribute work*
>
>
>
>
>
> Another purely asynchronous approach would be to associate message-queues
> with each micro-service, where each micro-service performs it’s task,
> submits a request (message on bus) to the next micro-service, and continues
> to process more requests. This ensures more availability, and perhaps we
> might need to handle corner cases for failures such as message broker down,
> or message loss, etc.
>
>
>
> As mentioned, these are just a few proposals that we are planning to
> investigate via a prototype project. Inject corner cases/failures and try
> and find ways to handle these cases. I would love to hear more
> thoughts/questions/suggestions.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *Ajinkya Dhamnaskar <ad...@umail.iu.edu>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Thursday, February 2, 2017 at 2:22 AM
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
>
> Hello all,
>
>
>
> Just a heads up. Here the name Distributed workload management does not
> necessarily mean having different instances of a microservice and then
> distributing work among these instances.
>
>
>
> Apparently, the problem is how to make each microservice work
> independently with concrete distributed communication infrastructure. So,
> think of it as a workflow where each microservice does its part of work and
> communicates (how? yet to be decided) output. The next underlying
> microservice identifies and picks up that output and takes it further
> towards the final outcome, having said that, the crux here is, none of the
> miscoservices need to worry about other miscoservices in a pipeline.
>
>
>
> Vidya Sagar,
>
> I completely second your opinion of having stateless miscoservices, in
> fact that is the key. With stateless miscroservices it is difficult to
> guarantee consistency in a system but it solves the availability problem to
> some extent. I would be interested to understand what do you mean by "an
> intelligent job scheduling algorithm, which receives real-time updates from
> the microservices with their current state information".
>
>
>
> On Wed, Feb 1, 2017 at 11:48 PM, Vidya Sagar Kalvakunta <
> vkalvaku@umail.iu.edu> wrote:
>
>
>
> On Wed, Feb 1, 2017 at 2:37 PM, Amila Jayasekara <th...@gmail.com>
> wrote:
>
> Hi Gourav,
>
>
>
> Sorry, I did not understand your question. Specifically I am having
> trouble relating "work load management" to options you suggest (RPC,
> message based etc.).
>
> So what exactly you mean by "workload management" ?
>
> What is work in this context ?
>
>
>
> Also, I did not understand what you meant by "the most efficient way".
> Efficient interms of what ? Are you looking at speed ?
>
>
>
> As per your suggestions, it seems you are trying to find a way to
> communicate between micro services. RPC might be troublesome if you need to
> communicate with processes separated from a firewall.
>
>
>
> Thanks
>
> -Thejaka
>
>
>
>
>
> On Wed, Feb 1, 2017 at 12:52 PM, Shenoy, Gourav Ganesh <
> goshenoy@indiana.edu> wrote:
>
> Hello dev, arch,
>
>
>
> As part of this Spring’17 Advanced Science Gateway Architecture course, we
> are working on trying to debate and find possible solutions to the issue of
> managing distributed workloads in Apache Airavata. This leads to the
> discussion of finding the most efficient way that different Airavata
> micro-services should communicate and distribute work, in such a way that:
>
> 1.       We maintain the ability to scale these micro-services whenever
> needed (autoscale perhaps?).
>
> 2.       Achieve fault tolerance.
>
> 3.       We can deploy these micro-services independently, or better in a
> containerized manner – keeping in mind the ability to use devops for
> deployment.
>
>
>
> As of now the options we are exploring are:
>
> 1.       RPC based communication
>
> 2.       Message based – either master-worker, or work-queue, etc
>
> 3.       A combination of both these approaches
>
>
>
> I am more inclined towards exploring the message based approach, but again
> there arises the possibility of handling limitations/corner cases of
> message broker such as downtimes (may be more). In my opinion, having
> asynchronous communication will help us achieve most of the above-mentioned
> points. Another debatable issue is making the micro-services implementation
> stateless, such that we do not have to pass the state information between
> micro-services.
>
>
>
> I would love to hear any thoughts/suggestions/comments on this topic and
> open up a discussion via this mail thread. If there is anything that I have
> missed which is relevant to this issue, please let me know.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
>
>
> Hi Gourav,
>
>
>
> Correct me if I'm wrong, but I think this is a case of the job shop
> scheduling problem, as we may have 'n' jobs of varying processing times
> and memory requirements, and we have 'm' microservices with possibly
> different computing and memory capacities, and we are trying to minimize
> the makespan <https://en.wikipedia.org/wiki/Makespan>.
>
>
>
> For this use-case, I'm in favor a highly available and consistent message
> broker with an intelligent job scheduling algorithm, which receives
> real-time updates from the microservices with their current state
> information.
>
>
>
> As for the state vs stateless implementation, I think that question
> depends on the functionality of a particular microservice. In a broad
> sense, the stateless implementation should be preferred as it will scale
> better horizontally.
>
>
>
>
>
> Regards,
>
> Vidya Sagar
>
>
>
>
> --
>
> Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics
> and Computing | Indiana University Bloomington | (812) 691-5002
> <8126915002> | vkalvaku@iu.edu
>
>
>
>
>
> --
>
> Thanks and regards,
>
>
>
> Ajinkya Dhamnaskar
>
> Student ID : 0003469679
>
> Masters (CS)
>
> +1 (812) 369- 5416 <(812)%20369-5416>
>



-- 
Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics
and Computing | Indiana University Bloomington | (812) 691-5002 <8126915002>
 | vkalvaku@iu.edu

Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Posted by "Kamat, Amruta Ravalnath" <ar...@indiana.edu>.
Hello all,


Adding more information to the message based approach. Messaging is a key strategy employed in many distributed environments. Message queuing is ideally suited to performing asynchronous operations. A sender can post a message to a queue, but it does not have to wait while the message is retrieved and processed. A sender and receiver do not even have to be running concurrently.


With message queuing there can be 2 possible scenarios:

  1.  ​Sending and receiving messages using a single message queue.
  2.  ​Sharing a message queue between many senders and receivers

​When a message is retrieved, it is removed from the queue. A message queue may also support message peeking. This mechanism can be useful if several receivers are retrieving messages from the same queue, but each receiver only wishes to handle specific messages. The receiver can examine the message it has peeked, and decide whether to retrieve the message (which removes it from the queue) or leave it on the queue for another receiver to handle.


A few basic message queuing patterns are:

  1.  One-way messaging: The sender simply posts a message to the queue in the expectation that a receiver will retrieve it and process it at some point.
  2.  Request/response messaging: In this pattern a sender posts a message to a queue and expects a response from the receiver. The sender can resend if the message is not delivered. This pattern typically requires some form of correlation to enable the sender to determine which response message corresponds to which request sent to the receiver.
  3.  Broadcast messaging: In this pattern a sender posts a message to a queue, and multiple receivers can read a copy of the message. This pattern depends on the message queue being able to disseminate the same message to multiple receivers. There is a queue to which the senders can post messages that include metadata in the form of attributes. Each receiver can create a subscription to the queue, specifying a filter that examines the values of message attributes. Any messages posted to the queue with attribute values that match the filter are automatically forwarded to that subscription.

A solution based on asynchronous messaging might need to address a number of concerns:


Message ordering, Message grouping: Process messages either in the order they are posted or in a specific order based on priority. Also, there may be occasions when it is difficult to eliminate dependencies, and it may be necessary to group messages together so that they are all handled by the same receiver.
Idempotency: Ideally the message processing logic in a receiver should be idempotent so that, if the work performed is repeated, this repetition does not change the state of the system.
Repeated messages: Some message queuing systems implement duplicate message detection and removal based on message IDs
Poison messages: A poison message is a message that cannot be handled, often because it is malformed or contains unexpected information.
Message expiration: A message might have a limited lifetime, and if it is not processed within this period it might no longer be relevant and should be discarded.
Message scheduling: A message might be temporarily embargoed and should not be processed until a specific date and time. The message should not be available to a receiver until this time.


Thanks

Amruta Kamat




________________________________
From: Shenoy, Gourav Ganesh <go...@indiana.edu>
Sent: Thursday, February 2, 2017 7:57 PM
To: dev@airavata.apache.org
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hello all,

Amila, Sagar, thank you for the response and raising those concerns; and apologies because my email resonated the topic of workload management in terms of how micro-services communicate. As Ajinkya rightly mentioned, there exists some sort of correlation between micro-services communication and it’s impact on how that micro-service performs the work under those circumstances. The goal is to make sure we have maximum independence between micro-services, and investigate the workflow pattern in which these micro-services will operate such that we can find the right balance between availability & consistency. Again, from our preliminary analysis we can assert that these solutions may not be generic and the specific use-case will have a big decisive role.

For starters, we are focusing on the following example – and I think this will clarify the doubts on what we are exactly trying to investigate about.

Our test example
Say we have the following 4 micro-services, which each perform a specific task as mentioned in the box.

[cid:image001.png@01D27D8E.8B69A3F0]


A state-full pattern to distribute work
[cid:image002.png@01D27D8E.8B69A3F0]

Here each communication between micro-services could be via RPC or Messaging (eg: RabbitMQ). Obvious disadvantage is that if any micro-service is down, then the system availability is at stake. In this test example, we can see that Microservice-A coordinates the work and maintains the state information.

A state-less pattern to distribute work

[cid:image003.png@01D27D8E.8B69A3F0]

Another purely asynchronous approach would be to associate message-queues with each micro-service, where each micro-service performs it’s task, submits a request (message on bus) to the next micro-service, and continues to process more requests. This ensures more availability, and perhaps we might need to handle corner cases for failures such as message broker down, or message loss, etc.

As mentioned, these are just a few proposals that we are planning to investigate via a prototype project. Inject corner cases/failures and try and find ways to handle these cases. I would love to hear more thoughts/questions/suggestions.

Thanks and Regards,
Gourav Shenoy

From: Ajinkya Dhamnaskar <ad...@umail.iu.edu>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Thursday, February 2, 2017 at 2:22 AM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hello all,

Just a heads up. Here the name Distributed workload management does not necessarily mean having different instances of a microservice and then distributing work among these instances.

Apparently, the problem is how to make each microservice work independently with concrete distributed communication infrastructure. So, think of it as a workflow where each microservice does its part of work and communicates (how? yet to be decided) output. The next underlying microservice identifies and picks up that output and takes it further towards the final outcome, having said that, the crux here is, none of the miscoservices need to worry about other miscoservices in a pipeline.

Vidya Sagar,
I completely second your opinion of having stateless miscoservices, in fact that is the key. With stateless miscroservices it is difficult to guarantee consistency in a system but it solves the availability problem to some extent. I would be interested to understand what do you mean by "an intelligent job scheduling algorithm, which receives real-time updates from the microservices with their current state information".

On Wed, Feb 1, 2017 at 11:48 PM, Vidya Sagar Kalvakunta <vk...@umail.iu.edu>> wrote:

On Wed, Feb 1, 2017 at 2:37 PM, Amila Jayasekara <th...@gmail.com>> wrote:
Hi Gourav,

Sorry, I did not understand your question. Specifically I am having trouble relating "work load management" to options you suggest (RPC, message based etc.).
So what exactly you mean by "workload management" ?
What is work in this context ?

Also, I did not understand what you meant by "the most efficient way". Efficient interms of what ? Are you looking at speed ?

As per your suggestions, it seems you are trying to find a way to communicate between micro services. RPC might be troublesome if you need to communicate with processes separated from a firewall.

Thanks
-Thejaka


On Wed, Feb 1, 2017 at 12:52 PM, Shenoy, Gourav Ganesh <go...@indiana.edu>> wrote:
Hello dev, arch,

As part of this Spring’17 Advanced Science Gateway Architecture course, we are working on trying to debate and find possible solutions to the issue of managing distributed workloads in Apache Airavata. This leads to the discussion of finding the most efficient way that different Airavata micro-services should communicate and distribute work, in such a way that:

1.       We maintain the ability to scale these micro-services whenever needed (autoscale perhaps?).

2.       Achieve fault tolerance.

3.       We can deploy these micro-services independently, or better in a containerized manner – keeping in mind the ability to use devops for deployment.

As of now the options we are exploring are:

1.       RPC based communication

2.       Message based – either master-worker, or work-queue, etc

3.       A combination of both these approaches

I am more inclined towards exploring the message based approach, but again there arises the possibility of handling limitations/corner cases of message broker such as downtimes (may be more). In my opinion, having asynchronous communication will help us achieve most of the above-mentioned points. Another debatable issue is making the micro-services implementation stateless, such that we do not have to pass the state information between micro-services.

I would love to hear any thoughts/suggestions/comments on this topic and open up a discussion via this mail thread. If there is anything that I have missed which is relevant to this issue, please let me know.

Thanks and Regards,
Gourav Shenoy


Hi Gourav,

Correct me if I'm wrong, but I think this is a case of the job shop scheduling problem, as we may have 'n' jobs of varying processing times and memory requirements, and we have 'm' microservices with possibly different computing and memory capacities, and we are trying to minimize the makespan<https://en.wikipedia.org/wiki/Makespan>.

For this use-case, I'm in favor a highly available and consistent message broker with an intelligent job scheduling algorithm, which receives real-time updates from the microservices with their current state information.

As for the state vs stateless implementation, I think that question depends on the functionality of a particular microservice. In a broad sense, the stateless implementation should be preferred as it will scale better horizontally.


Regards,
Vidya Sagar


--
Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics and Computing | Indiana University Bloomington | (812) 691-5002<tel:8126915002> | vkalvaku@iu.edu<ma...@iu.edu>



--
Thanks and regards,

Ajinkya Dhamnaskar
Student ID : 0003469679
Masters (CS)
+1 (812) 369- 5416

Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Posted by "Shenoy, Gourav Ganesh" <go...@indiana.edu>.
Hello all,

Amila, Sagar, thank you for the response and raising those concerns; and apologies because my email resonated the topic of workload management in terms of how micro-services communicate. As Ajinkya rightly mentioned, there exists some sort of correlation between micro-services communication and it’s impact on how that micro-service performs the work under those circumstances. The goal is to make sure we have maximum independence between micro-services, and investigate the workflow pattern in which these micro-services will operate such that we can find the right balance between availability & consistency. Again, from our preliminary analysis we can assert that these solutions may not be generic and the specific use-case will have a big decisive role.

For starters, we are focusing on the following example – and I think this will clarify the doubts on what we are exactly trying to investigate about.

Our test example
Say we have the following 4 micro-services, which each perform a specific task as mentioned in the box.

[cid:image001.png@01D27D8E.8B69A3F0]


A state-full pattern to distribute work
[cid:image002.png@01D27D8E.8B69A3F0]

Here each communication between micro-services could be via RPC or Messaging (eg: RabbitMQ). Obvious disadvantage is that if any micro-service is down, then the system availability is at stake. In this test example, we can see that Microservice-A coordinates the work and maintains the state information.

A state-less pattern to distribute work

[cid:image003.png@01D27D8E.8B69A3F0]

Another purely asynchronous approach would be to associate message-queues with each micro-service, where each micro-service performs it’s task, submits a request (message on bus) to the next micro-service, and continues to process more requests. This ensures more availability, and perhaps we might need to handle corner cases for failures such as message broker down, or message loss, etc.

As mentioned, these are just a few proposals that we are planning to investigate via a prototype project. Inject corner cases/failures and try and find ways to handle these cases. I would love to hear more thoughts/questions/suggestions.

Thanks and Regards,
Gourav Shenoy

From: Ajinkya Dhamnaskar <ad...@umail.iu.edu>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Thursday, February 2, 2017 at 2:22 AM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Hello all,

Just a heads up. Here the name Distributed workload management does not necessarily mean having different instances of a microservice and then distributing work among these instances.

Apparently, the problem is how to make each microservice work independently with concrete distributed communication infrastructure. So, think of it as a workflow where each microservice does its part of work and communicates (how? yet to be decided) output. The next underlying microservice identifies and picks up that output and takes it further towards the final outcome, having said that, the crux here is, none of the miscoservices need to worry about other miscoservices in a pipeline.

Vidya Sagar,
I completely second your opinion of having stateless miscoservices, in fact that is the key. With stateless miscroservices it is difficult to guarantee consistency in a system but it solves the availability problem to some extent. I would be interested to understand what do you mean by "an intelligent job scheduling algorithm, which receives real-time updates from the microservices with their current state information".

On Wed, Feb 1, 2017 at 11:48 PM, Vidya Sagar Kalvakunta <vk...@umail.iu.edu>> wrote:

On Wed, Feb 1, 2017 at 2:37 PM, Amila Jayasekara <th...@gmail.com>> wrote:
Hi Gourav,

Sorry, I did not understand your question. Specifically I am having trouble relating "work load management" to options you suggest (RPC, message based etc.).
So what exactly you mean by "workload management" ?
What is work in this context ?

Also, I did not understand what you meant by "the most efficient way". Efficient interms of what ? Are you looking at speed ?

As per your suggestions, it seems you are trying to find a way to communicate between micro services. RPC might be troublesome if you need to communicate with processes separated from a firewall.

Thanks
-Thejaka


On Wed, Feb 1, 2017 at 12:52 PM, Shenoy, Gourav Ganesh <go...@indiana.edu>> wrote:
Hello dev, arch,

As part of this Spring’17 Advanced Science Gateway Architecture course, we are working on trying to debate and find possible solutions to the issue of managing distributed workloads in Apache Airavata. This leads to the discussion of finding the most efficient way that different Airavata micro-services should communicate and distribute work, in such a way that:

1.       We maintain the ability to scale these micro-services whenever needed (autoscale perhaps?).

2.       Achieve fault tolerance.

3.       We can deploy these micro-services independently, or better in a containerized manner – keeping in mind the ability to use devops for deployment.

As of now the options we are exploring are:

1.       RPC based communication

2.       Message based – either master-worker, or work-queue, etc

3.       A combination of both these approaches

I am more inclined towards exploring the message based approach, but again there arises the possibility of handling limitations/corner cases of message broker such as downtimes (may be more). In my opinion, having asynchronous communication will help us achieve most of the above-mentioned points. Another debatable issue is making the micro-services implementation stateless, such that we do not have to pass the state information between micro-services.

I would love to hear any thoughts/suggestions/comments on this topic and open up a discussion via this mail thread. If there is anything that I have missed which is relevant to this issue, please let me know.

Thanks and Regards,
Gourav Shenoy


Hi Gourav,

Correct me if I'm wrong, but I think this is a case of the job shop scheduling problem, as we may have 'n' jobs of varying processing times and memory requirements, and we have 'm' microservices with possibly different computing and memory capacities, and we are trying to minimize the makespan<https://en.wikipedia.org/wiki/Makespan>.

For this use-case, I'm in favor a highly available and consistent message broker with an intelligent job scheduling algorithm, which receives real-time updates from the microservices with their current state information.

As for the state vs stateless implementation, I think that question depends on the functionality of a particular microservice. In a broad sense, the stateless implementation should be preferred as it will scale better horizontally.


Regards,
Vidya Sagar


--
Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics and Computing | Indiana University Bloomington | (812) 691-5002<tel:8126915002> | vkalvaku@iu.edu<ma...@iu.edu>



--
Thanks and regards,

Ajinkya Dhamnaskar
Student ID : 0003469679
Masters (CS)
+1 (812) 369- 5416

Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Posted by Ajinkya Dhamnaskar <ad...@umail.iu.edu>.
Hello all,

Just a heads up. Here the name Distributed workload management does not
necessarily mean having different instances of a microservice and then
distributing work among these instances.

Apparently, the problem is how to make each microservice work independently
with concrete distributed communication infrastructure. So, think of it as
a workflow where each microservice does its part of work and communicates
(how? yet to be decided) output. The next underlying microservice
identifies and picks up that output and takes it further towards the final
outcome, having said that, the crux here is, none of the miscoservices need
to worry about other miscoservices in a pipeline.

Vidya Sagar,
I completely second your opinion of having stateless miscoservices, in fact
that is the key. With stateless miscroservices it is difficult to guarantee
consistency in a system but it solves the availability problem to some
extent. I would be interested to understand what do you mean by "an
intelligent job scheduling algorithm, which receives real-time updates from
the microservices with their current state information".

On Wed, Feb 1, 2017 at 11:48 PM, Vidya Sagar Kalvakunta <
vkalvaku@umail.iu.edu> wrote:

>
> On Wed, Feb 1, 2017 at 2:37 PM, Amila Jayasekara <th...@gmail.com>
> wrote:
>
>> Hi Gourav,
>>
>> Sorry, I did not understand your question. Specifically I am having
>> trouble relating "work load management" to options you suggest (RPC,
>> message based etc.).
>> So what exactly you mean by "workload management" ?
>> What is work in this context ?
>>
>> Also, I did not understand what you meant by "the most efficient way".
>> Efficient interms of what ? Are you looking at speed ?
>>
>> As per your suggestions, it seems you are trying to find a way to
>> communicate between micro services. RPC might be troublesome if you need to
>> communicate with processes separated from a firewall.
>>
>> Thanks
>> -Thejaka
>>
>>
>> On Wed, Feb 1, 2017 at 12:52 PM, Shenoy, Gourav Ganesh <
>> goshenoy@indiana.edu> wrote:
>>
>>> Hello dev, arch,
>>>
>>>
>>>
>>> As part of this Spring’17 Advanced Science Gateway Architecture course,
>>> we are working on trying to debate and find possible solutions to the issue
>>> of managing distributed workloads in Apache Airavata. This leads to the
>>> discussion of finding the most efficient way that different Airavata
>>> micro-services should communicate and distribute work, in such a way that:
>>>
>>> 1.       We maintain the ability to scale these micro-services whenever
>>> needed (autoscale perhaps?).
>>>
>>> 2.       Achieve fault tolerance.
>>>
>>> 3.       We can deploy these micro-services independently, or better in
>>> a containerized manner – keeping in mind the ability to use devops for
>>> deployment.
>>>
>>>
>>>
>>> As of now the options we are exploring are:
>>>
>>> 1.       RPC based communication
>>>
>>> 2.       Message based – either master-worker, or work-queue, etc
>>>
>>> 3.       A combination of both these approaches
>>>
>>>
>>>
>>> I am more inclined towards exploring the message based approach, but
>>> again there arises the possibility of handling limitations/corner cases of
>>> message broker such as downtimes (may be more). In my opinion, having
>>> asynchronous communication will help us achieve most of the above-mentioned
>>> points. Another debatable issue is making the micro-services implementation
>>> stateless, such that we do not have to pass the state information between
>>> micro-services.
>>>
>>>
>>>
>>> I would love to hear any thoughts/suggestions/comments on this topic and
>>> open up a discussion via this mail thread. If there is anything that I have
>>> missed which is relevant to this issue, please let me know.
>>>
>>>
>>>
>>> Thanks and Regards,
>>>
>>> Gourav Shenoy
>>>
>>
>>
> Hi Gourav,
>
> Correct me if I'm wrong, but I think this is a case of the job shop
> scheduling problem, as we may have 'n' jobs of varying processing times
> and memory requirements, and we have 'm' microservices with possibly
> different computing and memory capacities, and we are trying to minimize
> the makespan <https://en.wikipedia.org/wiki/Makespan>.
>
> For this use-case, I'm in favor a highly available and consistent message
> broker with an intelligent job scheduling algorithm, which receives
> real-time updates from the microservices with their current state
> information.
>
> As for the state vs stateless implementation, I think that question
> depends on the functionality of a particular microservice. In a broad
> sense, the stateless implementation should be preferred as it will scale
> better horizontally.
>
>
> Regards,
> Vidya Sagar
>
>
> --
> Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of
> Informatics and Computing | Indiana University Bloomington | (812)
> 691-5002 <8126915002> | vkalvaku@iu.edu
>



-- 
Thanks and regards,

Ajinkya Dhamnaskar
Student ID : 0003469679
Masters (CS)
+1 (812) 369- 5416

Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Posted by Vidya Sagar Kalvakunta <vk...@umail.iu.edu>.
On Wed, Feb 1, 2017 at 2:37 PM, Amila Jayasekara <th...@gmail.com>
wrote:

> Hi Gourav,
>
> Sorry, I did not understand your question. Specifically I am having
> trouble relating "work load management" to options you suggest (RPC,
> message based etc.).
> So what exactly you mean by "workload management" ?
> What is work in this context ?
>
> Also, I did not understand what you meant by "the most efficient way".
> Efficient interms of what ? Are you looking at speed ?
>
> As per your suggestions, it seems you are trying to find a way to
> communicate between micro services. RPC might be troublesome if you need to
> communicate with processes separated from a firewall.
>
> Thanks
> -Thejaka
>
>
> On Wed, Feb 1, 2017 at 12:52 PM, Shenoy, Gourav Ganesh <
> goshenoy@indiana.edu> wrote:
>
>> Hello dev, arch,
>>
>>
>>
>> As part of this Spring’17 Advanced Science Gateway Architecture course,
>> we are working on trying to debate and find possible solutions to the issue
>> of managing distributed workloads in Apache Airavata. This leads to the
>> discussion of finding the most efficient way that different Airavata
>> micro-services should communicate and distribute work, in such a way that:
>>
>> 1.       We maintain the ability to scale these micro-services whenever
>> needed (autoscale perhaps?).
>>
>> 2.       Achieve fault tolerance.
>>
>> 3.       We can deploy these micro-services independently, or better in
>> a containerized manner – keeping in mind the ability to use devops for
>> deployment.
>>
>>
>>
>> As of now the options we are exploring are:
>>
>> 1.       RPC based communication
>>
>> 2.       Message based – either master-worker, or work-queue, etc
>>
>> 3.       A combination of both these approaches
>>
>>
>>
>> I am more inclined towards exploring the message based approach, but
>> again there arises the possibility of handling limitations/corner cases of
>> message broker such as downtimes (may be more). In my opinion, having
>> asynchronous communication will help us achieve most of the above-mentioned
>> points. Another debatable issue is making the micro-services implementation
>> stateless, such that we do not have to pass the state information between
>> micro-services.
>>
>>
>>
>> I would love to hear any thoughts/suggestions/comments on this topic and
>> open up a discussion via this mail thread. If there is anything that I have
>> missed which is relevant to this issue, please let me know.
>>
>>
>>
>> Thanks and Regards,
>>
>> Gourav Shenoy
>>
>
>
Hi Gourav,

Correct me if I'm wrong, but I think this is a case of the job shop
scheduling problem, as we may have 'n' jobs of varying processing times and
memory requirements, and we have 'm' microservices with possibly different
computing and memory capacities, and we are trying to minimize the makespan
<https://en.wikipedia.org/wiki/Makespan>.

For this use-case, I'm in favor a highly available and consistent message
broker with an intelligent job scheduling algorithm, which receives
real-time updates from the microservices with their current state
information.

As for the state vs stateless implementation, I think that question depends
on the functionality of a particular microservice. In a broad sense, the
stateless implementation should be preferred as it will scale better
horizontally.


Regards,
Vidya Sagar


-- 
Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics
and Computing | Indiana University Bloomington | (812) 691-5002 <8126915002>
 | vkalvaku@iu.edu

Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Posted by Amila Jayasekara <th...@gmail.com>.
Hi Gourav,

Sorry, I did not understand your question. Specifically I am having trouble
relating "work load management" to options you suggest (RPC, message based
etc.).
So what exactly you mean by "workload management" ?
What is work in this context ?

Also, I did not understand what you meant by "the most efficient way".
Efficient interms of what ? Are you looking at speed ?

As per your suggestions, it seems you are trying to find a way to
communicate between micro services. RPC might be troublesome if you need to
communicate with processes separated from a firewall.

Thanks
-Thejaka


On Wed, Feb 1, 2017 at 12:52 PM, Shenoy, Gourav Ganesh <goshenoy@indiana.edu
> wrote:

> Hello dev, arch,
>
>
>
> As part of this Spring’17 Advanced Science Gateway Architecture course, we
> are working on trying to debate and find possible solutions to the issue of
> managing distributed workloads in Apache Airavata. This leads to the
> discussion of finding the most efficient way that different Airavata
> micro-services should communicate and distribute work, in such a way that:
>
> 1.       We maintain the ability to scale these micro-services whenever
> needed (autoscale perhaps?).
>
> 2.       Achieve fault tolerance.
>
> 3.       We can deploy these micro-services independently, or better in a
> containerized manner – keeping in mind the ability to use devops for
> deployment.
>
>
>
> As of now the options we are exploring are:
>
> 1.       RPC based communication
>
> 2.       Message based – either master-worker, or work-queue, etc
>
> 3.       A combination of both these approaches
>
>
>
> I am more inclined towards exploring the message based approach, but again
> there arises the possibility of handling limitations/corner cases of
> message broker such as downtimes (may be more). In my opinion, having
> asynchronous communication will help us achieve most of the above-mentioned
> points. Another debatable issue is making the micro-services implementation
> stateless, such that we do not have to pass the state information between
> micro-services.
>
>
>
> I would love to hear any thoughts/suggestions/comments on this topic and
> open up a discussion via this mail thread. If there is anything that I have
> missed which is relevant to this issue, please let me know.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>