You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@griffin.apache.org by Preetam Shingavi <ps...@expediagroup.com.INVALID> on 2020/02/19 18:18:48 UTC

Use case for completion measure

Hello everyone,

I am trying to think of a way to create a measure that would help me correlate the following scenario:

Consider a workflow below that has 4 microservices, SystemA, B, C and D. Each system sends transaction to the next as shown with solid arrows and also sends monitoring events / a.k.a MEs (includes ids that help correlate downstream). Discard system C and D for the rest of description below.

I need to find a way to measure completion score for each system and for the whole workflow.

Completion score for SystemA: Is as simple as count of unique combination of ids in ITEM_PUBLISHED MEs.

Completion score for SystemB: I need to correlate 1:EVENT_RECEIVED ->1: EXP_OUTPUT (expectedCount=2) -> 2 unique: ITEM_PUBLISHED (correlated by set of ids). I am trying to see what’s the optimal way to do completion score for this system and then use the result of this score to find the overall workflow score together. Like how can I stream result of one measure to other to deduce other measure (in batch and stream way, both).

[A close up of a map  Description automatically generated]

My approach was to:

  1.  Create measure for System A as profiling metrics: count(unique(id1, id2)). In sink send it to another kafka topic (stream) / new hdfs location (batch)
  2.  Create custom measure for System B to correlate EVENT_RECEIVED to EXP_OUTPUT and find expectedCount value and match that to # of ITEM_PUBLISHED MEs. In sink send it to another kafka topic (stream) / new hdfs location (batch)
  3.  Similar for other systems
  4.  Create measure to create score metrics from the new kafka stream / hdfs batch location to create workflow score.

Any thoughts, inputs are highly appreciable. Thank you, for going through this.

NOTE: I am working with a small team within Expedia Group, trying to solve workflow completion, accuracy etc. DQ problem for a number of workflows. We have built a custom application today but see Apache Griffin as a great value if we make it work for our use cases. There are many features that we’d like to build on top of existing considering all use cases we have today built in our custom application and my team will be happy to contribute to this project fulltime if we are able to build a working prototype and convince our managers for all good reasons 😊  (Cost, scalability, availability, configurable, reprocessing, dashboard & reporting etc).

Thanks,
Preetam

Re: Use case for completion measure

Posted by Preetam Shingavi <ps...@expediagroup.com.INVALID>.
Not sure if the image shows in the original email: Re-attaching the same below and in the email attachments.

[A close up of a map  Description automatically generated]



From: Preetam Shingavi <ps...@expediagroup.com.INVALID>
Reply-To: "dev@griffin.apache.org" <de...@griffin.apache.org>
Date: Wednesday, February 19, 2020 at 10:19 AM
To: "dev@griffin.apache.org" <de...@griffin.apache.org>, "users@griffin.apache.org" <us...@griffin.apache.org>
Subject: Use case for completion measure

Hello everyone,

I am trying to think of a way to create a measure that would help me correlate the following scenario:

Consider a workflow below that has 4 microservices, SystemA, B, C and D. Each system sends transaction to the next as shown with solid arrows and also sends monitoring events / a.k.a MEs (includes ids that help correlate downstream). Discard system C and D for the rest of description below.

I need to find a way to measure completion score for each system and for the whole workflow.

Completion score for SystemA: Is as simple as count of unique combination of ids in ITEM_PUBLISHED MEs.

Completion score for SystemB: I need to correlate 1:EVENT_RECEIVED ->1: EXP_OUTPUT (expectedCount=2) -> 2 unique: ITEM_PUBLISHED (correlated by set of ids). I am trying to see what’s the optimal way to do completion score for this system and then use the result of this score to find the overall workflow score together. Like how can I stream result of one measure to other to deduce other measure (in batch and stream way, both).

[A close up of a map  Description automatically generated]

My approach was to:

  1.  Create measure for System A as profiling metrics: count(unique(id1, id2)). In sink send it to another kafka topic (stream) / new hdfs location (batch)
  2.  Create custom measure for System B to correlate EVENT_RECEIVED to EXP_OUTPUT and find expectedCount value and match that to # of ITEM_PUBLISHED MEs. In sink send it to another kafka topic (stream) / new hdfs location (batch)
  3.  Similar for other systems
  4.  Create measure to create score metrics from the new kafka stream / hdfs batch location to create workflow score.

Any thoughts, inputs are highly appreciable. Thank you, for going through this.

NOTE: I am working with a small team within Expedia Group, trying to solve workflow completion, accuracy etc. DQ problem for a number of workflows. We have built a custom application today but see Apache Griffin as a great value if we make it work for our use cases. There are many features that we’d like to build on top of existing considering all use cases we have today built in our custom application and my team will be happy to contribute to this project fulltime if we are able to build a working prototype and convince our managers for all good reasons 😊  (Cost, scalability, availability, configurable, reprocessing, dashboard & reporting etc).

Thanks,
Preetam

Re: Use case for completion measure

Posted by Preetam Shingavi <ps...@expediagroup.com.INVALID>.
Not sure if the image shows in the original email: Re-attaching the same below and in the email attachments.

[A close up of a map  Description automatically generated]



From: Preetam Shingavi <ps...@expediagroup.com.INVALID>
Reply-To: "dev@griffin.apache.org" <de...@griffin.apache.org>
Date: Wednesday, February 19, 2020 at 10:19 AM
To: "dev@griffin.apache.org" <de...@griffin.apache.org>, "users@griffin.apache.org" <us...@griffin.apache.org>
Subject: Use case for completion measure

Hello everyone,

I am trying to think of a way to create a measure that would help me correlate the following scenario:

Consider a workflow below that has 4 microservices, SystemA, B, C and D. Each system sends transaction to the next as shown with solid arrows and also sends monitoring events / a.k.a MEs (includes ids that help correlate downstream). Discard system C and D for the rest of description below.

I need to find a way to measure completion score for each system and for the whole workflow.

Completion score for SystemA: Is as simple as count of unique combination of ids in ITEM_PUBLISHED MEs.

Completion score for SystemB: I need to correlate 1:EVENT_RECEIVED ->1: EXP_OUTPUT (expectedCount=2) -> 2 unique: ITEM_PUBLISHED (correlated by set of ids). I am trying to see what’s the optimal way to do completion score for this system and then use the result of this score to find the overall workflow score together. Like how can I stream result of one measure to other to deduce other measure (in batch and stream way, both).

[A close up of a map  Description automatically generated]

My approach was to:

  1.  Create measure for System A as profiling metrics: count(unique(id1, id2)). In sink send it to another kafka topic (stream) / new hdfs location (batch)
  2.  Create custom measure for System B to correlate EVENT_RECEIVED to EXP_OUTPUT and find expectedCount value and match that to # of ITEM_PUBLISHED MEs. In sink send it to another kafka topic (stream) / new hdfs location (batch)
  3.  Similar for other systems
  4.  Create measure to create score metrics from the new kafka stream / hdfs batch location to create workflow score.

Any thoughts, inputs are highly appreciable. Thank you, for going through this.

NOTE: I am working with a small team within Expedia Group, trying to solve workflow completion, accuracy etc. DQ problem for a number of workflows. We have built a custom application today but see Apache Griffin as a great value if we make it work for our use cases. There are many features that we’d like to build on top of existing considering all use cases we have today built in our custom application and my team will be happy to contribute to this project fulltime if we are able to build a working prototype and convince our managers for all good reasons 😊  (Cost, scalability, availability, configurable, reprocessing, dashboard & reporting etc).

Thanks,
Preetam