You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@devlake.apache.org by Klesh Wong <kl...@apache.org> on 2022/07/19 07:25:34 UTC

[DICUSS] CI/CD Domain Layer Design & Development

Hi, all

This is a open discussion of CI/CD Domain Layer Table, feel free to
share your thought, thanks

## Where we at

The following status is how we support the CI/CD data analysis :

1. We collect Gitlab Pipeline data into Tool Layer as
`_tool_gitlab_pipeline` 2. We collect Jenkins data into both Tool Layer
and Domain Layer as `_tool_jenkins_jobs` `_tool_jenkins_builds` `jobs`
and `builds`

In another word, we modeled the CI/CD Domain Layer Tables based on
Jenkins, and the Gitlab Pipeline data never get into Domain Layer
because they are quite different in terms of modeling. This is also true
for other kinds of CI/CD systems.


## Why we need it

However, CI/CD plays an important role during the R&D process, metrics
like **Build Success Rate**, etc (@Startrekzky please add more examples)
depend on it.

## What is the problem

CI/CD modeling varies in different platforms, there is no de facto
modeling. It is hard to design a sensible model for CI/CD Domain Layer
Data.

## How should we approach

I propose that we address the problem by the following step:

1. Set up a couple of designated charts depending on CI/CD data that we
would like to have @Startrekzky
2. Assign a Veteran Developer to investigate the modeling of the most
popular platforms, such as Github/Gitlab/Jenkins/ArgoCD, and propose a
CI/CD Domain Layer Table design
3. The design shall be evaluated by the PPMC members in 3 workdays while
all Committers could share their opinions
4. A Data-Oriented PoC (a set of tables with data from different
platforms) should be presented for the Analyst (@Startrekzky @leglar
@hezyin) to evaluate
5. And then we implement

### Use case

1. We may create Standard CI/CD charts based on Unified Domain Layer
Tables
2. Users may exploit the Unified CI/CD Domain Layer Tables to create
customized charts


Regards
Klesh Wong

Re: [DICUSS] CI/CD Domain Layer Design & Development

Posted by Kaiyun Zhang <ka...@merico.dev.INVALID>.
Hi Klesh,
I think there’re 2 major aspects should be taken into consideration.


Firstly, the schema should support following metrics and analysis dimensions:

Metrics:

  1.  No. of Builds (Similar to Deployment Frequency<https://linearb.io/blog/dora-engineering-metrics/#deployment-frequency>- D in DORA, which can be interpreted as the total No. of builds that meet a specific condition)
  2.  Build success rate %
  3.  Build result distribution %
  4.  Build duration in hour
  5.  Mean Lead Time for Changes<https://linearb.io/blog/dora-engineering-metrics/#mean-lead-time-for-changes> (Depending on the definition of a change, it may be a “build” that meets specific conditions.)
  6.  Mean Time to Recovery (MTTR)<https://linearb.io/blog/dora-engineering-metrics/#mean-time-to-recovery-mttr> (Depending on the definition of recovery, it may start when a “bug” is created, and end when it's fixed.)
  7.  Change Failure Rate<https://linearb.io/blog/dora-engineering-metrics/#change-failure-rate> (Depending on the definition of a change and failure, it may be the failing percentage of “build” that meets specific conditions.)

Analysis Dimensions:

  1.  time: e.g. measure the average build duration over the calendar date
  2.  job: e.g. compare the average build duration under different jobs
  3.  repo (code base): e.g. compare the average build duration triggered by different repos


Secondly, the naming of tables should avoid confusion for users of any CI/CD tools, such as Jenkins, GitLabCI, GitHub Actions or CircleCI. For example, `job` means different entity in GitLabCI and Jenkins, thus we should consider not to use it in the domain layer.

Cheers,
Kaiyun Zhang(Louis)




2022年7月20日 上午7:20,Hezheng Yin <yi...@gmail.com>> 写道:

Hi Klesh,

I'm on board with the proposal. Besides the tools already mentioned, maybe
we also want to look into Azure DevOps/CircleCI/Travis CI/AWS CodePipeline.

The ultimate test of the schema is whether it supports the following
analysis/metric calculations that DevLake users want. So I agree some more
research into use cases would help with the design as well.

Best,
Hezheng

On Tue, Jul 19, 2022 at 12:26 AM Klesh Wong <kl...@apache.org>> wrote:

Hi, all

This is a open discussion of CI/CD Domain Layer Table, feel free to
share your thought, thanks

## Where we at

The following status is how we support the CI/CD data analysis :

1. We collect Gitlab Pipeline data into Tool Layer as
`_tool_gitlab_pipeline` 2. We collect Jenkins data into both Tool Layer
and Domain Layer as `_tool_jenkins_jobs` `_tool_jenkins_builds` `jobs`
and `builds`

In another word, we modeled the CI/CD Domain Layer Tables based on
Jenkins, and the Gitlab Pipeline data never get into Domain Layer
because they are quite different in terms of modeling. This is also true
for other kinds of CI/CD systems.


## Why we need it

However, CI/CD plays an important role during the R&D process, metrics
like **Build Success Rate**, etc (@Startrekzky please add more examples)
depend on it.

## What is the problem

CI/CD modeling varies in different platforms, there is no de facto
modeling. It is hard to design a sensible model for CI/CD Domain Layer
Data.

## How should we approach

I propose that we address the problem by the following step:

1. Set up a couple of designated charts depending on CI/CD data that we
would like to have @Startrekzky
2. Assign a Veteran Developer to investigate the modeling of the most
popular platforms, such as Github/Gitlab/Jenkins/ArgoCD, and propose a
CI/CD Domain Layer Table design
3. The design shall be evaluated by the PPMC members in 3 workdays while
all Committers could share their opinions
4. A Data-Oriented PoC (a set of tables with data from different
platforms) should be presented for the Analyst (@Startrekzky @leglar
@hezyin) to evaluate
5. And then we implement

### Use case

1. We may create Standard CI/CD charts based on Unified Domain Layer
Tables
2. Users may exploit the Unified CI/CD Domain Layer Tables to create
customized charts


Regards
Klesh Wong



Re: [DICUSS] CI/CD Domain Layer Design & Development

Posted by Hezheng Yin <yi...@gmail.com>.
Hi Klesh,

I'm on board with the proposal. Besides the tools already mentioned, maybe
we also want to look into Azure DevOps/CircleCI/Travis CI/AWS CodePipeline.

The ultimate test of the schema is whether it supports the following
analysis/metric calculations that DevLake users want. So I agree some more
research into use cases would help with the design as well.

Best,
Hezheng

On Tue, Jul 19, 2022 at 12:26 AM Klesh Wong <kl...@apache.org> wrote:

> Hi, all
>
> This is a open discussion of CI/CD Domain Layer Table, feel free to
> share your thought, thanks
>
> ## Where we at
>
> The following status is how we support the CI/CD data analysis :
>
> 1. We collect Gitlab Pipeline data into Tool Layer as
> `_tool_gitlab_pipeline` 2. We collect Jenkins data into both Tool Layer
> and Domain Layer as `_tool_jenkins_jobs` `_tool_jenkins_builds` `jobs`
> and `builds`
>
> In another word, we modeled the CI/CD Domain Layer Tables based on
> Jenkins, and the Gitlab Pipeline data never get into Domain Layer
> because they are quite different in terms of modeling. This is also true
> for other kinds of CI/CD systems.
>
>
> ## Why we need it
>
> However, CI/CD plays an important role during the R&D process, metrics
> like **Build Success Rate**, etc (@Startrekzky please add more examples)
> depend on it.
>
> ## What is the problem
>
> CI/CD modeling varies in different platforms, there is no de facto
> modeling. It is hard to design a sensible model for CI/CD Domain Layer
> Data.
>
> ## How should we approach
>
> I propose that we address the problem by the following step:
>
> 1. Set up a couple of designated charts depending on CI/CD data that we
> would like to have @Startrekzky
> 2. Assign a Veteran Developer to investigate the modeling of the most
> popular platforms, such as Github/Gitlab/Jenkins/ArgoCD, and propose a
> CI/CD Domain Layer Table design
> 3. The design shall be evaluated by the PPMC members in 3 workdays while
> all Committers could share their opinions
> 4. A Data-Oriented PoC (a set of tables with data from different
> platforms) should be presented for the Analyst (@Startrekzky @leglar
> @hezyin) to evaluate
> 5. And then we implement
>
> ### Use case
>
> 1. We may create Standard CI/CD charts based on Unified Domain Layer
> Tables
> 2. Users may exploit the Unified CI/CD Domain Layer Tables to create
> customized charts
>
>
> Regards
> Klesh Wong
>