You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Ilya Khlopotov <ii...@apache.org> on 2019/06/24 14:01:05 UTC
State of the CI and developers' woes

Hello,

It is mostly my rumblings at this point. I don't have anything concrete to propose at the moment.

It seems like our current CI setup is not adequate anymore and currently broken. In this email I wanted to highlight problems to start discussions about what we can do to fix it. I also wanted to cover aspects related to building and testing FDB Layer.

# Problems

- I have to wait few hours before Travis would start the build after code push
  - this means it is nearly impossible to merge any code
- There is no CI for Windows OS
  - I am afraid to touch build configuration because Windows is different and there is no easy way to test the change
- we use rebar which is an outdated and deprecated tool
  - disabling some tests is tricky
  - increasing timeouts for tests is tricky
  - slow
  - not declarative config
- we use reltool which is an outdated tool which has better alternatives now
  - slow
  - currently it is not possible to control dependencies without modifying source (which makes integrator live harder)
  - overkill since we are not supporting live code updates anymore
- Our tests are flaky in the context of CI (slow machines)
  - eunit doesn't allow us to increase timeout easily
- The current CI is very basic and doesn't allow us to compose complex set of services
  - with FDB Layer we need an ability to connect to external FDB cluster
  - implementing performance testing suite is problematic since we don't have a separate machine to generate the load
  - implementing fault injection is almost impossible
  - how to do testing if we would decide to convert some of the components into micro services (where it makes sense and would not increase a footprint)
  - it is hard to introduce installable integration test suite
     - a testing application which is developed inside couchdb repo but can be released separately and installed on any machine to test existing CouchDB cluster
- The environment in CI is quite different from local development environment which leads to hard to debug issues
- Introducing new development tools is problematic since we need to add them in all CIs we have:
  - travis
  - Jenkins
  - couchdb-ci
  - windows
  - local docker containers
  - integrator's CI
- IBM (as an integrator company) have to duplicate CI to produce custom builds
  - the configurability of a build is not sufficient for custom builds
    - additional applications
    - additional tests
    - disabled applications
    - disabled tests
    - custom build flags
    - hook additional tools for testing
    - we specify dependencies in JSON file instead of hardcoding them in rebar 
  - there is no way to notify an integrator company about new changes in the upstream. Since the only notification supported is via configuration of webhooks on the repo. 

# Things that we might want to think about

- what we can do to enable developers to test their code on all supported platforms on every change
- how developers can propose new tools to improve testing and static analysis
- how to reduce wait time
- how to test a service which consists of multiple components
- how to introduce performance testing suite
- produce release artifacts on every CI run (maybe???)
- how to reduce the gap between:
  - CouchDB CI
  - developer environment
  - Integrator company's CI (any company which runs custom builds of CouchDB)
- how to modernize Windows builds and release tools


# Approaches

## 1. Moving into docker compose

We could move into docker compose. Docker compose has nice ways to orchestrate multiple containers. There are ways to do fault injection. However it doesn't really work reliably on MacOS and Windows. Configuring it in the context of CI either Jenkins or Travis is problematic as well. 

## 2. Use packer or Vagrant to provision Jenkins CI and set of dependent services locally

We could provision Jenkins and all build infrastructure locally in VMs. This would simplify testing locally, remove some load from shared CI. However this is not easy.

## 3. Use packer or Vagrant to provision set of dependent services

We could provision all required services inside one or multiple VMs.

## 4. Move CI and development environment into Kubernetes 

Kubernetes is a modern approach to orchestrate containerized applications. It seems to be a good fit for scalable deployments and performance testing suite. It is less desirable for local development or embedded uses of CouchDB. Therefore it is not clear if it would be a good fit for CI. It doesn't seem to be solving the problem of testing windows builds either.

# Open questions 

- how to unify (if at all possible) or reduce the gap between multiple build environments 
  - MacOS
    - HyperKit docker
    - Hyperkit VM
    - VirtualBox VM
    - native dependencies
  - Windows
    - Hyper-V docker
    - cygwin
    - Linux subsystem
    - Hyper-V VM
  - Linux
    - libvirt 
    - kvm
    - docker
    - native dependencies
  - Jenkins
  - Travis
- if we go with docker how to unify or reduce the gap between multiple options (the best approach depends on the OS and developer):
  - everything in docker including editor
    - pros: 
      - it circumvents slow disk shares on MacOS
      - avoids some networking issues on MacOS
    - cons:
      - console only
      - configure git inside docker
      - needs backup solution 
      - communication with services running on host is harder to configure
   - bind current directory on host into docker container ("-v `pwd`:/home/couchdb/")
     - pros:
       - easiest to configure
       - smaller footprint of the container
     - cons:
       - there are files synchronization issues on MacOS and Windows
         - very slow on Mac
         - updates on host are missed in the docker container 
   - pass source code from host to docker builder in docker context
    - pros: 
      - it is the only thing which works in some restricted environments such as Docker-in-Docker setup
- Support for windows is the biggest problem:
  - limiting the variety of tools we can use
  - limiting the virtualization/containerization technologies 
  - forces us to have separate toolchain and CI
  - it would be way easier if we could find a way to cross compile to windows target inside docker at least in the context of CI

# Conclusion

In my opinion the approach when every developer had his/her own development environment worked fine in the past. However with introduction of FDB there is a need for an orchestration of multiple services. It is very hard to implement a proper testing of CouchDB on top of FDB without introducing some orchestration and additional tools in the CI (and developers environment). Developers would still might choose to maintain their own configuration. In such case we might need a mechanism to skip tests which require missing capabilities. 

Best regards,
iilyak