You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jarek Potiuk (Jira)" <ji...@apache.org> on 2022/05/06 10:57:00 UTC

[jira] [Updated] (CASSANDRA-17612) Cassandra latest (3.0.26) image fails to start with health check

     [ https://issues.apache.org/jira/browse/CASSANDRA-17612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jarek Potiuk updated CASSANDRA-17612:
-------------------------------------
    Description: 
Today our CI images at Apache Airflow started to fail, and when we investigated, the root cause seems to be that Cassandra 3.0 image in our CI jobs failed to start (and pass health checks). Usually we have one of our tests bring up a number of images via docker compose and we used "cassandra:3.0" image for that.

The whole tests fails because cassandra container is unhealthy:

[https://github.com/apache/airflow/runs/6320170343?check_suite_focus=true#step:10:6651]
[https://github.com/apache/airflow/runs/6319805534?check_suite_focus=true#step:10:12629]
[https://github.com/apache/airflow/runs/6319710486?check_suite_focus=true#step:10:6759]

{{ERROR: for airflow Container "3bd115315ba7" is unhealthy.}}
{{Encountered errors while bringing up the project.}}
{{3bd115315ba7 cassandra:3.0 "docker-entrypoint.s…" 5 minutes ago Up 5 minutes (unhealthy) 7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp airflow-integration-postgres_cassandra_1}}

 

The errors from the cassandra container do not show anything suspicious:

{{INFO  08:45:22 Using Netty Version: [netty-buffer=netty-buffer-4.0.44.Final.452812a, netty-codec=netty-codec-4.0.44.Final.452812a, netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, netty-codec-http=netty-codec-http-4.0.44.Final.452812a, netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, netty-common=netty-common-4.0.44.Final.452812a, netty-handler=netty-handler-4.0.44.Final.452812a, netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, netty-transport=netty-transport-4.0.44.Final.452812a, netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a, netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]}}
{{INFO  08:45:22 Starting listening for CQL clients on /0.0.0.0:9042 (unencrypted)...}}
{{INFO  08:45:23 Not starting RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start it}}
{{INFO  08:45:23 Startup complete}}
{{INFO  08:45:24 Created default superuser role ‘cassandra’}}

 

Our docker-compose entry is here:

[https://github.com/apache/airflow/blob/main/scripts/ci/docker-compose/integration-cassandra.yml]

Basically - we run healthcheck that checks if cassandra is up and this health check worked fine before, but seems to fail now. It's either we are using wrong healthcheck or there is some bug in the command ?:

{{    healthcheck:}}
{{      test: "[ $$(nodetool statusgossip) = running ]"}}
{{      interval: 5s}}
{{      timeout: 30s}}
{{      retries: 50}}
{{    restart: always}}

We mitigated it by switching to 3.0.25 temporarily [https://github.com/apache/airflow/pull/23522]

Is this an error in cassandra? Or should we maybe change our health-check command?

  was:
Today our CI images at Apache Airflow started to fail, and when we investigated, the root cause seems to be that Cassandra 3.0 image in our CI jobs failed to start (and pass health checks). Usually we have one of our tests bring up a number of images via docker compose and we used "cassandra:3.0" image for that.

The whole tests fails because cassandra container is unhealthy:

[https://github.com/apache/airflow/runs/6320170343?check_suite_focus=true#step:10:6651]
[https://github.com/apache/airflow/runs/6319805534?check_suite_focus=true#step:10:12629]
[https://github.com/apache/airflow/runs/6319710486?check_suite_focus=true#step:10:6759]
ERROR: for airflow  Container "3bd115315ba7" is unhealthy.
  Encountered errors while bringing up the project.


 3bd115315ba7   cassandra:3.0                                               "docker-entrypoint.s…"   5 minutes ago   Up 5 minutes (unhealthy)   7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp                                                                                         airflow-integration-postgres_cassandra_1
Our docker-compose entry is here:

[https://github.com/apache/airflow/blob/main/scripts/ci/docker-compose/integration-cassandra.yml]

Basically - we run healthcheck that checks if cassandra is up and this health check worked fine before, but seems to fail now. It's either we are using wrong healthcheck or there is some bug in the command ?:

{{    healthcheck:}}
{{      test: "[ $$(nodetool statusgossip) = running ]"}}
{{      interval: 5s}}
{{      timeout: 30s}}
{{      retries: 50}}
{{    restart: always}}

We mitigated it by switching to 3.0.25 temporarily [https://github.com/apache/airflow/pull/23522]

Is this an error in cassandra? Or should we maybe change our health-check command?


> Cassandra latest (3.0.26) image fails to start with health check
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-17612
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17612
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CI, Packaging
>            Reporter: Jarek Potiuk
>            Priority: Normal
>
> Today our CI images at Apache Airflow started to fail, and when we investigated, the root cause seems to be that Cassandra 3.0 image in our CI jobs failed to start (and pass health checks). Usually we have one of our tests bring up a number of images via docker compose and we used "cassandra:3.0" image for that.
> The whole tests fails because cassandra container is unhealthy:
> [https://github.com/apache/airflow/runs/6320170343?check_suite_focus=true#step:10:6651]
> [https://github.com/apache/airflow/runs/6319805534?check_suite_focus=true#step:10:12629]
> [https://github.com/apache/airflow/runs/6319710486?check_suite_focus=true#step:10:6759]
> {{ERROR: for airflow Container "3bd115315ba7" is unhealthy.}}
> {{Encountered errors while bringing up the project.}}
> {{3bd115315ba7 cassandra:3.0 "docker-entrypoint.s…" 5 minutes ago Up 5 minutes (unhealthy) 7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp airflow-integration-postgres_cassandra_1}}
>  
> The errors from the cassandra container do not show anything suspicious:
> {{INFO  08:45:22 Using Netty Version: [netty-buffer=netty-buffer-4.0.44.Final.452812a, netty-codec=netty-codec-4.0.44.Final.452812a, netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, netty-codec-http=netty-codec-http-4.0.44.Final.452812a, netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, netty-common=netty-common-4.0.44.Final.452812a, netty-handler=netty-handler-4.0.44.Final.452812a, netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, netty-transport=netty-transport-4.0.44.Final.452812a, netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a, netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]}}
> {{INFO  08:45:22 Starting listening for CQL clients on /0.0.0.0:9042 (unencrypted)...}}
> {{INFO  08:45:23 Not starting RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start it}}
> {{INFO  08:45:23 Startup complete}}
> {{INFO  08:45:24 Created default superuser role ‘cassandra’}}
>  
> Our docker-compose entry is here:
> [https://github.com/apache/airflow/blob/main/scripts/ci/docker-compose/integration-cassandra.yml]
> Basically - we run healthcheck that checks if cassandra is up and this health check worked fine before, but seems to fail now. It's either we are using wrong healthcheck or there is some bug in the command ?:
> {{    healthcheck:}}
> {{      test: "[ $$(nodetool statusgossip) = running ]"}}
> {{      interval: 5s}}
> {{      timeout: 30s}}
> {{      retries: 50}}
> {{    restart: always}}
> We mitigated it by switching to 3.0.25 temporarily [https://github.com/apache/airflow/pull/23522]
> Is this an error in cassandra? Or should we maybe change our health-check command?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org