You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/10/17 14:47:51 UTC

[GitHub] [airflow] jnunezgts opened a new issue #11617: Docker Swarm operator is broken: json: cannot unmarshal string into Go struct field Resources.MemoryBytes of type int64

jnunezgts opened a new issue #11617:
URL: https://github.com/apache/airflow/issues/11617


   <!--
   
   Welcome to Apache Airflow!  For a smooth issue process, try to answer the following questions.
   Don't worry if they're not all applicable; just try to include what you can :-)
   
   If you need to include code snippets or logs, please put them in fenced code
   blocks.  If they're super-long, please use the details tag like
   <details><summary>super-long log</summary> lots of stuff </details>
   
   Please delete these comment blocks before submitting the issue.
   
   -->
   
   **Apache Airflow version**:
    1.10.12, using SQLLite as the backend
   
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`):
   N/A. Using Docker Swarm 19.03.8
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**:
   No cloud, bare-metal server:
   ```
   HP ProLiant DL560 Gen8, BIOS P77 12/20/2013, 64 cpus
   ```
   
   - **OS** (e.g. from /etc/os-release):
   ```
   Fedora release 29 (Twenty Nine)
   ```
   
   - **Kernel** (e.g. `uname -a`):
   ```
   Linux server.company.com 4.19.82-1300.fc29.x86_64 #1 SMP Fri Nov 8 10:49:58 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
   ```
   
   - **Install tools**:
   ```
   pip
   ```
   - **Others**:
   ```
   Python 3.7.2 (default, Jan 16 2019, 19:49:22)
   [GCC 8.2.1 20181215 (Red Hat 8.2.1-6)] on linux
   ```
   
   Docker info:
   ```
   Client:
    Debug Mode: false
   
   Server:
    Containers: 21
     Running: 0
     Paused: 0
     Stopped: 21
    Images: 12
    Server Version: 19.03.8
    Storage Driver: overlay2
     Backing Filesystem: <unknown>
     Supports d_type: true
     Native Overlay Diff: true
    Logging Driver: json-file
    Cgroup Driver: systemd
    Plugins:
     Volume: local
     Network: bridge host ipvlan macvlan null overlay
     Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
    Swarm: active
     NodeID: j0pl320hoxuqcaaa14z2znvgo
     Is Manager: true
     ClusterID: kpgz783mpw8aapdxchtwdu2ff
     Managers: 1
     Nodes: 4
     Default Address Pool: 10.0.0.0/8
     SubnetSize: 24
     Data Path Port: 4789
     Orchestration:
      Task History Retention Limit: 5
     Raft:
      Snapshot Interval: 10000
      Number of Old Snapshots to Retain: 0
      Heartbeat Tick: 1
      Election Tick: 10
     Dispatcher:
      Heartbeat Period: 5 seconds
     CA Configuration:
      Expiry Duration: 3 months
      Force Rotate: 0
     Autolock Managers: false
     Root Rotation In Progress: false
     Node Address: 172.29.248.55
     Manager Addresses:
      172.29.248.55:2377
    Runtimes: runc
    Default Runtime: runc
    Init Binary: docker-init
    containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
    runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
    init version: fec3683
    Security Options:
     seccomp
      Profile: default
    Kernel Version: 4.19.82-1300.fc29.x86_64
    Operating System: Fedora 29 (Twenty Nine)
    OSType: linux
    Architecture: x86_64
    CPUs: 64
    Total Memory: 125.9GiB
    Name: server.company.com
    ID: 7ESU:O253:JGNS:YJIY:XXX:CYTI:WFQC:6L5C:XXXX:62IO:VH23:XXXX
    Docker Root Dir: /opt/docker
    Debug Mode: false
    HTTP Proxy: http://proxy.company.com:8080/
    HTTPS Proxy: http://proxy.company:8080/
    No Proxy: localhost,127.0.0.1,server.company.com,.company.com
    Registry: https://index.docker.io/v1/
    Labels:
    Experimental: false
    Insecure Registries:
     privatereg.company.com:5000
     localhost:5000
     server.company.com:5000
     127.0.0.0/8
    Live Restore Enabled: false
   ```
   
   **What happened**:
   
   Created the following DAG to schedule a one time shot job:
   ```
   from datetime import time
   from datetime import datetime
   from datetime import timedelta
   from airflow import DAG
   from airflow.contrib.operators.docker_swarm_operator import DockerSwarmOperator
   
     DEFAULT_ARGS = {
         'retry_delay': timedelta(minutes=5),
         'retries': 1,
         'email_on_failure': True,
         'email_on_retry': False,
         'email': ['myemail@company.com']
     }
   
   with DAG('24_7_box', description='24 x 7. With retries', default_args=DEFAULT_ARGS, schedule_interval='0 * * * Mon-Sun', start_date=datetime(2019, 7, 23), max_active_runs=1, catchup=False) as twenty_four_by_seven_dag:
         # See:
         # https://airflow.apache.org/docs/stable/_api/airflow/contrib/operators/docker_swarm_operator/index.html
         # https://airflow.apache.org/docs/stable/_modules/airflow/contrib/operators/docker_swarm_operator.html
         SLEEP_TASK = DockerSwarmOperator(
             task_id="SLEEP_TASK",
             image="fedora:29",
             api_version="auto",
             command="/bin/sleep 60",
             docker_url="unix://var/run/docker-sysavtbuild.sock",
             force_pull=False,
             mem_limit="500m",
             auto_remove=True,
         )
   
         SLEEP_TASK
   ```
   
   **What you expected to happen**:
   
   I was expecting the container to be created and be alive for 60 seconds, exit with code=0 after that. No ouput.
   
   Other have reported success in the past using [Docker Swarm Operator](https://airflow.apache.org/docs/stable/_api/airflow/contrib/operators/docker_swarm_operator/). 
   
   <!-- What do you think went wrong? -->
   
   Not sure. The Airflow log shows the following:
   ```
   [2020-10-17 09:46:58,475] {taskinstance.py:1150} ERROR - 400 Client Error: Bad Request ("json: cannot unmarshal string into Go struct field Resources.MemoryBytes of type int64")
   ```
   
   I can run this command from docker CLI as follows:
   ```
   [user@server dags]$ docker run --rm --detach fedora:29 /bin/sleep 45
   29912c34f43e2dfa20d417cb80113059a183518b99215609c0aa7b37874c27db
   [user@server dags]$ docker ps
   CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
   29912c34f43e        fedora:29           "/bin/sleep 45"     7 seconds ago       Up 6 seconds                            gifted_pare
   ```
   
   **How to reproduce it**:
   
   1. Copy the DAG provided into ~/airflow/dags
   2. Turn off the DAG
   3. Trigger the DAG or let the scheduler run it. Error will show up
   
   **Anything else we need to know**:
   
   <details><summary>Airflow.log</summary>
   *** Reading local file: /home/user/airflow/logs/avt_24_7_box/SLEEP_TASK/2020-10-17T13:24:26.101897+00:00/2.log
   [2020-10-17 09:46:58,312] {taskinstance.py:670} INFO - Dependencies all met for <TaskInstance: avt_24_7_box.SLEEP_TASK 2020-10-17T13:24:26.101897+00:00 [queued]>
   [2020-10-17 09:46:58,321] {taskinstance.py:670} INFO - Dependencies all met for <TaskInstance: avt_24_7_box.SLEEP_TASK 2020-10-17T13:24:26.101897+00:00 [queued]>
   [2020-10-17 09:46:58,321] {taskinstance.py:880} INFO - 
   --------------------------------------------------------------------------------
   [2020-10-17 09:46:58,321] {taskinstance.py:881} INFO - Starting attempt 2 of 2
   [2020-10-17 09:46:58,321] {taskinstance.py:882} INFO - 
   --------------------------------------------------------------------------------
   [2020-10-17 09:46:58,328] {taskinstance.py:901} INFO - Executing <Task(DockerSwarmOperator): SLEEP_TASK> on 2020-10-17T13:24:26.101897+00:00
   [2020-10-17 09:46:58,335] {standard_task_runner.py:54} INFO - Started process 35637 to run task
   [2020-10-17 09:46:58,371] {standard_task_runner.py:77} INFO - Running: ['airflow', 'run', '24_7_box', 'SLEEP_TASK', '2020-10-17T13:24:26.101897+00:00', '--job_id', '55', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/avt_test3.py', '--cfg_path', '/tmp/tmpaivrdhuu']
   [2020-10-17 09:46:58,372] {standard_task_runner.py:78} INFO - Job 55: Subtask SLEEP_TASK
   [2020-10-17 09:46:58,398] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: avt_24_7_box.SLEEP_TASK 2020-10-17T13:24:26.101897+00:00 [running]> server.company.com
   [2020-10-17 09:46:58,467] {docker_swarm_operator.py:105} INFO - Starting docker service from image fedora:29
   [2020-10-17 09:46:58,475] {taskinstance.py:1150} ERROR - 400 Client Error: Bad Request ("json: cannot unmarshal string into Go struct field Resources.MemoryBytes of type int64")
   Traceback (most recent call last):
     File "/home/user/virtualenv/airflow/lib64/python3.7/site-packages/docker/api/client.py", line 259, in _raise_for_status
       response.raise_for_status()
     File "/home/user/virtualenv/airflow/lib64/python3.7/site-packages/requests/models.py", line 941, in raise_for_status
       raise HTTPError(http_error_msg, response=self)
   requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http+docker://localhost/v1.40/services/create
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File "/home/user/virtualenv/airflow/lib64/python3.7/site-packages/airflow/models/taskinstance.py", line 984, in _run_raw_task
       result = task_copy.execute(context=context)
     File "/home/user/virtualenv/airflow/lib64/python3.7/site-packages/airflow/operators/docker_operator.py", line 277, in execute
       return self._run_image()
     File "/home/user/virtualenv/airflow/lib64/python3.7/site-packages/airflow/contrib/operators/docker_swarm_operator.py", line 119, in _run_image
       labels={'name': 'airflow__%s__%s' % (self.dag_id, self.task_id)}
     File "/home/user/virtualenv/airflow/lib64/python3.7/site-packages/docker/utils/decorators.py", line 34, in wrapper
       return f(self, *args, **kwargs)
     File "/home/user/virtualenv/airflow/lib64/python3.7/site-packages/docker/api/service.py", line 190, in create_service
       self._post_json(url, data=data, headers=headers), True
     File "/home/user/virtualenv/airflow/lib64/python3.7/site-packages/docker/api/client.py", line 265, in _result
       self._raise_for_status(response)
     File "/home/user/virtualenv/airflow/lib64/python3.7/site-packages/docker/api/client.py", line 261, in _raise_for_status
       raise create_api_error_from_http_exception(e)
     File "/home/user/virtualenv/airflow/lib64/python3.7/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
       raise cls(e, response=response, explanation=explanation)
   docker.errors.APIError: 400 Client Error: Bad Request ("json: cannot unmarshal string into Go struct field Resources.MemoryBytes of type int64")
   [2020-10-17 09:46:58,481] {taskinstance.py:1194} INFO - Marking task as FAILED. dag_id=24_7_box, task_id=SLEEP_TASK, execution_date=20201017T132426, start_date=20201017T134658, end_date=20201017T134658
   [2020-10-17 09:46:58,509] {configuration.py:373} WARNING - section/key [smtp/smtp_user] not found in config
   [2020-10-17 09:46:58,583] {email.py:132} INFO - Sent an alert email to ['user@company.com']
   [2020-10-17 09:47:03,312] {local_task_job.py:102} INFO - Task exited with return code 1
   </details>
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #11617: Docker Swarm operator is broken: json: cannot unmarshal string into Go struct field Resources.MemoryBytes of type int64

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #11617:
URL: https://github.com/apache/airflow/issues/11617#issuecomment-711023327


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #11617: Docker Swarm operator is broken: json: cannot unmarshal string into Go struct field Resources.MemoryBytes of type int64

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #11617:
URL: https://github.com/apache/airflow/issues/11617#issuecomment-712134943


   @jnunezgts Can I assign you to this report? I am happy to help with the review.
   CC: @nullhack 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #11617: Docker Swarm operator is broken: json: cannot unmarshal string into Go struct field Resources.MemoryBytes of type int64

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #11617:
URL: https://github.com/apache/airflow/issues/11617#issuecomment-712149329


   Don't worry. Be happy.  We will definitely help you. We also have a special channel for new contributors - #airflow-how-to-pr ([![Slack Status](https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&style=social)](https://s.apache.org/airflow-slack)
   )


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jnunezgts commented on issue #11617: Docker Swarm operator is broken: json: cannot unmarshal string into Go struct field Resources.MemoryBytes of type int64

Posted by GitBox <gi...@apache.org>.
jnunezgts commented on issue #11617:
URL: https://github.com/apache/airflow/issues/11617#issuecomment-711142545


   Hello,
   I think I found the bug. If I remove completely the memory limit parameter from the DAG ```mem_limit="500m"```, the task runs without a problem. The documentation says that you can put a float value in bytes or a String that will get parsed correctly (like the Docker operator) but that's not the case here
   
   ```
       :param mem_limit: Maximum amount of memory the container can use.
           Either a float value, which represents the limit in bytes,
           or a string like ``128m`` or ``1g``.
       :type mem_limit: float or str
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jnunezgts commented on issue #11617: Docker Swarm operator is broken: json: cannot unmarshal string into Go struct field Resources.MemoryBytes of type int64

Posted by GitBox <gi...@apache.org>.
jnunezgts commented on issue #11617:
URL: https://github.com/apache/airflow/issues/11617#issuecomment-712140669


   > @jnunezgts Can I assign you to this report? I am happy to help with the review.
   > CC: @nullhack
   
   Hello,
   That works. You may need to do a little bit off hand holding as I haven't submitted a patch to an OpenSource project in a while. Also I need to debug this to see why the DockerOperator is happy with the flag but the SwamDockerOperator is not (I suspect is how they pass the values between them but it is a wild guess at this point).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org