You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@liminal.apache.org by jb...@apache.org on 2020/07/20 06:24:40 UTC

[incubator-liminal] branch master created (now 88174a6)

This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git.


      at 88174a6  Add NOTICE and DISCLAIMER

This branch includes the following new commits:

     new 87da6bd  first commit
     new 2887b19  First code commit
     new aa63e16  rainbow_dags dag creation + python task
     new 9e646df  Tasks stubs
     new 1c204b6  Add 'build' package
     new c74af6c  Add run_tests script
     new 79c9031  Add LICENSE
     new a32d4eb  Add build module
     new 45ceeac  Fix requirements.txt
     new 9292bb8  Refactor PythonTask
     new 0817e97  Refactor build
     new eed1cd0  Elaborate build tests
     new f4bdfac  Add cli
     new ae36a5e  Remove TODOs with GitHub issues
     new 77139de  Fix rainbow_dags python task
     new 326a042  Change pythontask config to input/output enhancement
     new 14716fc  Add python_server service
     new 38152de  Class registries
     new da4022c  Fix import bug
     new 2ef72ec  Performance improvement for class_util
     new c1d77f2  Make paths in tests relative to script location
     new 7596566  Add job_start and job_end tasks
     new 93d9fa2  Update README
     new 2d34195  Update README with yml example
     new 779e86e  Format yaml in README
     new 2ce98f4  Add example repository structure to README
     new 44d9c3d  Update README fix task description
     new e1767df  Missing requirements
     new be9e513  TODO
     new c21bfdf  Fix missing tasks/dags bug
     new 253a15a  Add pipeline configuration as default arguments
     new 8df40b2  Use user pip conf in docker build
     new 54fb987  Add architecture diagrams
     new 3245e47  Add short architecture description
     new d80d9c0  Upgrade the quality of the diagram
     new 269459e  Upgrade the quality of the diagram
     new 6da38b8  Rainbow local mode
     new 07aad66  Local mode improvements
     new 324f717  perform pip upgrade when building python images
     new fac89af  fix jobEndStatus tasks state check
     new 0c8f8dc  Rename project to Liminal
     new 249aa74  fix split list function
     new 88174a6  Add NOTICE and DISCLAIMER

The 43 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



[incubator-liminal] 43/43: Add NOTICE and DISCLAIMER

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 88174a6fe519f9a6052f6e5d366a37a88a915ee4
Author: jbonofre <jb...@apache.org>
AuthorDate: Mon Jul 20 08:22:45 2020 +0200

    Add NOTICE and DISCLAIMER
---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 3623e76..8d1fd47 100644
--- a/README.md
+++ b/README.md
@@ -17,9 +17,9 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Liminal
+# Apache Liminal
 
-Liminal is an end-to-end platform for data engineers & scientists, allowing them to build,
+Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build,
 train and deploy machine learning models in a robust and agile way.
 
 The platform provides the abstractions and declarative capabilities for


[incubator-liminal] 08/43: Add build module

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit a32d4eb722511db7b2f29228fae8e08ca5de8e81
Author: aviemzur <av...@gmail.com>
AuthorDate: Thu Mar 12 10:08:43 2020 +0200

    Add build module
---
 README.md                                          |  4 ++
 rainbow/docker/__init__.py                         |  1 -
 rainbow/docker/python/Dockerfile                   | 19 +++++++
 rainbow/docker/{ => python}/__init__.py            |  1 -
 rainbow/docker/python/python_image.py              | 61 ++++++++++++++++++++++
 rainbow/{ => runners/airflow}/build/__init__.py    |  0
 .../rainbow_dags.py => build/build_rainbow.py}     | 18 +++----
 .../airflow/build/python/container-setup.sh        |  9 ++++
 .../airflow/build/python/container-teardown.sh     |  6 +++
 rainbow/runners/airflow/dag/rainbow_dags.py        | 10 ++--
 rainbow/runners/airflow/model/task.py              |  4 +-
 .../airflow/tasks/create_cloudformation_stack.py   |  2 +-
 .../airflow/tasks/delete_cloudformation_stack.py   |  2 +-
 rainbow/runners/airflow/tasks/job_end.py           |  2 +-
 rainbow/runners/airflow/tasks/job_start.py         |  2 +-
 rainbow/runners/airflow/tasks/python.py            | 13 +++--
 rainbow/runners/airflow/tasks/spark.py             |  2 +-
 rainbow/runners/airflow/tasks/sql.py               |  2 +-
 requirements.txt                                   |  3 ++
 tests/runners/airflow/dag/test_rainbow_dags.py     |  5 ++
 .../runners/airflow/tasks/hello_world}/__init__.py |  1 -
 .../airflow/tasks/hello_world/hello_world.py       |  2 +-
 tests/runners/airflow/tasks/test_python.py         | 45 +++++++++++++---
 23 files changed, 175 insertions(+), 39 deletions(-)

diff --git a/README.md b/README.md
index 7168564..d8b9a23 100644
--- a/README.md
+++ b/README.md
@@ -1 +1,5 @@
 # rainbow
+
+```
+ln -s "/Applications/Docker.app/Contents//Resources/bin/docker-credential-desktop" "/usr/local/bin/docker-credential-desktop"
+```
\ No newline at end of file
diff --git a/rainbow/docker/__init__.py b/rainbow/docker/__init__.py
index 8bb1ec2..217e5db 100644
--- a/rainbow/docker/__init__.py
+++ b/rainbow/docker/__init__.py
@@ -15,4 +15,3 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-# TODO: docker
diff --git a/rainbow/docker/python/Dockerfile b/rainbow/docker/python/Dockerfile
new file mode 100644
index 0000000..d4e3ed2
--- /dev/null
+++ b/rainbow/docker/python/Dockerfile
@@ -0,0 +1,19 @@
+# Use an official Python runtime as a parent image
+FROM python:3.7-slim
+
+# Install aptitude build-essential
+#RUN apt-get install -y --reinstall build-essential
+
+# Set the working directory to /app
+WORKDIR /app
+
+# Order of operations is important here for docker's caching & incremental build performance.    !
+# Be careful when changing this code.                                                            !
+
+# Install any needed packages specified in requirements.txt
+COPY ./requirements.txt /app
+RUN pip install -r requirements.txt
+
+# Copy the current directory contents into the container at /app
+RUN echo "Copying source code.."
+COPY . /app
diff --git a/rainbow/docker/__init__.py b/rainbow/docker/python/__init__.py
similarity index 98%
copy from rainbow/docker/__init__.py
copy to rainbow/docker/python/__init__.py
index 8bb1ec2..217e5db 100644
--- a/rainbow/docker/__init__.py
+++ b/rainbow/docker/python/__init__.py
@@ -15,4 +15,3 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-# TODO: docker
diff --git a/rainbow/docker/python/python_image.py b/rainbow/docker/python/python_image.py
new file mode 100644
index 0000000..d66dfbe
--- /dev/null
+++ b/rainbow/docker/python/python_image.py
@@ -0,0 +1,61 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import os
+import shutil
+import tempfile
+import docker
+
+
+def build(source_path, tag, extra_files=None):
+    if extra_files is None:
+        extra_files = []
+
+    print(f'Building image {tag}')
+
+    temp_dir = tempfile.mkdtemp()
+    # Delete dir for shutil.copytree to work
+    os.rmdir(temp_dir)
+
+    __copy_source(source_path, temp_dir)
+
+    requirements_file_path = os.path.join(temp_dir, 'requirements.txt')
+    if not os.path.exists(requirements_file_path):
+        with open(requirements_file_path, 'w'):
+            pass
+
+    dockerfile_path = os.path.join(os.path.dirname(__file__), 'Dockerfile')
+
+    for file in extra_files + [dockerfile_path]:
+        __copy_file(file, temp_dir)
+
+    print(temp_dir, os.listdir(temp_dir))
+
+    docker_client = docker.from_env()
+    docker_client.images.build(path=temp_dir, tag=tag)
+
+    docker_client.close()
+
+    shutil.rmtree(temp_dir)
+
+
+def __copy_source(source_path, destination_path):
+    shutil.copytree(source_path, destination_path)
+
+
+def __copy_file(source_file_path, destination_file_path):
+    shutil.copy2(source_file_path, destination_file_path)
diff --git a/rainbow/build/__init__.py b/rainbow/runners/airflow/build/__init__.py
similarity index 100%
rename from rainbow/build/__init__.py
rename to rainbow/runners/airflow/build/__init__.py
diff --git a/rainbow/runners/airflow/dag/rainbow_dags.py b/rainbow/runners/airflow/build/build_rainbow.py
similarity index 86%
copy from rainbow/runners/airflow/dag/rainbow_dags.py
copy to rainbow/runners/airflow/build/build_rainbow.py
index 6bdf66b..222ea5f 100644
--- a/rainbow/runners/airflow/dag/rainbow_dags.py
+++ b/rainbow/runners/airflow/build/build_rainbow.py
@@ -26,7 +26,10 @@ from airflow import DAG
 from rainbow.runners.airflow.tasks.python import PythonTask
 
 
-def register_dags(path):
+def build_rainbow(path):
+    """
+    TODO: doc for build_rainbow
+    """
     files = []
     for r, d, f in os.walk(path):
         for file in f:
@@ -38,7 +41,7 @@ def register_dags(path):
     dags = []
 
     for config_file in files:
-        print(f'Registering DAG for file: f{config_file}')
+        print(f'Building artifacts file: f{config_file}')
 
         with open(config_file) as stream:
             # TODO: validate config
@@ -64,12 +67,7 @@ def register_dags(path):
                     task_instance = get_task_class(task_type)(
                         dag, pipeline['pipeline'], parent if parent else None, task, 'all_success'
                     )
-                    parent = task_instance.apply_task_to_dag()
-
-                    print(f'{parent}{{{task_type}}}')
-
-                dags.append(dag)
-    return dags
+                    parent = task_instance.build()
 
 
 # TODO: task class registry
@@ -83,6 +81,4 @@ def get_task_class(task_type):
 
 
 if __name__ == '__main__':
-    # TODO: configurable yaml dir
-    path = 'tests/runners/airflow/dag/rainbow'
-    register_dags(path)
+    register_dags('')
diff --git a/rainbow/runners/airflow/build/python/container-setup.sh b/rainbow/runners/airflow/build/python/container-setup.sh
new file mode 100755
index 0000000..6e8d242
--- /dev/null
+++ b/rainbow/runners/airflow/build/python/container-setup.sh
@@ -0,0 +1,9 @@
+#!/bin/bash
+
+echo """$RAINBOW_INPUT""" > rainbow_input.json
+
+AIRFLOW_RETURN_FILE=/airflow/xcom/return.json
+
+mkdir -p /airflow/xcom/
+
+echo {} > $AIRFLOW_RETURN_FILE
diff --git a/rainbow/runners/airflow/build/python/container-teardown.sh b/rainbow/runners/airflow/build/python/container-teardown.sh
new file mode 100755
index 0000000..1219407
--- /dev/null
+++ b/rainbow/runners/airflow/build/python/container-teardown.sh
@@ -0,0 +1,6 @@
+#!/bin/bash
+
+USER_CONFIG_OUTPUT_FILE=$1
+if [ "$USER_CONFIG_OUTPUT_FILE" != "" ]; then
+    cp ${USER_CONFIG_OUTPUT_FILE} /airflow/xcom/return.json
+fi
diff --git a/rainbow/runners/airflow/dag/rainbow_dags.py b/rainbow/runners/airflow/dag/rainbow_dags.py
index 6bdf66b..c564737 100644
--- a/rainbow/runners/airflow/dag/rainbow_dags.py
+++ b/rainbow/runners/airflow/dag/rainbow_dags.py
@@ -23,10 +23,13 @@ from datetime import datetime
 import yaml
 from airflow import DAG
 
-from rainbow.runners.airflow.tasks.python import PythonTask
+from rainbow.runners.airflow.build import build_rainbow
 
 
 def register_dags(path):
+    """
+    TODO: doc for register_dags
+    """
     files = []
     for r, d, f in os.walk(path):
         for file in f:
@@ -72,10 +75,7 @@ def register_dags(path):
     return dags
 
 
-# TODO: task class registry
-task_classes = {
-    'python': PythonTask
-}
+task_classes = build_rainbow.task_classes
 
 
 def get_task_class(task_type):
diff --git a/rainbow/runners/airflow/model/task.py b/rainbow/runners/airflow/model/task.py
index 2650aa1..25656ee 100644
--- a/rainbow/runners/airflow/model/task.py
+++ b/rainbow/runners/airflow/model/task.py
@@ -32,9 +32,9 @@ class Task:
         self.config = config
         self.trigger_rule = trigger_rule
 
-    def setup(self):
+    def build(self):
         """
-        Setup method for task.
+        Build task's artifacts.
         """
         raise NotImplementedError()
 
diff --git a/rainbow/runners/airflow/tasks/create_cloudformation_stack.py b/rainbow/runners/airflow/tasks/create_cloudformation_stack.py
index 9304167..c478dc7 100644
--- a/rainbow/runners/airflow/tasks/create_cloudformation_stack.py
+++ b/rainbow/runners/airflow/tasks/create_cloudformation_stack.py
@@ -27,7 +27,7 @@ class CreateCloudFormationStackTask(task.Task):
     def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
         super().__init__(dag, pipeline_name, parent, config, trigger_rule)
 
-    def setup(self):
+    def build(self):
         pass
 
     def apply_task_to_dag(self):
diff --git a/rainbow/runners/airflow/tasks/delete_cloudformation_stack.py b/rainbow/runners/airflow/tasks/delete_cloudformation_stack.py
index 66d5783..d172284 100644
--- a/rainbow/runners/airflow/tasks/delete_cloudformation_stack.py
+++ b/rainbow/runners/airflow/tasks/delete_cloudformation_stack.py
@@ -27,7 +27,7 @@ class DeleteCloudFormationStackTask(task.Task):
     def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
         super().__init__(dag, pipeline_name, parent, config, trigger_rule)
 
-    def setup(self):
+    def build(self):
         pass
 
     def apply_task_to_dag(self):
diff --git a/rainbow/runners/airflow/tasks/job_end.py b/rainbow/runners/airflow/tasks/job_end.py
index b3244c4..a6c5ef2 100644
--- a/rainbow/runners/airflow/tasks/job_end.py
+++ b/rainbow/runners/airflow/tasks/job_end.py
@@ -27,7 +27,7 @@ class JobEndTask(task.Task):
     def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
         super().__init__(dag, pipeline_name, parent, config, trigger_rule)
 
-    def setup(self):
+    def build(self):
         pass
 
     def apply_task_to_dag(self):
diff --git a/rainbow/runners/airflow/tasks/job_start.py b/rainbow/runners/airflow/tasks/job_start.py
index f794e09..7338363 100644
--- a/rainbow/runners/airflow/tasks/job_start.py
+++ b/rainbow/runners/airflow/tasks/job_start.py
@@ -27,7 +27,7 @@ class JobStartTask(task.Task):
     def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
         super().__init__(dag, pipeline_name, parent, config, trigger_rule)
 
-    def setup(self):
+    def build(self):
         pass
 
     def apply_task_to_dag(self):
diff --git a/rainbow/runners/airflow/tasks/python.py b/rainbow/runners/airflow/tasks/python.py
index 983ce0c..8317854 100644
--- a/rainbow/runners/airflow/tasks/python.py
+++ b/rainbow/runners/airflow/tasks/python.py
@@ -16,10 +16,12 @@
 # specific language governing permissions and limitations
 # under the License.
 import json
+import os
 
 from airflow.models import Variable
 from airflow.operators.dummy_operator import DummyOperator
 
+from rainbow.docker.python import python_image
 from rainbow.runners.airflow.model import task
 from rainbow.runners.airflow.operators.kubernetes_pod_operator import \
     ConfigurableKubernetesPodOperator, \
@@ -45,9 +47,14 @@ class PythonTask(task.Task):
         self.config_task_id = self.task_name + '_input'
         self.executors = self.__executors()
 
-    def setup(self):
-        # TODO: build docker image if needed.
-        pass
+    def build(self):
+        if 'source' in self.config:
+            script_dir = os.path.dirname(__file__)
+
+            python_image.build(self.config['source'], self.image, [
+                os.path.join(script_dir, '../build/python/container-setup.sh'),
+                os.path.join(script_dir, '../build/python/container-teardown.sh')
+            ])
 
     def apply_task_to_dag(self):
 
diff --git a/rainbow/runners/airflow/tasks/spark.py b/rainbow/runners/airflow/tasks/spark.py
index ebae64e..8846f97 100644
--- a/rainbow/runners/airflow/tasks/spark.py
+++ b/rainbow/runners/airflow/tasks/spark.py
@@ -27,7 +27,7 @@ class SparkTask(task.Task):
     def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
         super().__init__(dag, pipeline_name, parent, config, trigger_rule)
 
-    def setup(self):
+    def build(self):
         pass
 
     def apply_task_to_dag(self):
diff --git a/rainbow/runners/airflow/tasks/sql.py b/rainbow/runners/airflow/tasks/sql.py
index 6dfc0f1..23458a9 100644
--- a/rainbow/runners/airflow/tasks/sql.py
+++ b/rainbow/runners/airflow/tasks/sql.py
@@ -27,7 +27,7 @@ class SparkTask(task.Task):
     def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
         super().__init__(dag, pipeline_name, parent, config, trigger_rule)
 
-    def setup(self):
+    def build(self):
         pass
 
     def apply_task_to_dag(self):
diff --git a/requirements.txt b/requirements.txt
new file mode 100644
index 0000000..f22c0a7
--- /dev/null
+++ b/requirements.txt
@@ -0,0 +1,3 @@
+docker:4.2.0
+apache-airflow:1.10.9
+docker-pycreds:0.4.0
diff --git a/tests/runners/airflow/dag/test_rainbow_dags.py b/tests/runners/airflow/dag/test_rainbow_dags.py
index 41bea09..c66e3bc 100644
--- a/tests/runners/airflow/dag/test_rainbow_dags.py
+++ b/tests/runners/airflow/dag/test_rainbow_dags.py
@@ -1,6 +1,7 @@
 from unittest import TestCase
 
 from rainbow.runners.airflow.dag import rainbow_dags
+import unittest
 
 
 class Test(TestCase):
@@ -9,3 +10,7 @@ class Test(TestCase):
         self.assertEqual(len(dags), 1)
         # TODO: elaborate test
         pass
+
+
+if __name__ == '__main__':
+    unittest.main()
diff --git a/rainbow/docker/__init__.py b/tests/runners/airflow/tasks/hello_world/__init__.py
similarity index 98%
copy from rainbow/docker/__init__.py
copy to tests/runners/airflow/tasks/hello_world/__init__.py
index 8bb1ec2..217e5db 100644
--- a/rainbow/docker/__init__.py
+++ b/tests/runners/airflow/tasks/hello_world/__init__.py
@@ -15,4 +15,3 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-# TODO: docker
diff --git a/rainbow/docker/__init__.py b/tests/runners/airflow/tasks/hello_world/hello_world.py
similarity index 97%
copy from rainbow/docker/__init__.py
copy to tests/runners/airflow/tasks/hello_world/hello_world.py
index 8bb1ec2..9b87c05 100644
--- a/rainbow/docker/__init__.py
+++ b/tests/runners/airflow/tasks/hello_world/hello_world.py
@@ -15,4 +15,4 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-# TODO: docker
+print('Hello world!')
diff --git a/tests/runners/airflow/tasks/test_python.py b/tests/runners/airflow/tasks/test_python.py
index 4f5808b..4bbbe9c 100644
--- a/tests/runners/airflow/tasks/test_python.py
+++ b/tests/runners/airflow/tasks/test_python.py
@@ -16,8 +16,11 @@
 # specific language governing permissions and limitations
 # under the License.
 
+import unittest
 from unittest import TestCase
 
+import docker
+
 from rainbow.runners.airflow.operators.kubernetes_pod_operator import \
     ConfigurableKubernetesPodOperator
 from rainbow.runners.airflow.tasks import python
@@ -25,20 +28,14 @@ from tests.util import dag_test_utils
 
 
 class TestPythonTask(TestCase):
+
     def test_apply_task_to_dag(self):
         # TODO: elaborate tests
         dag = dag_test_utils.create_dag()
 
         task_id = 'my_task'
 
-        config = {
-            'task': task_id,
-            'cmd': 'foo bar',
-            'image': 'my_image',
-            'input_type': 'my_input_type',
-            'input_path': 'my_input',
-            'output_path': '/my_output.json'
-        }
+        config = self.__create_conf(task_id)
 
         task0 = python.PythonTask(dag, 'my_pipeline', None, config, 'all_success')
         task0.apply_task_to_dag()
@@ -48,3 +45,35 @@ class TestPythonTask(TestCase):
 
         self.assertIsInstance(dag_task0, ConfigurableKubernetesPodOperator)
         self.assertEqual(dag_task0.task_id, task_id)
+
+    def test_build(self):
+        config = self.__create_conf('my_task')
+
+        task0 = python.PythonTask(None, None, None, config, None)
+        task0.build()
+
+        # TODO: elaborate test of image, validate input/output
+        image_name = config['image']
+
+        docker_client = docker.from_env()
+        docker_client.images.get(image_name)
+        container_log = docker_client.containers.run(image_name, "python hello_world.py")
+        docker_client.close()
+
+        self.assertEqual("b'Hello world!\\n'", str(container_log))
+
+    @staticmethod
+    def __create_conf(task_id):
+        return {
+            'task': task_id,
+            'cmd': 'foo bar',
+            'image': 'my_image',
+            'source': 'tests/runners/airflow/tasks/hello_world',
+            'input_type': 'my_input_type',
+            'input_path': 'my_input',
+            'output_path': '/my_output.json'
+        }
+
+
+if __name__ == '__main__':
+    unittest.main()


[incubator-liminal] 19/43: Fix import bug

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit da4022cacb1fecf22dc3f9abfde12ff602768e24
Author: aviemzur <av...@gmail.com>
AuthorDate: Tue Mar 17 11:16:37 2020 +0200

    Fix import bug
---
 rainbow/core/util/class_util.py | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/rainbow/core/util/class_util.py b/rainbow/core/util/class_util.py
index 59b8543..31e1806 100644
--- a/rainbow/core/util/class_util.py
+++ b/rainbow/core/util/class_util.py
@@ -29,13 +29,14 @@ def find_subclasses_in_packages(packages, parent_class):
     """
     classes = {}
 
-    for package in [a for a in sys.path]:
-        for root, directories, files in os.walk(package):
+    for py_path in [a for a in sys.path]:
+        for root, directories, files in os.walk(py_path):
             for file in files:
                 file_path = os.path.join(root, file)
                 if any(p in file_path for p in packages) \
                         and file.endswith('.py') \
                         and '__pycache__' not in file_path:
+
                     spec = importlib.util.spec_from_file_location(file[:-3], file_path)
                     mod = importlib.util.module_from_spec(spec)
                     spec.loader.exec_module(mod)
@@ -43,7 +44,9 @@ def find_subclasses_in_packages(packages, parent_class):
                         if inspect.isclass(obj) and not obj.__name__.endswith('Mixin'):
                             module_name = mod.__name__
                             class_name = obj.__name__
-                            module = root[len(package) + 1:].replace('/', '.') + '.' + module_name
+                            parent_module = root[len(py_path) + 1:].replace('/', '.')
+                            module = parent_module.replace('airflow.dags.', '') + \
+                                     '.' + module_name
                             clazz = __get_class(module, class_name)
                             if issubclass(clazz, parent_class):
                                 classes.update({module_name: clazz})


[incubator-liminal] 29/43: TODO

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit be9e513fec8a7b47153eff2eac022d1689dfa33a
Author: aviemzur <av...@gmail.com>
AuthorDate: Tue Apr 7 14:46:27 2020 +0300

    TODO
---
 tests/runners/airflow/dag/test_rainbow_dags.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/runners/airflow/dag/test_rainbow_dags.py b/tests/runners/airflow/dag/test_rainbow_dags.py
index d8c1afc..c744ce5 100644
--- a/tests/runners/airflow/dag/test_rainbow_dags.py
+++ b/tests/runners/airflow/dag/test_rainbow_dags.py
@@ -13,6 +13,8 @@ class Test(TestCase):
         self.assertEqual(len(dags), 1)
 
         test_pipeline = dags[0]
+
+        # TODO: elaborate tests to assert all dags have correct tasks
         self.assertEqual(test_pipeline.dag_id, 'my_pipeline')
 
     def test_default_start_task(self):


[incubator-liminal] 21/43: Make paths in tests relative to script location

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit c1d77f21b750402487063ba32ac4439d5382be52
Author: aviemzur <av...@gmail.com>
AuthorDate: Sun Mar 22 10:31:07 2020 +0200

    Make paths in tests relative to script location
---
 .../airflow/build/http/python/test_python_server_image_builder.py  | 7 ++++---
 tests/runners/airflow/build/python/test_python_image_builder.py    | 7 +++++--
 tests/runners/airflow/build/test_build_rainbows.py                 | 5 ++---
 tests/runners/airflow/dag/test_rainbow_dags.py                     | 4 +++-
 tests/runners/airflow/tasks/test_python.py                         | 2 +-
 5 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/tests/runners/airflow/build/http/python/test_python_server_image_builder.py b/tests/runners/airflow/build/http/python/test_python_server_image_builder.py
index 3423976..63fc8fa 100644
--- a/tests/runners/airflow/build/http/python/test_python_server_image_builder.py
+++ b/tests/runners/airflow/build/http/python/test_python_server_image_builder.py
@@ -15,7 +15,7 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-
+import os
 import threading
 import time
 import unittest
@@ -41,8 +41,9 @@ class TestPythonServer(TestCase):
         self.docker_client.close()
 
     def test_build_python_server(self):
+        base_path = os.path.join(os.path.dirname(__file__), '../../../rainbow')
         builder = PythonServerImageBuilder(config=self.config,
-                                           base_path='tests/runners/airflow/rainbow',
+                                           base_path=base_path,
                                            relative_source_path='myserver',
                                            tag=self.image_name)
 
@@ -87,7 +88,7 @@ class TestPythonServer(TestCase):
             'task': task_id,
             'cmd': 'foo bar',
             'image': 'rainbow_server_image',
-            'source': 'tests/runners/airflow/rainbow/myserver',
+            'source': 'baz',
             'input_type': 'my_input_type',
             'input_path': 'my_input',
             'output_path': '/my_output.json',
diff --git a/tests/runners/airflow/build/python/test_python_image_builder.py b/tests/runners/airflow/build/python/test_python_image_builder.py
index c8328da..7376987 100644
--- a/tests/runners/airflow/build/python/test_python_image_builder.py
+++ b/tests/runners/airflow/build/python/test_python_image_builder.py
@@ -15,6 +15,7 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+import os
 from unittest import TestCase
 
 import docker
@@ -29,8 +30,10 @@ class TestPythonImageBuilder(TestCase):
 
         image_name = config['image']
 
+        base_path = os.path.join(os.path.dirname(__file__), '../../rainbow')
+
         builder = PythonImageBuilder(config=config,
-                                     base_path='tests/runners/airflow/rainbow',
+                                     base_path=base_path,
                                      relative_source_path='helloworld',
                                      tag=image_name)
 
@@ -59,7 +62,7 @@ class TestPythonImageBuilder(TestCase):
             'task': task_id,
             'cmd': 'foo bar',
             'image': 'rainbow_image',
-            'source': 'tests/runners/airflow/rainbow/helloworld',
+            'source': 'baz',
             'input_type': 'my_input_type',
             'input_path': 'my_input',
             'output_path': '/my_output.json'
diff --git a/tests/runners/airflow/build/test_build_rainbows.py b/tests/runners/airflow/build/test_build_rainbows.py
index 9a4d31c..c5d8ea7 100644
--- a/tests/runners/airflow/build/test_build_rainbows.py
+++ b/tests/runners/airflow/build/test_build_rainbows.py
@@ -15,7 +15,7 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-
+import os
 import unittest
 from unittest import TestCase
 
@@ -25,7 +25,6 @@ from rainbow.build import build_rainbows
 
 
 class TestBuildRainbows(TestCase):
-
     __image_names = [
         'my_static_input_task_image',
         'my_task_output_input_task_image',
@@ -46,7 +45,7 @@ class TestBuildRainbows(TestCase):
                 self.docker_client.images.remove(image=image_name)
 
     def test_build_rainbow(self):
-        build_rainbows.build_rainbows('tests/runners/airflow/rainbow')
+        build_rainbows.build_rainbows(os.path.join(os.path.dirname(__file__), '../rainbow'))
 
         for image in self.__image_names:
             self.docker_client.images.get(image)
diff --git a/tests/runners/airflow/dag/test_rainbow_dags.py b/tests/runners/airflow/dag/test_rainbow_dags.py
index 2a65f31..c8f2e38 100644
--- a/tests/runners/airflow/dag/test_rainbow_dags.py
+++ b/tests/runners/airflow/dag/test_rainbow_dags.py
@@ -1,3 +1,4 @@
+import os
 from unittest import TestCase
 
 from rainbow.runners.airflow.dag import rainbow_dags
@@ -6,7 +7,8 @@ import unittest
 
 class Test(TestCase):
     def test_register_dags(self):
-        dags = rainbow_dags.register_dags('tests/runners/airflow/rainbow')
+        base_path = os.path.join(os.path.dirname(__file__), '../rainbow')
+        dags = rainbow_dags.register_dags(base_path)
         self.assertEqual(len(dags), 1)
         # TODO: elaborate test
         pass
diff --git a/tests/runners/airflow/tasks/test_python.py b/tests/runners/airflow/tasks/test_python.py
index 18e6c1a..ac295eb 100644
--- a/tests/runners/airflow/tasks/test_python.py
+++ b/tests/runners/airflow/tasks/test_python.py
@@ -50,7 +50,7 @@ class TestPythonTask(TestCase):
             'task': task_id,
             'cmd': 'foo bar',
             'image': 'rainbow_image',
-            'source': 'tests/runners/airflow/rainbow/helloworld',
+            'source': 'baz',
             'input_type': 'my_input_type',
             'input_path': 'my_input',
             'output_path': '/my_output.json'


[incubator-liminal] 24/43: Update README with yml example

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 2d34195321dcc5ddd2e33a37b8042b73f749c09a
Author: aviemzur <av...@gmail.com>
AuthorDate: Mon Mar 23 10:31:27 2020 +0200

    Update README with yml example
---
 README.md                                   | 69 +++++++++++++++++++++++++++++
 rainbow/build/build_rainbows.py             | 15 ++++---
 rainbow/runners/airflow/dag/rainbow_dags.py |  3 +-
 tests/runners/airflow/rainbow/rainbow.yml   | 26 +++++------
 4 files changed, 94 insertions(+), 19 deletions(-)

diff --git a/README.md b/README.md
index 257bb9a..62036e0 100644
--- a/README.md
+++ b/README.md
@@ -9,3 +9,72 @@ Rainbow's goal is to operationalize the machine learning process, allowing data
 quickly transition from a successful experiment to an automated pipeline of model training,
 validation, deployment and inference in production, freeing them from engineering and
 non-functional tasks, and allowing them to focus on machine learning code and artifacts.
+
+# Basics
+
+Using simple YAML configuration, create your own schedule data pipelines (a sequence of tasks to
+perform), application servers,  and more.
+
+## Example YAML config file
+
+name: MyPipeline
+owner: Bosco Albert Baracus
+pipelines:
+  - pipeline: my_pipeline
+    start_date: 1970-01-01
+    timeout_minutes: 45
+    schedule: 0 * 1 * *
+    metrics:
+     namespace: TestNamespace
+     backends: [ 'cloudwatch' ]
+    tasks:
+      - task: my_static_input_task
+        type: python
+        description: static input task
+        image: my_static_input_task_image
+        source: helloworld
+        env_vars:
+          env1: "a"
+          env2: "b"
+        input_type: static
+        input_path: '[ { "foo": "bar" }, { "foo": "baz" } ]'
+        output_path: /output.json
+        cmd: python -u hello_world.py
+      - task: my_parallelized_static_input_task
+        type: python
+        description: parallelized static input task
+        image: my_static_input_task_image
+        env_vars:
+          env1: "a"
+          env2: "b"
+        input_type: static
+        input_path: '[ { "foo": "bar" }, { "foo": "baz" } ]'
+        split_input: True
+        executors: 2
+        cmd: python -u helloworld.py
+      - task: my_task_output_input_task
+        type: python
+        description: parallelized static input task
+        image: my_task_output_input_task_image
+        source: helloworld
+        env_vars:
+          env1: "a"
+          env2: "b"
+        input_type: task
+        input_path: my_static_input_task
+        cmd: python -u hello_world.py
+services:
+  - service:
+    name: my_python_server
+    type: python_server
+    description: my python server
+    image: my_server_image
+    source: myserver
+    endpoints:
+      - endpoint: /myendpoint1
+        module: myserver.my_server
+        function: myendpoint1func
+
+# Installation
+
+TODO: installation.
diff --git a/rainbow/build/build_rainbows.py b/rainbow/build/build_rainbows.py
index fa3a922..4ed5bab 100644
--- a/rainbow/build/build_rainbows.py
+++ b/rainbow/build/build_rainbows.py
@@ -40,12 +40,17 @@ def build_rainbows(path):
 
             for pipeline in rainbow_config['pipelines']:
                 for task in pipeline['tasks']:
-                    task_type = task['type']
-                    builder_class = __get_task_build_class(task_type)
-                    if builder_class:
-                        __build_image(base_path, task, builder_class)
+                    task_name = task['task']
+
+                    if 'source' in task:
+                        task_type = task['type']
+                        builder_class = __get_task_build_class(task_type)
+                        if builder_class:
+                            __build_image(base_path, task, builder_class)
+                        else:
+                            raise ValueError(f'No such task type: {task_type}')
                     else:
-                        raise ValueError(f'No such task type: {task_type}')
+                        print(f'No source configured for task {task_name}, skipping build..')
 
                 for service in rainbow_config['services']:
                     service_type = service['type']
diff --git a/rainbow/runners/airflow/dag/rainbow_dags.py b/rainbow/runners/airflow/dag/rainbow_dags.py
index 71d18d2..17fd8d9 100644
--- a/rainbow/runners/airflow/dag/rainbow_dags.py
+++ b/rainbow/runners/airflow/dag/rainbow_dags.py
@@ -16,7 +16,7 @@
 # specific language governing permissions and limitations
 # under the License.
 
-from datetime import datetime
+from datetime import datetime, timedelta
 
 import yaml
 from airflow import DAG
@@ -56,6 +56,7 @@ def register_dags(configs_path):
                 dag = DAG(
                     dag_id=pipeline_name,
                     default_args=default_args,
+                    dagrun_timeout=timedelta(minutes=pipeline['timeout_minutes']),
                     catchup=False
                 )
 
diff --git a/tests/runners/airflow/rainbow/rainbow.yml b/tests/runners/airflow/rainbow/rainbow.yml
index 27507fd..66e3dec 100644
--- a/tests/runners/airflow/rainbow/rainbow.yml
+++ b/tests/runners/airflow/rainbow/rainbow.yml
@@ -21,7 +21,7 @@ owner: Bosco Albert Baracus
 pipelines:
   - pipeline: my_pipeline
     start_date: 1970-01-01
-    timeout-minutes: 45
+    timeout_minutes: 45
     schedule: 0 * 1 * *
     metrics:
      namespace: TestNamespace
@@ -39,18 +39,18 @@ pipelines:
         input_path: '[ { "foo": "bar" }, { "foo": "baz" } ]'
         output_path: /output.json
         cmd: python -u hello_world.py
-#      - task: my_parallelized_static_input_task
-#        type: python
-#        description: parallelized static input task
-#        image: my_static_input_task_image
-#        env_vars:
-#          env1: "a"
-#          env2: "b"
-#        input_type: static
-#        input_path: '[ { "foo": "bar" }, { "foo": "baz" } ]'
-#        split_input: True
-#        executors: 2
-#        cmd: python -u helloworld.py
+      - task: my_parallelized_static_input_task
+        type: python
+        description: parallelized static input task
+        image: my_static_input_task_image
+        env_vars:
+          env1: "a"
+          env2: "b"
+        input_type: static
+        input_path: '[ { "foo": "bar" }, { "foo": "baz" } ]'
+        split_input: True
+        executors: 2
+        cmd: python -u helloworld.py
       - task: my_task_output_input_task
         type: python
         description: parallelized static input task


[incubator-liminal] 33/43: Add architecture diagrams

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 54fb98781b3522058f322a78cc0919f713b621e5
Author: lior.schachter <li...@naturalint.com>
AuthorDate: Sat Apr 11 12:57:12 2020 +0300

    Add architecture diagrams
---
 images/rainbow_001.png | Bin 0 -> 54903 bytes
 images/rainbow_002.png | Bin 0 -> 34933 bytes
 2 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/images/rainbow_001.png b/images/rainbow_001.png
new file mode 100644
index 0000000..b3f50bb
Binary files /dev/null and b/images/rainbow_001.png differ
diff --git a/images/rainbow_002.png b/images/rainbow_002.png
new file mode 100644
index 0000000..7beed54
Binary files /dev/null and b/images/rainbow_002.png differ


[incubator-liminal] 02/43: First code commit

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 2887b1993ae5d17cc0e172a6f9f828ec4f362e04
Author: aviemzur <av...@gmail.com>
AuthorDate: Sun Mar 8 15:46:22 2020 +0200

    First code commit
---
 .gitignore                                         |   8 +
 LICENSE                                            | 250 +++++++++++++++++++
 rainbow/__init__.py                                |  17 ++
 rainbow/cli/__init__.py                            |  17 ++
 rainbow/core/__init__.py                           |  17 ++
 rainbow/docker/__init__.py                         |  17 ++
 rainbow/http/__init__.py                           |  17 ++
 rainbow/monitoring/__init__.py                     |  17 ++
 rainbow/runners/__init__.py                        |  17 ++
 rainbow/runners/airflow/__init__.py                |  17 ++
 rainbow/runners/airflow/compiler/__init__.py       |  17 ++
 .../runners/airflow/compiler/rainbow_compiler.py   |  26 ++
 rainbow/runners/airflow/dag/__init__.py            |  17 ++
 rainbow/runners/airflow/operators/__init__.py      |  17 ++
 .../runners/airflow/operators/cloudformation.py    | 270 +++++++++++++++++++++
 .../airflow/operators/job_status_operator.py       | 180 ++++++++++++++
 .../airflow/operators/kubernetes_pod_operator.py   | 140 +++++++++++
 rainbow/sql/__init__.py                            |  17 ++
 tests/__init__.py                                  |  17 ++
 tests/runners/__init__.py                          |  17 ++
 tests/runners/airflow/__init__.py                  |  17 ++
 tests/runners/airflow/compiler/__init__.py         |  17 ++
 tests/runners/airflow/compiler/rainbow.yml         | 115 +++++++++
 .../airflow/compiler/test_rainbow_compiler.py      |  33 +++
 tests/runners/airflow/operators/__init__.py        |  17 ++
 25 files changed, 1311 insertions(+)

diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..e14e323
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,8 @@
+.idea
+bin
+include
+lib
+venv
+.Python
+*.pyc
+pip-selfcheck.json
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..8f1552e
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,250 @@
+                              Apache License
+                        Version 2.0, January 2004
+                     http://www.apache.org/licenses/
+
+TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+1. Definitions.
+
+   "License" shall mean the terms and conditions for use, reproduction,
+   and distribution as defined by Sections 1 through 9 of this document.
+
+   "Licensor" shall mean the copyright owner or entity authorized by
+   the copyright owner that is granting the License.
+
+   "Legal Entity" shall mean the union of the acting entity and all
+   other entities that control, are controlled by, or are under common
+   control with that entity. For the purposes of this definition,
+   "control" means (i) the power, direct or indirect, to cause the
+   direction or management of such entity, whether by contract or
+   otherwise, or (ii) ownership of fifty percent (50%) or more of the
+   outstanding shares, or (iii) beneficial ownership of such entity.
+
+   "You" (or "Your") shall mean an individual or Legal Entity
+   exercising permissions granted by this License.
+
+   "Source" form shall mean the preferred form for making modifications,
+   including but not limited to software source code, documentation
+   source, and configuration files.
+
+   "Object" form shall mean any form resulting from mechanical
+   transformation or translation of a Source form, including but
+   not limited to compiled object code, generated documentation,
+   and conversions to other media types.
+
+   "Work" shall mean the work of authorship, whether in Source or
+   Object form, made available under the License, as indicated by a
+   copyright notice that is included in or attached to the work
+   (an example is provided in the Appendix below).
+
+   "Derivative Works" shall mean any work, whether in Source or Object
+   form, that is based on (or derived from) the Work and for which the
+   editorial revisions, annotations, elaborations, or other modifications
+   represent, as a whole, an original work of authorship. For the purposes
+   of this License, Derivative Works shall not include works that remain
+   separable from, or merely link (or bind by name) to the interfaces of,
+   the Work and Derivative Works thereof.
+
+   "Contribution" shall mean any work of authorship, including
+   the original version of the Work and any modifications or additions
+   to that Work or Derivative Works thereof, that is intentionally
+   submitted to Licensor for inclusion in the Work by the copyright owner
+   or by an individual or Legal Entity authorized to submit on behalf of
+   the copyright owner. For the purposes of this definition, "submitted"
+   means any form of electronic, verbal, or written communication sent
+   to the Licensor or its representatives, including but not limited to
+   communication on electronic mailing lists, source code control systems,
+   and issue tracking systems that are managed by, or on behalf of, the
+   Licensor for the purpose of discussing and improving the Work, but
+   excluding communication that is conspicuously marked or otherwise
+   designated in writing by the copyright owner as "Not a Contribution."
+
+   "Contributor" shall mean Licensor and any individual or Legal Entity
+   on behalf of whom a Contribution has been received by Licensor and
+   subsequently incorporated within the Work.
+
+2. Grant of Copyright License. Subject to the terms and conditions of
+   this License, each Contributor hereby grants to You a perpetual,
+   worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+   copyright license to reproduce, prepare Derivative Works of,
+   publicly display, publicly perform, sublicense, and distribute the
+   Work and such Derivative Works in Source or Object form.
+
+3. Grant of Patent License. Subject to the terms and conditions of
+   this License, each Contributor hereby grants to You a perpetual,
+   worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+   (except as stated in this section) patent license to make, have made,
+   use, offer to sell, sell, import, and otherwise transfer the Work,
+   where such license applies only to those patent claims licensable
+   by such Contributor that are necessarily infringed by their
+   Contribution(s) alone or by combination of their Contribution(s)
+   with the Work to which such Contribution(s) was submitted. If You
+   institute patent litigation against any entity (including a
+   cross-claim or counterclaim in a lawsuit) alleging that the Work
+   or a Contribution incorporated within the Work constitutes direct
+   or contributory patent infringement, then any patent licenses
+   granted to You under this License for that Work shall terminate
+   as of the date such litigation is filed.
+
+4. Redistribution. You may reproduce and distribute copies of the
+   Work or Derivative Works thereof in any medium, with or without
+   modifications, and in Source or Object form, provided that You
+   meet the following conditions:
+
+   (a) You must give any other recipients of the Work or
+       Derivative Works a copy of this License; and
+
+   (b) You must cause any modified files to carry prominent notices
+       stating that You changed the files; and
+
+   (c) You must retain, in the Source form of any Derivative Works
+       that You distribute, all copyright, patent, trademark, and
+       attribution notices from the Source form of the Work,
+       excluding those notices that do not pertain to any part of
+       the Derivative Works; and
+
+   (d) If the Work includes a "NOTICE" text file as part of its
+       distribution, then any Derivative Works that You distribute must
+       include a readable copy of the attribution notices contained
+       within such NOTICE file, excluding those notices that do not
+       pertain to any part of the Derivative Works, in at least one
+       of the following places: within a NOTICE text file distributed
+       as part of the Derivative Works; within the Source form or
+       documentation, if provided along with the Derivative Works; or,
+       within a display generated by the Derivative Works, if and
+       wherever such third-party notices normally appear. The contents
+       of the NOTICE file are for informational purposes only and
+       do not modify the License. You may add Your own attribution
+       notices within Derivative Works that You distribute, alongside
+       or as an addendum to the NOTICE text from the Work, provided
+       that such additional attribution notices cannot be construed
+       as modifying the License.
+
+   You may add Your own copyright statement to Your modifications and
+   may provide additional or different license terms and conditions
+   for use, reproduction, or distribution of Your modifications, or
+   for any such Derivative Works as a whole, provided Your use,
+   reproduction, and distribution of the Work otherwise complies with
+   the conditions stated in this License.
+
+5. Submission of Contributions. Unless You explicitly state otherwise,
+   any Contribution intentionally submitted for inclusion in the Work
+   by You to the Licensor shall be under the terms and conditions of
+   this License, without any additional terms or conditions.
+   Notwithstanding the above, nothing herein shall supersede or modify
+   the terms of any separate license agreement you may have executed
+   with Licensor regarding such Contributions.
+
+6. Trademarks. This License does not grant permission to use the trade
+   names, trademarks, service marks, or product names of the Licensor,
+   except as required for reasonable and customary use in describing the
+   origin of the Work and reproducing the content of the NOTICE file.
+
+7. Disclaimer of Warranty. Unless required by applicable law or
+   agreed to in writing, Licensor provides the Work (and each
+   Contributor provides its Contributions) on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+   implied, including, without limitation, any warranties or conditions
+   of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+   PARTICULAR PURPOSE. You are solely responsible for determining the
+   appropriateness of using or redistributing the Work and assume any
+   risks associated with Your exercise of permissions under this License.
+
+8. Limitation of Liability. In no event and under no legal theory,
+   whether in tort (including negligence), contract, or otherwise,
+   unless required by applicable law (such as deliberate and grossly
+   negligent acts) or agreed to in writing, shall any Contributor be
+   liable to You for damages, including any direct, indirect, special,
+   incidental, or consequential damages of any character arising as a
+   result of this License or out of the use or inability to use the
+   Work (including but not limited to damages for loss of goodwill,
+   work stoppage, computer failure or malfunction, or any and all
+   other commercial damages or losses), even if such Contributor
+   has been advised of the possibility of such damages.
+
+9. Accepting Warranty or Additional Liability. While redistributing
+   the Work or Derivative Works thereof, You may choose to offer,
+   and charge a fee for, acceptance of support, warranty, indemnity,
+   or other liability obligations and/or rights consistent with this
+   License. However, in accepting such obligations, You may act only
+   on Your own behalf and on Your sole responsibility, not on behalf
+   of any other Contributor, and only if You agree to indemnify,
+   defend, and hold each Contributor harmless for any liability
+   incurred by, or claims asserted against, such Contributor by reason
+   of your accepting any such warranty or additional liability.
+
+END OF TERMS AND CONDITIONS
+
+APPENDIX: How to apply the Apache License to your work.
+
+   To apply the Apache License to your work, attach the following
+   boilerplate notice, with the fields enclosed by brackets "[]"
+   replaced with your own identifying information. (Don't include
+   the brackets!)  The text should be enclosed in the appropriate
+   comment syntax for the file format. We also recommend that a
+   file or class name and description of purpose be included on the
+   same "printed page" as the copyright notice for easier
+   identification within third-party archives.
+
+Copyright [yyyy] [name of copyright owner]
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+============================================================================
+   APACHE AIRFLOW SUBCOMPONENTS:
+
+   The Apache Airflow project contains subcomponents with separate copyright
+   notices and license terms. Your use of the source code for the these
+   subcomponents is subject to the terms and conditions of the following
+   licenses.
+
+
+========================================================================
+Third party Apache 2.0 licenses
+========================================================================
+
+The following components are provided under the Apache 2.0 License.
+See project link for details. The text of each license is also included
+at licenses/LICENSE-[project].txt.
+
+    (ALv2 License) hue v4.3.0 (https://github.com/cloudera/hue/)
+    (ALv2 License) jqclock v2.3.0 (https://github.com/JohnRDOrazio/jQuery-Clock-Plugin)
+    (ALv2 License) bootstrap3-typeahead v4.0.2 (https://github.com/bassjobsen/Bootstrap-3-Typeahead)
+    (ALv2 License) airflow.contrib.auth.backends.github_enterprise_auth
+
+========================================================================
+MIT licenses
+========================================================================
+
+The following components are provided under the MIT License. See project link for details.
+The text of each license is also included at licenses/LICENSE-[project].txt.
+
+    (MIT License) jquery v3.4.1 (https://jquery.org/license/)
+    (MIT License) dagre-d3 v0.6.4 (https://github.com/cpettitt/dagre-d3)
+    (MIT License) bootstrap v3.2 (https://github.com/twbs/bootstrap/)
+    (MIT License) d3-tip v0.9.1 (https://github.com/Caged/d3-tip)
+    (MIT License) dataTables v1.10.20 (https://datatables.net)
+    (MIT License) Bootstrap Toggle v2.2.2 (http://www.bootstraptoggle.com)
+    (MIT License) normalize.css v3.0.2 (http://necolas.github.io/normalize.css/)
+    (MIT License) ElasticMock v1.3.2 (https://github.com/vrcmarcos/elasticmock)
+    (MIT License) MomentJS v2.24.0 (http://momentjs.com/)
+    (MIT License) python-slugify v2.0.1 (https://github.com/un33k/python-slugify)
+    (MIT License) python-nvd3 v0.15.0 (https://github.com/areski/python-nvd3)
+
+========================================================================
+BSD 3-Clause licenses
+========================================================================
+The following components are provided under the BSD 3-Clause license. See project links for details.
+The text of each license is also included at licenses/LICENSE-[project].txt.
+
+    (BSD 3 License) d3 v5.15.0 (https://d3js.org)
diff --git a/rainbow/__init__.py b/rainbow/__init__.py
new file mode 100644
index 0000000..217e5db
--- /dev/null
+++ b/rainbow/__init__.py
@@ -0,0 +1,17 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/rainbow/cli/__init__.py b/rainbow/cli/__init__.py
new file mode 100644
index 0000000..217e5db
--- /dev/null
+++ b/rainbow/cli/__init__.py
@@ -0,0 +1,17 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/rainbow/core/__init__.py b/rainbow/core/__init__.py
new file mode 100644
index 0000000..217e5db
--- /dev/null
+++ b/rainbow/core/__init__.py
@@ -0,0 +1,17 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/rainbow/docker/__init__.py b/rainbow/docker/__init__.py
new file mode 100644
index 0000000..217e5db
--- /dev/null
+++ b/rainbow/docker/__init__.py
@@ -0,0 +1,17 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/rainbow/http/__init__.py b/rainbow/http/__init__.py
new file mode 100644
index 0000000..217e5db
--- /dev/null
+++ b/rainbow/http/__init__.py
@@ -0,0 +1,17 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/rainbow/monitoring/__init__.py b/rainbow/monitoring/__init__.py
new file mode 100644
index 0000000..217e5db
--- /dev/null
+++ b/rainbow/monitoring/__init__.py
@@ -0,0 +1,17 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/rainbow/runners/__init__.py b/rainbow/runners/__init__.py
new file mode 100644
index 0000000..217e5db
--- /dev/null
+++ b/rainbow/runners/__init__.py
@@ -0,0 +1,17 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/rainbow/runners/airflow/__init__.py b/rainbow/runners/airflow/__init__.py
new file mode 100644
index 0000000..217e5db
--- /dev/null
+++ b/rainbow/runners/airflow/__init__.py
@@ -0,0 +1,17 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/rainbow/runners/airflow/compiler/__init__.py b/rainbow/runners/airflow/compiler/__init__.py
new file mode 100644
index 0000000..217e5db
--- /dev/null
+++ b/rainbow/runners/airflow/compiler/__init__.py
@@ -0,0 +1,17 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/rainbow/runners/airflow/compiler/rainbow_compiler.py b/rainbow/runners/airflow/compiler/rainbow_compiler.py
new file mode 100644
index 0000000..818fdc5
--- /dev/null
+++ b/rainbow/runners/airflow/compiler/rainbow_compiler.py
@@ -0,0 +1,26 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compiler for rainbows.
+"""
+import yaml
+
+
+def parse_yaml(path):
+    with open(path, 'r') as stream:
+        return yaml.safe_load(stream)
diff --git a/rainbow/runners/airflow/dag/__init__.py b/rainbow/runners/airflow/dag/__init__.py
new file mode 100644
index 0000000..217e5db
--- /dev/null
+++ b/rainbow/runners/airflow/dag/__init__.py
@@ -0,0 +1,17 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/rainbow/runners/airflow/operators/__init__.py b/rainbow/runners/airflow/operators/__init__.py
new file mode 100644
index 0000000..217e5db
--- /dev/null
+++ b/rainbow/runners/airflow/operators/__init__.py
@@ -0,0 +1,17 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/rainbow/runners/airflow/operators/cloudformation.py b/rainbow/runners/airflow/operators/cloudformation.py
new file mode 100644
index 0000000..0a70e5a
--- /dev/null
+++ b/rainbow/runners/airflow/operators/cloudformation.py
@@ -0,0 +1,270 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+This module contains CloudFormation create/delete stack operators.
+Can be removed when Airflow 2.0.0 is released.
+"""
+from typing import List
+
+from airflow.contrib.hooks.aws_hook import AwsHook
+from airflow.models import BaseOperator
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.utils.decorators import apply_defaults
+from botocore.exceptions import ClientError
+
+
+# noinspection PyAbstractClass
+class CloudFormationHook(AwsHook):
+    """
+    Interact with AWS CloudFormation.
+    """
+
+    def __init__(self, region_name=None, *args, **kwargs):
+        self.region_name = region_name
+        self.conn = None
+        super().__init__(*args, **kwargs)
+
+    def get_conn(self):
+        self.conn = self.get_client_type('cloudformation', self.region_name)
+        return self.conn
+
+
+class BaseCloudFormationOperator(BaseOperator):
+    """
+    Base operator for CloudFormation operations.
+
+    :param params: parameters to be passed to CloudFormation.
+    :type dict
+    :param aws_conn_id: aws connection to uses
+    :type aws_conn_id: str
+    """
+    template_fields: List[str] = []
+    template_ext = ()
+    ui_color = '#1d472b'
+    ui_fgcolor = '#FFF'
+
+    @apply_defaults
+    def __init__(
+            self,
+            params,
+            aws_conn_id='aws_default',
+            *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.params = params
+        self.aws_conn_id = aws_conn_id
+
+    def execute(self, context):
+        self.log.info('Parameters: %s', self.params)
+
+        self.cloudformation_op(CloudFormationHook(aws_conn_id=self.aws_conn_id).get_conn())
+
+    def cloudformation_op(self, cloudformation):
+        """
+        This is the main method to run CloudFormation operation.
+        """
+        raise NotImplementedError()
+
+
+class CloudFormationCreateStackOperator(BaseCloudFormationOperator):
+    """
+    An operator that creates a CloudFormation stack.
+
+    :param params: parameters to be passed to CloudFormation. For possible arguments see:
+            https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/cloudformation.html#CloudFormation.Client.create_stack
+    :type dict
+    :param aws_conn_id: aws connection to uses
+    :type aws_conn_id: str
+    """
+    template_fields: List[str] = []
+    template_ext = ()
+    ui_color = '#6b9659'
+
+    @apply_defaults
+    def __init__(
+            self,
+            params,
+            aws_conn_id='aws_default',
+            *args, **kwargs):
+        super().__init__(params=params, aws_conn_id=aws_conn_id, *args, **kwargs)
+
+    def cloudformation_op(self, cloudformation):
+        cloudformation.create_stack(**self.params)
+
+
+class CloudFormationDeleteStackOperator(BaseCloudFormationOperator):
+    """
+    An operator that deletes a CloudFormation stack.
+
+    :param params: parameters to be passed to CloudFormation. For possible arguments see:
+            https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/cloudformation.html#CloudFormation.Client.delete_stack
+    :type dict
+    :param aws_conn_id: aws connection to uses
+    :type aws_conn_id: str
+    """
+    template_fields: List[str] = []
+    template_ext = ()
+    ui_color = '#1d472b'
+    ui_fgcolor = '#FFF'
+
+    @apply_defaults
+    def __init__(
+            self,
+            params,
+            aws_conn_id='aws_default',
+            *args, **kwargs):
+        super().__init__(params=params, aws_conn_id=aws_conn_id, *args, **kwargs)
+
+    def cloudformation_op(self, cloudformation):
+        cloudformation.delete_stack(**self.params)
+
+
+class BaseCloudFormationSensor(BaseSensorOperator):
+    """
+    Waits for a stack operation to complete on AWS CloudFormation.
+
+    :param stack_name: The name of the stack to wait for (templated)
+    :type stack_name: str
+    :param aws_conn_id: ID of the Airflow connection where credentials and extra configuration are
+        stored
+    :type aws_conn_id: str
+    :param poke_interval: Time in seconds that the job should wait between each try
+    :type poke_interval: int
+    """
+
+    @apply_defaults
+    def __init__(self,
+                 stack_name,
+                 complete_status,
+                 in_progress_status,
+                 aws_conn_id='aws_default',
+                 poke_interval=30,
+                 *args,
+                 **kwargs):
+        super().__init__(poke_interval=poke_interval, *args, **kwargs)
+        self.aws_conn_id = aws_conn_id
+        self.stack_name = stack_name
+        self.complete_status = complete_status
+        self.in_progress_status = in_progress_status
+        self.hook = None
+
+    def poke(self, context):
+        """
+        Checks for existence of the stack in AWS CloudFormation.
+        """
+        cloudformation = self.get_hook().get_conn()
+
+        self.log.info('Poking for stack %s', self.stack_name)
+
+        try:
+            stacks = cloudformation.describe_stacks(StackName=self.stack_name)['Stacks']
+            stack_status = stacks[0]['StackStatus']
+            if stack_status == self.complete_status:
+                return True
+            elif stack_status == self.in_progress_status:
+                return False
+            else:
+                raise ValueError(f'Stack {self.stack_name} in bad state: {stack_status}')
+        except ClientError as e:
+            if 'does not exist' in str(e):
+                if not self.allow_non_existing_stack_status():
+                    raise ValueError(f'Stack {self.stack_name} does not exist')
+                else:
+                    return True
+            else:
+                raise e
+
+    def get_hook(self):
+        """
+        Gets the AwsGlueCatalogHook
+        """
+        if not self.hook:
+            self.hook = CloudFormationHook(aws_conn_id=self.aws_conn_id)
+
+        return self.hook
+
+    def allow_non_existing_stack_status(self):
+        """
+        Boolean value whether or not sensor should allow non existing stack responses.
+        """
+        return False
+
+
+class CloudFormationCreateStackSensor(BaseCloudFormationSensor):
+    """
+    Waits for a stack to be created successfully on AWS CloudFormation.
+
+    :param stack_name: The name of the stack to wait for (templated)
+    :type stack_name: str
+    :param aws_conn_id: ID of the Airflow connection where credentials and extra configuration are
+        stored
+    :type aws_conn_id: str
+    :param poke_interval: Time in seconds that the job should wait between each try
+    :type poke_interval: int
+    """
+
+    template_fields = ['stack_name']
+    ui_color = '#C5CAE9'
+
+    @apply_defaults
+    def __init__(self,
+                 stack_name,
+                 aws_conn_id='aws_default',
+                 poke_interval=30,
+                 *args,
+                 **kwargs):
+        super().__init__(stack_name=stack_name,
+                         complete_status='CREATE_COMPLETE',
+                         in_progress_status='CREATE_IN_PROGRESS',
+                         aws_conn_id=aws_conn_id,
+                         poke_interval=poke_interval,
+                         *args,
+                         **kwargs)
+
+
+class CloudFormationDeleteStackSensor(BaseCloudFormationSensor):
+    """
+    Waits for a stack to be deleted successfully on AWS CloudFormation.
+
+    :param stack_name: The name of the stack to wait for (templated)
+    :type stack_name: str
+    :param aws_conn_id: ID of the Airflow connection where credentials and extra configuration are
+        stored
+    :type aws_conn_id: str
+    :param poke_interval: Time in seconds that the job should wait between each try
+    :type poke_interval: int
+    """
+
+    template_fields = ['stack_name']
+    ui_color = '#C5CAE9'
+
+    @apply_defaults
+    def __init__(self,
+                 stack_name,
+                 aws_conn_id='aws_default',
+                 poke_interval=30,
+                 *args,
+                 **kwargs):
+        super().__init__(stack_name=stack_name,
+                         complete_status='DELETE_COMPLETE',
+                         in_progress_status='DELETE_IN_PROGRESS',
+                         aws_conn_id=aws_conn_id,
+                         poke_interval=poke_interval, *args, **kwargs)
+
+    def allow_non_existing_stack_status(self):
+        return True
diff --git a/rainbow/runners/airflow/operators/job_status_operator.py b/rainbow/runners/airflow/operators/job_status_operator.py
new file mode 100644
index 0000000..dc318e5
--- /dev/null
+++ b/rainbow/runners/airflow/operators/job_status_operator.py
@@ -0,0 +1,180 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from datetime import datetime
+
+import pytz
+from airflow.contrib.hooks.aws_hook import AwsHook
+from airflow.exceptions import AirflowException
+from airflow.models import BaseOperator
+from airflow.utils.decorators import apply_defaults
+
+
+class JobStatusOperator(BaseOperator):
+    """
+    Base operator for job status operators.
+    """
+    template_ext = ()
+
+    @apply_defaults
+    def __init__(
+            self,
+            backends,
+            *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.backends = backends
+        self.cloudwatch = CloudWatchHook()
+
+    def execute(self, context):
+        for backend in self.backends:
+            if backend in self.report_functions:
+                for metric in self.metrics(context):
+                    self.report_functions[backend](self, metric)
+            else:
+                raise AirflowException('No such metrics backend: {}'.format(backend))
+
+    def metrics(self, context):
+        raise NotImplementedError
+
+    def send_metric_to_cloudwatch(self, metric):
+        self.cloudwatch.put_metric_data(metric)
+
+    report_functions = {
+        'cloudwatch': send_metric_to_cloudwatch
+    }
+
+
+class JobStartOperator(JobStatusOperator):
+    ui_color = '#c5e5e8'
+
+    def __init__(
+            self,
+            namespace,
+            application_name,
+            backends,
+            *args, **kwargs):
+        super().__init__(backends=backends, *args, **kwargs)
+        self.namespace = namespace
+        self.application_name = application_name
+
+    def metrics(self, context):
+        return [Metric(self.namespace, 'JobStarted', 1,
+                       [Tag('ApplicationName', self.application_name)])]
+
+
+class JobEndOperator(JobStatusOperator):
+    ui_color = '#6d8fad'
+
+    def __init__(
+            self,
+            namespace,
+            application_name,
+            backends,
+            *args, **kwargs):
+        super().__init__(backends=backends, *args, **kwargs)
+        self.namespace = namespace
+        self.application_name = application_name
+
+    def metrics(self, context):
+        duration = round((pytz.utc.localize(datetime.utcnow()) - context[
+            'ti'].get_dagrun().start_date).total_seconds())
+
+        self.log.info('Elapsed time: %s' % duration)
+
+        task_instances = context['dag_run'].get_task_instances()
+        task_states = [task_instance.state for task_instance in task_instances[:-1]]
+
+        job_result = 0
+        if all(state == 'success' for state in task_states):
+            job_result = 1
+
+        return [
+            Metric(self.namespace, 'JobResult', job_result,
+                   [Tag('ApplicationName', self.application_name)]),
+            Metric(self.namespace, 'JobDuration', duration,
+                   [Tag('ApplicationName', self.application_name)])
+        ]
+
+
+# noinspection PyAbstractClass
+class CloudWatchHook(AwsHook):
+    """
+    Interact with AWS CloudWatch.
+    """
+
+    def __init__(self, region_name=None, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.region_name = region_name
+        self.conn = self.get_client_type('cloudwatch', self.region_name)
+
+    def get_conn(self):
+        return self.conn
+
+    def put_metric_data(self, metric):
+        value = metric.value
+
+        cloudwatch = self.get_conn()
+
+        dimensions = [{'Name': tag.name, 'Value': tag.value} for tag in metric.tags]
+
+        cloudwatch.put_metric_data(
+            Namespace=metric.namespace,
+            MetricData=[
+                {
+                    'MetricName': metric.name,
+                    'Dimensions': dimensions,
+                    'Timestamp': datetime.utcnow(),
+                    'Value': value,
+                    'Unit': 'None'
+                }
+            ]
+        )
+
+
+class Metric:
+    """
+    Metric.
+    :param namespace: namespace.
+    :type name: str
+    :param name: name.
+    :type name: str
+    :param value: value.
+    :type value: float
+    :param tags: list of tags.
+    :type tags: List[str]
+    """
+
+    def __init__(
+            self,
+            namespace,
+            name,
+            value,
+            tags):
+        self.namespace = namespace
+        self.name = name
+        self.value = value
+        self.tags = tags
+
+
+class Tag:
+    def __init__(
+            self,
+            name,
+            value):
+        self.name = name
+        self.value = value
diff --git a/rainbow/runners/airflow/operators/kubernetes_pod_operator.py b/rainbow/runners/airflow/operators/kubernetes_pod_operator.py
new file mode 100644
index 0000000..a7b0bdd
--- /dev/null
+++ b/rainbow/runners/airflow/operators/kubernetes_pod_operator.py
@@ -0,0 +1,140 @@
+from airflow.contrib.operators.kubernetes_pod_operator import KubernetesPodOperator
+import json
+import traceback
+from airflow.models import DAG, TaskInstance
+from airflow.utils import timezone
+from random import randint
+
+
+def split_list(seq, num):
+    avg = len(seq) / float(num)
+    out = []
+    last = 0.0
+
+    while last < len(seq):
+        out.append(seq[int(last):int(last + avg)])
+        last += avg
+
+    return out
+
+
+class ConfigureParallelExecutionOperator(KubernetesPodOperator):
+
+    def __init__(self,
+                 config_type=None,
+                 config_path=None,
+                 executors=1,
+                 *args,
+                 **kwargs):
+        namespace = kwargs['namespace']
+        image = kwargs['image']
+        name = kwargs['name']
+
+        del kwargs['namespace']
+        del kwargs['image']
+        del kwargs['name']
+
+        super().__init__(
+            namespace=namespace,
+            image=image,
+            name=name,
+            *args,
+            **kwargs)
+        self.config_type = config_type
+        self.config_path = config_path
+        self.executors = executors
+
+    def execute(self, context):
+        config_dict = {}
+
+        self.log.info(f'config type: {self.config_type}')
+
+        if self.config_type:
+            if self.config_type == 'file':
+                config_dict = {}  # future feature: return config from file
+            elif self.config_type == 'sql':
+                config_dict = {}  # future feature: return from sql config
+            elif self.config_type == 'task':
+                ti = context['task_instance']
+                self.log.info(self.config_path)
+                config_dict = ti.xcom_pull(task_ids=self.config_path)
+            elif self.config_type == 'static':
+                config_dict = json.loads(self.config_path)
+            else:
+                raise ValueError(f'Unknown config type: {self.config_type}')
+
+        run_id = context['dag_run'].run_id
+
+        return_conf = {'config_type': self.config_type,
+                       'splits': {'0': {'run_id': run_id, 'configs': []}}}
+
+        if config_dict:
+            self.log.info(f'configs dict: {config_dict}')
+
+            configs = config_dict['configs']
+
+            self.log.info(f'configs: {configs}')
+
+            config_splits = split_list(configs, self.executors)
+
+            for i in range(self.executors):
+                return_conf['splits'][str(i)] = {'run_id': run_id, 'configs': config_splits[i]}
+
+        return return_conf
+
+    def run_pod(self, context):
+        return super().execute(context)
+
+
+class ConfigurableKubernetesPodOperator(KubernetesPodOperator):
+
+    def __init__(self,
+                 config_task_id,
+                 task_split,
+                 *args,
+                 **kwargs):
+        namespace = kwargs['namespace']
+        image = kwargs['image']
+        name = kwargs['name']
+
+        del kwargs['namespace']
+        del kwargs['image']
+        del kwargs['name']
+
+        super().__init__(
+            namespace=namespace,
+            image=image,
+            name=name,
+            *args,
+            **kwargs)
+
+        self.config_task_id = config_task_id
+        self.task_split = task_split
+
+    def execute(self, context):
+        if self.config_task_id:
+            ti = context['task_instance']
+
+            config = ti.xcom_pull(task_ids=self.config_task_id)
+
+            if config:
+                split = {}
+
+                if 'configs' in config:
+                    split = configs
+                else:
+                    split = config['splits'][str(self.task_split)]
+
+                self.log.info(split)
+
+                if split and split['configs']:
+                    self.env_vars.update({'DATA_PIPELINE_CONFIG': json.dumps(split)})
+                    return super().execute(context)
+                else:
+                    self.log.info(
+                        f'Empty split config for split {self.task_split}. split config: {split}. config: {config}')
+            else:
+                raise ValueError('Config not found in task: ' + self.config_task_id)
+        else:
+            self.env_vars.update({'DATA_PIPELINE_CONFIG': '{}'})
+            return super().execute(context)
diff --git a/rainbow/sql/__init__.py b/rainbow/sql/__init__.py
new file mode 100644
index 0000000..217e5db
--- /dev/null
+++ b/rainbow/sql/__init__.py
@@ -0,0 +1,17 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/tests/__init__.py b/tests/__init__.py
new file mode 100644
index 0000000..217e5db
--- /dev/null
+++ b/tests/__init__.py
@@ -0,0 +1,17 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/tests/runners/__init__.py b/tests/runners/__init__.py
new file mode 100644
index 0000000..217e5db
--- /dev/null
+++ b/tests/runners/__init__.py
@@ -0,0 +1,17 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/tests/runners/airflow/__init__.py b/tests/runners/airflow/__init__.py
new file mode 100644
index 0000000..217e5db
--- /dev/null
+++ b/tests/runners/airflow/__init__.py
@@ -0,0 +1,17 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/tests/runners/airflow/compiler/__init__.py b/tests/runners/airflow/compiler/__init__.py
new file mode 100644
index 0000000..217e5db
--- /dev/null
+++ b/tests/runners/airflow/compiler/__init__.py
@@ -0,0 +1,17 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/tests/runners/airflow/compiler/rainbow.yml b/tests/runners/airflow/compiler/rainbow.yml
new file mode 100644
index 0000000..45333a8
--- /dev/null
+++ b/tests/runners/airflow/compiler/rainbow.yml
@@ -0,0 +1,115 @@
+
+---
+name: MyPipeline
+owner: Bosco Albert Baracus
+pipeline:
+  timeout-minutes: 45
+  schedule: 0 * 1 * *
+  metrics-namespace: TestNamespace
+  tasks:
+    - name: mytask1
+      type: sql
+      description: mytask1 is cool
+      query: "select * from mytable"
+      overrides:
+        - prod:
+          partition-columns: dt
+          output-table: test.test_impression_prod
+          output-path: s3://mybucket/myproject-test/impression
+          emr-cluster-name: spark-playground-prod
+        - stg:
+          query: "select * from mytable"
+          partition-columns: dt
+          output-table: test.test_impression_stg
+          output-path: s3://mybucket/haya-test/impression
+          emr-cluster-name: spark-playground-staging
+      tasks:
+        - name: my_static_config_task
+          type: python
+          description: my 1st ds task
+          artifact-id: mytask1artifactid
+          source: mytask1folder
+          env-vars:
+            env1: "a"
+            env2: "b"
+          config-type: static
+          config-path: "{\"configs\": [ { \"campaign_id\": 10 }, { \"campaign_id\": 20 } ]}"
+          cmd: python -u my_app.py
+        - task:
+          name: my_no_config_task
+          type: python
+          description: my 2nd ds task
+          artifact-id: mytask1artifactid
+          env-vars:
+            env1: "a"
+            env2: "b"
+          request-cpu: 100m
+          request-memory: 65M
+          cmd: python -u my_app.py foo bar
+        - task:
+          name: my_create_custom_config_task
+          type: python
+          description: my 2nd ds task
+          artifact-id: myconftask
+          source: myconftask
+          output-config-path: /my_conf.json
+          env-vars:
+            env1: "a"
+            env2: "b"
+          cmd: python -u my_app.py foo bar
+        - task:
+          name: my_custom_config_task
+          type: python
+          description: my 2nd ds task
+          artifact-id: mytask1artifactid
+          config-type: task
+          config-path: my_create_custom_config_task
+          env-vars:
+            env1: "a"
+            env2: "b"
+          cmd: python -u my_app.py foo bar
+        - task:
+          name: my_parallelized_static_config_task
+          type: python
+          description: my 3rd ds task
+          artifact-id: mytask1artifactid
+          executors: 5
+          env-vars:
+            env1: "x"
+            env2: "y"
+            myconf: $CONFIG_FILE
+          config-type: static
+          config-path: "{\"configs\": [ { \"campaign_id\": 10 }, { \"campaign_id\": 20 }, { \"campaign_id\": 30 }, { \"campaign_id\": 40 }, { \"campaign_id\": 50 }, { \"campaign_id\": 60 }, { \"campaign_id\": 70 }, { \"campaign_id\": 80 } ]}"
+          cmd: python -u my_app.py $CONFIG_FILE
+        - task:
+          name: my_parallelized_custom_config_task
+          type: python
+          description: my 4th ds task
+          artifact-id: mytask1artifactid
+          executors: 5
+          config-type: task
+          config-path: my_create_custom_config_task
+          cmd: python -u my_app.py
+        - task:
+          name: my_parallelized_no_config_task
+          type: python
+          description: my 4th ds task
+          artifact-id: mytask1artifactid
+          executors: 5
+          cmd: python -u my_app.py
+services:
+  - service:
+    name: myserver1
+    type: python-server
+    description: my python server
+    artifact-id: myserver1artifactid
+    source: myserver1logicfolder
+    endpoints:
+      - endpoint:
+        path: /myendpoint1
+        module: mymodule1
+        function: myfun1
+      - endpoint:
+        path: /myendpoint2
+        module: mymodule2
+        function: myfun2
diff --git a/tests/runners/airflow/compiler/test_rainbow_compiler.py b/tests/runners/airflow/compiler/test_rainbow_compiler.py
new file mode 100644
index 0000000..6e73d8f
--- /dev/null
+++ b/tests/runners/airflow/compiler/test_rainbow_compiler.py
@@ -0,0 +1,33 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import unittest
+
+from rainbow.runners.airflow.compiler import rainbow_compiler
+
+
+class TestRainbowCompiler(unittest.TestCase):
+
+    def test_parse(self):
+        expected = {'name': 'MyPipeline', 'owner': 'Bosco Albert Baracus', 'pipeline': {'timeout-minutes': 45, 'schedule': '0 * 1 * *', 'metrics-namespace': 'TestNamespace', 'tasks': [{'name': 'mytask1', 'type': 'sql', 'description': 'mytask1 is cool', 'query': 'select * from mytable', 'overrides': [{'prod': None, 'partition-columns': 'dt', 'output-table': 'test.test_impression_prod', 'output-path': 's3://mybucket/myproject-test/impression', 'emr-cluster-name': 'spark-playground-prod'},  [...]
+        actual = rainbow_compiler.parse_yaml('tests/runners/airflow/compiler/rainbow.yml')
+        self.assertEqual(expected, actual)
+
+
+if __name__ == '__main__':
+    unittest.main()
diff --git a/tests/runners/airflow/operators/__init__.py b/tests/runners/airflow/operators/__init__.py
new file mode 100644
index 0000000..217e5db
--- /dev/null
+++ b/tests/runners/airflow/operators/__init__.py
@@ -0,0 +1,17 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.


[incubator-liminal] 35/43: Upgrade the quality of the diagram

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit d80d9c0a133f106ccc5ea82bb8147079d92e7afb
Author: lior.schachter <li...@naturalint.com>
AuthorDate: Sat Apr 11 15:38:28 2020 +0300

    Upgrade the quality of the diagram
---
 images/rainbow_002.png | Bin 34933 -> 61815 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)

diff --git a/images/rainbow_002.png b/images/rainbow_002.png
index 7beed54..8cb0a92 100644
Binary files a/images/rainbow_002.png and b/images/rainbow_002.png differ


[incubator-liminal] 38/43: Local mode improvements

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 07aad66928fb260d2dbb2ce821c2a5b1f864b55c
Author: assapin <47...@users.noreply.github.com>
AuthorDate: Mon Jun 22 15:03:53 2020 +0300

    Local mode improvements
---
 MANIFEST.in                                 |  21 +++++++++
 README.md                                   |  67 +++++++++++++++++++++++++---
 images/airflow_trigger.png                  | Bin 0 -> 148427 bytes
 images/k8s_running.png                      | Bin 0 -> 223001 bytes
 rainbow/runners/airflow/dag/rainbow_dags.py |   2 +-
 scripts/docker-compose.yml                  |   1 +
 scripts/package.sh                          |  19 +++-----
 setup.py                                    |  17 ++++---
 tests/runners/airflow/rainbow/rainbow.yml   |   2 +-
 9 files changed, 101 insertions(+), 28 deletions(-)

diff --git a/MANIFEST.in b/MANIFEST.in
new file mode 100644
index 0000000..04cdb6d
--- /dev/null
+++ b/MANIFEST.in
@@ -0,0 +1,21 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required bgit y applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+include scripts/*
+include requirements-airflow.txt
+recursive-include rainbow/build/ *
\ No newline at end of file
diff --git a/README.md b/README.md
index ee2f961..078343a 100644
--- a/README.md
+++ b/README.md
@@ -1,3 +1,22 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required bgit y applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
 # Rainbow
 
 Rainbow is an end-to-end platform for data engineers & scientists, allowing them to build,
@@ -80,7 +99,7 @@ services:
 # Installation
 1. Install this package
 ```bash
-   pip install git+https://github.com/Natural-Intelligence/rainbow.git@rainbow_local_mode
+   pip install liminal
 ```
 2. Optional: set RAINBOW_HOME to path of your choice (if not set, will default to ~/rainbow_home)
 ```bash
@@ -102,18 +121,54 @@ a requirements.txt in the root of your project.
 
 When your pipeline code is ready, you can test it by running it locally on your machine.
 
-1. Deploy the pipeline:
+1. Ensure you have The Docker engine running locally, and enable a local Kubernetes cluster:
+![Kubernetes configured](https://raw.githubusercontent.com/Natural-Intelligence/rainbow/rainbow_local_mode/images/k8s_running.png)
+
+If you want to execute your pipeline on a remote kubernetes cluster, make sure the cluster is configured
+using :
+```bash
+kubectl config set-context <your remote kubernetes cluster>
+``` 
+2. Build the docker images used by your pipeline.
+
+In the example pipeline above, you can see that tasks and services have an "image" field - such as 
+"my_static_input_task_image". This means that the task is executed inside a docker container, and the docker container 
+is created from a docker image where various code and libraries are installed.
+
+You can take a look at what the build process looks like, e.g. 
+[here](https://github.com/Natural-Intelligence/rainbow/tree/master/rainbow/build/image/python)
+
+In order for the images to be available for your pipeline, you'll need to build them locally:
+
+```bash
+cd </path/to/your/rainbow/code>
+rainbow build
+```
+
+You'll see that a number of outputs indicating various docker images built.
+
+3. Deploy the pipeline:
 ```bash
 cd </path/to/your/rainbow/code> 
 rainbow deploy
 ```
-2. Make sure you have docker running
-3. Start the Server
+
+4. Start the server
 ```bash
 rainbow start
 ```
-4. Navigate to [http://localhost:8080/admin]
-5. You should see your ![pipeline](https://raw.githubusercontent.com/Natural-Intelligence/rainbow/rainbow_local_mode/images/airflow.png")
+
+5. Navigate to [http://localhost:8080/admin](http://localhost:8080/admin)
+
+6. You should see your ![pipeline](https://raw.githubusercontent.com/Natural-Intelligence/rainbow/master/images/airflow.png)
+The pipeline is scheduled to run according to the ```json schedule: 0 * 1 * *``` field in the .yml file you provided.
+
+7. To manually activate your pipeline:
+Click your pipeline and then click "trigger DAG"
+Click "Graph view"
+You should see the steps in your pipeline getting executed in "real time" by clicking "Refresh" periodically.
+
+![Pipeline activation](https://raw.githubusercontent.com/Natural-Intelligence/rainbow/rainbow_local_mode/images/airflow_trigger.png)
 
 ### Running Tests (for contributors)
 When doing local development and running Rainbow unit-tests, make sure to set RAINBOW_STAND_ALONE_MODE=True
diff --git a/images/airflow_trigger.png b/images/airflow_trigger.png
new file mode 100644
index 0000000..22168e8
Binary files /dev/null and b/images/airflow_trigger.png differ
diff --git a/images/k8s_running.png b/images/k8s_running.png
new file mode 100644
index 0000000..8bf8f3b
Binary files /dev/null and b/images/k8s_running.png differ
diff --git a/rainbow/runners/airflow/dag/rainbow_dags.py b/rainbow/runners/airflow/dag/rainbow_dags.py
index 730fd03..9deef8e 100644
--- a/rainbow/runners/airflow/dag/rainbow_dags.py
+++ b/rainbow/runners/airflow/dag/rainbow_dags.py
@@ -35,7 +35,7 @@ __DEPENDS_ON_PAST = 'depends_on_past'
 
 def register_dags(configs_path):
     """
-    TODO: doc for register_dags
+    Registers pipelines in rainbow yml files found in given path (recursively) as airflow DAGs.
     """
     print(f'Registering DAG from path: {configs_path}')
     config_files = files_util.find_config_files(configs_path)
diff --git a/scripts/docker-compose.yml b/scripts/docker-compose.yml
index b6a2dc3..d0304e5 100644
--- a/scripts/docker-compose.yml
+++ b/scripts/docker-compose.yml
@@ -30,6 +30,7 @@
                     max-file: "3"
             volumes:
                 - ${RAINBOW_HOME}:/usr/local/airflow/dags
+                - ${HOME}/.kube:/usr/local/airflow/.kube
             ports:
                 - "8080:8080"
             command: webserver
diff --git a/scripts/package.sh b/scripts/package.sh
index f4083e4..7824fd5 100755
--- a/scripts/package.sh
+++ b/scripts/package.sh
@@ -42,20 +42,14 @@ rsync -a --exclude 'venv' $(PWD)/ $docker_build_dir/zip_content/
 # perform installation of external pacakges (framework-requirements and user-requirements)
 # this is done inside a docker to 1) avoid requiring the user to install stuff, and 2) to create a platform-compatible
 # package (install the native libraries in a flavour suitable for the docker in which airflow runs, and not user machine)
-docker stop rainbow_build
-docker rm rainbow_build
-docker run --name rainbow_build -v /private/"$docker_build_dir":/home/rainbow/tmp --entrypoint="" -u 0 \
-       puckel/docker-airflow:1.10.9 /bin/bash -c "apt-get update && apt-get install -y wget && apt-get install -y git &&
-       cd /home/rainbow/tmp/zip_content &&
-       wget https://raw.githubusercontent.com/Natural-Intelligence/rainbow/rainbow_local_mode/rainbow/runners/airflow/dag/rainbow_dags.py &&
-       wget https://raw.githubusercontent.com/Natural-Intelligence/rainbow/rainbow_local_mode/requirements-airflow.txt &&
-       wget https://raw.githubusercontent.com/Natural-Intelligence/rainbow/rainbow_local_mode/scripts/docker-compose.yml &&
-       pip install --no-deps --target=\"/home/rainbow/tmp/zip_content\" git+https://github.com/Natural-Intelligence/rainbow.git@rainbow_local_mode &&
+
+docker run --rm --name rainbow_build -v /private/"$docker_build_dir":/home/rainbow/tmp --entrypoint="" -u 0 \
+       puckel/docker-airflow:1.10.9 /bin/bash -c "cd /home/rainbow/tmp/zip_content &&
+       pip install --no-deps --target=\"/home/rainbow/tmp/zip_content\" liminal==0.0.2dev5 &&
+       rsync -avzh --ignore-errors /home/rainbow/tmp/zip_content/liminal-resources/* /home/rainbow/tmp/zip_content/
        pip install --target=\"/home/rainbow/tmp/zip_content\" -r /home/rainbow/tmp/zip_content/requirements-airflow.txt &&
        pip install --target=\"/home/rainbow/tmp/zip_content\" -r /home/rainbow/tmp/zip_content/requirements.txt"
 
-docker stop rainbow_build
-docker rm rainbow_build
 
 # zip the content per https://airflow.apache.org/docs/stable/concepts.html#packaged-dags
 cd $docker_build_dir/zip_content
@@ -64,6 +58,3 @@ rm __init__.py
 
 zip -r ../dags/rainbows.zip .
 cp ../dags/rainbows.zip $target_path
-
-
-
diff --git a/setup.py b/setup.py
index c102ae3..2a0fdbb 100644
--- a/setup.py
+++ b/setup.py
@@ -17,9 +17,9 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+import os
 
 import setuptools
-from setuptools import setup
 
 with open("README.md", "r") as fh:
     long_description = fh.read()
@@ -29,9 +29,9 @@ with open('requirements.txt') as f:
     print(requirements)
 
 setuptools.setup(
-    name="rainbow",
-    version="0.0.1",
-    author="Rainbow team",
+    name="liminal",
+    version=os.environ["LIMINAL_BUILD_VERSION"],
+    author="liminal team",
     description="A package for authoring and deploying machine learning workflows",
     long_description=long_description,
     long_description_content_type="text/markdown",
@@ -39,10 +39,15 @@ setuptools.setup(
     packages=setuptools.find_packages(),
     classifiers=[
         "Programming Language :: Python :: 3",
-        "License :: Apache 2.0",
+        "License :: OSI Approved :: Apache Software License",
         "Operating System :: OS Independent",
     ],
+    license='Apache License, Version 2.0',
     python_requires='>=3.6',
     install_requires=requirements,
-    scripts=['scripts/rainbow', 'scripts/package.sh']
+    scripts=['scripts/rainbow', 'scripts/package.sh'],
+    include_package_data=True,
+    data_files=[('liminal-resources', ['scripts/docker-compose.yml',
+                                       'requirements-airflow.txt',
+                                       'rainbow/runners/airflow/dag/rainbow_dags.py'])]
 )
diff --git a/tests/runners/airflow/rainbow/rainbow.yml b/tests/runners/airflow/rainbow/rainbow.yml
index 77af37b..1d5da13 100644
--- a/tests/runners/airflow/rainbow/rainbow.yml
+++ b/tests/runners/airflow/rainbow/rainbow.yml
@@ -30,7 +30,7 @@ pipelines:
       key2: val2
     metrics:
       namespace: TestNamespace
-      backends: [ 'cloudwatch' ]
+      backends: [ ]
     tasks:
       - task: my_static_input_task
         type: python


[incubator-liminal] 41/43: Rename project to Liminal

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 0c8f8dc994d41a5cab1b0c37142dfb1f6431faff
Author: aviemzur <av...@gmail.com>
AuthorDate: Tue Jun 23 15:20:36 2020 +0300

    Rename project to Liminal
---
 LICENSE                                            |   4 +-
 MANIFEST.in                                        |   2 +-
 README.md                                          |  36 +++++++-------
 images/{rainbow_001.png => liminal_001.png}        | Bin
 images/{rainbow_002.png => liminal_002.png}        | Bin
 rainbow-arch.md => liminal-arch.md                 |  18 +++----
 .../rainbow/myserver => liminal}/__init__.py       |   0
 .../helloworld => liminal/build}/__init__.py       |   0
 .../rainbow => liminal/build/image}/__init__.py    |   0
 {rainbow => liminal}/build/image/python/Dockerfile |   0
 .../sql => liminal/build/image/python}/__init__.py |   0
 .../build/image/python/container-setup.sh          |   4 +-
 .../build/image/python/container-teardown.sh       |   2 +-
 {rainbow => liminal}/build/image/python/python.py  |   2 +-
 {rainbow => liminal}/build/image_builder.py        |   4 +-
 .../build/liminal_apps_builder.py                  |  18 +++----
 {rainbow => liminal}/build/python.py               |   2 +-
 .../defaults => liminal/build/service}/__init__.py |   0
 .../build/service/python_server/Dockerfile         |   2 +-
 .../build/service/python_server}/__init__.py       |   0
 .../service/python_server/liminal_python_server.py |   0
 .../build/service/python_server/python_server.py   |   6 +--
 .../python_server/python_server_requirements.txt   |   0
 .../build/service/service_image_builder.py         |   0
 .../airflow/operators => liminal/core}/__init__.py |   0
 {rainbow => liminal}/core/environment.py           |  24 +++++-----
 .../model => liminal/core/util}/__init__.py        |   0
 {rainbow => liminal}/core/util/class_util.py       |   0
 {rainbow => liminal}/core/util/files_util.py       |   2 +-
 .../airflow/dag => liminal/docker}/__init__.py     |   0
 .../airflow => liminal/monitoring}/__init__.py     |   0
 {rainbow => liminal}/runners/__init__.py           |   0
 .../runners/airflow}/__init__.py                   |   0
 .../runners/airflow/config/__init__.py             |   0
 .../airflow/config/standalone_variable_backend.py  |   8 ++--
 .../runners/airflow/dag}/__init__.py               |   0
 .../runners/airflow/dag/liminal_dags.py            |  24 +++++-----
 .../runners/airflow/model}/__init__.py             |   0
 {rainbow => liminal}/runners/airflow/model/task.py |   0
 .../runners/airflow/operators}/__init__.py         |   0
 .../runners/airflow/operators/cloudformation.py    |   0
 .../airflow/operators/job_status_operator.py       |   0
 .../kubernetes_pod_operator_with_input_output.py   |   6 +--
 .../runners/airflow/tasks}/__init__.py             |   0
 .../airflow/tasks/create_cloudformation_stack.py   |   2 +-
 .../runners/airflow/tasks/defaults}/__init__.py    |   0
 .../runners/airflow/tasks/defaults/default_task.py |   2 +-
 .../runners/airflow/tasks/defaults/job_end.py      |   4 +-
 .../runners/airflow/tasks/defaults/job_start.py    |   4 +-
 .../airflow/tasks/delete_cloudformation_stack.py   |   2 +-
 .../runners/airflow/tasks/python.py                |   6 +--
 .../runners/airflow/tasks/spark.py                 |   2 +-
 {rainbow => liminal}/runners/airflow/tasks/sql.py  |   2 +-
 .../build/image/python => liminal/sql}/__init__.py |   0
 scripts/docker-compose.yml                         |   2 +-
 scripts/{rainbow => liminal}                       |  52 +++++++++++----------
 scripts/package.sh                                 |  24 +++++-----
 setup.py                                           |   6 +--
 .../python/test_python_server_image_builder.py     |   6 +--
 .../build/python/test_python_image_builder.py      |  14 +++---
 ...ld_rainbows.py => test_liminal_apps_builder.py} |   9 ++--
 .../{test_rainbow_dags.py => test_liminal_dags.py} |   8 ++--
 .../runners/airflow/liminal}/__init__.py           |   0
 .../airflow/liminal/helloworld}/__init__.py        |   0
 .../{rainbow => liminal}/helloworld/hello_world.py |   4 +-
 .../{rainbow/rainbow.yml => liminal/liminal.yml}   |   0
 .../runners/airflow/liminal/myserver}/__init__.py  |   0
 .../{rainbow => liminal}/myserver/my_server.py     |   0
 .../runners/airflow/{rainbow => liminal}/pip.conf  |   0
 .../airflow/{rainbow => liminal}/requirements.txt  |   0
 .../runners/airflow/tasks/defaults/test_job_end.py |   2 +-
 .../airflow/tasks/defaults/test_job_start.py       |   2 +-
 tests/runners/airflow/tasks/test_python.py         |   6 +--
 tests/util/test_class_utils.py                     |   2 +-
 74 files changed, 164 insertions(+), 161 deletions(-)

diff --git a/LICENSE b/LICENSE
index d9f1716..bdfa9f4 100644
--- a/LICENSE
+++ b/LICENSE
@@ -201,9 +201,9 @@ See the License for the specific language governing permissions and
 limitations under the License.
 
 ============================================================================
-   APACHE RAINBOW SUBCOMPONENTS:
+   APACHE LIMINAL SUBCOMPONENTS:
 
-   The Apache Rainbow project contains subcomponents with separate copyright
+   The Apache Liminal project contains subcomponents with separate copyright
    notices and license terms. Your use of the source code for the these
    subcomponents is subject to the terms and conditions of the following
    licenses.
diff --git a/MANIFEST.in b/MANIFEST.in
index 04cdb6d..6e3a162 100644
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -18,4 +18,4 @@
 
 include scripts/*
 include requirements-airflow.txt
-recursive-include rainbow/build/ *
\ No newline at end of file
+recursive-include liminal/build/ *
\ No newline at end of file
diff --git a/README.md b/README.md
index 078343a..3623e76 100644
--- a/README.md
+++ b/README.md
@@ -17,14 +17,14 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Rainbow
+# Liminal
 
-Rainbow is an end-to-end platform for data engineers & scientists, allowing them to build,
+Liminal is an end-to-end platform for data engineers & scientists, allowing them to build,
 train and deploy machine learning models in a robust and agile way.
 
 The platform provides the abstractions and declarative capabilities for
 data extraction & feature engineering followed by model training and serving.
-Rainbow's goal is to operationalize the machine learning process, allowing data scientists to
+Liminal's goal is to operationalize the machine learning process, allowing data scientists to
 quickly transition from a successful experiment to an automated pipeline of model training,
 validation, deployment and inference in production, freeing them from engineering and
 non-functional tasks, and allowing them to focus on machine learning code and artifacts.
@@ -101,20 +101,20 @@ services:
 ```bash
    pip install liminal
 ```
-2. Optional: set RAINBOW_HOME to path of your choice (if not set, will default to ~/rainbow_home)
+2. Optional: set LIMINAL_HOME to path of your choice (if not set, will default to ~/liminal_home)
 ```bash
-echo 'export RAINBOW_HOME=</path/to/some/folder>' >> ~/.bash_profile && source ~/.bash_profile
+echo 'export LIMINAL_HOME=</path/to/some/folder>' >> ~/.bash_profile && source ~/.bash_profile
 ```
 
 # Authoring pipelines
 
-This involves at minimum creating a single file called rainbow.yml as in the example above.
+This involves at minimum creating a single file called liminal.yml as in the example above.
 
 If your pipeline requires custom python code to implement tasks, they should be organized 
-[like this](https://github.com/Natural-Intelligence/rainbow/tree/master/tests/runners/airflow/rainbow)
+[like this](https://github.com/apache/incubator-liminal/tree/master/tests/runners/airflow/liminal)
 
 If your pipeline  introduces imports of external packages which are not already a part 
-of the rainbow framework (i.e. you had to pip install them yourself), you need to also provide 
+of the liminal framework (i.e. you had to pip install them yourself), you need to also provide 
 a requirements.txt in the root of your project.
 
 # Testing the pipeline locally
@@ -122,7 +122,7 @@ a requirements.txt in the root of your project.
 When your pipeline code is ready, you can test it by running it locally on your machine.
 
 1. Ensure you have The Docker engine running locally, and enable a local Kubernetes cluster:
-![Kubernetes configured](https://raw.githubusercontent.com/Natural-Intelligence/rainbow/rainbow_local_mode/images/k8s_running.png)
+![Kubernetes configured](https://raw.githubusercontent.com/apache/incubator-liminal/master/images/k8s_running.png)
 
 If you want to execute your pipeline on a remote kubernetes cluster, make sure the cluster is configured
 using :
@@ -136,31 +136,31 @@ In the example pipeline above, you can see that tasks and services have an "imag
 is created from a docker image where various code and libraries are installed.
 
 You can take a look at what the build process looks like, e.g. 
-[here](https://github.com/Natural-Intelligence/rainbow/tree/master/rainbow/build/image/python)
+[here](https://github.com/apache/incubator-liminal/tree/master/liminal/build/image/python)
 
 In order for the images to be available for your pipeline, you'll need to build them locally:
 
 ```bash
-cd </path/to/your/rainbow/code>
-rainbow build
+cd </path/to/your/liminal/code>
+liminal build
 ```
 
 You'll see that a number of outputs indicating various docker images built.
 
 3. Deploy the pipeline:
 ```bash
-cd </path/to/your/rainbow/code> 
-rainbow deploy
+cd </path/to/your/liminal/code> 
+liminal deploy
 ```
 
 4. Start the server
 ```bash
-rainbow start
+liminal start
 ```
 
 5. Navigate to [http://localhost:8080/admin](http://localhost:8080/admin)
 
-6. You should see your ![pipeline](https://raw.githubusercontent.com/Natural-Intelligence/rainbow/master/images/airflow.png)
+6. You should see your ![pipeline](https://raw.githubusercontent.com/apache/incubator-liminal/master/images/airflow.png)
 The pipeline is scheduled to run according to the ```json schedule: 0 * 1 * *``` field in the .yml file you provided.
 
 7. To manually activate your pipeline:
@@ -168,7 +168,7 @@ Click your pipeline and then click "trigger DAG"
 Click "Graph view"
 You should see the steps in your pipeline getting executed in "real time" by clicking "Refresh" periodically.
 
-![Pipeline activation](https://raw.githubusercontent.com/Natural-Intelligence/rainbow/rainbow_local_mode/images/airflow_trigger.png)
+![Pipeline activation](https://raw.githubusercontent.com/apache/incubator-liminal/master/images/airflow_trigger.png)
 
 ### Running Tests (for contributors)
-When doing local development and running Rainbow unit-tests, make sure to set RAINBOW_STAND_ALONE_MODE=True
+When doing local development and running Liminal unit-tests, make sure to set LIMINAL_STAND_ALONE_MODE=True
diff --git a/images/rainbow_001.png b/images/liminal_001.png
similarity index 100%
rename from images/rainbow_001.png
rename to images/liminal_001.png
diff --git a/images/rainbow_002.png b/images/liminal_002.png
similarity index 100%
rename from images/rainbow_002.png
rename to images/liminal_002.png
diff --git a/rainbow-arch.md b/liminal-arch.md
similarity index 85%
rename from rainbow-arch.md
rename to liminal-arch.md
index 3e48712..835fb7c 100644
--- a/rainbow-arch.md
+++ b/liminal-arch.md
@@ -1,24 +1,24 @@
-# Rainbow
-Rainbow is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way. The platform provides the abstractions and declarative capabilities for data extraction & feature engineering followed by model training and serving. Apache Rainbow's goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model  [...]
+# Liminal
+Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way. The platform provides the abstractions and declarative capabilities for data extraction & feature engineering followed by model training and serving. Apache Liminal's goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model  [...]
 
 ## Motivation
 The challenges involved in operationalizing machine learning models are one of the main reasons why many machine learning projects never make it to production.
 The process involves automating and orchestrating multiple steps which run on heterogeneous infrastructure - different compute environments, data processing platforms, ML frameworks, notebooks, containers and monitoring tools.
 There are no mature standards for this workflow, and most organizations do not have the experience to build it in-house. In the best case, dev-ds-devops teams form in order to accomplish this task together; in many cases, it's the data scientists who try to deal with this themselves without the knowledge or the inclination to become infrastructure experts.
 As a result, many projects never make it through the cycle. Those who do suffer from a very long lead time from a successful experiment to an operational, refreshable, deployed and monitored model in production. 
-The goal of Apache Rainbow is to simplify the creation and management of machine learning pipelines by data engineers & scientists. The platform provides declarative building blocks which define the workflow, orchestrate the underlying infrastructure,  take care of non functional concerns, enabling focus in business logic / algorithm code.
+The goal of Apache Liminal is to simplify the creation and management of machine learning pipelines by data engineers & scientists. The platform provides declarative building blocks which define the workflow, orchestrate the underlying infrastructure,  take care of non functional concerns, enabling focus in business logic / algorithm code.
 Some Commercial E2E solutions have started to emerge in the last few years, however, they are limited to specific parts of the workflow, such as Databricks MLFlow. Other solutions are tied to specific environments (e.g. SageMaker on AWS).
 
 ## High Level Architecture
-The platform is aimed to provide data engineers & scientists with a solution for end to end flows from model training to real time inference in production. It’s architecture enables and promotes adoption of specific components in existing (non-Rainbow) frameworks, as well as seamless integration with other open source projects. Rainbow was created to enable scalability in ML efforts and after a thorough review of available solutions and frameworks, which did not meet our main KPIs: 
+The platform is aimed to provide data engineers & scientists with a solution for end to end flows from model training to real time inference in production. It’s architecture enables and promotes adoption of specific components in existing (non-Liminal) frameworks, as well as seamless integration with other open source projects. Liminal was created to enable scalability in ML efforts and after a thorough review of available solutions and frameworks, which did not meet our main KPIs: 
 Provide an opinionated but customizable end-to-end workflow
 Abstract away the complexity of underlying infrastructure
 Support major open source tools and cloud-native infrastructure to carry out many of the steps
 Allow teams to leverage their existing investments or bring in their tools of choice into the workflow
 We have found that other tech companies in the Israeli Hi-Tech ecosystem also have an interest in such a platform, hence decided to share our work with the community.
-The following diagram depicts these main components and where Apache Rainbow comes in:
+The following diagram depicts these main components and where Apache Liminal comes in:
 
-![raibow-arch1](https://raw.githubusercontent.com/Natural-Intelligence/rainbow/master/images/rainbow_001.png)
+![raibow-arch1](https://raw.githubusercontent.com/apache/incubator-liminal/master/images/liminal_001.png)
 
 A classical data scientist workflow includes some base phases: 
 _Train, Deploy and Consume._
@@ -37,7 +37,7 @@ _Train, Deploy and Consume._
 1. Inference - Batch or Real-time - use the model to evaluate data by your offline or online by your applications
 1. Consume - The actual use of the models created by applications and ETLs, usually through APIs to the batch or real-time inference that usually rely on Model and Feature stores.
  
-Rainbow provides its users a declarative composition capabilities to materialize these steps in a robust way, while exploiting existing frameworks and tools. e.g. Data science frameworks such as scikit-learn, Tensor flow, Keras and such, for running core data science algorithms; as numerous core mechanisms as data stores, processing engines, parallelism, schedulers, code deployment as well as batch and real-time inference.
-Rainbow allows the creation and wiring of these kinds of functional and non functional tasks while making the underlying infrastructure used by these tasks very easy to use and even abstracted away entirely. While handling the non-functional aspects as monitoring (in a standard fashion) deployment, scheduling, resource management and execution.
+Liminal provides its users a declarative composition capabilities to materialize these steps in a robust way, while exploiting existing frameworks and tools. e.g. Data science frameworks such as scikit-learn, Tensor flow, Keras and such, for running core data science algorithms; as numerous core mechanisms as data stores, processing engines, parallelism, schedulers, code deployment as well as batch and real-time inference.
+Liminal allows the creation and wiring of these kinds of functional and non functional tasks while making the underlying infrastructure used by these tasks very easy to use and even abstracted away entirely. While handling the non-functional aspects as monitoring (in a standard fashion) deployment, scheduling, resource management and execution.
 
-![raibow-arch2](https://raw.githubusercontent.com/Natural-Intelligence/rainbow/master/images/rainbow_002.png)
\ No newline at end of file
+![raibow-arch2](https://raw.githubusercontent.com/apache/incubator-liminal/master/images/liminal_002.png)
\ No newline at end of file
diff --git a/tests/runners/airflow/rainbow/myserver/__init__.py b/liminal/__init__.py
similarity index 100%
rename from tests/runners/airflow/rainbow/myserver/__init__.py
rename to liminal/__init__.py
diff --git a/tests/runners/airflow/rainbow/helloworld/__init__.py b/liminal/build/__init__.py
similarity index 100%
rename from tests/runners/airflow/rainbow/helloworld/__init__.py
rename to liminal/build/__init__.py
diff --git a/tests/runners/airflow/rainbow/__init__.py b/liminal/build/image/__init__.py
similarity index 100%
rename from tests/runners/airflow/rainbow/__init__.py
rename to liminal/build/image/__init__.py
diff --git a/rainbow/build/image/python/Dockerfile b/liminal/build/image/python/Dockerfile
similarity index 100%
rename from rainbow/build/image/python/Dockerfile
rename to liminal/build/image/python/Dockerfile
diff --git a/rainbow/sql/__init__.py b/liminal/build/image/python/__init__.py
similarity index 100%
rename from rainbow/sql/__init__.py
rename to liminal/build/image/python/__init__.py
diff --git a/rainbow/build/image/python/container-setup.sh b/liminal/build/image/python/container-setup.sh
similarity index 59%
rename from rainbow/build/image/python/container-setup.sh
rename to liminal/build/image/python/container-setup.sh
index c9e5cef..d8c98c5 100755
--- a/rainbow/build/image/python/container-setup.sh
+++ b/liminal/build/image/python/container-setup.sh
@@ -1,8 +1,8 @@
 #!/bin/sh
 
-echo 'Writing rainbow input..'
+echo 'Writing liminal input..'
 
-echo """$RAINBOW_INPUT""" > /rainbow_input.json
+echo """$LIMINAL_INPUT""" > /liminal_input.json
 
 AIRFLOW_RETURN_FILE=/airflow/xcom/return.json
 
diff --git a/rainbow/build/image/python/container-teardown.sh b/liminal/build/image/python/container-teardown.sh
similarity index 82%
rename from rainbow/build/image/python/container-teardown.sh
rename to liminal/build/image/python/container-teardown.sh
index 46c4426..2b2c23c 100755
--- a/rainbow/build/image/python/container-teardown.sh
+++ b/liminal/build/image/python/container-teardown.sh
@@ -1,6 +1,6 @@
 #!/bin/sh
 
-echo 'Writing rainbow output..'
+echo 'Writing liminal output..'
 
 USER_CONFIG_OUTPUT_FILE=$1
 if [ "$USER_CONFIG_OUTPUT_FILE" != "" ]; then
diff --git a/rainbow/build/image/python/python.py b/liminal/build/image/python/python.py
similarity index 95%
rename from rainbow/build/image/python/python.py
rename to liminal/build/image/python/python.py
index 0ecec77..0d370d5 100644
--- a/rainbow/build/image/python/python.py
+++ b/liminal/build/image/python/python.py
@@ -18,7 +18,7 @@
 
 import os
 
-from rainbow.build.python import BasePythonImageBuilder
+from liminal.build.python import BasePythonImageBuilder
 
 
 class PythonImageBuilder(BasePythonImageBuilder):
diff --git a/rainbow/build/image_builder.py b/liminal/build/image_builder.py
similarity index 97%
rename from rainbow/build/image_builder.py
rename to liminal/build/image_builder.py
index a56a22e..3c300eb 100644
--- a/rainbow/build/image_builder.py
+++ b/liminal/build/image_builder.py
@@ -32,8 +32,8 @@ class ImageBuilder:
     def __init__(self, config, base_path, relative_source_path, tag):
         """
         :param config: task/service config
-        :param base_path: directory containing rainbow yml
-        :param relative_source_path: source path relative to rainbow yml
+        :param base_path: directory containing liminal yml
+        :param relative_source_path: source path relative to liminal yml
         :param tag: image tag
         """
         self.base_path = base_path
diff --git a/rainbow/build/build_rainbows.py b/liminal/build/liminal_apps_builder.py
similarity index 89%
rename from rainbow/build/build_rainbows.py
rename to liminal/build/liminal_apps_builder.py
index 66d27cb..efe2680 100644
--- a/rainbow/build/build_rainbows.py
+++ b/liminal/build/liminal_apps_builder.py
@@ -20,13 +20,13 @@ import os
 
 import yaml
 
-from rainbow.build.image_builder import ImageBuilder, ServiceImageBuilderMixin
-from rainbow.core.util import files_util, class_util
+from liminal.build.image_builder import ImageBuilder, ServiceImageBuilderMixin
+from liminal.core.util import files_util, class_util
 
 
-def build_rainbows(path):
+def build_liminal_apps(path):
     """
-    Build images for rainbows in path.
+    Build images for liminal apps in path.
     """
     config_files = files_util.find_config_files(path)
 
@@ -36,9 +36,9 @@ def build_rainbows(path):
         base_path = os.path.dirname(config_file)
 
         with open(config_file) as stream:
-            rainbow_config = yaml.safe_load(stream)
+            liminal_config = yaml.safe_load(stream)
 
-            for pipeline in rainbow_config['pipelines']:
+            for pipeline in liminal_config['pipelines']:
                 for task in pipeline['tasks']:
                     task_name = task['task']
 
@@ -52,7 +52,7 @@ def build_rainbows(path):
                     else:
                         print(f'No source configured for task {task_name}, skipping build..')
 
-                for service in rainbow_config['services']:
+                for service in liminal_config['services']:
                     service_type = service['type']
                     builder_class = __get_service_build_class(service_type)
                     if builder_class:
@@ -84,7 +84,7 @@ def __get_service_build_class(service_type):
 print(f'Loading image builder implementations..')
 
 # TODO: add configuration for user image builders package
-image_builders_package = 'rainbow.build.image'
+image_builders_package = 'liminal.build.image'
 # user_image_builders_package = 'TODO: user_image_builders_package'
 
 task_build_classes = class_util.find_subclasses_in_packages(
@@ -104,7 +104,7 @@ print(f'Finished loading image builder implementations: {task_build_classes}')
 print(f'Loading service image builder implementations..')
 
 # TODO: add configuration for user service image builders package
-service_builders_package = 'rainbow.build.service'
+service_builders_package = 'liminal.build.service'
 # user_service_builders_package = 'TODO: user_service_builders_package'
 
 service_build_classes = class_util.find_subclasses_in_packages(
diff --git a/rainbow/build/python.py b/liminal/build/python.py
similarity index 97%
rename from rainbow/build/python.py
rename to liminal/build/python.py
index 0961d2b..07c6155 100644
--- a/rainbow/build/python.py
+++ b/liminal/build/python.py
@@ -18,7 +18,7 @@
 
 import os
 
-from rainbow.build.image_builder import ImageBuilder
+from liminal.build.image_builder import ImageBuilder
 
 
 class BasePythonImageBuilder(ImageBuilder):
diff --git a/rainbow/runners/airflow/tasks/defaults/__init__.py b/liminal/build/service/__init__.py
similarity index 100%
rename from rainbow/runners/airflow/tasks/defaults/__init__.py
rename to liminal/build/service/__init__.py
diff --git a/rainbow/build/service/python_server/Dockerfile b/liminal/build/service/python_server/Dockerfile
similarity index 97%
rename from rainbow/build/service/python_server/Dockerfile
rename to liminal/build/service/python_server/Dockerfile
index e329738..96c2e4a 100644
--- a/rainbow/build/service/python_server/Dockerfile
+++ b/liminal/build/service/python_server/Dockerfile
@@ -41,4 +41,4 @@ RUN {{mount}} pip install -r requirements.txt
 RUN echo "Copying source code.."
 COPY . /app/
 
-CMD python -u rainbow_python_server.py
+CMD python -u liminal_python_server.py
diff --git a/rainbow/runners/airflow/tasks/__init__.py b/liminal/build/service/python_server/__init__.py
similarity index 100%
rename from rainbow/runners/airflow/tasks/__init__.py
rename to liminal/build/service/python_server/__init__.py
diff --git a/rainbow/build/service/python_server/rainbow_python_server.py b/liminal/build/service/python_server/liminal_python_server.py
similarity index 100%
rename from rainbow/build/service/python_server/rainbow_python_server.py
rename to liminal/build/service/python_server/liminal_python_server.py
diff --git a/rainbow/build/service/python_server/python_server.py b/liminal/build/service/python_server/python_server.py
similarity index 89%
rename from rainbow/build/service/python_server/python_server.py
rename to liminal/build/service/python_server/python_server.py
index 0b2537d..f0d5b99 100644
--- a/rainbow/build/service/python_server/python_server.py
+++ b/liminal/build/service/python_server/python_server.py
@@ -20,8 +20,8 @@ import os
 
 import yaml
 
-from rainbow.build.image_builder import ServiceImageBuilderMixin
-from rainbow.build.python import BasePythonImageBuilder
+from liminal.build.image_builder import ServiceImageBuilderMixin
+from liminal.build.python import BasePythonImageBuilder
 
 
 class PythonServerImageBuilder(BasePythonImageBuilder, ServiceImageBuilderMixin):
@@ -36,7 +36,7 @@ class PythonServerImageBuilder(BasePythonImageBuilder, ServiceImageBuilderMixin)
     @staticmethod
     def _additional_files_from_paths():
         return [
-            os.path.join(os.path.dirname(__file__), 'rainbow_python_server.py'),
+            os.path.join(os.path.dirname(__file__), 'liminal_python_server.py'),
             os.path.join(os.path.dirname(__file__), 'python_server_requirements.txt')
         ]
 
diff --git a/rainbow/build/service/python_server/python_server_requirements.txt b/liminal/build/service/python_server/python_server_requirements.txt
similarity index 100%
rename from rainbow/build/service/python_server/python_server_requirements.txt
rename to liminal/build/service/python_server/python_server_requirements.txt
diff --git a/rainbow/build/service/service_image_builder.py b/liminal/build/service/service_image_builder.py
similarity index 100%
rename from rainbow/build/service/service_image_builder.py
rename to liminal/build/service/service_image_builder.py
diff --git a/rainbow/runners/airflow/operators/__init__.py b/liminal/core/__init__.py
similarity index 100%
rename from rainbow/runners/airflow/operators/__init__.py
rename to liminal/core/__init__.py
diff --git a/rainbow/core/environment.py b/liminal/core/environment.py
similarity index 61%
rename from rainbow/core/environment.py
rename to liminal/core/environment.py
index 27f4d41..f335ce9 100644
--- a/rainbow/core/environment.py
+++ b/liminal/core/environment.py
@@ -18,21 +18,21 @@
 
 import os
 
-DEFAULT_DAGS_ZIP_NAME = 'rainbows.zip'
-DEFAULT_RAINBOW_HOME = os.path.expanduser('~/rainbow_home')
-DEFAULT_RAINBOWS_SUBDIR = "rainbows"
-RAINBOW_HOME_PARAM_NAME = "RAINBOW_HOME"
+DEFAULT_DAGS_ZIP_NAME = 'liminal.zip'
+DEFAULT_LIMINAL_HOME = os.path.expanduser('~/liminal_home')
+DEFAULT_PIPELINES_SUBDIR = "pipelines"
+LIMINAL_HOME_PARAM_NAME = "LIMINAL_HOME"
 
 
-def get_rainbow_home():
-    if not os.environ.get(RAINBOW_HOME_PARAM_NAME):
-        print("no environment parameter called RAINBOW_HOME detected")
-        print(f"registering {DEFAULT_RAINBOW_HOME} as the RAINBOW_HOME directory")
-        os.environ[RAINBOW_HOME_PARAM_NAME] = DEFAULT_RAINBOW_HOME
-    return os.environ.get(RAINBOW_HOME_PARAM_NAME, DEFAULT_RAINBOW_HOME)
+def get_liminal_home():
+    if not os.environ.get(LIMINAL_HOME_PARAM_NAME):
+        print("no environment parameter called LIMINAL_HOME detected")
+        print(f"registering {DEFAULT_LIMINAL_HOME} as the LIMINAL_HOME directory")
+        os.environ[LIMINAL_HOME_PARAM_NAME] = DEFAULT_LIMINAL_HOME
+    return os.environ.get(LIMINAL_HOME_PARAM_NAME, DEFAULT_LIMINAL_HOME)
 
 
 def get_dags_dir():
     # if we are inside airflow, we will take it from the configured dags folder
-    base_dir = os.environ.get("AIRFLOW__CORE__DAGS_FOLDER", get_rainbow_home())
-    return os.path.join(base_dir, DEFAULT_RAINBOWS_SUBDIR)
+    base_dir = os.environ.get("AIRFLOW__CORE__DAGS_FOLDER", get_liminal_home())
+    return os.path.join(base_dir, DEFAULT_PIPELINES_SUBDIR)
diff --git a/rainbow/runners/airflow/model/__init__.py b/liminal/core/util/__init__.py
similarity index 100%
rename from rainbow/runners/airflow/model/__init__.py
rename to liminal/core/util/__init__.py
diff --git a/rainbow/core/util/class_util.py b/liminal/core/util/class_util.py
similarity index 100%
rename from rainbow/core/util/class_util.py
rename to liminal/core/util/class_util.py
diff --git a/rainbow/core/util/files_util.py b/liminal/core/util/files_util.py
similarity index 93%
rename from rainbow/core/util/files_util.py
rename to liminal/core/util/files_util.py
index 4a03020..e611005 100644
--- a/rainbow/core/util/files_util.py
+++ b/liminal/core/util/files_util.py
@@ -24,7 +24,7 @@ def find_config_files(path):
     print(path)
     for r, d, f in os.walk(path):
         for file in f:
-            if os.path.basename(file) in ['rainbow.yml', 'rainbow.yaml']:
+            if os.path.basename(file) in ['liminal.yml', 'liminal.yaml']:
                 print(os.path.join(r, file))
                 files.append(os.path.join(r, file))
     return files
diff --git a/rainbow/runners/airflow/dag/__init__.py b/liminal/docker/__init__.py
similarity index 100%
rename from rainbow/runners/airflow/dag/__init__.py
rename to liminal/docker/__init__.py
diff --git a/rainbow/runners/airflow/__init__.py b/liminal/monitoring/__init__.py
similarity index 100%
rename from rainbow/runners/airflow/__init__.py
rename to liminal/monitoring/__init__.py
diff --git a/rainbow/runners/__init__.py b/liminal/runners/__init__.py
similarity index 100%
rename from rainbow/runners/__init__.py
rename to liminal/runners/__init__.py
diff --git a/rainbow/monitoring/__init__.py b/liminal/runners/airflow/__init__.py
similarity index 100%
rename from rainbow/monitoring/__init__.py
rename to liminal/runners/airflow/__init__.py
diff --git a/rainbow/runners/airflow/config/__init__.py b/liminal/runners/airflow/config/__init__.py
similarity index 100%
rename from rainbow/runners/airflow/config/__init__.py
rename to liminal/runners/airflow/config/__init__.py
diff --git a/rainbow/runners/airflow/config/standalone_variable_backend.py b/liminal/runners/airflow/config/standalone_variable_backend.py
similarity index 85%
rename from rainbow/runners/airflow/config/standalone_variable_backend.py
rename to liminal/runners/airflow/config/standalone_variable_backend.py
index d7df06c..b49c7fe 100644
--- a/rainbow/runners/airflow/config/standalone_variable_backend.py
+++ b/liminal/runners/airflow/config/standalone_variable_backend.py
@@ -20,16 +20,16 @@ from os import environ
 
 from airflow.models import Variable
 
-RAINBOW_STAND_ALONE_MODE_KEY = "RAINBOW_STAND_ALONE_MODE"
+LIMINAL_STAND_ALONE_MODE_KEY = "LIMINAL_STAND_ALONE_MODE"
 
 
 def get_variable(key, default_val):
-    if rainbow_local_mode():
+    if liminal_local_mode():
         return os.environ.get(key, default_val)
     else:
         return Variable.get(key, default_var=default_val)
 
 
-def rainbow_local_mode():
-    stand_alone = environ.get(RAINBOW_STAND_ALONE_MODE_KEY, "False")
+def liminal_local_mode():
+    stand_alone = environ.get(LIMINAL_STAND_ALONE_MODE_KEY, "False")
     return stand_alone.strip().lower() == "true"
diff --git a/rainbow/docker/__init__.py b/liminal/runners/airflow/dag/__init__.py
similarity index 100%
rename from rainbow/docker/__init__.py
rename to liminal/runners/airflow/dag/__init__.py
diff --git a/rainbow/runners/airflow/dag/rainbow_dags.py b/liminal/runners/airflow/dag/liminal_dags.py
similarity index 85%
rename from rainbow/runners/airflow/dag/rainbow_dags.py
rename to liminal/runners/airflow/dag/liminal_dags.py
index 9deef8e..9798ed4 100644
--- a/rainbow/runners/airflow/dag/rainbow_dags.py
+++ b/liminal/runners/airflow/dag/liminal_dags.py
@@ -23,19 +23,19 @@ import yaml
 from airflow import DAG
 from airflow.models import Variable
 
-from rainbow.core import environment
-from rainbow.core.util import class_util
-from rainbow.core.util import files_util
-from rainbow.runners.airflow.model.task import Task
-from rainbow.runners.airflow.tasks.defaults.job_end import JobEndTask
-from rainbow.runners.airflow.tasks.defaults.job_start import JobStartTask
+from liminal.core import environment
+from liminal.core.util import class_util
+from liminal.core.util import files_util
+from liminal.runners.airflow.model.task import Task
+from liminal.runners.airflow.tasks.defaults.job_end import JobEndTask
+from liminal.runners.airflow.tasks.defaults.job_start import JobStartTask
 
 __DEPENDS_ON_PAST = 'depends_on_past'
 
 
 def register_dags(configs_path):
     """
-    Registers pipelines in rainbow yml files found in given path (recursively) as airflow DAGs.
+    Registers pipelines in liminal yml files found in given path (recursively) as airflow DAGs.
     """
     print(f'Registering DAG from path: {configs_path}')
     config_files = files_util.find_config_files(configs_path)
@@ -96,24 +96,24 @@ def register_dags(configs_path):
 print(f'Loading task implementations..')
 
 # TODO: add configuration for user tasks package
-impl_packages = 'rainbow.runners.airflow.tasks'
+impl_packages = 'liminal.runners.airflow.tasks'
 user_task_package = 'TODO: user_tasks_package'
 
 task_classes = class_util.find_subclasses_in_packages([impl_packages], Task)
 
 
-def tasks_by_rainbow_name(task_classes):
+def tasks_by_liminal_name(task_classes):
     return {full_name.replace(impl_packages, '').replace(clzz.__name__, '')[1:-1]: clzz
             for (full_name, clzz) in task_classes.items()}
 
 
-tasks_by_rainbow_name = tasks_by_rainbow_name(task_classes)
+tasks_by_liminal_name = tasks_by_liminal_name(task_classes)
 
-print(f'Finished loading task implementations: {tasks_by_rainbow_name}')
+print(f'Finished loading task implementations: {tasks_by_liminal_name}')
 
 
 def get_task_class(task_type):
-    return tasks_by_rainbow_name[task_type]
+    return tasks_by_liminal_name[task_type]
 
 
 register_dags(environment.get_dags_dir())
diff --git a/rainbow/core/util/__init__.py b/liminal/runners/airflow/model/__init__.py
similarity index 100%
rename from rainbow/core/util/__init__.py
rename to liminal/runners/airflow/model/__init__.py
diff --git a/rainbow/runners/airflow/model/task.py b/liminal/runners/airflow/model/task.py
similarity index 100%
rename from rainbow/runners/airflow/model/task.py
rename to liminal/runners/airflow/model/task.py
diff --git a/rainbow/core/__init__.py b/liminal/runners/airflow/operators/__init__.py
similarity index 100%
rename from rainbow/core/__init__.py
rename to liminal/runners/airflow/operators/__init__.py
diff --git a/rainbow/runners/airflow/operators/cloudformation.py b/liminal/runners/airflow/operators/cloudformation.py
similarity index 100%
rename from rainbow/runners/airflow/operators/cloudformation.py
rename to liminal/runners/airflow/operators/cloudformation.py
diff --git a/rainbow/runners/airflow/operators/job_status_operator.py b/liminal/runners/airflow/operators/job_status_operator.py
similarity index 100%
rename from rainbow/runners/airflow/operators/job_status_operator.py
rename to liminal/runners/airflow/operators/job_status_operator.py
diff --git a/rainbow/runners/airflow/operators/kubernetes_pod_operator_with_input_output.py b/liminal/runners/airflow/operators/kubernetes_pod_operator_with_input_output.py
similarity index 95%
rename from rainbow/runners/airflow/operators/kubernetes_pod_operator_with_input_output.py
rename to liminal/runners/airflow/operators/kubernetes_pod_operator_with_input_output.py
index c44e80b..5833550 100644
--- a/rainbow/runners/airflow/operators/kubernetes_pod_operator_with_input_output.py
+++ b/liminal/runners/airflow/operators/kubernetes_pod_operator_with_input_output.py
@@ -93,7 +93,7 @@ class KubernetesPodOperatorWithInputAndOutput(KubernetesPodOperator):
     TODO: pydoc
     """
 
-    _RAINBOW_INPUT_ENV_VAR = 'RAINBOW_INPUT'
+    _LIMINAL_INPUT_ENV_VAR = 'LIMINAL_INPUT'
 
     def __init__(self,
                  task_split,
@@ -138,9 +138,9 @@ class KubernetesPodOperatorWithInputAndOutput(KubernetesPodOperator):
         if task_input:
             self.log.info(f'task input = {task_input}')
 
-            self.env_vars.update({self._RAINBOW_INPUT_ENV_VAR: json.dumps(task_input)})
+            self.env_vars.update({self._LIMINAL_INPUT_ENV_VAR: json.dumps(task_input)})
         else:
-            self.env_vars.update({self._RAINBOW_INPUT_ENV_VAR: '{}'})
+            self.env_vars.update({self._LIMINAL_INPUT_ENV_VAR: '{}'})
 
             self.log.info(f'Empty input for task {self.task_split}.')
 
diff --git a/rainbow/build/service/python_server/__init__.py b/liminal/runners/airflow/tasks/__init__.py
similarity index 100%
rename from rainbow/build/service/python_server/__init__.py
rename to liminal/runners/airflow/tasks/__init__.py
diff --git a/rainbow/runners/airflow/tasks/create_cloudformation_stack.py b/liminal/runners/airflow/tasks/create_cloudformation_stack.py
similarity index 95%
rename from rainbow/runners/airflow/tasks/create_cloudformation_stack.py
rename to liminal/runners/airflow/tasks/create_cloudformation_stack.py
index 8f069f3..01bbba1 100644
--- a/rainbow/runners/airflow/tasks/create_cloudformation_stack.py
+++ b/liminal/runners/airflow/tasks/create_cloudformation_stack.py
@@ -16,7 +16,7 @@
 # specific language governing permissions and limitations
 # under the License.
 
-from rainbow.runners.airflow.model import task
+from liminal.runners.airflow.model import task
 
 
 class CreateCloudFormationStackTask(task.Task):
diff --git a/rainbow/build/service/__init__.py b/liminal/runners/airflow/tasks/defaults/__init__.py
similarity index 100%
rename from rainbow/build/service/__init__.py
rename to liminal/runners/airflow/tasks/defaults/__init__.py
diff --git a/rainbow/runners/airflow/tasks/defaults/default_task.py b/liminal/runners/airflow/tasks/defaults/default_task.py
similarity index 96%
rename from rainbow/runners/airflow/tasks/defaults/default_task.py
rename to liminal/runners/airflow/tasks/defaults/default_task.py
index 0e901fc..0a57284 100644
--- a/rainbow/runners/airflow/tasks/defaults/default_task.py
+++ b/liminal/runners/airflow/tasks/defaults/default_task.py
@@ -20,7 +20,7 @@ Default base task.
 """
 from abc import abstractmethod
 
-from rainbow.runners.airflow.model.task import Task
+from liminal.runners.airflow.model.task import Task
 
 
 class DefaultTask(Task):
diff --git a/rainbow/runners/airflow/tasks/defaults/job_end.py b/liminal/runners/airflow/tasks/defaults/job_end.py
similarity index 92%
rename from rainbow/runners/airflow/tasks/defaults/job_end.py
rename to liminal/runners/airflow/tasks/defaults/job_end.py
index e177ccc..f90b482 100644
--- a/rainbow/runners/airflow/tasks/defaults/job_end.py
+++ b/liminal/runners/airflow/tasks/defaults/job_end.py
@@ -16,8 +16,8 @@
 # specific language governing permissions and limitations
 # under the License.
 
-from rainbow.runners.airflow.operators.job_status_operator import JobEndOperator
-from rainbow.runners.airflow.tasks.defaults.default_task import DefaultTask
+from liminal.runners.airflow.operators.job_status_operator import JobEndOperator
+from liminal.runners.airflow.tasks.defaults.default_task import DefaultTask
 
 
 class JobEndTask(DefaultTask):
diff --git a/rainbow/runners/airflow/tasks/defaults/job_start.py b/liminal/runners/airflow/tasks/defaults/job_start.py
similarity index 92%
rename from rainbow/runners/airflow/tasks/defaults/job_start.py
rename to liminal/runners/airflow/tasks/defaults/job_start.py
index e196919..3c1d831 100644
--- a/rainbow/runners/airflow/tasks/defaults/job_start.py
+++ b/liminal/runners/airflow/tasks/defaults/job_start.py
@@ -15,8 +15,8 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-from rainbow.runners.airflow.operators.job_status_operator import JobStartOperator
-from rainbow.runners.airflow.tasks.defaults.default_task import DefaultTask
+from liminal.runners.airflow.operators.job_status_operator import JobStartOperator
+from liminal.runners.airflow.tasks.defaults.default_task import DefaultTask
 
 
 class JobStartTask(DefaultTask):
diff --git a/rainbow/runners/airflow/tasks/delete_cloudformation_stack.py b/liminal/runners/airflow/tasks/delete_cloudformation_stack.py
similarity index 95%
rename from rainbow/runners/airflow/tasks/delete_cloudformation_stack.py
rename to liminal/runners/airflow/tasks/delete_cloudformation_stack.py
index ab99101..b2347e7 100644
--- a/rainbow/runners/airflow/tasks/delete_cloudformation_stack.py
+++ b/liminal/runners/airflow/tasks/delete_cloudformation_stack.py
@@ -16,7 +16,7 @@
 # specific language governing permissions and limitations
 # under the License.
 
-from rainbow.runners.airflow.model import task
+from liminal.runners.airflow.model import task
 
 
 class DeleteCloudFormationStackTask(task.Task):
diff --git a/rainbow/runners/airflow/tasks/python.py b/liminal/runners/airflow/tasks/python.py
similarity index 97%
rename from rainbow/runners/airflow/tasks/python.py
rename to liminal/runners/airflow/tasks/python.py
index d5d4e00..9d8635f 100644
--- a/rainbow/runners/airflow/tasks/python.py
+++ b/liminal/runners/airflow/tasks/python.py
@@ -20,9 +20,9 @@ import json
 from airflow.models import Variable
 from airflow.operators.dummy_operator import DummyOperator
 
-from rainbow.runners.airflow.config.standalone_variable_backend import get_variable
-from rainbow.runners.airflow.model import task
-from rainbow.runners.airflow.operators.kubernetes_pod_operator_with_input_output import \
+from liminal.runners.airflow.config.standalone_variable_backend import get_variable
+from liminal.runners.airflow.model import task
+from liminal.runners.airflow.operators.kubernetes_pod_operator_with_input_output import \
     KubernetesPodOperatorWithInputAndOutput, \
     PrepareInputOperator
 
diff --git a/rainbow/runners/airflow/tasks/spark.py b/liminal/runners/airflow/tasks/spark.py
similarity index 95%
rename from rainbow/runners/airflow/tasks/spark.py
rename to liminal/runners/airflow/tasks/spark.py
index 68cfac0..a93722e 100644
--- a/rainbow/runners/airflow/tasks/spark.py
+++ b/liminal/runners/airflow/tasks/spark.py
@@ -16,7 +16,7 @@
 # specific language governing permissions and limitations
 # under the License.
 
-from rainbow.runners.airflow.model import task
+from liminal.runners.airflow.model import task
 
 class SparkTask(task.Task):
     """
diff --git a/rainbow/runners/airflow/tasks/sql.py b/liminal/runners/airflow/tasks/sql.py
similarity index 95%
rename from rainbow/runners/airflow/tasks/sql.py
rename to liminal/runners/airflow/tasks/sql.py
index 7ae3b9f..f15e356 100644
--- a/rainbow/runners/airflow/tasks/sql.py
+++ b/liminal/runners/airflow/tasks/sql.py
@@ -16,7 +16,7 @@
 # specific language governing permissions and limitations
 # under the License.
 
-from rainbow.runners.airflow.model import task
+from liminal.runners.airflow.model import task
 
 
 class SparkTask(task.Task):
diff --git a/rainbow/build/image/python/__init__.py b/liminal/sql/__init__.py
similarity index 100%
rename from rainbow/build/image/python/__init__.py
rename to liminal/sql/__init__.py
diff --git a/scripts/docker-compose.yml b/scripts/docker-compose.yml
index d0304e5..8cab3ea 100644
--- a/scripts/docker-compose.yml
+++ b/scripts/docker-compose.yml
@@ -29,7 +29,7 @@
                     max-size: 10m
                     max-file: "3"
             volumes:
-                - ${RAINBOW_HOME}:/usr/local/airflow/dags
+                - ${LIMINAL_HOME}:/usr/local/airflow/dags
                 - ${HOME}/.kube:/usr/local/airflow/.kube
             ports:
                 - "8080:8080"
diff --git a/scripts/rainbow b/scripts/liminal
similarity index 60%
rename from scripts/rainbow
rename to scripts/liminal
index 1d5f65e..5ea21ad 100755
--- a/scripts/rainbow
+++ b/scripts/liminal
@@ -19,14 +19,14 @@
 # under the License.
 import os
 import shutil
-import site
+import subprocess
 import sys
 
 import click
-from rainbow.build import build_rainbows
-import subprocess
-from rainbow.core import environment
-from rainbow.core.util import files_util
+
+from liminal.build import liminal_apps_builder
+from liminal.core import environment
+from liminal.core.util import files_util
 
 
 @click.group()
@@ -46,41 +46,43 @@ def docker_is_running():
 @cli.command("build", short_help="builds dockers from your business logic")
 @click.option('--path', default=os.getcwd(), help='Build within this path.')
 def build(path):
-    click.echo(f'Building rainbows in {path}')
+    click.echo(f'Building liminal apps in {path}')
     if docker_is_running():
-        build_rainbows.build_rainbows(path)
+        liminal_apps_builder.build_liminal_apps(path)
 
 
-def deploy_rainbow_core_internal():
-    click.echo("WARN: refreshing rainbow core package")
-    rainbow_home = environment.get_rainbow_home()
-    subprocess.call([f'package.sh {rainbow_home}'], shell=True)
+def deploy_liminal_core_internal():
+    click.echo("WARN: refreshing liminal core package")
+    liminal_home = environment.get_liminal_home()
+    subprocess.call([f'package.sh {liminal_home}'], shell=True)
 
 
-@cli.command("deploy", short_help="deploys your rainbow.yaml files to $RAINBOW_HOME folder")
-@click.option('--path', default=os.getcwd(), help="folder containing rainbow.yaml files")
-def deploy_rainbows(path):
-    click.echo("deploying rainbow yaml files")
-    rainbow_home = environment.get_rainbow_home()
-    os.makedirs(rainbow_home, exist_ok=True)
+@cli.command("deploy", short_help="deploys your liminal.yaml files to $LIMINAL_HOME folder")
+@click.option('--path', default=os.getcwd(), help="folder containing liminal.yaml files")
+def deploy_liminal_apps(path):
+    click.echo("deploying liminal yaml files")
+    liminal_home = environment.get_liminal_home()
+    os.makedirs(liminal_home, exist_ok=True)
     os.makedirs(environment.get_dags_dir(), exist_ok=True)
-    deploy_rainbow_core_internal()
+    deploy_liminal_core_internal()
     config_files = files_util.find_config_files(path)
     for config_file in config_files:
-        click.echo(f"deploying rainbow file: {config_file}")
+        click.echo(f"deploying liminal file: {config_file}")
         yml_name = os.path.basename(config_file)
         target_yml_name = os.path.join(environment.get_dags_dir(), yml_name)
         shutil.copyfile(config_file, target_yml_name)
 
 
-@cli.command("start", short_help="starts a local airflow in docker compose. should be run after deploy. " +
-                                 "Make sure docker is running on your machine")
+@cli.command("start",
+             short_help="starts a local airflow in docker compose. should be run after deploy. " +
+                        "Make sure docker is running on your machine")
 def start():
     if docker_is_running():
-        # initialize rainbow home by default
-        environment.get_rainbow_home()
-        result = subprocess.call([f'docker-compose -f "{environment.get_rainbow_home()}/docker-compose.yml" up'],
-                                 env=os.environ, shell=True)
+        # initialize liminal home by default
+        environment.get_liminal_home()
+        result = subprocess.call(
+            [f'docker-compose -f "{environment.get_liminal_home()}/docker-compose.yml" up'],
+            env=os.environ, shell=True)
 
 
 if __name__ == '__main__':
diff --git a/scripts/package.sh b/scripts/package.sh
index 7824fd5..de88f41 100755
--- a/scripts/package.sh
+++ b/scripts/package.sh
@@ -21,12 +21,12 @@ echo $1
 target_path="$1"
 
 echo "running from " $(PWD)
-echo "target path for rainbow zip file is " $target_path
+echo "target path for liminal zip file is " $target_path
 
-echo "cleaning up the temp dirs $TMPDIR/rainbow_build"
-rm -rf $TMPDIR/rainbow_build-*/
+echo "cleaning up the temp dirs $TMPDIR/liminal_build"
+rm -rf $TMPDIR/liminal_build-*/
 
-tmp_dir=$(mktemp -d -t rainbow_build-)
+tmp_dir=$(mktemp -d -t liminal_build-)
 echo "creating temp directory $tmp_dir"
 
 docker_build_dir=$tmp_dir/docker_build
@@ -43,12 +43,12 @@ rsync -a --exclude 'venv' $(PWD)/ $docker_build_dir/zip_content/
 # this is done inside a docker to 1) avoid requiring the user to install stuff, and 2) to create a platform-compatible
 # package (install the native libraries in a flavour suitable for the docker in which airflow runs, and not user machine)
 
-docker run --rm --name rainbow_build -v /private/"$docker_build_dir":/home/rainbow/tmp --entrypoint="" -u 0 \
-       puckel/docker-airflow:1.10.9 /bin/bash -c "cd /home/rainbow/tmp/zip_content &&
-       pip install --no-deps --target=\"/home/rainbow/tmp/zip_content\" liminal==0.0.2dev5 &&
-       rsync -avzh --ignore-errors /home/rainbow/tmp/zip_content/liminal-resources/* /home/rainbow/tmp/zip_content/
-       pip install --target=\"/home/rainbow/tmp/zip_content\" -r /home/rainbow/tmp/zip_content/requirements-airflow.txt &&
-       pip install --target=\"/home/rainbow/tmp/zip_content\" -r /home/rainbow/tmp/zip_content/requirements.txt"
+docker run --rm --name liminal_build -v /private/"$docker_build_dir":/home/liminal/tmp --entrypoint="" -u 0 \
+       puckel/docker-airflow:1.10.9 /bin/bash -c "cd /home/liminal/tmp/zip_content &&
+       pip install --no-deps --target=\"/home/liminal/tmp/zip_content\" liminal==0.0.2dev5 &&
+       rsync -avzh --ignore-errors /home/liminal/tmp/zip_content/liminal-resources/* /home/liminal/tmp/zip_content/
+       pip install --target=\"/home/liminal/tmp/zip_content\" -r /home/liminal/tmp/zip_content/requirements-airflow.txt &&
+       pip install --target=\"/home/liminal/tmp/zip_content\" -r /home/liminal/tmp/zip_content/requirements.txt"
 
 
 # zip the content per https://airflow.apache.org/docs/stable/concepts.html#packaged-dags
@@ -56,5 +56,5 @@ cd $docker_build_dir/zip_content
 mv docker-compose.yml $target_path
 rm __init__.py
 
-zip -r ../dags/rainbows.zip .
-cp ../dags/rainbows.zip $target_path
+zip -r ../dags/liminal.zip .
+cp ../dags/liminal.zip $target_path
diff --git a/setup.py b/setup.py
index 2a0fdbb..0cf0e64 100644
--- a/setup.py
+++ b/setup.py
@@ -35,7 +35,7 @@ setuptools.setup(
     description="A package for authoring and deploying machine learning workflows",
     long_description=long_description,
     long_description_content_type="text/markdown",
-    url="https://github.com/Natural-Intelligence/rainbow",
+    url="https://github.com/apache/incubator-liminal",
     packages=setuptools.find_packages(),
     classifiers=[
         "Programming Language :: Python :: 3",
@@ -45,9 +45,9 @@ setuptools.setup(
     license='Apache License, Version 2.0',
     python_requires='>=3.6',
     install_requires=requirements,
-    scripts=['scripts/rainbow', 'scripts/package.sh'],
+    scripts=['scripts/liminal', 'scripts/package.sh'],
     include_package_data=True,
     data_files=[('liminal-resources', ['scripts/docker-compose.yml',
                                        'requirements-airflow.txt',
-                                       'rainbow/runners/airflow/dag/rainbow_dags.py'])]
+                                       'liminal/runners/airflow/dag/liminal_dags.py'])]
 )
diff --git a/tests/runners/airflow/build/http/python/test_python_server_image_builder.py b/tests/runners/airflow/build/http/python/test_python_server_image_builder.py
index ecdaced..145d931 100644
--- a/tests/runners/airflow/build/http/python/test_python_server_image_builder.py
+++ b/tests/runners/airflow/build/http/python/test_python_server_image_builder.py
@@ -25,7 +25,7 @@ from unittest import TestCase
 
 import docker
 
-from rainbow.build.service.python_server.python_server import PythonServerImageBuilder
+from liminal.build.service.python_server.python_server import PythonServerImageBuilder
 
 
 class TestPythonServer(TestCase):
@@ -54,7 +54,7 @@ class TestPythonServer(TestCase):
             'Incorrect pip command')
 
     def __test_build_python_server(self, use_pip_conf=False):
-        base_path = os.path.join(os.path.dirname(__file__), '../../../rainbow')
+        base_path = os.path.join(os.path.dirname(__file__), '../../../liminal')
 
         config = self.__create_conf('my_task')
 
@@ -112,7 +112,7 @@ class TestPythonServer(TestCase):
         return {
             'task': task_id,
             'cmd': 'foo bar',
-            'image': 'rainbow_server_image',
+            'image': 'liminal_server_image',
             'source': 'baz',
             'input_type': 'my_input_type',
             'input_path': 'my_input',
diff --git a/tests/runners/airflow/build/python/test_python_image_builder.py b/tests/runners/airflow/build/python/test_python_image_builder.py
index 81b5cc3..ab25dab 100644
--- a/tests/runners/airflow/build/python/test_python_image_builder.py
+++ b/tests/runners/airflow/build/python/test_python_image_builder.py
@@ -22,11 +22,11 @@ from unittest import TestCase
 
 import docker
 
-from rainbow.build.image.python.python import PythonImageBuilder
+from liminal.build.image.python.python import PythonImageBuilder
 
 
 class TestPythonImageBuilder(TestCase):
-    __IMAGE_NAME = 'rainbow_image'
+    __IMAGE_NAME = 'liminal_image'
     __OUTPUT_PATH = '/mnt/vol1/my_output.json'
 
     def setUp(self) -> None:
@@ -59,7 +59,7 @@ class TestPythonImageBuilder(TestCase):
     def __test_build(self, use_pip_conf=False):
         config = self.__create_conf('my_task')
 
-        base_path = os.path.join(os.path.dirname(__file__), '../../rainbow')
+        base_path = os.path.join(os.path.dirname(__file__), '../../liminal')
 
         if use_pip_conf:
             config['pip_conf'] = os.path.join(base_path, 'pip.conf')
@@ -77,7 +77,7 @@ class TestPythonImageBuilder(TestCase):
         docker_client = docker.from_env()
         docker_client.images.get(self.__IMAGE_NAME)
 
-        cmd = 'export RAINBOW_INPUT="{\\"x\\": 1}" && ' + \
+        cmd = 'export LIMINAL_INPUT="{\\"x\\": 1}" && ' + \
               'sh container-setup.sh && ' + \
               'python hello_world.py && ' + \
               f'sh container-teardown.sh {self.__OUTPUT_PATH}'
@@ -100,10 +100,10 @@ class TestPythonImageBuilder(TestCase):
         print(container_log)
 
         self.assertEqual(
-            "b\"Writing rainbow input..\\n" +
+            "b\"Writing liminal input..\\n" +
             "Hello world!\\n\\n" +
-            "rainbow_input.json contents = {'x': 1}\\n" +
-            "Writing rainbow output..\\n\"",
+            "liminal_input.json contents = {'x': 1}\\n" +
+            "Writing liminal output..\\n\"",
             str(container_log))
 
         with open(os.path.join(self.temp_airflow_dir, 'return.json')) as file:
diff --git a/tests/runners/airflow/build/test_build_rainbows.py b/tests/runners/airflow/build/test_liminal_apps_builder.py
similarity index 86%
rename from tests/runners/airflow/build/test_build_rainbows.py
rename to tests/runners/airflow/build/test_liminal_apps_builder.py
index 7e01245..6888eb6 100644
--- a/tests/runners/airflow/build/test_build_rainbows.py
+++ b/tests/runners/airflow/build/test_liminal_apps_builder.py
@@ -21,10 +21,10 @@ from unittest import TestCase
 
 import docker
 
-from rainbow.build import build_rainbows
+from liminal.build import liminal_apps_builder
 
 
-class TestBuildRainbows(TestCase):
+class TestLiminalAppsBuilder(TestCase):
     __image_names = [
         'my_static_input_task_image',
         'my_task_output_input_task_image',
@@ -44,8 +44,9 @@ class TestBuildRainbows(TestCase):
             if len(self.docker_client.images.list(image_name)) > 0:
                 self.docker_client.images.remove(image=image_name, force=True)
 
-    def test_build_rainbow(self):
-        build_rainbows.build_rainbows(os.path.join(os.path.dirname(__file__), '../rainbow'))
+    def test_build_liminal(self):
+        liminal_apps_builder.build_liminal_apps(
+            os.path.join(os.path.dirname(__file__), '../liminal'))
 
         for image in self.__image_names:
             self.docker_client.images.get(image)
diff --git a/tests/runners/airflow/dag/test_rainbow_dags.py b/tests/runners/airflow/dag/test_liminal_dags.py
similarity index 87%
rename from tests/runners/airflow/dag/test_rainbow_dags.py
rename to tests/runners/airflow/dag/test_liminal_dags.py
index 5ffdf07..1069278 100644
--- a/tests/runners/airflow/dag/test_rainbow_dags.py
+++ b/tests/runners/airflow/dag/test_liminal_dags.py
@@ -2,8 +2,8 @@ import os
 import unittest
 from unittest import TestCase
 
-from rainbow.runners.airflow.dag import rainbow_dags
-from rainbow.runners.airflow.operators.job_status_operator import JobEndOperator, JobStartOperator
+from liminal.runners.airflow.dag import liminal_dags
+from liminal.runners.airflow.operators.job_status_operator import JobEndOperator, JobStartOperator
 
 
 class Test(TestCase):
@@ -42,8 +42,8 @@ class Test(TestCase):
 
     @staticmethod
     def get_register_dags():
-        base_path = os.path.join(os.path.dirname(__file__), '../rainbow')
-        return rainbow_dags.register_dags(base_path)
+        base_path = os.path.join(os.path.dirname(__file__), '../liminal')
+        return liminal_dags.register_dags(base_path)
 
 
 if __name__ == '__main__':
diff --git a/rainbow/build/image/__init__.py b/tests/runners/airflow/liminal/__init__.py
similarity index 100%
rename from rainbow/build/image/__init__.py
rename to tests/runners/airflow/liminal/__init__.py
diff --git a/rainbow/build/__init__.py b/tests/runners/airflow/liminal/helloworld/__init__.py
similarity index 100%
rename from rainbow/build/__init__.py
rename to tests/runners/airflow/liminal/helloworld/__init__.py
diff --git a/tests/runners/airflow/rainbow/helloworld/hello_world.py b/tests/runners/airflow/liminal/helloworld/hello_world.py
similarity index 90%
rename from tests/runners/airflow/rainbow/helloworld/hello_world.py
rename to tests/runners/airflow/liminal/helloworld/hello_world.py
index 95f4e73..0b92ae0 100644
--- a/tests/runners/airflow/rainbow/helloworld/hello_world.py
+++ b/tests/runners/airflow/liminal/helloworld/hello_world.py
@@ -20,8 +20,8 @@ import os
 
 print('Hello world!\n')
 
-with open('/rainbow_input.json') as file:
-    print(f'rainbow_input.json contents = {json.loads(file.readline())}')
+with open('/liminal_input.json') as file:
+    print(f'liminal_input.json contents = {json.loads(file.readline())}')
 
 os.makedirs('/mnt/vol1/', exist_ok=True)
 
diff --git a/tests/runners/airflow/rainbow/rainbow.yml b/tests/runners/airflow/liminal/liminal.yml
similarity index 100%
rename from tests/runners/airflow/rainbow/rainbow.yml
rename to tests/runners/airflow/liminal/liminal.yml
diff --git a/rainbow/__init__.py b/tests/runners/airflow/liminal/myserver/__init__.py
similarity index 100%
rename from rainbow/__init__.py
rename to tests/runners/airflow/liminal/myserver/__init__.py
diff --git a/tests/runners/airflow/rainbow/myserver/my_server.py b/tests/runners/airflow/liminal/myserver/my_server.py
similarity index 100%
rename from tests/runners/airflow/rainbow/myserver/my_server.py
rename to tests/runners/airflow/liminal/myserver/my_server.py
diff --git a/tests/runners/airflow/rainbow/pip.conf b/tests/runners/airflow/liminal/pip.conf
similarity index 100%
rename from tests/runners/airflow/rainbow/pip.conf
rename to tests/runners/airflow/liminal/pip.conf
diff --git a/tests/runners/airflow/rainbow/requirements.txt b/tests/runners/airflow/liminal/requirements.txt
similarity index 100%
rename from tests/runners/airflow/rainbow/requirements.txt
rename to tests/runners/airflow/liminal/requirements.txt
diff --git a/tests/runners/airflow/tasks/defaults/test_job_end.py b/tests/runners/airflow/tasks/defaults/test_job_end.py
index 9a2c398..a5606f4 100644
--- a/tests/runners/airflow/tasks/defaults/test_job_end.py
+++ b/tests/runners/airflow/tasks/defaults/test_job_end.py
@@ -19,7 +19,7 @@
 import unittest
 from unittest import TestCase
 
-from rainbow.runners.airflow.tasks.defaults import job_end
+from liminal.runners.airflow.tasks.defaults import job_end
 from tests.util import dag_test_utils
 
 
diff --git a/tests/runners/airflow/tasks/defaults/test_job_start.py b/tests/runners/airflow/tasks/defaults/test_job_start.py
index d07cf4b..c87f499 100644
--- a/tests/runners/airflow/tasks/defaults/test_job_start.py
+++ b/tests/runners/airflow/tasks/defaults/test_job_start.py
@@ -19,7 +19,7 @@
 import unittest
 from unittest import TestCase
 
-from rainbow.runners.airflow.tasks.defaults import job_end, job_start
+from liminal.runners.airflow.tasks.defaults import job_end, job_start
 from tests.util import dag_test_utils
 
 
diff --git a/tests/runners/airflow/tasks/test_python.py b/tests/runners/airflow/tasks/test_python.py
index ac295eb..cb76ece 100644
--- a/tests/runners/airflow/tasks/test_python.py
+++ b/tests/runners/airflow/tasks/test_python.py
@@ -19,9 +19,9 @@
 import unittest
 from unittest import TestCase
 
-from rainbow.runners.airflow.operators.kubernetes_pod_operator_with_input_output import \
+from liminal.runners.airflow.operators.kubernetes_pod_operator_with_input_output import \
     KubernetesPodOperatorWithInputAndOutput
-from rainbow.runners.airflow.tasks import python
+from liminal.runners.airflow.tasks import python
 from tests.util import dag_test_utils
 
 
@@ -49,7 +49,7 @@ class TestPythonTask(TestCase):
         return {
             'task': task_id,
             'cmd': 'foo bar',
-            'image': 'rainbow_image',
+            'image': 'liminal_image',
             'source': 'baz',
             'input_type': 'my_input_type',
             'input_path': 'my_input',
diff --git a/tests/util/test_class_utils.py b/tests/util/test_class_utils.py
index 0deeff6..e0f3d5c 100644
--- a/tests/util/test_class_utils.py
+++ b/tests/util/test_class_utils.py
@@ -1,6 +1,6 @@
 from unittest import TestCase
 
-from rainbow.core.util import class_util
+from liminal.core.util import class_util
 from tests.util.test_pkg_1.test_clazz_base import A, Z
 from tests.util.test_pkg_1.test_pkg_1_1.test_clazz_child_1 import B
 from tests.util.test_pkg_1.test_pkg_1_1.test_clazz_child_2 import C


[incubator-liminal] 16/43: Change pythontask config to input/output enhancement

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 326a042ce87e979c4d97bb6f1e9c8e89ec6c09c0
Author: aviemzur <av...@gmail.com>
AuthorDate: Sun Mar 15 15:23:57 2020 +0200

    Change pythontask config to input/output enhancement
---
 rainbow/build/python/container-setup.sh            |   2 +-
 .../airflow/operators/kubernetes_pod_operator.py   | 140 -------------------
 .../kubernetes_pod_operator_with_input_output.py   | 148 +++++++++++++++++++++
 rainbow/runners/airflow/tasks/python.py            |  57 ++++----
 .../airflow/build/python/test_python_image.py      |  14 +-
 tests/runners/airflow/build/test_build_rainbow.py  |   2 +-
 .../airflow/rainbow/hello_world/hello_world.py     |   9 ++
 tests/runners/airflow/rainbow/rainbow.yml          |  35 +++--
 tests/runners/airflow/tasks/test_python.py         |   6 +-
 9 files changed, 225 insertions(+), 188 deletions(-)

diff --git a/rainbow/build/python/container-setup.sh b/rainbow/build/python/container-setup.sh
index 4e20fc2..883f1e1 100755
--- a/rainbow/build/python/container-setup.sh
+++ b/rainbow/build/python/container-setup.sh
@@ -1,6 +1,6 @@
 #!/bin/sh
 
-echo """$RAINBOW_INPUT""" > rainbow_input.json
+echo """$RAINBOW_INPUT""" > /rainbow_input.json
 
 AIRFLOW_RETURN_FILE=/airflow/xcom/return.json
 
diff --git a/rainbow/runners/airflow/operators/kubernetes_pod_operator.py b/rainbow/runners/airflow/operators/kubernetes_pod_operator.py
deleted file mode 100644
index a7b0bdd..0000000
--- a/rainbow/runners/airflow/operators/kubernetes_pod_operator.py
+++ /dev/null
@@ -1,140 +0,0 @@
-from airflow.contrib.operators.kubernetes_pod_operator import KubernetesPodOperator
-import json
-import traceback
-from airflow.models import DAG, TaskInstance
-from airflow.utils import timezone
-from random import randint
-
-
-def split_list(seq, num):
-    avg = len(seq) / float(num)
-    out = []
-    last = 0.0
-
-    while last < len(seq):
-        out.append(seq[int(last):int(last + avg)])
-        last += avg
-
-    return out
-
-
-class ConfigureParallelExecutionOperator(KubernetesPodOperator):
-
-    def __init__(self,
-                 config_type=None,
-                 config_path=None,
-                 executors=1,
-                 *args,
-                 **kwargs):
-        namespace = kwargs['namespace']
-        image = kwargs['image']
-        name = kwargs['name']
-
-        del kwargs['namespace']
-        del kwargs['image']
-        del kwargs['name']
-
-        super().__init__(
-            namespace=namespace,
-            image=image,
-            name=name,
-            *args,
-            **kwargs)
-        self.config_type = config_type
-        self.config_path = config_path
-        self.executors = executors
-
-    def execute(self, context):
-        config_dict = {}
-
-        self.log.info(f'config type: {self.config_type}')
-
-        if self.config_type:
-            if self.config_type == 'file':
-                config_dict = {}  # future feature: return config from file
-            elif self.config_type == 'sql':
-                config_dict = {}  # future feature: return from sql config
-            elif self.config_type == 'task':
-                ti = context['task_instance']
-                self.log.info(self.config_path)
-                config_dict = ti.xcom_pull(task_ids=self.config_path)
-            elif self.config_type == 'static':
-                config_dict = json.loads(self.config_path)
-            else:
-                raise ValueError(f'Unknown config type: {self.config_type}')
-
-        run_id = context['dag_run'].run_id
-
-        return_conf = {'config_type': self.config_type,
-                       'splits': {'0': {'run_id': run_id, 'configs': []}}}
-
-        if config_dict:
-            self.log.info(f'configs dict: {config_dict}')
-
-            configs = config_dict['configs']
-
-            self.log.info(f'configs: {configs}')
-
-            config_splits = split_list(configs, self.executors)
-
-            for i in range(self.executors):
-                return_conf['splits'][str(i)] = {'run_id': run_id, 'configs': config_splits[i]}
-
-        return return_conf
-
-    def run_pod(self, context):
-        return super().execute(context)
-
-
-class ConfigurableKubernetesPodOperator(KubernetesPodOperator):
-
-    def __init__(self,
-                 config_task_id,
-                 task_split,
-                 *args,
-                 **kwargs):
-        namespace = kwargs['namespace']
-        image = kwargs['image']
-        name = kwargs['name']
-
-        del kwargs['namespace']
-        del kwargs['image']
-        del kwargs['name']
-
-        super().__init__(
-            namespace=namespace,
-            image=image,
-            name=name,
-            *args,
-            **kwargs)
-
-        self.config_task_id = config_task_id
-        self.task_split = task_split
-
-    def execute(self, context):
-        if self.config_task_id:
-            ti = context['task_instance']
-
-            config = ti.xcom_pull(task_ids=self.config_task_id)
-
-            if config:
-                split = {}
-
-                if 'configs' in config:
-                    split = configs
-                else:
-                    split = config['splits'][str(self.task_split)]
-
-                self.log.info(split)
-
-                if split and split['configs']:
-                    self.env_vars.update({'DATA_PIPELINE_CONFIG': json.dumps(split)})
-                    return super().execute(context)
-                else:
-                    self.log.info(
-                        f'Empty split config for split {self.task_split}. split config: {split}. config: {config}')
-            else:
-                raise ValueError('Config not found in task: ' + self.config_task_id)
-        else:
-            self.env_vars.update({'DATA_PIPELINE_CONFIG': '{}'})
-            return super().execute(context)
diff --git a/rainbow/runners/airflow/operators/kubernetes_pod_operator_with_input_output.py b/rainbow/runners/airflow/operators/kubernetes_pod_operator_with_input_output.py
new file mode 100644
index 0000000..eb6fa83
--- /dev/null
+++ b/rainbow/runners/airflow/operators/kubernetes_pod_operator_with_input_output.py
@@ -0,0 +1,148 @@
+import json
+
+from airflow.contrib.operators.kubernetes_pod_operator import KubernetesPodOperator
+
+
+def split_list(seq, num):
+    avg = len(seq) / float(num)
+    out = []
+    last = 0.0
+
+    while last < len(seq):
+        out.append(seq[int(last):int(last + avg)])
+        last += avg
+
+    return out
+
+
+_IS_SPLIT_KEY = 'is_split'
+
+
+class PrepareInputOperator(KubernetesPodOperator):
+
+    def __init__(self,
+                 input_type=None,
+                 input_path=None,
+                 split_input=False,
+                 executors=1,
+                 *args,
+                 **kwargs):
+        namespace = kwargs['namespace']
+        image = kwargs['image']
+        name = kwargs['name']
+
+        del kwargs['namespace']
+        del kwargs['image']
+        del kwargs['name']
+
+        super().__init__(
+            namespace=namespace,
+            image=image,
+            name=name,
+            *args,
+            **kwargs)
+
+        self.input_type = input_type
+        self.input_path = input_path
+        self.executors = executors
+        self.split_input = split_input
+
+    def execute(self, context):
+        input_dict = {}
+
+        self.log.info(f'config type: {self.input_type}')
+
+        ti = context['task_instance']
+
+        if self.input_type:
+            if self.input_type == 'file':
+                input_dict = {}  # future feature: return config from file
+            elif self.input_type == 'sql':
+                input_dict = {}  # future feature: return from sql config
+            elif self.input_type == 'task':
+                self.log.info(self.input_path)
+                input_dict = ti.xcom_pull(task_ids=self.input_path)
+            elif self.input_type == 'static':
+                input_dict = json.loads(self.input_path)
+            else:
+                raise ValueError(f'Unknown config type: {self.input_type}')
+
+        # TODO: pass run_id as well as env var
+        run_id = context['dag_run'].run_id
+        print(f'run_id = {run_id}')
+
+        if input_dict:
+            self.log.info(f'Generated input: {input_dict}')
+
+            if self.split_input:
+                input_splits = split_list(input_dict, self.executors)
+
+                ti.xcom_push(key=_IS_SPLIT_KEY, value=True)
+
+                return input_splits
+            else:
+                return input_dict
+        else:
+            return {}
+
+    def run_pod(self, context):
+        return super().execute(context)
+
+
+class KubernetesPodOperatorWithInputAndOutput(KubernetesPodOperator):
+    """
+    TODO: pydoc
+    """
+
+    _RAINBOW_INPUT_ENV_VAR = 'RAINBOW_INPUT'
+
+    def __init__(self,
+                 task_split,
+                 input_task_id=None,
+                 *args,
+                 **kwargs):
+        namespace = kwargs['namespace']
+        image = kwargs['image']
+        name = kwargs['name']
+
+        del kwargs['namespace']
+        del kwargs['image']
+        del kwargs['name']
+
+        super().__init__(
+            namespace=namespace,
+            image=image,
+            name=name,
+            *args,
+            **kwargs)
+
+        self.input_task_id = input_task_id
+        self.task_split = task_split
+
+    def execute(self, context):
+        task_input = {}
+
+        if self.input_task_id:
+            ti = context['task_instance']
+
+            self.log.info(f'Fetching input for task {self.task_split}.')
+
+            task_input = ti.xcom_pull(task_ids=self.input_task_id)
+
+            is_split = ti.xcom_pull(task_ids=self.input_task_id, key=_IS_SPLIT_KEY)
+            self.log.info(f'is_split = {is_split}')
+            if is_split:
+                self.log.info(f'Fetching split {self.task_split} of input.')
+
+                task_input = task_input[self.task_split]
+
+        if task_input:
+            self.log.info(f'task input = {task_input}')
+
+            self.env_vars.update({self._RAINBOW_INPUT_ENV_VAR: json.dumps(task_input)})
+        else:
+            self.env_vars.update({self._RAINBOW_INPUT_ENV_VAR: '{}'})
+
+            self.log.info(f'Empty input for task {self.task_split}.')
+
+        return super().execute(context)
diff --git a/rainbow/runners/airflow/tasks/python.py b/rainbow/runners/airflow/tasks/python.py
index ac46d0b..8bd11cf 100644
--- a/rainbow/runners/airflow/tasks/python.py
+++ b/rainbow/runners/airflow/tasks/python.py
@@ -21,9 +21,9 @@ from airflow.models import Variable
 from airflow.operators.dummy_operator import DummyOperator
 
 from rainbow.runners.airflow.model import task
-from rainbow.runners.airflow.operators.kubernetes_pod_operator import \
-    ConfigurableKubernetesPodOperator, \
-    ConfigureParallelExecutionOperator
+from rainbow.runners.airflow.operators.kubernetes_pod_operator_with_input_output import \
+    KubernetesPodOperatorWithInputAndOutput, \
+    PrepareInputOperator
 
 
 class PythonTask(task.Task):
@@ -42,25 +42,24 @@ class PythonTask(task.Task):
         self.env_vars = self.__env_vars()
         self.kubernetes_kwargs = self.__kubernetes_kwargs()
         self.cmds, self.arguments = self.__kubernetes_cmds_and_arguments()
-        self.config_task_id = self.task_name + '_input'
+        self.input_task_id = self.task_name + '_input'
         self.executors = self.__executors()
 
     def apply_task_to_dag(self):
-
-        config_task = None
+        input_task = None
 
         if self.input_type in ['static', 'task']:
-            config_task = self.__config_task(config_task)
+            input_task = self.__input_task()
 
         if self.executors == 1:
-            return self.__apply_task_to_dag_single_executor(config_task)
+            return self.__apply_task_to_dag_single_executor(input_task)
         else:
-            return self.__apply_task_to_dag_multiple_executors(config_task)
+            return self.__apply_task_to_dag_multiple_executors(input_task)
 
-    def __apply_task_to_dag_multiple_executors(self, config_task):
-        if not config_task:
-            config_task = DummyOperator(
-                task_id=self.config_task_id,
+    def __apply_task_to_dag_multiple_executors(self, input_task):
+        if not input_task:
+            input_task = DummyOperator(
+                task_id=self.input_task_id,
                 trigger_rule=self.trigger_rule,
                 dag=self.dag
             )
@@ -71,7 +70,7 @@ class PythonTask(task.Task):
         )
 
         if self.parent:
-            self.parent.set_downstream(config_task)
+            self.parent.set_downstream(input_task)
 
             for i in range(self.executors):
                 split_task = self.__create_pod_operator(
@@ -80,51 +79,51 @@ class PythonTask(task.Task):
                     image=self.image
                 )
 
-                config_task.set_downstream(split_task)
+                input_task.set_downstream(split_task)
 
                 split_task.set_downstream(end_task)
 
         return end_task
 
     def __create_pod_operator(self, task_id, task_split, image):
-        return ConfigurableKubernetesPodOperator(
+        return KubernetesPodOperatorWithInputAndOutput(
             task_id=task_id,
-            config_task_id=self.config_task_id,
-            task_split=task_split,
+            input_task_id=self.input_task_id,
+            task_split=task_split if task_split else 0,
             image=image,
             cmds=self.cmds,
             arguments=self.arguments,
             **self.kubernetes_kwargs
         )
 
-    def __apply_task_to_dag_single_executor(self, config_task):
+    def __apply_task_to_dag_single_executor(self, input_task):
         pod_task = self.__create_pod_operator(
             task_id=f'{self.task_name}',
-            task_split=0,
+            task_split=None,
             image=f'''{self.image}'''
         )
 
         first_task = pod_task
 
-        if config_task:
-            first_task = config_task
+        if input_task:
+            first_task = input_task
             first_task.set_downstream(pod_task)
         if self.parent:
             self.parent.set_downstream(first_task)
 
         return pod_task
 
-    def __config_task(self, config_task):
-        self.env_vars.update({'DATA_PIPELINE_INPUT': self.input_path})
-        config_task = ConfigureParallelExecutionOperator(
-            task_id=self.config_task_id,
+    def __input_task(self):
+        return PrepareInputOperator(
+            task_id=self.input_task_id,
             image=self.image,
-            config_type=self.input_type,
-            config_path=self.input_path,
+            input_type=self.input_type,
+            input_path=self.input_path,
+            split_input=True if 'split_input' in self.config and
+                                self.config['split_input'] else False,
             executors=self.executors,
             **self.kubernetes_kwargs
         )
-        return config_task
 
     def __executors(self):
         executors = 1
diff --git a/tests/runners/airflow/build/python/test_python_image.py b/tests/runners/airflow/build/python/test_python_image.py
index 368b05d..d190fba 100644
--- a/tests/runners/airflow/build/python/test_python_image.py
+++ b/tests/runners/airflow/build/python/test_python_image.py
@@ -29,16 +29,24 @@ class TestPythonImage(TestCase):
 
         image_name = config['image']
 
-        PythonImage().build('tests/runners/airflow/rainbow', 'hello_world', 'image_name')
+        PythonImage().build('tests/runners/airflow/rainbow', 'hello_world', image_name)
 
         # TODO: elaborate test of image, validate input/output
 
         docker_client = docker.from_env()
         docker_client.images.get(image_name)
-        container_log = docker_client.containers.run(image_name, "python hello_world.py")
+
+        cmd = 'export RAINBOW_INPUT="{}" && ' + \
+              'sh container-setup.sh && ' + \
+              'python hello_world.py && ' + \
+              'sh container-teardown.sh'
+        cmds = ['/bin/bash', '-c', cmd]
+
+        container_log = docker_client.containers.run(image_name, cmds)
+
         docker_client.close()
 
-        self.assertEqual("b'Hello world!\\n'", str(container_log))
+        self.assertEqual("b'Hello world!\\n\\n{}\\n'", str(container_log))
 
     @staticmethod
     def __create_conf(task_id):
diff --git a/tests/runners/airflow/build/test_build_rainbow.py b/tests/runners/airflow/build/test_build_rainbow.py
index 533848f..0817d6c 100644
--- a/tests/runners/airflow/build/test_build_rainbow.py
+++ b/tests/runners/airflow/build/test_build_rainbow.py
@@ -9,7 +9,7 @@ class TestBuildRainbow(TestCase):
 
     def test_build_rainbow(self):
         docker_client = docker.client.from_env()
-        image_names = ['rainbow_image', 'rainbow_image2']
+        image_names = ['my_static_input_task_image', 'my_task_output_input_task_image']
 
         for image_name in image_names:
             if len(docker_client.images.list(image_name)) > 0:
diff --git a/tests/runners/airflow/rainbow/hello_world/hello_world.py b/tests/runners/airflow/rainbow/hello_world/hello_world.py
index 9b87c05..3eae465 100644
--- a/tests/runners/airflow/rainbow/hello_world/hello_world.py
+++ b/tests/runners/airflow/rainbow/hello_world/hello_world.py
@@ -15,4 +15,13 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+import json
+
 print('Hello world!')
+print()
+
+with open('/rainbow_input.json') as file:
+    print(json.loads(file.readline()))
+
+with open('/output.json', 'w') as file:
+    file.write(json.dumps({'a': 1, 'b': 2}))
diff --git a/tests/runners/airflow/rainbow/rainbow.yml b/tests/runners/airflow/rainbow/rainbow.yml
index 3e3ec4b..2000621 100644
--- a/tests/runners/airflow/rainbow/rainbow.yml
+++ b/tests/runners/airflow/rainbow/rainbow.yml
@@ -25,28 +25,41 @@ pipelines:
     schedule: 0 * 1 * *
     metrics-namespace: TestNamespace
     tasks:
-      - task: my_static_config_task
+      - task: my_static_input_task
         type: python
-        description: my 1st ds task
-        image: rainbow_image
+        description: static input task
+        image: my_static_input_task_image
         source: hello_world
         env_vars:
           env1: "a"
           env2: "b"
         input_type: static
-        input_path: "{\"configs\": [ { \"campaign_id\": 10 }, { \"campaign_id\": 20 } ]}"
-        cmd: 'python hello_world.py'
-      - task: my_static_config_task2
+        input_path: '[ { "foo": "bar" }, { "foo": "baz" } ]'
+        output_path: /output.json
+        cmd: python hello_world.py
+#      - task: my_parallelized_static_input_task
+#        type: python
+#        description: parallelized static input task
+#        image: my_static_input_task_image
+#        env_vars:
+#          env1: "a"
+#          env2: "b"
+#        input_type: static
+#        input_path: '[ { "foo": "bar" }, { "foo": "baz" } ]'
+#        split_input: True
+#        executors: 2
+#        cmd: python hello_world.py
+      - task: my_task_output_input_task
         type: python
-        description: my 1st ds task
-        image: rainbow_image2
+        description: parallelized static input task
+        image: my_task_output_input_task_image
         source: hello_world
         env_vars:
           env1: "a"
           env2: "b"
-        input_type: static
-        input_path: "{\"configs\": [ { \"campaign_id\": 10 }, { \"campaign_id\": 20 } ]}"
-        cmd: 'python hello_world.py'
+        input_type: task
+        input_path: my_static_input_task
+        cmd: python hello_world.py
 services:
   - service:
     name: myserver1
diff --git a/tests/runners/airflow/tasks/test_python.py b/tests/runners/airflow/tasks/test_python.py
index ffdcac3..260f71d 100644
--- a/tests/runners/airflow/tasks/test_python.py
+++ b/tests/runners/airflow/tasks/test_python.py
@@ -19,8 +19,8 @@
 import unittest
 from unittest import TestCase
 
-from rainbow.runners.airflow.operators.kubernetes_pod_operator import \
-    ConfigurableKubernetesPodOperator
+from rainbow.runners.airflow.operators.kubernetes_pod_operator_with_input_output import \
+    KubernetesPodOperatorWithInputAndOutput
 from rainbow.runners.airflow.tasks import python
 from tests.util import dag_test_utils
 
@@ -41,7 +41,7 @@ class TestPythonTask(TestCase):
         self.assertEqual(len(dag.tasks), 1)
         dag_task0 = dag.tasks[0]
 
-        self.assertIsInstance(dag_task0, ConfigurableKubernetesPodOperator)
+        self.assertIsInstance(dag_task0, KubernetesPodOperatorWithInputAndOutput)
         self.assertEqual(dag_task0.task_id, task_id)
 
     @staticmethod


[incubator-liminal] 31/43: Add pipeline configuration as default arguments

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 253a15a16cef606085632d83ddf48a6640450b52
Author: Oded Rosenberg <od...@naturalint.com>
AuthorDate: Thu Apr 9 15:06:03 2020 +0300

    Add pipeline configuration as default arguments
---
 rainbow/runners/airflow/dag/rainbow_dags.py    | 11 ++++++++---
 tests/runners/airflow/dag/test_rainbow_dags.py |  9 +++++++++
 tests/runners/airflow/rainbow/rainbow.yml      |  7 ++++++-
 3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/rainbow/runners/airflow/dag/rainbow_dags.py b/rainbow/runners/airflow/dag/rainbow_dags.py
index 6b071fd..d5e3be1 100644
--- a/rainbow/runners/airflow/dag/rainbow_dags.py
+++ b/rainbow/runners/airflow/dag/rainbow_dags.py
@@ -28,6 +28,8 @@ from rainbow.runners.airflow.model.task import Task
 from rainbow.runners.airflow.tasks.defaults.job_end import JobEndTask
 from rainbow.runners.airflow.tasks.defaults.job_start import JobStartTask
 
+__DEPENDS_ON_PAST = 'depends_on_past'
+
 
 def register_dags(configs_path):
     """
@@ -47,12 +49,15 @@ def register_dags(configs_path):
             for pipeline in config['pipelines']:
                 pipeline_name = pipeline['pipeline']
 
-                default_args = {
-                    'owner': config['owner'],
+                default_args = {k: v for k, v in pipeline.items()}
+
+                override_args = {
                     'start_date': datetime.combine(pipeline['start_date'], datetime.min.time()),
-                    'depends_on_past': False,
+                    __DEPENDS_ON_PAST: default_args[__DEPENDS_ON_PAST] if __DEPENDS_ON_PAST in default_args else False,
                 }
 
+                default_args.update(override_args)
+
                 dag = DAG(
                     dag_id=pipeline_name,
                     default_args=default_args,
diff --git a/tests/runners/airflow/dag/test_rainbow_dags.py b/tests/runners/airflow/dag/test_rainbow_dags.py
index c744ce5..5ffdf07 100644
--- a/tests/runners/airflow/dag/test_rainbow_dags.py
+++ b/tests/runners/airflow/dag/test_rainbow_dags.py
@@ -31,6 +31,15 @@ class Test(TestCase):
 
         self.assertIsInstance(task_dict['end'], JobEndOperator)
 
+    def test_default_args(self):
+        dag = self.get_register_dags()[0]
+        default_args = dag.default_args
+
+        keys = default_args.keys()
+        self.assertIn('default_arg_loaded', keys)
+        self.assertIn('default_array_loaded', keys)
+        self.assertIn('default_object_loaded', keys)
+
     @staticmethod
     def get_register_dags():
         base_path = os.path.join(os.path.dirname(__file__), '../rainbow')
diff --git a/tests/runners/airflow/rainbow/rainbow.yml b/tests/runners/airflow/rainbow/rainbow.yml
index 05c0a09..0b08a1f 100644
--- a/tests/runners/airflow/rainbow/rainbow.yml
+++ b/tests/runners/airflow/rainbow/rainbow.yml
@@ -17,12 +17,17 @@
 # under the License.
 ---
 name: MyPipeline
-owner: Bosco Albert Baracus
 pipelines:
   - pipeline: my_pipeline
+    owner: Bosco Albert Baracus
     start_date: 1970-01-01
     timeout_minutes: 45
     schedule: 0 * 1 * *
+    default_arg_loaded: check
+    default_array_loaded: [2, 3, 4]
+    default_object_loaded:
+      key1: val1
+      key2: val2
     metrics:
      namespace: TestNamespace
      backends: [ 'cloudwatch' ]


[incubator-liminal] 10/43: Refactor PythonTask

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 9292bb85b77084f31c48c1c982a5a7f593fefa66
Author: aviemzur <av...@gmail.com>
AuthorDate: Thu Mar 12 10:14:55 2020 +0200

    Refactor PythonTask
---
 rainbow/runners/airflow/tasks/python.py | 122 +++++++++++++++++---------------
 1 file changed, 65 insertions(+), 57 deletions(-)

diff --git a/rainbow/runners/airflow/tasks/python.py b/rainbow/runners/airflow/tasks/python.py
index 8317854..eb00c0e 100644
--- a/rainbow/runners/airflow/tasks/python.py
+++ b/rainbow/runners/airflow/tasks/python.py
@@ -58,76 +58,84 @@ class PythonTask(task.Task):
 
     def apply_task_to_dag(self):
 
-        def create_pod_operator(task_id, task_split, image):
-            return ConfigurableKubernetesPodOperator(
-                task_id=task_id,
-                config_task_id=self.config_task_id,
-                task_split=task_split,
-                image=image,
-                cmds=self.cmds,
-                arguments=self.arguments,
-                **self.kubernetes_kwargs
-            )
-
         config_task = None
 
         if self.input_type in ['static', 'task']:
-            self.env_vars.update({'DATA_PIPELINE_INPUT': self.input_path})
-
-            config_task = ConfigureParallelExecutionOperator(
-                task_id=self.config_task_id,
-                image=self.image,
-                config_type=self.input_type,
-                config_path=self.input_path,
-                executors=self.executors,
-                **self.kubernetes_kwargs
-            )
+            config_task = self.__config_task(config_task)
 
         if self.executors == 1:
-            pod_task = create_pod_operator(
-                task_id=f'{self.task_name}',
-                task_split=0,
-                image=f'''{self.image}'''
-            )
-
-            first_task = pod_task
-
-            if config_task:
-                first_task = config_task
-                first_task.set_downstream(pod_task)
-
-            if self.parent:
-                self.parent.set_downstream(first_task)
-
-            return pod_task
+            return self.__apply_task_to_dag_single_executor(config_task)
         else:
-            if not config_task:
-                config_task = DummyOperator(
-                    task_id=self.config_task_id,
-                    trigger_rule=self.trigger_rule,
-                    dag=self.dag
-                )
+            return self.__apply_task_to_dag_multiple_executors(config_task)
 
-            end_task = DummyOperator(
-                task_id=self.task_name,
+    def __apply_task_to_dag_multiple_executors(self, config_task):
+        if not config_task:
+            config_task = DummyOperator(
+                task_id=self.config_task_id,
+                trigger_rule=self.trigger_rule,
                 dag=self.dag
             )
 
-            if self.parent:
-                self.parent.set_downstream(config_task)
-
-                for i in range(self.executors):
-                    split_task = create_pod_operator(
-                        task_id=f'''{self.task_name}_{i}''',
-                        task_split=i,
-                        image=self.image
-                    )
+        end_task = DummyOperator(
+            task_id=self.task_name,
+            dag=self.dag
+        )
 
-                    config_task.set_downstream(split_task)
+        if self.parent:
+            self.parent.set_downstream(config_task)
 
-                    split_task.set_downstream(end_task)
+            for i in range(self.executors):
+                split_task = self.__create_pod_operator(
+                    task_id=f'''{self.task_name}_{i}''',
+                    task_split=i,
+                    image=self.image
+                )
 
-            return end_task
+                config_task.set_downstream(split_task)
+
+                split_task.set_downstream(end_task)
+
+        return end_task
+
+    def __create_pod_operator(self, task_id, task_split, image):
+        return ConfigurableKubernetesPodOperator(
+            task_id=task_id,
+            config_task_id=self.config_task_id,
+            task_split=task_split,
+            image=image,
+            cmds=self.cmds,
+            arguments=self.arguments,
+            **self.kubernetes_kwargs
+        )
+
+    def __apply_task_to_dag_single_executor(self, config_task):
+        pod_task = self.__create_pod_operator(
+            task_id=f'{self.task_name}',
+            task_split=0,
+            image=f'''{self.image}'''
+        )
+
+        first_task = pod_task
+
+        if config_task:
+            first_task = config_task
+            first_task.set_downstream(pod_task)
+        if self.parent:
+            self.parent.set_downstream(first_task)
+
+        return pod_task
+
+    def __config_task(self, config_task):
+        self.env_vars.update({'DATA_PIPELINE_INPUT': self.input_path})
+        config_task = ConfigureParallelExecutionOperator(
+            task_id=self.config_task_id,
+            image=self.image,
+            config_type=self.input_type,
+            config_path=self.input_path,
+            executors=self.executors,
+            **self.kubernetes_kwargs
+        )
+        return config_task
 
     def __executors(self):
         executors = 1


[incubator-liminal] 25/43: Format yaml in README

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 779e86e5832c45078cc06d090b05108dcd4f3b73
Author: aviemzur <av...@gmail.com>
AuthorDate: Mon Mar 23 10:32:25 2020 +0200

    Format yaml in README
---
 README.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 62036e0..90c6b78 100644
--- a/README.md
+++ b/README.md
@@ -16,7 +16,7 @@ Using simple YAML configuration, create your own schedule data pipelines (a sequ
 perform), application servers,  and more.
 
 ## Example YAML config file
-
+```yaml
 name: MyPipeline
 owner: Bosco Albert Baracus
 pipelines:
@@ -74,6 +74,7 @@ services:
       - endpoint: /myendpoint1
         module: myserver.my_server
         function: myendpoint1func
+```
 
 # Installation
 


[incubator-liminal] 34/43: Add short architecture description

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 3245e47d2e020bd20c802653c933b13dfa5f643f
Author: lior.schachter <li...@naturalint.com>
AuthorDate: Sat Apr 11 15:28:49 2020 +0300

    Add short architecture description
---
 rainbow-arch.md | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/rainbow-arch.md b/rainbow-arch.md
new file mode 100644
index 0000000..3e48712
--- /dev/null
+++ b/rainbow-arch.md
@@ -0,0 +1,43 @@
+# Rainbow
+Rainbow is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way. The platform provides the abstractions and declarative capabilities for data extraction & feature engineering followed by model training and serving. Apache Rainbow's goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model  [...]
+
+## Motivation
+The challenges involved in operationalizing machine learning models are one of the main reasons why many machine learning projects never make it to production.
+The process involves automating and orchestrating multiple steps which run on heterogeneous infrastructure - different compute environments, data processing platforms, ML frameworks, notebooks, containers and monitoring tools.
+There are no mature standards for this workflow, and most organizations do not have the experience to build it in-house. In the best case, dev-ds-devops teams form in order to accomplish this task together; in many cases, it's the data scientists who try to deal with this themselves without the knowledge or the inclination to become infrastructure experts.
+As a result, many projects never make it through the cycle. Those who do suffer from a very long lead time from a successful experiment to an operational, refreshable, deployed and monitored model in production. 
+The goal of Apache Rainbow is to simplify the creation and management of machine learning pipelines by data engineers & scientists. The platform provides declarative building blocks which define the workflow, orchestrate the underlying infrastructure,  take care of non functional concerns, enabling focus in business logic / algorithm code.
+Some Commercial E2E solutions have started to emerge in the last few years, however, they are limited to specific parts of the workflow, such as Databricks MLFlow. Other solutions are tied to specific environments (e.g. SageMaker on AWS).
+
+## High Level Architecture
+The platform is aimed to provide data engineers & scientists with a solution for end to end flows from model training to real time inference in production. It’s architecture enables and promotes adoption of specific components in existing (non-Rainbow) frameworks, as well as seamless integration with other open source projects. Rainbow was created to enable scalability in ML efforts and after a thorough review of available solutions and frameworks, which did not meet our main KPIs: 
+Provide an opinionated but customizable end-to-end workflow
+Abstract away the complexity of underlying infrastructure
+Support major open source tools and cloud-native infrastructure to carry out many of the steps
+Allow teams to leverage their existing investments or bring in their tools of choice into the workflow
+We have found that other tech companies in the Israeli Hi-Tech ecosystem also have an interest in such a platform, hence decided to share our work with the community.
+The following diagram depicts these main components and where Apache Rainbow comes in:
+
+![raibow-arch1](https://raw.githubusercontent.com/Natural-Intelligence/rainbow/master/images/rainbow_001.png)
+
+A classical data scientist workflow includes some base phases: 
+_Train, Deploy and Consume._
+
+**The Train phase includes the following tasks:**
+
+1. Fetch -  get the data needed to build a model - usually using SQL
+1. Clean - make sure the data is useful for building the model 
+1. Prepare - split data and encode features from the data according to model needs 
+1. Train - Build the model and tune it
+1. Evaluate - make sure the model is correct - run it on a test set, etc…
+1. Validate - make sure the model is up to the standards you need
+
+**The Deploy phase includes these tasks:**
+1. Deploy - make it available for usage in production
+1. Inference - Batch or Real-time - use the model to evaluate data by your offline or online by your applications
+1. Consume - The actual use of the models created by applications and ETLs, usually through APIs to the batch or real-time inference that usually rely on Model and Feature stores.
+ 
+Rainbow provides its users a declarative composition capabilities to materialize these steps in a robust way, while exploiting existing frameworks and tools. e.g. Data science frameworks such as scikit-learn, Tensor flow, Keras and such, for running core data science algorithms; as numerous core mechanisms as data stores, processing engines, parallelism, schedulers, code deployment as well as batch and real-time inference.
+Rainbow allows the creation and wiring of these kinds of functional and non functional tasks while making the underlying infrastructure used by these tasks very easy to use and even abstracted away entirely. While handling the non-functional aspects as monitoring (in a standard fashion) deployment, scheduling, resource management and execution.
+
+![raibow-arch2](https://raw.githubusercontent.com/Natural-Intelligence/rainbow/master/images/rainbow_002.png)
\ No newline at end of file


[incubator-liminal] 42/43: fix split list function

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 249aa74d2e038e0e6617df7503a559d7fa1a6111
Author: roei <ro...@naturalint.com>
AuthorDate: Thu Jul 2 11:36:52 2020 +0300

    fix split list function
---
 .../kubernetes_pod_operator_with_input_output.py   | 41 ++++++++-------------
 ...st_kubernetes_pod_operator_with_input_output.py | 43 ++++++++++++++++++++++
 2 files changed, 59 insertions(+), 25 deletions(-)

diff --git a/liminal/runners/airflow/operators/kubernetes_pod_operator_with_input_output.py b/liminal/runners/airflow/operators/kubernetes_pod_operator_with_input_output.py
index 5833550..267010e 100644
--- a/liminal/runners/airflow/operators/kubernetes_pod_operator_with_input_output.py
+++ b/liminal/runners/airflow/operators/kubernetes_pod_operator_with_input_output.py
@@ -3,16 +3,11 @@ import json
 from airflow.contrib.operators.kubernetes_pod_operator import KubernetesPodOperator
 
 
-def split_list(seq, num):
-    avg = len(seq) / float(num)
-    out = []
-    last = 0.0
-
-    while last < len(seq):
-        out.append(seq[int(last):int(last + avg)])
-        last += avg
-
-    return out
+def _split_list(seq, num):
+    k, m = divmod(len(seq), num)
+    return list(
+        (seq[i * k + min(i, m):(i + 1) * k + min(i + 1, m)] for i in range(num))
+    )
 
 
 _IS_SPLIT_KEY = 'is_split'
@@ -27,13 +22,9 @@ class PrepareInputOperator(KubernetesPodOperator):
                  executors=1,
                  *args,
                  **kwargs):
-        namespace = kwargs['namespace']
-        image = kwargs['image']
-        name = kwargs['name']
-
-        del kwargs['namespace']
-        del kwargs['image']
-        del kwargs['name']
+        namespace = kwargs.pop('namespace')
+        image = kwargs.pop('image')
+        name = kwargs.pop('name')
 
         super().__init__(
             namespace=namespace,
@@ -74,7 +65,11 @@ class PrepareInputOperator(KubernetesPodOperator):
             self.log.info(f'Generated input: {input_dict}')
 
             if self.split_input:
-                input_splits = split_list(input_dict, self.executors)
+                input_splits = _split_list(input_dict, self.executors)
+                numbered_splits = list(
+                    zip(range(len(input_splits)), input_splits)
+                )
+                self.log.info(numbered_splits)
 
                 ti.xcom_push(key=_IS_SPLIT_KEY, value=True)
 
@@ -100,13 +95,9 @@ class KubernetesPodOperatorWithInputAndOutput(KubernetesPodOperator):
                  input_task_id=None,
                  *args,
                  **kwargs):
-        namespace = kwargs['namespace']
-        image = kwargs['image']
-        name = kwargs['name']
-
-        del kwargs['namespace']
-        del kwargs['image']
-        del kwargs['name']
+        namespace = kwargs.pop('namespace')
+        image = kwargs.pop('image')
+        name = kwargs.pop('name')
 
         super().__init__(
             namespace=namespace,
diff --git a/tests/runners/airflow/operators/test_kubernetes_pod_operator_with_input_output.py b/tests/runners/airflow/operators/test_kubernetes_pod_operator_with_input_output.py
new file mode 100644
index 0000000..6895382
--- /dev/null
+++ b/tests/runners/airflow/operators/test_kubernetes_pod_operator_with_input_output.py
@@ -0,0 +1,43 @@
+import unittest
+from unittest import TestCase
+import itertools
+
+from liminal.runners.airflow.operators.\
+    kubernetes_pod_operator_with_input_output import _split_list
+
+
+class TestSplitList(TestCase):
+    def setUp(self) -> None:
+        self.short_seq = [{f'task_{i}': f'value_{i}'} for i in range(3)]
+        self.long_seq = [{f'task_{i}': f'value_{i}'} for i in range(10)]
+
+    def test_seq_equal_num(self):
+        num = len(self.short_seq)
+        result = _split_list(self.short_seq, num)
+        expected = [[{'task_0': 'value_0'}], [{'task_1': 'value_1'}],
+                    [{'task_2': 'value_2'}]]
+        self.assertListEqual(expected, result)
+
+    def test_seq_grater_than_num(self):
+        num = 3
+        result = _split_list(self.long_seq, num)
+        n_tasks = len(self.long_seq)
+
+        min_length = min([len(i) for i in result])
+        max_length = max([len(i) for i in result])
+        flat_results = list(itertools.chain(*result))
+
+        self.assertGreaterEqual(max_length - min_length, 1)
+        self.assertEqual(n_tasks, len(flat_results))
+        self.assertTrue(all([{f'task_{i}': f'value_{i}'} in flat_results
+                             for i in range(n_tasks)]))
+
+    def test_seq_smaller_than_num(self):
+        test_num_range = [8, 9, 10, 11, 12]
+        for num in test_num_range:
+            result = _split_list(self.short_seq, num)
+            self.assertEqual(len(result), num)
+            self.assertTrue(all([[i] in result for i in self.short_seq]))
+            self.assertEqual([[]] * (num - len(self.short_seq)),
+                             [i for i in result if i == []])
+


[incubator-liminal] 37/43: Rainbow local mode

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 6da38b82f53a20a91bdbf3f4a4afd56c709fb206
Author: assapin <47...@users.noreply.github.com>
AuthorDate: Thu Jun 11 16:33:22 2020 +0300

    Rainbow local mode
---
 README.md                                          |  44 +++++++++--
 images/airflow.png                                 | Bin 0 -> 70049 bytes
 rainbow/build/build_rainbows.py                    |  25 ++++--
 rainbow-cli => rainbow/core/environment.py         |  32 ++++----
 rainbow/core/util/class_util.py                    |  62 +++++++++------
 rainbow/core/util/files_util.py                    |   2 +-
 .../util/files_util.py => runners/__init__.py}     |  13 ---
 rainbow/runners/airflow/config/__init__.py         |   0
 .../airflow/config/standalone_variable_backend.py  |  27 +++----
 rainbow/runners/airflow/dag/rainbow_dags.py        |  31 +++++---
 .../airflow/operators/job_status_operator.py       |   9 ++-
 .../runners/airflow/tasks/defaults/job_end.py~HEAD |  44 -----------
 .../airflow/tasks/defaults/job_start.py~HEAD       |  43 ----------
 rainbow/runners/airflow/tasks/python.py            |  11 +--
 rainbow/runners/airflow/tasks/spark.py             |   1 -
 requirements-airflow.txt                           |   5 ++
 requirements.txt                                   |   6 +-
 scripts/docker-compose.yml                         |  40 ++++++++++
 scripts/package.sh                                 |  69 ++++++++++++++++
 scripts/rainbow                                    |  87 +++++++++++++++++++++
 setup.py                                           |  48 ++++++++++++
 tests/runners/airflow/rainbow/requirements.txt     |   1 +
 tests/util/test_class_utils.py                     |  32 ++++++++
 .../util/test_pkg_1/__init__.py                    |  13 ---
 .../util/test_pkg_1/test_clazz_base.py             |  15 ++--
 .../util/test_pkg_1/test_pkg_1_1/__init__.py       |  13 ---
 .../test_pkg_1/test_pkg_1_1/test_clazz_child_1.py  |  18 ++---
 .../test_pkg_1/test_pkg_1_1/test_clazz_child_2.py  |  14 +---
 .../test_pkg_1_1/test_pkg_1_1_1/__init__.py        |  13 ---
 .../test_pkg_1_1_1/test_clazz_leaf_1.py            |  21 ++---
 .../test_pkg_1_1/test_pkg_1_1_2/__init__.py        |   0
 .../test_pkg_1_1_2/test_clazz_leaf_2.py            |   8 ++
 32 files changed, 469 insertions(+), 278 deletions(-)

diff --git a/README.md b/README.md
index 467edf2..ee2f961 100644
--- a/README.md
+++ b/README.md
@@ -76,12 +76,44 @@ services:
         function: myendpoint1func
 ```
 
-## Example repository structure
-
-[Example repository structure](
-https://github.com/Natural-Intelligence/rainbow/tree/master/tests/runners/airflow/rainbow]
-)
 
 # Installation
+1. Install this package
+```bash
+   pip install git+https://github.com/Natural-Intelligence/rainbow.git@rainbow_local_mode
+```
+2. Optional: set RAINBOW_HOME to path of your choice (if not set, will default to ~/rainbow_home)
+```bash
+echo 'export RAINBOW_HOME=</path/to/some/folder>' >> ~/.bash_profile && source ~/.bash_profile
+```
+
+# Authoring pipelines
+
+This involves at minimum creating a single file called rainbow.yml as in the example above.
+
+If your pipeline requires custom python code to implement tasks, they should be organized 
+[like this](https://github.com/Natural-Intelligence/rainbow/tree/master/tests/runners/airflow/rainbow)
+
+If your pipeline  introduces imports of external packages which are not already a part 
+of the rainbow framework (i.e. you had to pip install them yourself), you need to also provide 
+a requirements.txt in the root of your project.
+
+# Testing the pipeline locally
+
+When your pipeline code is ready, you can test it by running it locally on your machine.
+
+1. Deploy the pipeline:
+```bash
+cd </path/to/your/rainbow/code> 
+rainbow deploy
+```
+2. Make sure you have docker running
+3. Start the Server
+```bash
+rainbow start
+```
+4. Navigate to [http://localhost:8080/admin]
+5. You should see your ![pipeline](https://raw.githubusercontent.com/Natural-Intelligence/rainbow/rainbow_local_mode/images/airflow.png")
 
-TODO: installation.
+### Running Tests (for contributors)
+When doing local development and running Rainbow unit-tests, make sure to set RAINBOW_STAND_ALONE_MODE=True
diff --git a/images/airflow.png b/images/airflow.png
new file mode 100644
index 0000000..229f8fa
Binary files /dev/null and b/images/airflow.png differ
diff --git a/rainbow/build/build_rainbows.py b/rainbow/build/build_rainbows.py
index b7ea6eb..66d27cb 100644
--- a/rainbow/build/build_rainbows.py
+++ b/rainbow/build/build_rainbows.py
@@ -74,33 +74,42 @@ def __build_image(base_path, builder_config, builder):
 
 
 def __get_task_build_class(task_type):
-    return task_build_classes[task_type] if task_type in task_build_classes else None
+    return task_build_types.get(task_type, None)
 
 
 def __get_service_build_class(service_type):
-    return service_build_classes[service_type] if service_type in service_build_classes else None
+    return service_build_types.get(service_type, None)
 
 
 print(f'Loading image builder implementations..')
 
 # TODO: add configuration for user image builders package
-image_builders_package = 'rainbow/build/image'
-user_image_builders_package = 'TODO: user_image_builders_package'
+image_builders_package = 'rainbow.build.image'
+# user_image_builders_package = 'TODO: user_image_builders_package'
 
 task_build_classes = class_util.find_subclasses_in_packages(
-    [image_builders_package, user_image_builders_package],
+    [image_builders_package],
     ImageBuilder)
 
+
+def get_types_dict(task_build_classes):
+    # take module name from class name
+    return {x.split(".")[-2]: c for x, c in task_build_classes.items()}
+
+
+task_build_types = get_types_dict(task_build_classes)
+
 print(f'Finished loading image builder implementations: {task_build_classes}')
 
 print(f'Loading service image builder implementations..')
 
 # TODO: add configuration for user service image builders package
-service_builders_package = 'rainbow/build/service'
-user_service_builders_package = 'TODO: user_service_builders_package'
+service_builders_package = 'rainbow.build.service'
+# user_service_builders_package = 'TODO: user_service_builders_package'
 
 service_build_classes = class_util.find_subclasses_in_packages(
-    [service_builders_package, user_service_builders_package],
+    [service_builders_package],
     ServiceImageBuilderMixin)
 
+service_build_types = get_types_dict(service_build_classes)
 print(f'Finished loading service image builder implementations: {service_build_classes}')
diff --git a/rainbow-cli b/rainbow/core/environment.py
old mode 100755
new mode 100644
similarity index 50%
copy from rainbow-cli
copy to rainbow/core/environment.py
index 4f16b4e..27f4d41
--- a/rainbow-cli
+++ b/rainbow/core/environment.py
@@ -1,5 +1,3 @@
-#!/usr/bin/env python3
-
 #
 # Licensed to the Apache Software Foundation (ASF) under one
 # or more contributor license agreements.  See the NOTICE file
@@ -17,24 +15,24 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-import os
-
-import click
-
-from rainbow.build import build_rainbows
 
+import os
 
-@click.group()
-def cli():
-    pass
+DEFAULT_DAGS_ZIP_NAME = 'rainbows.zip'
+DEFAULT_RAINBOW_HOME = os.path.expanduser('~/rainbow_home')
+DEFAULT_RAINBOWS_SUBDIR = "rainbows"
+RAINBOW_HOME_PARAM_NAME = "RAINBOW_HOME"
 
 
-@cli.command()
-@click.option('--path', default=os.getcwd(), help='Build within this path.')
-def build(path):
-    click.echo(f'Building rainbows in {path}')
-    build_rainbows.build_rainbows(path)
+def get_rainbow_home():
+    if not os.environ.get(RAINBOW_HOME_PARAM_NAME):
+        print("no environment parameter called RAINBOW_HOME detected")
+        print(f"registering {DEFAULT_RAINBOW_HOME} as the RAINBOW_HOME directory")
+        os.environ[RAINBOW_HOME_PARAM_NAME] = DEFAULT_RAINBOW_HOME
+    return os.environ.get(RAINBOW_HOME_PARAM_NAME, DEFAULT_RAINBOW_HOME)
 
 
-if __name__ == '__main__':
-    cli()
+def get_dags_dir():
+    # if we are inside airflow, we will take it from the configured dags folder
+    base_dir = os.environ.get("AIRFLOW__CORE__DAGS_FOLDER", get_rainbow_home())
+    return os.path.join(base_dir, DEFAULT_RAINBOWS_SUBDIR)
diff --git a/rainbow/core/util/class_util.py b/rainbow/core/util/class_util.py
index e083477..129c61a 100644
--- a/rainbow/core/util/class_util.py
+++ b/rainbow/core/util/class_util.py
@@ -17,9 +17,7 @@
 # under the License.
 
 import importlib.util
-import inspect
-import os
-import sys
+import pkgutil
 
 
 def find_subclasses_in_packages(packages, parent_class):
@@ -27,28 +25,42 @@ def find_subclasses_in_packages(packages, parent_class):
     Finds all subclasses of given parent class within given packages
     :return: map of module ref -> class
     """
-    classes = {}
-
-    for py_path in [a for a in sys.path]:
-        for root, directories, files in os.walk(py_path):
-            if any(package in root for package in packages):
-                for file in files:
-                    file_path = os.path.join(root, file)
-                    if file.endswith('.py') and '__pycache__' not in file_path:
-                        spec = importlib.util.spec_from_file_location(file[:-3], file_path)
-                        mod = importlib.util.module_from_spec(spec)
-                        spec.loader.exec_module(mod)
-                        for name, obj in inspect.getmembers(mod):
-                            if inspect.isclass(obj) and not obj.__name__.endswith('Mixin'):
-                                module_name = mod.__name__
-                                class_name = obj.__name__
-                                parent_module = root[len(py_path) + 1:].replace('/', '.')
-                                module = parent_module.replace('airflow.dags.', '') + \
-                                         '.' + module_name
-                                clazz = __get_class(module, class_name)
-                                if issubclass(clazz, parent_class):
-                                    classes.update({module_name: clazz})
-    return classes
+    module_content = {}
+    for p in packages:
+        module_content.update(import_module(p))
+
+    subclasses = set()
+    work = [parent_class]
+    while work:
+        parent = work.pop()
+        for child in parent.__subclasses__():
+            if child not in subclasses:
+                work.append(child)
+                # verify that the found class is in the relevant module
+                for p in packages:
+                    if p in child.__module__:
+                        subclasses.add(child)
+                        break
+
+    result = {sc.__module__ + "." + sc.__name__: sc for sc in subclasses}
+    return result
+
+
+def import_module(package, recrsive=True):
+    """ Import all submodules of a module, recursively, including subpackages
+    :param package: package (name or actual module)
+    :type package: str | module
+    :rtype: dict[str, types.ModuleType]
+    """
+    if isinstance(package, str):
+        package = importlib.import_module(package)
+    results = {}
+    for loader, name, is_pkg in pkgutil.walk_packages(package.__path__):
+        full_name = package.__name__ + '.' + name
+        results[full_name] = importlib.import_module(full_name)
+        if recrsive and is_pkg:
+            results.update(import_module(full_name))
+    return results
 
 
 def __get_class(the_module, the_class):
diff --git a/rainbow/core/util/files_util.py b/rainbow/core/util/files_util.py
index b1d1daf..4a03020 100644
--- a/rainbow/core/util/files_util.py
+++ b/rainbow/core/util/files_util.py
@@ -24,7 +24,7 @@ def find_config_files(path):
     print(path)
     for r, d, f in os.walk(path):
         for file in f:
-            print(os.path.basename(file))
             if os.path.basename(file) in ['rainbow.yml', 'rainbow.yaml']:
+                print(os.path.join(r, file))
                 files.append(os.path.join(r, file))
     return files
diff --git a/rainbow/core/util/files_util.py b/rainbow/runners/__init__.py
similarity index 71%
copy from rainbow/core/util/files_util.py
copy to rainbow/runners/__init__.py
index b1d1daf..217e5db 100644
--- a/rainbow/core/util/files_util.py
+++ b/rainbow/runners/__init__.py
@@ -15,16 +15,3 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-
-import os
-
-
-def find_config_files(path):
-    files = []
-    print(path)
-    for r, d, f in os.walk(path):
-        for file in f:
-            print(os.path.basename(file))
-            if os.path.basename(file) in ['rainbow.yml', 'rainbow.yaml']:
-                files.append(os.path.join(r, file))
-    return files
diff --git a/rainbow/runners/airflow/config/__init__.py b/rainbow/runners/airflow/config/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/rainbow-cli b/rainbow/runners/airflow/config/standalone_variable_backend.py
old mode 100755
new mode 100644
similarity index 64%
copy from rainbow-cli
copy to rainbow/runners/airflow/config/standalone_variable_backend.py
index 4f16b4e..d7df06c
--- a/rainbow-cli
+++ b/rainbow/runners/airflow/config/standalone_variable_backend.py
@@ -1,5 +1,3 @@
-#!/usr/bin/env python3
-
 #
 # Licensed to the Apache Software Foundation (ASF) under one
 # or more contributor license agreements.  See the NOTICE file
@@ -18,23 +16,20 @@
 # specific language governing permissions and limitations
 # under the License.
 import os
+from os import environ
 
-import click
-
-from rainbow.build import build_rainbows
-
+from airflow.models import Variable
 
-@click.group()
-def cli():
-    pass
+RAINBOW_STAND_ALONE_MODE_KEY = "RAINBOW_STAND_ALONE_MODE"
 
 
-@cli.command()
-@click.option('--path', default=os.getcwd(), help='Build within this path.')
-def build(path):
-    click.echo(f'Building rainbows in {path}')
-    build_rainbows.build_rainbows(path)
+def get_variable(key, default_val):
+    if rainbow_local_mode():
+        return os.environ.get(key, default_val)
+    else:
+        return Variable.get(key, default_var=default_val)
 
 
-if __name__ == '__main__':
-    cli()
+def rainbow_local_mode():
+    stand_alone = environ.get(RAINBOW_STAND_ALONE_MODE_KEY, "False")
+    return stand_alone.strip().lower() == "true"
diff --git a/rainbow/runners/airflow/dag/rainbow_dags.py b/rainbow/runners/airflow/dag/rainbow_dags.py
index d5e3be1..730fd03 100644
--- a/rainbow/runners/airflow/dag/rainbow_dags.py
+++ b/rainbow/runners/airflow/dag/rainbow_dags.py
@@ -17,11 +17,13 @@
 # under the License.
 
 from datetime import datetime, timedelta
+from os import environ
 
 import yaml
 from airflow import DAG
 from airflow.models import Variable
 
+from rainbow.core import environment
 from rainbow.core.util import class_util
 from rainbow.core.util import files_util
 from rainbow.runners.airflow.model.task import Task
@@ -33,13 +35,13 @@ __DEPENDS_ON_PAST = 'depends_on_past'
 
 def register_dags(configs_path):
     """
-    Registers pipelines in rainbow yml files found in given path (recursively) as airflow DAGs.
+    TODO: doc for register_dags
     """
-
+    print(f'Registering DAG from path: {configs_path}')
     config_files = files_util.find_config_files(configs_path)
 
     dags = []
-
+    print(f'found {len(config_files)} in path: {configs_path}')
     for config_file in config_files:
         print(f'Registering DAG for file: {config_file}')
 
@@ -83,28 +85,35 @@ def register_dags(configs_path):
                 job_end_task = JobEndTask(dag, pipeline_name, parent, pipeline, 'all_done')
                 job_end_task.apply_task_to_dag()
 
-                print(f'{pipeline_name}: {dag.tasks}')
+                print(f'registered DAG {dag.dag_id}: {dag.tasks}')
 
                 globals()[pipeline_name] = dag
-
                 dags.append(dag)
 
-            return dags
+    return dags
 
 
 print(f'Loading task implementations..')
 
 # TODO: add configuration for user tasks package
-task_package = 'rainbow/runners/airflow/tasks'
+impl_packages = 'rainbow.runners.airflow.tasks'
 user_task_package = 'TODO: user_tasks_package'
 
-task_classes = class_util.find_subclasses_in_packages([task_package, user_task_package], Task)
+task_classes = class_util.find_subclasses_in_packages([impl_packages], Task)
+
+
+def tasks_by_rainbow_name(task_classes):
+    return {full_name.replace(impl_packages, '').replace(clzz.__name__, '')[1:-1]: clzz
+            for (full_name, clzz) in task_classes.items()}
+
+
+tasks_by_rainbow_name = tasks_by_rainbow_name(task_classes)
 
-print(f'Finished loading task implementations: {task_classes}')
+print(f'Finished loading task implementations: {tasks_by_rainbow_name}')
 
 
 def get_task_class(task_type):
-    return task_classes[task_type]
+    return tasks_by_rainbow_name[task_type]
 
 
-register_dags(Variable.get('rainbows_dir'))
+register_dags(environment.get_dags_dir())
diff --git a/rainbow/runners/airflow/operators/job_status_operator.py b/rainbow/runners/airflow/operators/job_status_operator.py
index dc318e5..ae9382a 100644
--- a/rainbow/runners/airflow/operators/job_status_operator.py
+++ b/rainbow/runners/airflow/operators/job_status_operator.py
@@ -38,7 +38,7 @@ class JobStatusOperator(BaseOperator):
             *args, **kwargs):
         super().__init__(*args, **kwargs)
         self.backends = backends
-        self.cloudwatch = CloudWatchHook()
+        self.cloudwatch = None
 
     def execute(self, context):
         for backend in self.backends:
@@ -52,12 +52,17 @@ class JobStatusOperator(BaseOperator):
         raise NotImplementedError
 
     def send_metric_to_cloudwatch(self, metric):
-        self.cloudwatch.put_metric_data(metric)
+        self.get_cloudwatch().put_metric_data(metric)
 
     report_functions = {
         'cloudwatch': send_metric_to_cloudwatch
     }
 
+    def get_cloudwatch(self):
+        if not self.cloudwatch:
+            self.cloudwatch = CloudWatchHook()
+        return self.cloudwatch
+
 
 class JobStartOperator(JobStatusOperator):
     ui_color = '#c5e5e8'
diff --git a/rainbow/runners/airflow/tasks/defaults/job_end.py~HEAD b/rainbow/runners/airflow/tasks/defaults/job_end.py~HEAD
deleted file mode 100644
index e177ccc..0000000
--- a/rainbow/runners/airflow/tasks/defaults/job_end.py~HEAD
+++ /dev/null
@@ -1,44 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-from rainbow.runners.airflow.operators.job_status_operator import JobEndOperator
-from rainbow.runners.airflow.tasks.defaults.default_task import DefaultTask
-
-
-class JobEndTask(DefaultTask):
-    """
-      Job end task. Reports job end metrics.
-    """
-
-    def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
-        super().__init__(dag, pipeline_name, parent, config, trigger_rule)
-
-    def apply_task_to_dag(self):
-        job_end_task = JobEndOperator(
-            task_id='end',
-            namespace=self.metrics_namespace,
-            application_name=self.pipeline_name,
-            backends=self.metrics_backends,
-            dag=self.dag,
-            trigger_rule=self.trigger_rule
-        )
-
-        if self.parent:
-            self.parent.set_downstream(job_end_task)
-
-        return job_end_task
diff --git a/rainbow/runners/airflow/tasks/defaults/job_start.py~HEAD b/rainbow/runners/airflow/tasks/defaults/job_start.py~HEAD
deleted file mode 100644
index e196919..0000000
--- a/rainbow/runners/airflow/tasks/defaults/job_start.py~HEAD
+++ /dev/null
@@ -1,43 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-from rainbow.runners.airflow.operators.job_status_operator import JobStartOperator
-from rainbow.runners.airflow.tasks.defaults.default_task import DefaultTask
-
-
-class JobStartTask(DefaultTask):
-    """
-    Job start task. Reports job start metrics.
-    """
-
-    def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
-        super().__init__(dag, pipeline_name, parent, config, trigger_rule)
-
-    def apply_task_to_dag(self):
-        job_start_task = JobStartOperator(
-            task_id='start',
-            namespace=self.metrics_namespace,
-            application_name=self.pipeline_name,
-            backends=self.metrics_backends,
-            dag=self.dag,
-            trigger_rule=self.trigger_rule
-        )
-
-        if self.parent:
-            self.parent.set_downstream(job_start_task)
-
-        return job_start_task
diff --git a/rainbow/runners/airflow/tasks/python.py b/rainbow/runners/airflow/tasks/python.py
index 8bd11cf..d5d4e00 100644
--- a/rainbow/runners/airflow/tasks/python.py
+++ b/rainbow/runners/airflow/tasks/python.py
@@ -20,6 +20,7 @@ import json
 from airflow.models import Variable
 from airflow.operators.dummy_operator import DummyOperator
 
+from rainbow.runners.airflow.config.standalone_variable_backend import get_variable
 from rainbow.runners.airflow.model import task
 from rainbow.runners.airflow.operators.kubernetes_pod_operator_with_input_output import \
     KubernetesPodOperatorWithInputAndOutput, \
@@ -143,10 +144,10 @@ class PythonTask(task.Task):
 
     def __kubernetes_kwargs(self):
         kubernetes_kwargs = {
-            'namespace': Variable.get('kubernetes_namespace', default_var='default'),
+            'namespace': get_variable('kubernetes_namespace', default_val='default'),
             'name': self.task_name.replace('_', '-'),
-            'in_cluster': Variable.get('in_kubernetes_cluster', default_var=False),
-            'image_pull_policy': Variable.get('image_pull_policy', default_var='IfNotPresent'),
+            'in_cluster': get_variable('in_kubernetes_cluster', default_val=False),
+            'image_pull_policy': get_variable('image_pull_policy', default_val='IfNotPresent'),
             'get_logs': True,
             'env_vars': self.env_vars,
             'do_xcom_push': True,
@@ -162,9 +163,9 @@ class PythonTask(task.Task):
         env_vars = {}
         if 'env_vars' in self.config:
             env_vars = self.config['env_vars']
-        airflow_configuration_variable = Variable.get(
+        airflow_configuration_variable = get_variable(
             f'''{self.pipeline_name}_dag_configuration''',
-            default_var=None)
+            default_val=None)
         if airflow_configuration_variable:
             airflow_configs = json.loads(airflow_configuration_variable)
             environment_variables_key = f'''{self.pipeline_name}_environment_variables'''
diff --git a/rainbow/runners/airflow/tasks/spark.py b/rainbow/runners/airflow/tasks/spark.py
index 9a46dd4..68cfac0 100644
--- a/rainbow/runners/airflow/tasks/spark.py
+++ b/rainbow/runners/airflow/tasks/spark.py
@@ -18,7 +18,6 @@
 
 from rainbow.runners.airflow.model import task
 
-
 class SparkTask(task.Task):
     """
     Executes a Spark application.
diff --git a/requirements-airflow.txt b/requirements-airflow.txt
new file mode 100644
index 0000000..5191d2a
--- /dev/null
+++ b/requirements-airflow.txt
@@ -0,0 +1,5 @@
+click==7.1.1
+pyyaml
+boto3==1.12.10
+botocore==1.15.21
+kubernetes
\ No newline at end of file
diff --git a/requirements.txt b/requirements.txt
index d7eec03..3fef3a5 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -4,7 +4,7 @@ docker-pycreds==0.4.0
 click==7.1.1
 Flask==1.1.1
 pyyaml
-statsd
-botocore
-boto3
+boto3==1.12.10
+botocore==1.15.21
 kubernetes
+
diff --git a/scripts/docker-compose.yml b/scripts/docker-compose.yml
new file mode 100644
index 0000000..b6a2dc3
--- /dev/null
+++ b/scripts/docker-compose.yml
@@ -0,0 +1,40 @@
+    version: '3.7'
+    services:
+        postgres:
+            image: postgres:9.6
+            environment:
+                - POSTGRES_USER=airflow
+                - POSTGRES_PASSWORD=airflow
+                - POSTGRES_DB=
+
+            ports:
+                - "5432:5432"
+            logging:
+                options:
+                    max-size: 10m
+                    max-file: "3"
+
+        webserver:
+            image: puckel/docker-airflow:1.10.9
+            restart: always
+            depends_on:
+                - postgres
+            environment:
+                - LOAD_EX=n
+                - EXECUTOR=Local
+                - AIRFLOW__CORE__DAGS_FOLDER=/usr/local/airflow/dags
+                - AIRFLOW__WEBSERVER__WORKERS=1
+            logging:
+                options:
+                    max-size: 10m
+                    max-file: "3"
+            volumes:
+                - ${RAINBOW_HOME}:/usr/local/airflow/dags
+            ports:
+                - "8080:8080"
+            command: webserver
+            healthcheck:
+                test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
+                interval: 30s
+                timeout: 30s
+                retries: 3
diff --git a/scripts/package.sh b/scripts/package.sh
new file mode 100755
index 0000000..f4083e4
--- /dev/null
+++ b/scripts/package.sh
@@ -0,0 +1,69 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required bgit y applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+echo $1
+target_path="$1"
+
+echo "running from " $(PWD)
+echo "target path for rainbow zip file is " $target_path
+
+echo "cleaning up the temp dirs $TMPDIR/rainbow_build"
+rm -rf $TMPDIR/rainbow_build-*/
+
+tmp_dir=$(mktemp -d -t rainbow_build-)
+echo "creating temp directory $tmp_dir"
+
+docker_build_dir=$tmp_dir/docker_build
+mkdir -p $docker_build_dir
+echo "docker build directory :"$docker_build_dir
+
+mkdir $docker_build_dir/"zip_content"
+mkdir $docker_build_dir/"dags"
+
+#copy the content of the user project into the build folder
+rsync -a --exclude 'venv' $(PWD)/ $docker_build_dir/zip_content/
+
+# perform installation of external pacakges (framework-requirements and user-requirements)
+# this is done inside a docker to 1) avoid requiring the user to install stuff, and 2) to create a platform-compatible
+# package (install the native libraries in a flavour suitable for the docker in which airflow runs, and not user machine)
+docker stop rainbow_build
+docker rm rainbow_build
+docker run --name rainbow_build -v /private/"$docker_build_dir":/home/rainbow/tmp --entrypoint="" -u 0 \
+       puckel/docker-airflow:1.10.9 /bin/bash -c "apt-get update && apt-get install -y wget && apt-get install -y git &&
+       cd /home/rainbow/tmp/zip_content &&
+       wget https://raw.githubusercontent.com/Natural-Intelligence/rainbow/rainbow_local_mode/rainbow/runners/airflow/dag/rainbow_dags.py &&
+       wget https://raw.githubusercontent.com/Natural-Intelligence/rainbow/rainbow_local_mode/requirements-airflow.txt &&
+       wget https://raw.githubusercontent.com/Natural-Intelligence/rainbow/rainbow_local_mode/scripts/docker-compose.yml &&
+       pip install --no-deps --target=\"/home/rainbow/tmp/zip_content\" git+https://github.com/Natural-Intelligence/rainbow.git@rainbow_local_mode &&
+       pip install --target=\"/home/rainbow/tmp/zip_content\" -r /home/rainbow/tmp/zip_content/requirements-airflow.txt &&
+       pip install --target=\"/home/rainbow/tmp/zip_content\" -r /home/rainbow/tmp/zip_content/requirements.txt"
+
+docker stop rainbow_build
+docker rm rainbow_build
+
+# zip the content per https://airflow.apache.org/docs/stable/concepts.html#packaged-dags
+cd $docker_build_dir/zip_content
+mv docker-compose.yml $target_path
+rm __init__.py
+
+zip -r ../dags/rainbows.zip .
+cp ../dags/rainbows.zip $target_path
+
+
+
diff --git a/scripts/rainbow b/scripts/rainbow
new file mode 100755
index 0000000..1d5f65e
--- /dev/null
+++ b/scripts/rainbow
@@ -0,0 +1,87 @@
+#!/usr/bin/env python3
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required bgit y applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import os
+import shutil
+import site
+import sys
+
+import click
+from rainbow.build import build_rainbows
+import subprocess
+from rainbow.core import environment
+from rainbow.core.util import files_util
+
+
+@click.group()
+def cli():
+    pass
+
+
+def docker_is_running():
+    try:
+        return not subprocess.check_output("docker info >/dev/null 2>&1", shell=True)
+    except subprocess.CalledProcessError as e:
+        msg = "Docker is not running. Please start docker service on your machine\n"
+        sys.stderr.write(f"ERROR: {msg}")
+        raise RuntimeError(msg)
+
+
+@cli.command("build", short_help="builds dockers from your business logic")
+@click.option('--path', default=os.getcwd(), help='Build within this path.')
+def build(path):
+    click.echo(f'Building rainbows in {path}')
+    if docker_is_running():
+        build_rainbows.build_rainbows(path)
+
+
+def deploy_rainbow_core_internal():
+    click.echo("WARN: refreshing rainbow core package")
+    rainbow_home = environment.get_rainbow_home()
+    subprocess.call([f'package.sh {rainbow_home}'], shell=True)
+
+
+@cli.command("deploy", short_help="deploys your rainbow.yaml files to $RAINBOW_HOME folder")
+@click.option('--path', default=os.getcwd(), help="folder containing rainbow.yaml files")
+def deploy_rainbows(path):
+    click.echo("deploying rainbow yaml files")
+    rainbow_home = environment.get_rainbow_home()
+    os.makedirs(rainbow_home, exist_ok=True)
+    os.makedirs(environment.get_dags_dir(), exist_ok=True)
+    deploy_rainbow_core_internal()
+    config_files = files_util.find_config_files(path)
+    for config_file in config_files:
+        click.echo(f"deploying rainbow file: {config_file}")
+        yml_name = os.path.basename(config_file)
+        target_yml_name = os.path.join(environment.get_dags_dir(), yml_name)
+        shutil.copyfile(config_file, target_yml_name)
+
+
+@cli.command("start", short_help="starts a local airflow in docker compose. should be run after deploy. " +
+                                 "Make sure docker is running on your machine")
+def start():
+    if docker_is_running():
+        # initialize rainbow home by default
+        environment.get_rainbow_home()
+        result = subprocess.call([f'docker-compose -f "{environment.get_rainbow_home()}/docker-compose.yml" up'],
+                                 env=os.environ, shell=True)
+
+
+if __name__ == '__main__':
+    cli()
diff --git a/setup.py b/setup.py
new file mode 100644
index 0000000..c102ae3
--- /dev/null
+++ b/setup.py
@@ -0,0 +1,48 @@
+#!/usr/bin/env python3
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required bgit y applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import setuptools
+from setuptools import setup
+
+with open("README.md", "r") as fh:
+    long_description = fh.read()
+
+with open('requirements.txt') as f:
+    requirements = f.read().splitlines()
+    print(requirements)
+
+setuptools.setup(
+    name="rainbow",
+    version="0.0.1",
+    author="Rainbow team",
+    description="A package for authoring and deploying machine learning workflows",
+    long_description=long_description,
+    long_description_content_type="text/markdown",
+    url="https://github.com/Natural-Intelligence/rainbow",
+    packages=setuptools.find_packages(),
+    classifiers=[
+        "Programming Language :: Python :: 3",
+        "License :: Apache 2.0",
+        "Operating System :: OS Independent",
+    ],
+    python_requires='>=3.6',
+    install_requires=requirements,
+    scripts=['scripts/rainbow', 'scripts/package.sh']
+)
diff --git a/tests/runners/airflow/rainbow/requirements.txt b/tests/runners/airflow/rainbow/requirements.txt
new file mode 100644
index 0000000..037103e
--- /dev/null
+++ b/tests/runners/airflow/rainbow/requirements.txt
@@ -0,0 +1 @@
+pillow
\ No newline at end of file
diff --git a/tests/util/test_class_utils.py b/tests/util/test_class_utils.py
new file mode 100644
index 0000000..0deeff6
--- /dev/null
+++ b/tests/util/test_class_utils.py
@@ -0,0 +1,32 @@
+from unittest import TestCase
+
+from rainbow.core.util import class_util
+from tests.util.test_pkg_1.test_clazz_base import A, Z
+from tests.util.test_pkg_1.test_pkg_1_1.test_clazz_child_1 import B
+from tests.util.test_pkg_1.test_pkg_1_1.test_clazz_child_2 import C
+from tests.util.test_pkg_1.test_pkg_1_1.test_pkg_1_1_1.test_clazz_leaf_1 import F, D, E
+from tests.util.test_pkg_1.test_pkg_1_1.test_pkg_1_1_2.test_clazz_leaf_2 import G, H
+
+
+class Test(TestCase):
+    def test_find_full_hierarchy_from_root(self):
+        expected_set = set([B, C, D, E, H, Z])
+        self.hierarchy_check(A, expected_set)
+
+    def test_find_full_hierarchy_mid_tree_in_package(self):
+        expected_set = set([G])
+        self.hierarchy_check(F, expected_set)
+
+    def test_leaf_class(self):
+        expected_set = set()
+        self.hierarchy_check(G, expected_set)
+
+    def hierarchy_check(self, clazz, expected_set):
+        pkg_root = 'tests.util.test_pkg_1'
+        full_tree = class_util.find_subclasses_in_packages(
+            [pkg_root],
+            clazz)
+
+        res_set = set()
+        res_set.update(full_tree.values())
+        self.assertEqual(res_set, expected_set)
diff --git a/rainbow/core/util/files_util.py b/tests/util/test_pkg_1/__init__.py
similarity index 71%
copy from rainbow/core/util/files_util.py
copy to tests/util/test_pkg_1/__init__.py
index b1d1daf..217e5db 100644
--- a/rainbow/core/util/files_util.py
+++ b/tests/util/test_pkg_1/__init__.py
@@ -15,16 +15,3 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-
-import os
-
-
-def find_config_files(path):
-    files = []
-    print(path)
-    for r, d, f in os.walk(path):
-        for file in f:
-            print(os.path.basename(file))
-            if os.path.basename(file) in ['rainbow.yml', 'rainbow.yaml']:
-                files.append(os.path.join(r, file))
-    return files
diff --git a/rainbow/core/util/files_util.py b/tests/util/test_pkg_1/test_clazz_base.py
similarity index 71%
copy from rainbow/core/util/files_util.py
copy to tests/util/test_pkg_1/test_clazz_base.py
index b1d1daf..3e7c523 100644
--- a/rainbow/core/util/files_util.py
+++ b/tests/util/test_pkg_1/test_clazz_base.py
@@ -16,15 +16,10 @@
 # specific language governing permissions and limitations
 # under the License.
 
-import os
 
+class A:
+    pass
 
-def find_config_files(path):
-    files = []
-    print(path)
-    for r, d, f in os.walk(path):
-        for file in f:
-            print(os.path.basename(file))
-            if os.path.basename(file) in ['rainbow.yml', 'rainbow.yaml']:
-                files.append(os.path.join(r, file))
-    return files
+
+class Z(A):
+    pass
diff --git a/rainbow/core/util/files_util.py b/tests/util/test_pkg_1/test_pkg_1_1/__init__.py
similarity index 71%
copy from rainbow/core/util/files_util.py
copy to tests/util/test_pkg_1/test_pkg_1_1/__init__.py
index b1d1daf..217e5db 100644
--- a/rainbow/core/util/files_util.py
+++ b/tests/util/test_pkg_1/test_pkg_1_1/__init__.py
@@ -15,16 +15,3 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-
-import os
-
-
-def find_config_files(path):
-    files = []
-    print(path)
-    for r, d, f in os.walk(path):
-        for file in f:
-            print(os.path.basename(file))
-            if os.path.basename(file) in ['rainbow.yml', 'rainbow.yaml']:
-                files.append(os.path.join(r, file))
-    return files
diff --git a/rainbow/core/util/files_util.py b/tests/util/test_pkg_1/test_pkg_1_1/test_clazz_child_1.py
similarity index 71%
copy from rainbow/core/util/files_util.py
copy to tests/util/test_pkg_1/test_pkg_1_1/test_clazz_child_1.py
index b1d1daf..6fe2e9a 100644
--- a/rainbow/core/util/files_util.py
+++ b/tests/util/test_pkg_1/test_pkg_1_1/test_clazz_child_1.py
@@ -16,15 +16,13 @@
 # specific language governing permissions and limitations
 # under the License.
 
-import os
 
+from tests.util.test_pkg_1.test_clazz_base import A
 
-def find_config_files(path):
-    files = []
-    print(path)
-    for r, d, f in os.walk(path):
-        for file in f:
-            print(os.path.basename(file))
-            if os.path.basename(file) in ['rainbow.yml', 'rainbow.yaml']:
-                files.append(os.path.join(r, file))
-    return files
+
+class B(A):
+    pass
+
+
+class M:
+    pass
diff --git a/rainbow/core/util/files_util.py b/tests/util/test_pkg_1/test_pkg_1_1/test_clazz_child_2.py
similarity index 71%
copy from rainbow/core/util/files_util.py
copy to tests/util/test_pkg_1/test_pkg_1_1/test_clazz_child_2.py
index b1d1daf..e279c7a 100644
--- a/rainbow/core/util/files_util.py
+++ b/tests/util/test_pkg_1/test_pkg_1_1/test_clazz_child_2.py
@@ -16,15 +16,9 @@
 # specific language governing permissions and limitations
 # under the License.
 
-import os
 
+from tests.util.test_pkg_1.test_clazz_base import A
 
-def find_config_files(path):
-    files = []
-    print(path)
-    for r, d, f in os.walk(path):
-        for file in f:
-            print(os.path.basename(file))
-            if os.path.basename(file) in ['rainbow.yml', 'rainbow.yaml']:
-                files.append(os.path.join(r, file))
-    return files
+
+class C(A):
+    pass
diff --git a/rainbow/core/util/files_util.py b/tests/util/test_pkg_1/test_pkg_1_1/test_pkg_1_1_1/__init__.py
similarity index 71%
copy from rainbow/core/util/files_util.py
copy to tests/util/test_pkg_1/test_pkg_1_1/test_pkg_1_1_1/__init__.py
index b1d1daf..217e5db 100644
--- a/rainbow/core/util/files_util.py
+++ b/tests/util/test_pkg_1/test_pkg_1_1/test_pkg_1_1_1/__init__.py
@@ -15,16 +15,3 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-
-import os
-
-
-def find_config_files(path):
-    files = []
-    print(path)
-    for r, d, f in os.walk(path):
-        for file in f:
-            print(os.path.basename(file))
-            if os.path.basename(file) in ['rainbow.yml', 'rainbow.yaml']:
-                files.append(os.path.join(r, file))
-    return files
diff --git a/rainbow-cli b/tests/util/test_pkg_1/test_pkg_1_1/test_pkg_1_1_1/test_clazz_leaf_1.py
old mode 100755
new mode 100644
similarity index 69%
rename from rainbow-cli
rename to tests/util/test_pkg_1/test_pkg_1_1/test_pkg_1_1_1/test_clazz_leaf_1.py
index 4f16b4e..2aba50e
--- a/rainbow-cli
+++ b/tests/util/test_pkg_1/test_pkg_1_1/test_pkg_1_1_1/test_clazz_leaf_1.py
@@ -1,5 +1,3 @@
-#!/usr/bin/env python3
-
 #
 # Licensed to the Apache Software Foundation (ASF) under one
 # or more contributor license agreements.  See the NOTICE file
@@ -17,24 +15,19 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-import os
 
-import click
 
-from rainbow.build import build_rainbows
+from tests.util.test_pkg_1.test_pkg_1_1.test_clazz_child_1 import B
+from tests.util.test_pkg_1.test_pkg_1_1.test_clazz_child_2 import C
 
 
-@click.group()
-def cli():
+class D(B):
     pass
 
 
-@cli.command()
-@click.option('--path', default=os.getcwd(), help='Build within this path.')
-def build(path):
-    click.echo(f'Building rainbows in {path}')
-    build_rainbows.build_rainbows(path)
+class E(C):
+    pass
 
 
-if __name__ == '__main__':
-    cli()
+class F:
+    pass
diff --git a/tests/util/test_pkg_1/test_pkg_1_1/test_pkg_1_1_2/__init__.py b/tests/util/test_pkg_1/test_pkg_1_1/test_pkg_1_1_2/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/tests/util/test_pkg_1/test_pkg_1_1/test_pkg_1_1_2/test_clazz_leaf_2.py b/tests/util/test_pkg_1/test_pkg_1_1/test_pkg_1_1_2/test_clazz_leaf_2.py
new file mode 100644
index 0000000..a328f33
--- /dev/null
+++ b/tests/util/test_pkg_1/test_pkg_1_1/test_pkg_1_1_2/test_clazz_leaf_2.py
@@ -0,0 +1,8 @@
+from tests.util.test_pkg_1.test_pkg_1_1.test_pkg_1_1_1.test_clazz_leaf_1 import F, E
+
+
+class G(F):
+    pass
+
+class H(E):
+    pass


[incubator-liminal] 20/43: Performance improvement for class_util

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 2ef72ec83223f4a8e292d73fd7cd0782b17b968d
Author: aviemzur <av...@gmail.com>
AuthorDate: Wed Mar 18 09:52:34 2020 +0200

    Performance improvement for class_util
---
 rainbow/core/util/class_util.py | 37 +++++++++++++++++--------------------
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/rainbow/core/util/class_util.py b/rainbow/core/util/class_util.py
index 31e1806..e083477 100644
--- a/rainbow/core/util/class_util.py
+++ b/rainbow/core/util/class_util.py
@@ -31,26 +31,23 @@ def find_subclasses_in_packages(packages, parent_class):
 
     for py_path in [a for a in sys.path]:
         for root, directories, files in os.walk(py_path):
-            for file in files:
-                file_path = os.path.join(root, file)
-                if any(p in file_path for p in packages) \
-                        and file.endswith('.py') \
-                        and '__pycache__' not in file_path:
-
-                    spec = importlib.util.spec_from_file_location(file[:-3], file_path)
-                    mod = importlib.util.module_from_spec(spec)
-                    spec.loader.exec_module(mod)
-                    for name, obj in inspect.getmembers(mod):
-                        if inspect.isclass(obj) and not obj.__name__.endswith('Mixin'):
-                            module_name = mod.__name__
-                            class_name = obj.__name__
-                            parent_module = root[len(py_path) + 1:].replace('/', '.')
-                            module = parent_module.replace('airflow.dags.', '') + \
-                                     '.' + module_name
-                            clazz = __get_class(module, class_name)
-                            if issubclass(clazz, parent_class):
-                                classes.update({module_name: clazz})
-
+            if any(package in root for package in packages):
+                for file in files:
+                    file_path = os.path.join(root, file)
+                    if file.endswith('.py') and '__pycache__' not in file_path:
+                        spec = importlib.util.spec_from_file_location(file[:-3], file_path)
+                        mod = importlib.util.module_from_spec(spec)
+                        spec.loader.exec_module(mod)
+                        for name, obj in inspect.getmembers(mod):
+                            if inspect.isclass(obj) and not obj.__name__.endswith('Mixin'):
+                                module_name = mod.__name__
+                                class_name = obj.__name__
+                                parent_module = root[len(py_path) + 1:].replace('/', '.')
+                                module = parent_module.replace('airflow.dags.', '') + \
+                                         '.' + module_name
+                                clazz = __get_class(module, class_name)
+                                if issubclass(clazz, parent_class):
+                                    classes.update({module_name: clazz})
     return classes
 
 


[incubator-liminal] 11/43: Refactor build

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 0817e973ec27f9a25a8cacc464b0a043d8d47428
Author: aviemzur <av...@gmail.com>
AuthorDate: Thu Mar 12 11:13:58 2020 +0200

    Refactor build
---
 rainbow/{runners/airflow => }/build/__init__.py    |  0
 rainbow/build/build_rainbow.py                     | 57 +++++++++++++++
 .../airflow => }/build/python/container-setup.sh   |  0
 .../build/python/container-teardown.sh             |  0
 rainbow/core/__init__.py                           |  1 -
 .../hello_world => rainbow/core/util}/__init__.py  |  0
 rainbow/core/{__init__.py => util/files_util.py}   | 12 +++-
 rainbow/docker/python/python_image.py              | 61 +++++++++-------
 rainbow/runners/airflow/build/build_rainbow.py     | 84 ----------------------
 rainbow/runners/airflow/dag/rainbow_dags.py        | 23 +++---
 rainbow/runners/airflow/model/task.py              |  6 --
 .../airflow/tasks/create_cloudformation_stack.py   |  3 -
 .../airflow/tasks/delete_cloudformation_stack.py   |  3 -
 rainbow/runners/airflow/tasks/job_end.py           |  3 -
 rainbow/runners/airflow/tasks/job_start.py         |  3 -
 rainbow/runners/airflow/tasks/python.py            |  9 ---
 rainbow/runners/airflow/tasks/spark.py             |  3 -
 rainbow/runners/airflow/tasks/sql.py               |  3 -
 requirements.txt                                   |  2 +
 .../{tasks/hello_world => build}/__init__.py       |  0
 .../hello_world => build/python}/__init__.py       |  0
 .../airflow/build/python/test_python_image.py      | 26 ++++---
 tests/runners/airflow/build/test_build_rainbow.py  |  8 +++
 tests/runners/airflow/dag/test_rainbow_dags.py     |  2 +-
 .../{tasks/hello_world => rainbow}/__init__.py     |  0
 .../{tasks => rainbow}/hello_world/__init__.py     |  0
 .../{tasks => rainbow}/hello_world/hello_world.py  |  0
 .../runners/airflow/{dag => }/rainbow/rainbow.yml  | 24 +++++--
 tests/runners/airflow/tasks/test_python.py         | 18 +----
 29 files changed, 161 insertions(+), 190 deletions(-)

diff --git a/rainbow/runners/airflow/build/__init__.py b/rainbow/build/__init__.py
similarity index 100%
rename from rainbow/runners/airflow/build/__init__.py
rename to rainbow/build/__init__.py
diff --git a/rainbow/build/build_rainbow.py b/rainbow/build/build_rainbow.py
new file mode 100644
index 0000000..280b862
--- /dev/null
+++ b/rainbow/build/build_rainbow.py
@@ -0,0 +1,57 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import os
+
+import yaml
+
+from rainbow.core.util import files_util
+from rainbow.docker.python.python_image import PythonImage
+
+
+def build_rainbow(path):
+    """
+    TODO: doc for build_rainbow
+    """
+
+    config_files = files_util.find_config_files(path)
+
+    for config_file in config_files:
+        print(f'Building artifacts file: f{config_file}')
+
+        with open(config_file) as stream:
+            # TODO: validate config
+            config = yaml.safe_load(stream)
+
+            for pipeline in config['pipelines']:
+                for task in pipeline['tasks']:
+                    task_type = task['type']
+                    task_instance = get_build_class(task_type)()
+                    task_instance.build(base_path=os.path.dirname(config_file),
+                                        relative_source_path=task['source'],
+                                        tag=task['image'])
+
+
+# TODO: task class registry
+build_classes = {
+    'python': PythonImage
+
+}
+
+
+def get_build_class(task_type):
+    return build_classes[task_type]
diff --git a/rainbow/runners/airflow/build/python/container-setup.sh b/rainbow/build/python/container-setup.sh
similarity index 100%
rename from rainbow/runners/airflow/build/python/container-setup.sh
rename to rainbow/build/python/container-setup.sh
diff --git a/rainbow/runners/airflow/build/python/container-teardown.sh b/rainbow/build/python/container-teardown.sh
similarity index 100%
rename from rainbow/runners/airflow/build/python/container-teardown.sh
rename to rainbow/build/python/container-teardown.sh
diff --git a/rainbow/core/__init__.py b/rainbow/core/__init__.py
index 2162b08..217e5db 100644
--- a/rainbow/core/__init__.py
+++ b/rainbow/core/__init__.py
@@ -15,4 +15,3 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-# TODO: core
diff --git a/tests/runners/airflow/tasks/hello_world/__init__.py b/rainbow/core/util/__init__.py
similarity index 100%
copy from tests/runners/airflow/tasks/hello_world/__init__.py
copy to rainbow/core/util/__init__.py
diff --git a/rainbow/core/__init__.py b/rainbow/core/util/files_util.py
similarity index 76%
copy from rainbow/core/__init__.py
copy to rainbow/core/util/files_util.py
index 2162b08..e5a8e09 100644
--- a/rainbow/core/__init__.py
+++ b/rainbow/core/util/files_util.py
@@ -15,4 +15,14 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-# TODO: core
+
+import os
+
+
+def find_config_files(path):
+    files = []
+    for r, d, f in os.walk(path):
+        for file in f:
+            if file[file.rfind('.') + 1:] in ['yml', 'yaml']:
+                files.append(os.path.join(r, file))
+    return files
diff --git a/rainbow/docker/python/python_image.py b/rainbow/docker/python/python_image.py
index d66dfbe..2cd3594 100644
--- a/rainbow/docker/python/python_image.py
+++ b/rainbow/docker/python/python_image.py
@@ -18,44 +18,57 @@
 import os
 import shutil
 import tempfile
+
 import docker
 
 
-def build(source_path, tag, extra_files=None):
-    if extra_files is None:
-        extra_files = []
+class PythonImage:
+
+    def build(self, base_path, relative_source_path, tag, extra_files=None):
+        """
+        TODO: pydoc
 
-    print(f'Building image {tag}')
+        :param base_path:
+        :param relative_source_path:
+        :param tag:
+        :param extra_files:
+        :return:
+        """
 
-    temp_dir = tempfile.mkdtemp()
-    # Delete dir for shutil.copytree to work
-    os.rmdir(temp_dir)
+        if extra_files is None:
+            extra_files = []
 
-    __copy_source(source_path, temp_dir)
+        print(f'Building image {tag}')
 
-    requirements_file_path = os.path.join(temp_dir, 'requirements.txt')
-    if not os.path.exists(requirements_file_path):
-        with open(requirements_file_path, 'w'):
-            pass
+        temp_dir = tempfile.mkdtemp()
+        # Delete dir for shutil.copytree to work
+        os.rmdir(temp_dir)
 
-    dockerfile_path = os.path.join(os.path.dirname(__file__), 'Dockerfile')
+        self.__copy_source(os.path.join(base_path, relative_source_path), temp_dir)
 
-    for file in extra_files + [dockerfile_path]:
-        __copy_file(file, temp_dir)
+        requirements_file_path = os.path.join(temp_dir, 'requirements.txt')
+        if not os.path.exists(requirements_file_path):
+            with open(requirements_file_path, 'w'):
+                pass
 
-    print(temp_dir, os.listdir(temp_dir))
+        dockerfile_path = os.path.join(os.path.dirname(__file__), 'Dockerfile')
 
-    docker_client = docker.from_env()
-    docker_client.images.build(path=temp_dir, tag=tag)
+        for file in extra_files + [dockerfile_path]:
+            self.__copy_file(file, temp_dir)
 
-    docker_client.close()
+        print(temp_dir, os.listdir(temp_dir))
 
-    shutil.rmtree(temp_dir)
+        docker_client = docker.from_env()
+        docker_client.images.build(path=temp_dir, tag=tag)
 
+        docker_client.close()
 
-def __copy_source(source_path, destination_path):
-    shutil.copytree(source_path, destination_path)
+        shutil.rmtree(temp_dir)
 
+    @staticmethod
+    def __copy_source(source_path, destination_path):
+        shutil.copytree(source_path, destination_path)
 
-def __copy_file(source_file_path, destination_file_path):
-    shutil.copy2(source_file_path, destination_file_path)
+    @staticmethod
+    def __copy_file(source_file_path, destination_file_path):
+        shutil.copy2(source_file_path, destination_file_path)
diff --git a/rainbow/runners/airflow/build/build_rainbow.py b/rainbow/runners/airflow/build/build_rainbow.py
deleted file mode 100644
index 222ea5f..0000000
--- a/rainbow/runners/airflow/build/build_rainbow.py
+++ /dev/null
@@ -1,84 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-import os
-import pprint
-from datetime import datetime
-
-import yaml
-from airflow import DAG
-
-from rainbow.runners.airflow.tasks.python import PythonTask
-
-
-def build_rainbow(path):
-    """
-    TODO: doc for build_rainbow
-    """
-    files = []
-    for r, d, f in os.walk(path):
-        for file in f:
-            if file[file.rfind('.') + 1:] in ['yml', 'yaml']:
-                files.append(os.path.join(r, file))
-
-    print(files)
-
-    dags = []
-
-    for config_file in files:
-        print(f'Building artifacts file: f{config_file}')
-
-        with open(config_file) as stream:
-            # TODO: validate config
-            config = yaml.safe_load(stream)
-            pp = pprint.PrettyPrinter(indent=4)
-            # pp.pprint(config)
-
-            for pipeline in config['pipelines']:
-                parent = None
-
-                default_args = {
-                    'owner': config['owner'],
-                    'start_date': datetime.combine(pipeline['start_date'], datetime.min.time())
-                }
-                # TODO: add all relevant airflow args
-                dag = DAG(
-                    dag_id='test_dag',
-                    default_args=default_args
-                )
-
-                for task in pipeline['tasks']:
-                    task_type = task['type']
-                    task_instance = get_task_class(task_type)(
-                        dag, pipeline['pipeline'], parent if parent else None, task, 'all_success'
-                    )
-                    parent = task_instance.build()
-
-
-# TODO: task class registry
-task_classes = {
-    'python': PythonTask
-}
-
-
-def get_task_class(task_type):
-    return task_classes[task_type]
-
-
-if __name__ == '__main__':
-    register_dags('')
diff --git a/rainbow/runners/airflow/dag/rainbow_dags.py b/rainbow/runners/airflow/dag/rainbow_dags.py
index c564737..639f0cc 100644
--- a/rainbow/runners/airflow/dag/rainbow_dags.py
+++ b/rainbow/runners/airflow/dag/rainbow_dags.py
@@ -16,38 +16,30 @@
 # specific language governing permissions and limitations
 # under the License.
 
-import os
-import pprint
 from datetime import datetime
 
 import yaml
 from airflow import DAG
 
-from rainbow.runners.airflow.build import build_rainbow
+from rainbow.core.util import files_util
+from rainbow.runners.airflow.tasks.python import PythonTask
 
 
-def register_dags(path):
+def register_dags(configs_path):
     """
     TODO: doc for register_dags
     """
-    files = []
-    for r, d, f in os.walk(path):
-        for file in f:
-            if file[file.rfind('.') + 1:] in ['yml', 'yaml']:
-                files.append(os.path.join(r, file))
 
-    print(files)
+    config_files = files_util.find_config_files(configs_path)
 
     dags = []
 
-    for config_file in files:
+    for config_file in config_files:
         print(f'Registering DAG for file: f{config_file}')
 
         with open(config_file) as stream:
             # TODO: validate config
             config = yaml.safe_load(stream)
-            pp = pprint.PrettyPrinter(indent=4)
-            # pp.pprint(config)
 
             for pipeline in config['pipelines']:
                 parent = None
@@ -75,7 +67,10 @@ def register_dags(path):
     return dags
 
 
-task_classes = build_rainbow.task_classes
+# TODO: task class registry
+task_classes = {
+    'python': PythonTask
+}
 
 
 def get_task_class(task_type):
diff --git a/rainbow/runners/airflow/model/task.py b/rainbow/runners/airflow/model/task.py
index 25656ee..8163117 100644
--- a/rainbow/runners/airflow/model/task.py
+++ b/rainbow/runners/airflow/model/task.py
@@ -32,12 +32,6 @@ class Task:
         self.config = config
         self.trigger_rule = trigger_rule
 
-    def build(self):
-        """
-        Build task's artifacts.
-        """
-        raise NotImplementedError()
-
     def apply_task_to_dag(self):
         """
         Registers Airflow operator to parent task.
diff --git a/rainbow/runners/airflow/tasks/create_cloudformation_stack.py b/rainbow/runners/airflow/tasks/create_cloudformation_stack.py
index c478dc7..ca8482a 100644
--- a/rainbow/runners/airflow/tasks/create_cloudformation_stack.py
+++ b/rainbow/runners/airflow/tasks/create_cloudformation_stack.py
@@ -27,8 +27,5 @@ class CreateCloudFormationStackTask(task.Task):
     def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
         super().__init__(dag, pipeline_name, parent, config, trigger_rule)
 
-    def build(self):
-        pass
-
     def apply_task_to_dag(self):
         pass
diff --git a/rainbow/runners/airflow/tasks/delete_cloudformation_stack.py b/rainbow/runners/airflow/tasks/delete_cloudformation_stack.py
index d172284..8ac4e8b 100644
--- a/rainbow/runners/airflow/tasks/delete_cloudformation_stack.py
+++ b/rainbow/runners/airflow/tasks/delete_cloudformation_stack.py
@@ -27,8 +27,5 @@ class DeleteCloudFormationStackTask(task.Task):
     def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
         super().__init__(dag, pipeline_name, parent, config, trigger_rule)
 
-    def build(self):
-        pass
-
     def apply_task_to_dag(self):
         pass
diff --git a/rainbow/runners/airflow/tasks/job_end.py b/rainbow/runners/airflow/tasks/job_end.py
index a6c5ef2..53e1eef 100644
--- a/rainbow/runners/airflow/tasks/job_end.py
+++ b/rainbow/runners/airflow/tasks/job_end.py
@@ -27,8 +27,5 @@ class JobEndTask(task.Task):
     def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
         super().__init__(dag, pipeline_name, parent, config, trigger_rule)
 
-    def build(self):
-        pass
-
     def apply_task_to_dag(self):
         pass
diff --git a/rainbow/runners/airflow/tasks/job_start.py b/rainbow/runners/airflow/tasks/job_start.py
index 7338363..5c82e1c 100644
--- a/rainbow/runners/airflow/tasks/job_start.py
+++ b/rainbow/runners/airflow/tasks/job_start.py
@@ -27,9 +27,6 @@ class JobStartTask(task.Task):
     def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
         super().__init__(dag, pipeline_name, parent, config, trigger_rule)
 
-    def build(self):
-        pass
-
     def apply_task_to_dag(self):
         # TODO: job start task
         pass
diff --git a/rainbow/runners/airflow/tasks/python.py b/rainbow/runners/airflow/tasks/python.py
index eb00c0e..b2769c8 100644
--- a/rainbow/runners/airflow/tasks/python.py
+++ b/rainbow/runners/airflow/tasks/python.py
@@ -47,15 +47,6 @@ class PythonTask(task.Task):
         self.config_task_id = self.task_name + '_input'
         self.executors = self.__executors()
 
-    def build(self):
-        if 'source' in self.config:
-            script_dir = os.path.dirname(__file__)
-
-            python_image.build(self.config['source'], self.image, [
-                os.path.join(script_dir, '../build/python/container-setup.sh'),
-                os.path.join(script_dir, '../build/python/container-teardown.sh')
-            ])
-
     def apply_task_to_dag(self):
 
         config_task = None
diff --git a/rainbow/runners/airflow/tasks/spark.py b/rainbow/runners/airflow/tasks/spark.py
index 8846f97..5822e92 100644
--- a/rainbow/runners/airflow/tasks/spark.py
+++ b/rainbow/runners/airflow/tasks/spark.py
@@ -27,8 +27,5 @@ class SparkTask(task.Task):
     def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
         super().__init__(dag, pipeline_name, parent, config, trigger_rule)
 
-    def build(self):
-        pass
-
     def apply_task_to_dag(self):
         pass
diff --git a/rainbow/runners/airflow/tasks/sql.py b/rainbow/runners/airflow/tasks/sql.py
index 23458a9..42c02ce 100644
--- a/rainbow/runners/airflow/tasks/sql.py
+++ b/rainbow/runners/airflow/tasks/sql.py
@@ -27,8 +27,5 @@ class SparkTask(task.Task):
     def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
         super().__init__(dag, pipeline_name, parent, config, trigger_rule)
 
-    def build(self):
-        pass
-
     def apply_task_to_dag(self):
         pass
diff --git a/requirements.txt b/requirements.txt
index e952e4c..6e05d98 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,3 +1,5 @@
+botocore
+PyYAML
 docker==4.2.0
 apache-airflow==1.10.9
 docker-pycreds==0.4.0
diff --git a/tests/runners/airflow/tasks/hello_world/__init__.py b/tests/runners/airflow/build/__init__.py
similarity index 100%
copy from tests/runners/airflow/tasks/hello_world/__init__.py
copy to tests/runners/airflow/build/__init__.py
diff --git a/tests/runners/airflow/tasks/hello_world/__init__.py b/tests/runners/airflow/build/python/__init__.py
similarity index 100%
copy from tests/runners/airflow/tasks/hello_world/__init__.py
copy to tests/runners/airflow/build/python/__init__.py
diff --git a/rainbow/runners/airflow/tasks/delete_cloudformation_stack.py b/tests/runners/airflow/build/python/test_python_image.py
similarity index 58%
copy from rainbow/runners/airflow/tasks/delete_cloudformation_stack.py
copy to tests/runners/airflow/build/python/test_python_image.py
index d172284..c290720 100644
--- a/rainbow/runners/airflow/tasks/delete_cloudformation_stack.py
+++ b/tests/runners/airflow/build/python/test_python_image.py
@@ -16,19 +16,23 @@
 # specific language governing permissions and limitations
 # under the License.
 
-from rainbow.runners.airflow.model import task
+import docker
 
+from rainbow.docker.python import python_image
 
-class DeleteCloudFormationStackTask(task.Task):
-    """
-    # TODO: Deletes cloud_formation stack.
-    """
 
-    def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
-        super().__init__(dag, pipeline_name, parent, config, trigger_rule)
+def test_build(self):
+    config = self.__create_conf('my_task')
 
-    def build(self):
-        pass
+    image_name = config['image']
 
-    def apply_task_to_dag(self):
-        pass
+    python_image.build('tests/runners/airflow/rainbow', 'hello_world', 'image_name')
+
+    # TODO: elaborate test of image, validate input/output
+
+    docker_client = docker.from_env()
+    docker_client.images.get(image_name)
+    container_log = docker_client.containers.run(image_name, "python hello_world.py")
+    docker_client.close()
+
+    self.assertEqual("b'Hello world!\\n'", str(container_log))
diff --git a/tests/runners/airflow/build/test_build_rainbow.py b/tests/runners/airflow/build/test_build_rainbow.py
new file mode 100644
index 0000000..c8fec6e
--- /dev/null
+++ b/tests/runners/airflow/build/test_build_rainbow.py
@@ -0,0 +1,8 @@
+from unittest import TestCase
+
+from rainbow.build import build_rainbow
+
+
+class Test(TestCase):
+    def test_build_rainbow(self):
+        build_rainbow.build_rainbow('tests/runners/airflow/rainbow')
diff --git a/tests/runners/airflow/dag/test_rainbow_dags.py b/tests/runners/airflow/dag/test_rainbow_dags.py
index c66e3bc..2a65f31 100644
--- a/tests/runners/airflow/dag/test_rainbow_dags.py
+++ b/tests/runners/airflow/dag/test_rainbow_dags.py
@@ -6,7 +6,7 @@ import unittest
 
 class Test(TestCase):
     def test_register_dags(self):
-        dags = rainbow_dags.register_dags("tests/runners/airflow/dag/rainbow")
+        dags = rainbow_dags.register_dags('tests/runners/airflow/rainbow')
         self.assertEqual(len(dags), 1)
         # TODO: elaborate test
         pass
diff --git a/tests/runners/airflow/tasks/hello_world/__init__.py b/tests/runners/airflow/rainbow/__init__.py
similarity index 100%
copy from tests/runners/airflow/tasks/hello_world/__init__.py
copy to tests/runners/airflow/rainbow/__init__.py
diff --git a/tests/runners/airflow/tasks/hello_world/__init__.py b/tests/runners/airflow/rainbow/hello_world/__init__.py
similarity index 100%
rename from tests/runners/airflow/tasks/hello_world/__init__.py
rename to tests/runners/airflow/rainbow/hello_world/__init__.py
diff --git a/tests/runners/airflow/tasks/hello_world/hello_world.py b/tests/runners/airflow/rainbow/hello_world/hello_world.py
similarity index 100%
rename from tests/runners/airflow/tasks/hello_world/hello_world.py
rename to tests/runners/airflow/rainbow/hello_world/hello_world.py
diff --git a/tests/runners/airflow/dag/rainbow/rainbow.yml b/tests/runners/airflow/rainbow/rainbow.yml
similarity index 59%
rename from tests/runners/airflow/dag/rainbow/rainbow.yml
rename to tests/runners/airflow/rainbow/rainbow.yml
index 07afd08..fd30028 100644
--- a/tests/runners/airflow/dag/rainbow/rainbow.yml
+++ b/tests/runners/airflow/rainbow/rainbow.yml
@@ -1,4 +1,20 @@
-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
 ---
 name: MyPipeline
 owner: Bosco Albert Baracus
@@ -12,8 +28,8 @@ pipelines:
       - task: my_static_config_task
         type: python
         description: my 1st ds task
-        image: mytask1artifactid
-        source: mytask1folder
+        image: my_image
+        source: hello_world
         env_vars:
           env1: "a"
           env2: "b"
@@ -25,7 +41,7 @@ pipelines:
         type: python
         description: my 1st ds task
         image: mytask1artifactid
-        source: mytask1folder
+        source: hello_world
         env_vars:
           env1: "a"
           env2: "b"
diff --git a/tests/runners/airflow/tasks/test_python.py b/tests/runners/airflow/tasks/test_python.py
index 4bbbe9c..37a325a 100644
--- a/tests/runners/airflow/tasks/test_python.py
+++ b/tests/runners/airflow/tasks/test_python.py
@@ -46,29 +46,13 @@ class TestPythonTask(TestCase):
         self.assertIsInstance(dag_task0, ConfigurableKubernetesPodOperator)
         self.assertEqual(dag_task0.task_id, task_id)
 
-    def test_build(self):
-        config = self.__create_conf('my_task')
-
-        task0 = python.PythonTask(None, None, None, config, None)
-        task0.build()
-
-        # TODO: elaborate test of image, validate input/output
-        image_name = config['image']
-
-        docker_client = docker.from_env()
-        docker_client.images.get(image_name)
-        container_log = docker_client.containers.run(image_name, "python hello_world.py")
-        docker_client.close()
-
-        self.assertEqual("b'Hello world!\\n'", str(container_log))
-
     @staticmethod
     def __create_conf(task_id):
         return {
             'task': task_id,
             'cmd': 'foo bar',
             'image': 'my_image',
-            'source': 'tests/runners/airflow/tasks/hello_world',
+            'source': 'tests/runners/airflow/rainbow/hello_world',
             'input_type': 'my_input_type',
             'input_path': 'my_input',
             'output_path': '/my_output.json'


[incubator-liminal] 12/43: Elaborate build tests

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit eed1cd0b445f0878a9b078b7c206c54e47244bd6
Author: aviemzur <av...@gmail.com>
AuthorDate: Thu Mar 12 11:30:55 2020 +0200

    Elaborate build tests
---
 rainbow/build/build_rainbow.py                     |  4 +--
 rainbow/docker/python/python_image.py              |  2 --
 .../airflow/build/python/test_python_image.py      | 37 +++++++++++++++-------
 tests/runners/airflow/build/test_build_rainbow.py  | 21 +++++++++++-
 tests/runners/airflow/rainbow/rainbow.yml          |  4 +--
 tests/runners/airflow/tasks/test_python.py         |  2 +-
 6 files changed, 51 insertions(+), 19 deletions(-)

diff --git a/rainbow/build/build_rainbow.py b/rainbow/build/build_rainbow.py
index 280b862..7c03104 100644
--- a/rainbow/build/build_rainbow.py
+++ b/rainbow/build/build_rainbow.py
@@ -15,6 +15,7 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+
 import os
 
 import yaml
@@ -46,10 +47,9 @@ def build_rainbow(path):
                                         tag=task['image'])
 
 
-# TODO: task class registry
+# TODO: build class registry
 build_classes = {
     'python': PythonImage
-
 }
 
 
diff --git a/rainbow/docker/python/python_image.py b/rainbow/docker/python/python_image.py
index 2cd3594..ae7bc23 100644
--- a/rainbow/docker/python/python_image.py
+++ b/rainbow/docker/python/python_image.py
@@ -56,8 +56,6 @@ class PythonImage:
         for file in extra_files + [dockerfile_path]:
             self.__copy_file(file, temp_dir)
 
-        print(temp_dir, os.listdir(temp_dir))
-
         docker_client = docker.from_env()
         docker_client.images.build(path=temp_dir, tag=tag)
 
diff --git a/tests/runners/airflow/build/python/test_python_image.py b/tests/runners/airflow/build/python/test_python_image.py
index c290720..a8c02b6 100644
--- a/tests/runners/airflow/build/python/test_python_image.py
+++ b/tests/runners/airflow/build/python/test_python_image.py
@@ -15,24 +15,39 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+from unittest import TestCase
 
 import docker
 
-from rainbow.docker.python import python_image
+from rainbow.docker.python.python_image import PythonImage
 
 
-def test_build(self):
-    config = self.__create_conf('my_task')
+class TestPythonImage(TestCase):
 
-    image_name = config['image']
+    def test_build(self):
+        config = self.__create_conf('my_task')
 
-    python_image.build('tests/runners/airflow/rainbow', 'hello_world', 'image_name')
+        image_name = config['image']
 
-    # TODO: elaborate test of image, validate input/output
+        PythonImage().build('tests/runners/airflow/rainbow', 'hello_world', 'image_name')
 
-    docker_client = docker.from_env()
-    docker_client.images.get(image_name)
-    container_log = docker_client.containers.run(image_name, "python hello_world.py")
-    docker_client.close()
+        # TODO: elaborate test of image, validate input/output
 
-    self.assertEqual("b'Hello world!\\n'", str(container_log))
+        docker_client = docker.from_env()
+        docker_client.images.get(image_name)
+        container_log = docker_client.containers.run(image_name, "python hello_world.py")
+        docker_client.close()
+
+        self.assertEqual("b'Hello world!\\n'", str(container_log))
+
+    @staticmethod
+    def __create_conf(task_id):
+        return {
+            'task': task_id,
+            'cmd': 'foo bar',
+            'image': 'rainbow_image',
+            'source': 'tests/runners/airflow/rainbow/hello_world',
+            'input_type': 'my_input_type',
+            'input_path': 'my_input',
+            'output_path': '/my_output.json'
+        }
diff --git a/tests/runners/airflow/build/test_build_rainbow.py b/tests/runners/airflow/build/test_build_rainbow.py
index c8fec6e..d1b28aa 100644
--- a/tests/runners/airflow/build/test_build_rainbow.py
+++ b/tests/runners/airflow/build/test_build_rainbow.py
@@ -1,8 +1,27 @@
+import unittest
 from unittest import TestCase
 
+import docker
 from rainbow.build import build_rainbow
 
 
-class Test(TestCase):
+class TestBuildRainbow(TestCase):
+
     def test_build_rainbow(self):
+        docker_client = docker.client.from_env()
+        image_names = ['rainbow_image', 'rainbow_image2']
+
+        for image_name in image_names:
+            if len(docker_client.images.list(image_name)) > 0:
+                docker_client.images.remove(image=image_name)
+
         build_rainbow.build_rainbow('tests/runners/airflow/rainbow')
+
+        for image_name in image_names:
+            docker_client.images.get(name=image_name)
+
+        docker_client.close()
+
+
+if __name__ == '__main__':
+    unittest.main()
diff --git a/tests/runners/airflow/rainbow/rainbow.yml b/tests/runners/airflow/rainbow/rainbow.yml
index fd30028..1a834d7 100644
--- a/tests/runners/airflow/rainbow/rainbow.yml
+++ b/tests/runners/airflow/rainbow/rainbow.yml
@@ -28,7 +28,7 @@ pipelines:
       - task: my_static_config_task
         type: python
         description: my 1st ds task
-        image: my_image
+        image: rainbow_image
         source: hello_world
         env_vars:
           env1: "a"
@@ -40,7 +40,7 @@ pipelines:
       - task: my_static_config_task2
         type: python
         description: my 1st ds task
-        image: mytask1artifactid
+        image: rainbow_image2
         source: hello_world
         env_vars:
           env1: "a"
diff --git a/tests/runners/airflow/tasks/test_python.py b/tests/runners/airflow/tasks/test_python.py
index 37a325a..8477c69 100644
--- a/tests/runners/airflow/tasks/test_python.py
+++ b/tests/runners/airflow/tasks/test_python.py
@@ -51,7 +51,7 @@ class TestPythonTask(TestCase):
         return {
             'task': task_id,
             'cmd': 'foo bar',
-            'image': 'my_image',
+            'image': 'rainbow_image',
             'source': 'tests/runners/airflow/rainbow/hello_world',
             'input_type': 'my_input_type',
             'input_path': 'my_input',


[incubator-liminal] 15/43: Fix rainbow_dags python task

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 77139de6083d355b8bd80c07a3961b6d9ba8408d
Author: aviemzur <av...@gmail.com>
AuthorDate: Thu Mar 12 16:52:40 2020 +0200

    Fix rainbow_dags python task
---
 rainbow/build/build_rainbows.py                    |  2 +-
 rainbow/{docker => build}/python/Dockerfile        |  0
 rainbow/build/python/__init__.py                   |  0
 rainbow/build/python/container-setup.sh            |  2 +-
 rainbow/build/python/container-teardown.sh         |  2 +-
 rainbow/{docker => build}/python/python_image.py   | 18 ++++++++-----
 rainbow/core/util/files_util.py                    |  2 ++
 rainbow/docker/python/__init__.py                  | 17 ------------
 rainbow/runners/airflow/dag/rainbow_dags.py        | 31 +++++++++++++++-------
 rainbow/runners/airflow/tasks/python.py            |  9 +++----
 .../airflow/build/python/test_python_image.py      |  2 +-
 tests/runners/airflow/rainbow/rainbow.yml          |  6 ++---
 tests/runners/airflow/tasks/test_python.py         |  2 --
 13 files changed, 45 insertions(+), 48 deletions(-)

diff --git a/rainbow/build/build_rainbows.py b/rainbow/build/build_rainbows.py
index 1452bb8..2a9e6a3 100644
--- a/rainbow/build/build_rainbows.py
+++ b/rainbow/build/build_rainbows.py
@@ -21,7 +21,7 @@ import os
 import yaml
 
 from rainbow.core.util import files_util
-from rainbow.docker.python.python_image import PythonImage
+from rainbow.build.python.python_image import PythonImage
 
 
 def build_rainbows(path):
diff --git a/rainbow/docker/python/Dockerfile b/rainbow/build/python/Dockerfile
similarity index 100%
rename from rainbow/docker/python/Dockerfile
rename to rainbow/build/python/Dockerfile
diff --git a/rainbow/build/python/__init__.py b/rainbow/build/python/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/rainbow/build/python/container-setup.sh b/rainbow/build/python/container-setup.sh
index 6e8d242..4e20fc2 100755
--- a/rainbow/build/python/container-setup.sh
+++ b/rainbow/build/python/container-setup.sh
@@ -1,4 +1,4 @@
-#!/bin/bash
+#!/bin/sh
 
 echo """$RAINBOW_INPUT""" > rainbow_input.json
 
diff --git a/rainbow/build/python/container-teardown.sh b/rainbow/build/python/container-teardown.sh
index 1219407..ef213a8 100755
--- a/rainbow/build/python/container-teardown.sh
+++ b/rainbow/build/python/container-teardown.sh
@@ -1,4 +1,4 @@
-#!/bin/bash
+#!/bin/sh
 
 USER_CONFIG_OUTPUT_FILE=$1
 if [ "$USER_CONFIG_OUTPUT_FILE" != "" ]; then
diff --git a/rainbow/docker/python/python_image.py b/rainbow/build/python/python_image.py
similarity index 82%
rename from rainbow/docker/python/python_image.py
rename to rainbow/build/python/python_image.py
index ae7bc23..f0fb3a0 100644
--- a/rainbow/docker/python/python_image.py
+++ b/rainbow/build/python/python_image.py
@@ -15,6 +15,7 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+
 import os
 import shutil
 import tempfile
@@ -24,7 +25,7 @@ import docker
 
 class PythonImage:
 
-    def build(self, base_path, relative_source_path, tag, extra_files=None):
+    def build(self, base_path, relative_source_path, tag):
         """
         TODO: pydoc
 
@@ -35,9 +36,6 @@ class PythonImage:
         :return:
         """
 
-        if extra_files is None:
-            extra_files = []
-
         print(f'Building image {tag}')
 
         temp_dir = tempfile.mkdtemp()
@@ -51,16 +49,24 @@ class PythonImage:
             with open(requirements_file_path, 'w'):
                 pass
 
-        dockerfile_path = os.path.join(os.path.dirname(__file__), 'Dockerfile')
+        docker_files = [
+            os.path.join(os.path.dirname(__file__), 'Dockerfile'),
+            os.path.join(os.path.dirname(__file__), 'container-setup.sh'),
+            os.path.join(os.path.dirname(__file__), 'container-teardown.sh')
+        ]
 
-        for file in extra_files + [dockerfile_path]:
+        for file in docker_files:
             self.__copy_file(file, temp_dir)
 
         docker_client = docker.from_env()
+
+        # TODO: log docker output
         docker_client.images.build(path=temp_dir, tag=tag)
 
         docker_client.close()
 
+        print(temp_dir, os.listdir(temp_dir))
+
         shutil.rmtree(temp_dir)
 
     @staticmethod
diff --git a/rainbow/core/util/files_util.py b/rainbow/core/util/files_util.py
index 403fec9..b1d1daf 100644
--- a/rainbow/core/util/files_util.py
+++ b/rainbow/core/util/files_util.py
@@ -21,8 +21,10 @@ import os
 
 def find_config_files(path):
     files = []
+    print(path)
     for r, d, f in os.walk(path):
         for file in f:
+            print(os.path.basename(file))
             if os.path.basename(file) in ['rainbow.yml', 'rainbow.yaml']:
                 files.append(os.path.join(r, file))
     return files
diff --git a/rainbow/docker/python/__init__.py b/rainbow/docker/python/__init__.py
deleted file mode 100644
index 217e5db..0000000
--- a/rainbow/docker/python/__init__.py
+++ /dev/null
@@ -1,17 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
diff --git a/rainbow/runners/airflow/dag/rainbow_dags.py b/rainbow/runners/airflow/dag/rainbow_dags.py
index 8557455..92b6d64 100644
--- a/rainbow/runners/airflow/dag/rainbow_dags.py
+++ b/rainbow/runners/airflow/dag/rainbow_dags.py
@@ -20,6 +20,7 @@ from datetime import datetime
 
 import yaml
 from airflow import DAG
+from airflow.models import Variable
 
 from rainbow.core.util import files_util
 from rainbow.runners.airflow.tasks.python import PythonTask
@@ -35,7 +36,7 @@ def register_dags(configs_path):
     dags = []
 
     for config_file in config_files:
-        print(f'Registering DAG for file: f{config_file}')
+        print(f'Registering DAG for file: {config_file}')
 
         with open(config_file) as stream:
             config = yaml.safe_load(stream)
@@ -43,24 +44,35 @@ def register_dags(configs_path):
             for pipeline in config['pipelines']:
                 parent = None
 
+                pipeline_name = pipeline['pipeline']
+
                 default_args = {
                     'owner': config['owner'],
-                    'start_date': datetime.combine(pipeline['start_date'], datetime.min.time())
+                    'start_date': datetime.combine(pipeline['start_date'], datetime.min.time()),
+                    'depends_on_past': False,
                 }
 
                 dag = DAG(
-                    dag_id='test_dag',
-                    default_args=default_args
+                    dag_id=pipeline_name,
+                    default_args=default_args,
+                    catchup=False
                 )
 
+                trigger_rule = 'all_success'
+                if 'always_run' in config and config['always_run']:
+                    trigger_rule = 'all_done'
+
                 for task in pipeline['tasks']:
                     task_type = task['type']
                     task_instance = get_task_class(task_type)(
-                        dag, pipeline['pipeline'], parent if parent else None, task, 'all_success'
+                        dag, pipeline['pipeline'], parent if parent else None, task, trigger_rule
                     )
+
                     parent = task_instance.apply_task_to_dag()
 
-                    print(f'{parent}{{{task_type}}}')
+                print(f'{pipeline_name}: {dag.tasks}')
+
+                globals()[pipeline_name] = dag
 
                 dags.append(dag)
     return dags
@@ -75,7 +87,6 @@ def get_task_class(task_type):
     return task_classes[task_type]
 
 
-if __name__ == '__main__':
-    # TODO: configurable yaml dir
-    path = 'tests/runners/airflow/dag/rainbow'
-    register_dags(path)
+# TODO: configurable path
+path = Variable.get('rainbows_dir')
+register_dags(path)
diff --git a/rainbow/runners/airflow/tasks/python.py b/rainbow/runners/airflow/tasks/python.py
index b2769c8..ac46d0b 100644
--- a/rainbow/runners/airflow/tasks/python.py
+++ b/rainbow/runners/airflow/tasks/python.py
@@ -16,12 +16,10 @@
 # specific language governing permissions and limitations
 # under the License.
 import json
-import os
 
 from airflow.models import Variable
 from airflow.operators.dummy_operator import DummyOperator
 
-from rainbow.docker.python import python_image
 from rainbow.runners.airflow.model import task
 from rainbow.runners.airflow.operators.kubernetes_pod_operator import \
     ConfigurableKubernetesPodOperator, \
@@ -136,10 +134,11 @@ class PythonTask(task.Task):
 
     def __kubernetes_cmds_and_arguments(self):
         cmds = ['/bin/bash', '-c']
+        output_path = self.config['output_path'] if 'output_path' in self.config else ''
         arguments = [
-            f'''sh container-setup.sh && \
-            {self.config['cmd']} && \
-            sh container-teardown.sh {self.config['output_path']}'''
+            f"sh container-setup.sh && " +
+            f"{self.config['cmd']} && " +
+            f"sh container-teardown.sh {output_path}"
         ]
         return cmds, arguments
 
diff --git a/tests/runners/airflow/build/python/test_python_image.py b/tests/runners/airflow/build/python/test_python_image.py
index a8c02b6..368b05d 100644
--- a/tests/runners/airflow/build/python/test_python_image.py
+++ b/tests/runners/airflow/build/python/test_python_image.py
@@ -19,7 +19,7 @@ from unittest import TestCase
 
 import docker
 
-from rainbow.docker.python.python_image import PythonImage
+from rainbow.build.python.python_image import PythonImage
 
 
 class TestPythonImage(TestCase):
diff --git a/tests/runners/airflow/rainbow/rainbow.yml b/tests/runners/airflow/rainbow/rainbow.yml
index 1a834d7..3e3ec4b 100644
--- a/tests/runners/airflow/rainbow/rainbow.yml
+++ b/tests/runners/airflow/rainbow/rainbow.yml
@@ -35,8 +35,7 @@ pipelines:
           env2: "b"
         input_type: static
         input_path: "{\"configs\": [ { \"campaign_id\": 10 }, { \"campaign_id\": 20 } ]}"
-        output_path: 'baz'
-        cmd: 'foo bar'
+        cmd: 'python hello_world.py'
       - task: my_static_config_task2
         type: python
         description: my 1st ds task
@@ -47,8 +46,7 @@ pipelines:
           env2: "b"
         input_type: static
         input_path: "{\"configs\": [ { \"campaign_id\": 10 }, { \"campaign_id\": 20 } ]}"
-        output_path: 'baz'
-        cmd: 'foo bar'
+        cmd: 'python hello_world.py'
 services:
   - service:
     name: myserver1
diff --git a/tests/runners/airflow/tasks/test_python.py b/tests/runners/airflow/tasks/test_python.py
index 8477c69..ffdcac3 100644
--- a/tests/runners/airflow/tasks/test_python.py
+++ b/tests/runners/airflow/tasks/test_python.py
@@ -19,8 +19,6 @@
 import unittest
 from unittest import TestCase
 
-import docker
-
 from rainbow.runners.airflow.operators.kubernetes_pod_operator import \
     ConfigurableKubernetesPodOperator
 from rainbow.runners.airflow.tasks import python


[incubator-liminal] 27/43: Update README fix task description

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 44d9c3d3f229f65d3c792db7013fbcdaddf4c7b0
Author: aviemzur <av...@gmail.com>
AuthorDate: Mon Mar 23 10:45:38 2020 +0200

    Update README fix task description
---
 README.md                                 | 8 ++++++--
 tests/runners/airflow/rainbow/rainbow.yml | 2 +-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index 3e46f34..467edf2 100644
--- a/README.md
+++ b/README.md
@@ -54,7 +54,7 @@ pipelines:
         cmd: python -u helloworld.py
       - task: my_task_output_input_task
         type: python
-        description: parallelized static input task
+        description: task with input from other task's output
         image: my_task_output_input_task_image
         source: helloworld
         env_vars:
@@ -77,7 +77,11 @@ services:
 ```
 
 ## Example repository structure
-[Example repository structure](https://github.com/Natural-Intelligence/rainbow/tree/master/tests/runners/airflow/rainbow])
+
+[Example repository structure](
+https://github.com/Natural-Intelligence/rainbow/tree/master/tests/runners/airflow/rainbow]
+)
+
 # Installation
 
 TODO: installation.
diff --git a/tests/runners/airflow/rainbow/rainbow.yml b/tests/runners/airflow/rainbow/rainbow.yml
index 66e3dec..05c0a09 100644
--- a/tests/runners/airflow/rainbow/rainbow.yml
+++ b/tests/runners/airflow/rainbow/rainbow.yml
@@ -53,7 +53,7 @@ pipelines:
         cmd: python -u helloworld.py
       - task: my_task_output_input_task
         type: python
-        description: parallelized static input task
+        description: task with input from other task's output
         image: my_task_output_input_task_image
         source: helloworld
         env_vars:


[incubator-liminal] 01/43: first commit

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 87da6bd2bad2c2929891ad0833fdc3c0733ebf92
Author: aviemzur <av...@gmail.com>
AuthorDate: Thu Mar 5 16:23:01 2020 +0200

    first commit
---
 README.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/README.md b/README.md
new file mode 100644
index 0000000..7168564
--- /dev/null
+++ b/README.md
@@ -0,0 +1 @@
+# rainbow


[incubator-liminal] 13/43: Add cli

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit f4bdfac225f3c7656966b0fffafd8d3e871fb7f5
Author: aviemzur <av...@gmail.com>
AuthorDate: Thu Mar 12 12:05:11 2020 +0200

    Add cli
---
 rainbow/cli/__init__.py => rainbow-cli             | 24 +++++++++++++++++++++-
 rainbow/build/__init__.py                          |  3 ---
 .../build/{build_rainbow.py => build_rainbows.py}  |  4 ++--
 rainbow/core/util/files_util.py                    |  2 +-
 requirements.txt                                   |  1 +
 tests/runners/airflow/build/test_build_rainbow.py  |  4 ++--
 6 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/rainbow/cli/__init__.py b/rainbow-cli
old mode 100644
new mode 100755
similarity index 68%
rename from rainbow/cli/__init__.py
rename to rainbow-cli
index c24b2fa..4f16b4e
--- a/rainbow/cli/__init__.py
+++ b/rainbow-cli
@@ -1,3 +1,5 @@
+#!/usr/bin/env python3
+
 #
 # Licensed to the Apache Software Foundation (ASF) under one
 # or more contributor license agreements.  See the NOTICE file
@@ -15,4 +17,24 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-# TODO: cli
+import os
+
+import click
+
+from rainbow.build import build_rainbows
+
+
+@click.group()
+def cli():
+    pass
+
+
+@cli.command()
+@click.option('--path', default=os.getcwd(), help='Build within this path.')
+def build(path):
+    click.echo(f'Building rainbows in {path}')
+    build_rainbows.build_rainbows(path)
+
+
+if __name__ == '__main__':
+    cli()
diff --git a/rainbow/build/__init__.py b/rainbow/build/__init__.py
index 9e84106..217e5db 100644
--- a/rainbow/build/__init__.py
+++ b/rainbow/build/__init__.py
@@ -15,6 +15,3 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-"""
-TODO: rainbow build.
-"""
diff --git a/rainbow/build/build_rainbow.py b/rainbow/build/build_rainbows.py
similarity index 95%
rename from rainbow/build/build_rainbow.py
rename to rainbow/build/build_rainbows.py
index 7c03104..d10a9bc 100644
--- a/rainbow/build/build_rainbow.py
+++ b/rainbow/build/build_rainbows.py
@@ -24,7 +24,7 @@ from rainbow.core.util import files_util
 from rainbow.docker.python.python_image import PythonImage
 
 
-def build_rainbow(path):
+def build_rainbows(path):
     """
     TODO: doc for build_rainbow
     """
@@ -32,7 +32,7 @@ def build_rainbow(path):
     config_files = files_util.find_config_files(path)
 
     for config_file in config_files:
-        print(f'Building artifacts file: f{config_file}')
+        print(f'Building artifacts for file: {config_file}')
 
         with open(config_file) as stream:
             # TODO: validate config
diff --git a/rainbow/core/util/files_util.py b/rainbow/core/util/files_util.py
index e5a8e09..403fec9 100644
--- a/rainbow/core/util/files_util.py
+++ b/rainbow/core/util/files_util.py
@@ -23,6 +23,6 @@ def find_config_files(path):
     files = []
     for r, d, f in os.walk(path):
         for file in f:
-            if file[file.rfind('.') + 1:] in ['yml', 'yaml']:
+            if os.path.basename(file) in ['rainbow.yml', 'rainbow.yaml']:
                 files.append(os.path.join(r, file))
     return files
diff --git a/requirements.txt b/requirements.txt
index 6e05d98..599ab8b 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -3,3 +3,4 @@ PyYAML
 docker==4.2.0
 apache-airflow==1.10.9
 docker-pycreds==0.4.0
+click==7.1.1
diff --git a/tests/runners/airflow/build/test_build_rainbow.py b/tests/runners/airflow/build/test_build_rainbow.py
index d1b28aa..533848f 100644
--- a/tests/runners/airflow/build/test_build_rainbow.py
+++ b/tests/runners/airflow/build/test_build_rainbow.py
@@ -2,7 +2,7 @@ import unittest
 from unittest import TestCase
 
 import docker
-from rainbow.build import build_rainbow
+from rainbow.build import build_rainbows
 
 
 class TestBuildRainbow(TestCase):
@@ -15,7 +15,7 @@ class TestBuildRainbow(TestCase):
             if len(docker_client.images.list(image_name)) > 0:
                 docker_client.images.remove(image=image_name)
 
-        build_rainbow.build_rainbow('tests/runners/airflow/rainbow')
+        build_rainbows.build_rainbows('tests/runners/airflow/rainbow')
 
         for image_name in image_names:
             docker_client.images.get(name=image_name)


[incubator-liminal] 32/43: Use user pip conf in docker build

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 8df40b2cb91eb2428e58c58e7dad9a290f9c185e
Author: aviem-naturalint <av...@naturalint.com>
AuthorDate: Sat Apr 11 08:14:08 2020 +0300

    Use user pip conf in docker build
---
 rainbow/build/build_rainbows.py                    |  4 +-
 rainbow/build/image/python/Dockerfile              | 26 ++++++-
 rainbow/build/image/python/container-setup.sh      |  2 +
 rainbow/build/image/python/container-teardown.sh   |  4 +-
 rainbow/build/image/python/python.py               |  7 +-
 rainbow/build/{image => }/image_builder.py         | 70 ++++++++++++-----
 rainbow/build/python.py                            | 74 ++++++++++++++++++
 rainbow/build/service/python_server/Dockerfile     | 26 ++++++-
 .../build/service/python_server/python_server.py   |  8 +-
 .../kubernetes_pod_operator_with_input_output.py   |  5 +-
 run_tests.sh                                       |  4 +-
 .../python/test_python_server_image_builder.py     | 36 +++++++--
 .../build/python/test_python_image_builder.py      | 90 ++++++++++++++++++----
 tests/runners/airflow/build/test_build_rainbows.py |  2 +-
 .../airflow/rainbow/helloworld/hello_world.py      | 10 ++-
 .../{helloworld/hello_world.py => pip.conf}        | 10 ---
 tests/runners/airflow/rainbow/rainbow.yml          |  8 +-
 17 files changed, 308 insertions(+), 78 deletions(-)

diff --git a/rainbow/build/build_rainbows.py b/rainbow/build/build_rainbows.py
index 4ed5bab..b7ea6eb 100644
--- a/rainbow/build/build_rainbows.py
+++ b/rainbow/build/build_rainbows.py
@@ -20,13 +20,13 @@ import os
 
 import yaml
 
-from rainbow.build.image.image_builder import ImageBuilder, ServiceImageBuilderMixin
+from rainbow.build.image_builder import ImageBuilder, ServiceImageBuilderMixin
 from rainbow.core.util import files_util, class_util
 
 
 def build_rainbows(path):
     """
-    TODO: doc for build_rainbows
+    Build images for rainbows in path.
     """
     config_files = files_util.find_config_files(path)
 
diff --git a/rainbow/build/image/python/Dockerfile b/rainbow/build/image/python/Dockerfile
index d4e3ed2..8e4de05 100644
--- a/rainbow/build/image/python/Dockerfile
+++ b/rainbow/build/image/python/Dockerfile
@@ -1,3 +1,21 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
 # Use an official Python runtime as a parent image
 FROM python:3.7-slim
 
@@ -11,9 +29,11 @@ WORKDIR /app
 # Be careful when changing this code.                                                            !
 
 # Install any needed packages specified in requirements.txt
-COPY ./requirements.txt /app
-RUN pip install -r requirements.txt
+COPY ./requirements.txt /app/
+
+# mount the secret in the correct location, then run pip install
+RUN {{mount}} pip install -r requirements.txt
 
 # Copy the current directory contents into the container at /app
 RUN echo "Copying source code.."
-COPY . /app
+COPY . /app/
diff --git a/rainbow/build/image/python/container-setup.sh b/rainbow/build/image/python/container-setup.sh
index 883f1e1..c9e5cef 100755
--- a/rainbow/build/image/python/container-setup.sh
+++ b/rainbow/build/image/python/container-setup.sh
@@ -1,5 +1,7 @@
 #!/bin/sh
 
+echo 'Writing rainbow input..'
+
 echo """$RAINBOW_INPUT""" > /rainbow_input.json
 
 AIRFLOW_RETURN_FILE=/airflow/xcom/return.json
diff --git a/rainbow/build/image/python/container-teardown.sh b/rainbow/build/image/python/container-teardown.sh
index ef213a8..46c4426 100755
--- a/rainbow/build/image/python/container-teardown.sh
+++ b/rainbow/build/image/python/container-teardown.sh
@@ -1,6 +1,8 @@
 #!/bin/sh
 
+echo 'Writing rainbow output..'
+
 USER_CONFIG_OUTPUT_FILE=$1
 if [ "$USER_CONFIG_OUTPUT_FILE" != "" ]; then
-    cp ${USER_CONFIG_OUTPUT_FILE} /airflow/xcom/return.json
+    cp "${USER_CONFIG_OUTPUT_FILE}" /airflow/xcom/return.json
 fi
diff --git a/rainbow/build/image/python/python.py b/rainbow/build/image/python/python.py
index f4fb03b..0ecec77 100644
--- a/rainbow/build/image/python/python.py
+++ b/rainbow/build/image/python/python.py
@@ -18,10 +18,10 @@
 
 import os
 
-from rainbow.build.image.image_builder import ImageBuilder
+from rainbow.build.python import BasePythonImageBuilder
 
 
-class PythonImageBuilder(ImageBuilder):
+class PythonImageBuilder(BasePythonImageBuilder):
 
     def __init__(self, config, base_path, relative_source_path, tag):
         super().__init__(config, base_path, relative_source_path, tag)
@@ -30,8 +30,7 @@ class PythonImageBuilder(ImageBuilder):
     def _dockerfile_path():
         return os.path.join(os.path.dirname(__file__), 'Dockerfile')
 
-    @staticmethod
-    def _additional_files_from_paths():
+    def _additional_files_from_paths(self):
         return [
             os.path.join(os.path.dirname(__file__), 'container-setup.sh'),
             os.path.join(os.path.dirname(__file__), 'container-teardown.sh'),
diff --git a/rainbow/build/image/image_builder.py b/rainbow/build/image_builder.py
similarity index 64%
rename from rainbow/build/image/image_builder.py
rename to rainbow/build/image_builder.py
index e716b9d..a56a22e 100644
--- a/rainbow/build/image/image_builder.py
+++ b/rainbow/build/image_builder.py
@@ -18,24 +18,23 @@
 
 import os
 import shutil
+import subprocess
 import tempfile
 
-import docker
-
 
 class ImageBuilder:
     """
     Builds an image from source code
     """
 
+    __NO_CACHE = 'no_cache'
+
     def __init__(self, config, base_path, relative_source_path, tag):
         """
-        TODO: pydoc
-
-        :param config:
-        :param base_path:
-        :param relative_source_path:
-        :param tag:
+        :param config: task/service config
+        :param base_path: directory containing rainbow yml
+        :param relative_source_path: source path relative to rainbow yml
+        :param tag: image tag
         """
         self.base_path = base_path
         self.relative_source_path = relative_source_path
@@ -51,27 +50,44 @@ class ImageBuilder:
         temp_dir = self.__temp_dir()
 
         self.__copy_source_code(temp_dir)
-        self.__write_additional_files(temp_dir)
-
-        # TODO: log docker output
-        docker_client = docker.from_env()
-        docker_client.images.build(path=temp_dir, tag=self.tag)
-        docker_client.close()
+        self._write_additional_files(temp_dir)
+
+        no_cache = ''
+        if self.__NO_CACHE in self.config and self.config[self.__NO_CACHE]:
+            no_cache = '--no-cache=true'
+
+        docker_build_command = f'docker build {no_cache} --progress=plain ' + \
+                               f'--tag {self.tag} {self._build_flags()} {temp_dir}'
+
+        if self._use_buildkit():
+            docker_build_command = f'DOCKER_BUILDKIT=1 {docker_build_command}'
+
+        print(docker_build_command)
+
+        docker_build_out = ''
+        try:
+            docker_build_out = subprocess.check_output(docker_build_command,
+                                                       shell=True, stderr=subprocess.STDOUT,
+                                                       timeout=240)
+        except subprocess.CalledProcessError as e:
+            docker_build_out = e.output
+            raise e
+        finally:
+            print('=' * 80)
+            for line in str(docker_build_out)[2:-3].split('\\n'):
+                print(line)
+            print('=' * 80)
 
         self.__remove_dir(temp_dir)
 
         print(f'[X] Building image: {self.tag} (Success).')
 
+        return docker_build_out
+
     def __copy_source_code(self, temp_dir):
         self.__copy_dir(os.path.join(self.base_path, self.relative_source_path), temp_dir)
 
-    def __write_additional_files(self, temp_dir):
-        # TODO: move requirements.txt related code to a parent class for python image builders.
-        requirements_file_path = os.path.join(temp_dir, 'requirements.txt')
-        if not os.path.exists(requirements_file_path):
-            with open(requirements_file_path, 'w'):
-                pass
-
+    def _write_additional_files(self, temp_dir):
         for file in [self._dockerfile_path()] + self._additional_files_from_paths():
             self.__copy_file(file, temp_dir)
 
@@ -117,6 +133,18 @@ class ImageBuilder:
         """
         return []
 
+    def _build_flags(self):
+        """
+        Additional build flags to add to docker build command.
+        """
+        return ''
+
+    def _use_buildkit(self):
+        """
+        overwrite with True to use docker buildkit
+        """
+        return False
+
 
 class ServiceImageBuilderMixin(object):
     pass
diff --git a/rainbow/build/python.py b/rainbow/build/python.py
new file mode 100644
index 0000000..0961d2b
--- /dev/null
+++ b/rainbow/build/python.py
@@ -0,0 +1,74 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import os
+
+from rainbow.build.image_builder import ImageBuilder
+
+
+class BasePythonImageBuilder(ImageBuilder):
+    """
+    Base class for building python images.
+    """
+
+    __PIP_CONF = 'pip_conf'
+
+    def __init__(self, config, base_path, relative_source_path, tag):
+        super().__init__(config, base_path, relative_source_path, tag)
+
+    @staticmethod
+    def _dockerfile_path():
+        raise NotImplementedError()
+
+    def _write_additional_files(self, temp_dir):
+        requirements_file_path = os.path.join(temp_dir, 'requirements.txt')
+        if not os.path.exists(requirements_file_path):
+            with open(requirements_file_path, 'w'):
+                pass
+
+        super()._write_additional_files(temp_dir)
+
+    def _additional_files_from_filename_content_pairs(self):
+        with open(self._dockerfile_path()) as original:
+            data = original.read()
+
+        data = self.__mount_pip_conf(data)
+
+        return [('Dockerfile', data)]
+
+    def __mount_pip_conf(self, data):
+        new_data = data
+
+        if self.__PIP_CONF in self.config:
+            new_data = '# syntax = docker/dockerfile:1.0-experimental\n' + data
+            new_data = new_data.replace('{{mount}}',
+                                        '--mount=type=secret,id=pip_config,dst=/etc/pip.conf \\\n')
+        else:
+            new_data = new_data.replace('{{mount}} ', '')
+
+        return new_data
+
+    def _build_flags(self):
+        if self.__PIP_CONF in self.config:
+            return f'--secret id=pip_config,src={self.config[self.__PIP_CONF]}'
+        else:
+            return ''
+
+    def _use_buildkit(self):
+        if self.__PIP_CONF in self.config:
+            return True
diff --git a/rainbow/build/service/python_server/Dockerfile b/rainbow/build/service/python_server/Dockerfile
index 6119437..4d4254f 100644
--- a/rainbow/build/service/python_server/Dockerfile
+++ b/rainbow/build/service/python_server/Dockerfile
@@ -1,3 +1,21 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
 # Use an official Python runtime as a parent image
 FROM python:3.7-slim
 
@@ -11,14 +29,14 @@ WORKDIR /app
 # Be careful when changing this code.                                                            !
 
 # Install any needed packages specified in python_server_requirements.txt and requirements.txt
-COPY ./python_server_requirements.txt /app
+COPY ./python_server_requirements.txt /app/
 RUN pip install -r python_server_requirements.txt
 
-COPY ./requirements.txt /app
-RUN pip install -r requirements.txt
+COPY ./requirements.txt /app/
+RUN {{mount}} pip install -r requirements.txt
 
 # Copy the current directory contents into the container at /app
 RUN echo "Copying source code.."
-COPY . /app
+COPY . /app/
 
 CMD python -u rainbow_python_server.py
diff --git a/rainbow/build/service/python_server/python_server.py b/rainbow/build/service/python_server/python_server.py
index 3404abf..0b2537d 100644
--- a/rainbow/build/service/python_server/python_server.py
+++ b/rainbow/build/service/python_server/python_server.py
@@ -20,10 +20,11 @@ import os
 
 import yaml
 
-from rainbow.build.image.image_builder import ImageBuilder, ServiceImageBuilderMixin
+from rainbow.build.image_builder import ServiceImageBuilderMixin
+from rainbow.build.python import BasePythonImageBuilder
 
 
-class PythonServerImageBuilder(ImageBuilder, ServiceImageBuilderMixin):
+class PythonServerImageBuilder(BasePythonImageBuilder, ServiceImageBuilderMixin):
 
     def __init__(self, config, base_path, relative_source_path, tag):
         super().__init__(config, base_path, relative_source_path, tag)
@@ -40,4 +41,5 @@ class PythonServerImageBuilder(ImageBuilder, ServiceImageBuilderMixin):
         ]
 
     def _additional_files_from_filename_content_pairs(self):
-        return [('service.yml', yaml.safe_dump(self.config))]
+        return super()._additional_files_from_filename_content_pairs() + \
+               [('service.yml', yaml.safe_dump(self.config))]
diff --git a/rainbow/runners/airflow/operators/kubernetes_pod_operator_with_input_output.py b/rainbow/runners/airflow/operators/kubernetes_pod_operator_with_input_output.py
index eb6fa83..c44e80b 100644
--- a/rainbow/runners/airflow/operators/kubernetes_pod_operator_with_input_output.py
+++ b/rainbow/runners/airflow/operators/kubernetes_pod_operator_with_input_output.py
@@ -67,7 +67,6 @@ class PrepareInputOperator(KubernetesPodOperator):
             else:
                 raise ValueError(f'Unknown config type: {self.input_type}')
 
-        # TODO: pass run_id as well as env var
         run_id = context['dag_run'].run_id
         print(f'run_id = {run_id}')
 
@@ -145,4 +144,8 @@ class KubernetesPodOperatorWithInputAndOutput(KubernetesPodOperator):
 
             self.log.info(f'Empty input for task {self.task_split}.')
 
+        run_id = context['dag_run'].run_id
+        print(f'run_id = {run_id}')
+
+        self.env_vars.update({'run_id': run_id})
         return super().execute(context)
diff --git a/run_tests.sh b/run_tests.sh
index 3e5cd2f..8fdae7a 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -1,3 +1,5 @@
 #!/bin/sh
 
-python -m unittest
\ No newline at end of file
+export TMPDIR=/tmp
+
+python -m unittest
diff --git a/tests/runners/airflow/build/http/python/test_python_server_image_builder.py b/tests/runners/airflow/build/http/python/test_python_server_image_builder.py
index 63fc8fa..ecdaced 100644
--- a/tests/runners/airflow/build/http/python/test_python_server_image_builder.py
+++ b/tests/runners/airflow/build/http/python/test_python_server_image_builder.py
@@ -15,6 +15,7 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+
 import os
 import threading
 import time
@@ -41,23 +42,47 @@ class TestPythonServer(TestCase):
         self.docker_client.close()
 
     def test_build_python_server(self):
+        build_out = self.__test_build_python_server()
+
+        self.assertTrue('RUN pip install -r requirements.txt' in build_out, 'Incorrect pip command')
+
+    def test_build_python_server_with_pip_conf(self):
+        build_out = self.__test_build_python_server(use_pip_conf=True)
+
+        self.assertTrue(
+            'RUN --mount=type=secret,id=pip_config,dst=/etc/pip.conf  pip insta...' in build_out,
+            'Incorrect pip command')
+
+    def __test_build_python_server(self, use_pip_conf=False):
         base_path = os.path.join(os.path.dirname(__file__), '../../../rainbow')
-        builder = PythonServerImageBuilder(config=self.config,
+
+        config = self.__create_conf('my_task')
+
+        if use_pip_conf:
+            config['pip_conf'] = os.path.join(base_path, 'pip.conf')
+
+        builder = PythonServerImageBuilder(config=config,
                                            base_path=base_path,
                                            relative_source_path='myserver',
                                            tag=self.image_name)
 
-        builder.build()
+        build_out = str(builder.build())
 
         thread = threading.Thread(target=self.__run_container, args=[self.image_name])
         thread.daemon = True
         thread.start()
 
-        time.sleep(2)
+        time.sleep(5)
+
+        print('Sending request to server')
+
+        server_response = str(urllib.request.urlopen('http://localhost:9294/myendpoint1').read())
+
+        print(f'Response from server: {server_response}')
 
-        server_response = urllib.request.urlopen("http://localhost:9294/myendpoint1").read()
+        self.assertEqual("b'1'", server_response)
 
-        self.assertEqual("b'1'", str(server_response))
+        return build_out
 
     def __remove_containers(self):
         print(f'Stopping containers with image: {self.image_name}')
@@ -92,6 +117,7 @@ class TestPythonServer(TestCase):
             'input_type': 'my_input_type',
             'input_path': 'my_input',
             'output_path': '/my_output.json',
+            'no_cache': True,
             'endpoints': [
                 {
                     'endpoint': '/myendpoint1',
diff --git a/tests/runners/airflow/build/python/test_python_image_builder.py b/tests/runners/airflow/build/python/test_python_image_builder.py
index 7376987..81b5cc3 100644
--- a/tests/runners/airflow/build/python/test_python_image_builder.py
+++ b/tests/runners/airflow/build/python/test_python_image_builder.py
@@ -16,6 +16,8 @@
 # specific language governing permissions and limitations
 # under the License.
 import os
+import shutil
+import tempfile
 from unittest import TestCase
 
 import docker
@@ -24,46 +26,106 @@ from rainbow.build.image.python.python import PythonImageBuilder
 
 
 class TestPythonImageBuilder(TestCase):
+    __IMAGE_NAME = 'rainbow_image'
+    __OUTPUT_PATH = '/mnt/vol1/my_output.json'
+
+    def setUp(self) -> None:
+        super().setUp()
+        os.environ['TMPDIR'] = '/tmp'
+        self.temp_dir = self.__temp_dir()
+        self.temp_airflow_dir = self.__temp_dir()
+
+    def tearDown(self) -> None:
+        super().tearDown()
+        self.__remove_dir(self.temp_dir)
+        self.__remove_dir(self.temp_airflow_dir)
 
     def test_build(self):
-        config = self.__create_conf('my_task')
+        build_out = self.__test_build()
+
+        self.assertTrue('RUN pip install -r requirements.txt' in build_out, 'Incorrect pip command')
+
+        self.__test_image()
 
-        image_name = config['image']
+    def test_build_with_pip_conf(self):
+        build_out = self.__test_build(use_pip_conf=True)
+
+        self.assertTrue(
+            'RUN --mount=type=secret,id=pip_config,dst=/etc/pip.conf  pip insta...' in build_out,
+            'Incorrect pip command')
+
+        self.__test_image()
+
+    def __test_build(self, use_pip_conf=False):
+        config = self.__create_conf('my_task')
 
         base_path = os.path.join(os.path.dirname(__file__), '../../rainbow')
 
+        if use_pip_conf:
+            config['pip_conf'] = os.path.join(base_path, 'pip.conf')
+
         builder = PythonImageBuilder(config=config,
                                      base_path=base_path,
                                      relative_source_path='helloworld',
-                                     tag=image_name)
+                                     tag=self.__IMAGE_NAME)
 
-        builder.build()
+        build_out = str(builder.build())
 
-        # TODO: elaborate test of image, validate input/output
+        return build_out
 
+    def __test_image(self):
         docker_client = docker.from_env()
-        docker_client.images.get(image_name)
+        docker_client.images.get(self.__IMAGE_NAME)
 
-        cmd = 'export RAINBOW_INPUT="{}" && ' + \
+        cmd = 'export RAINBOW_INPUT="{\\"x\\": 1}" && ' + \
               'sh container-setup.sh && ' + \
               'python hello_world.py && ' + \
-              'sh container-teardown.sh'
+              f'sh container-teardown.sh {self.__OUTPUT_PATH}'
         cmds = ['/bin/bash', '-c', cmd]
 
-        container_log = docker_client.containers.run(image_name, cmds)
+        container_log = docker_client.containers.run(self.__IMAGE_NAME,
+                                                     cmds,
+                                                     volumes={
+                                                         self.temp_dir: {
+                                                             'bind': '/mnt/vol1',
+                                                             'mode': 'rw'
+                                                         },
+                                                         self.temp_airflow_dir: {
+                                                             'bind': '/airflow/xcom',
+                                                             'mode': 'rw'},
+                                                     })
 
         docker_client.close()
 
-        self.assertEqual("b'Hello world!\\n\\n{}\\n'", str(container_log))
+        print(container_log)
 
-    @staticmethod
-    def __create_conf(task_id):
+        self.assertEqual(
+            "b\"Writing rainbow input..\\n" +
+            "Hello world!\\n\\n" +
+            "rainbow_input.json contents = {'x': 1}\\n" +
+            "Writing rainbow output..\\n\"",
+            str(container_log))
+
+        with open(os.path.join(self.temp_airflow_dir, 'return.json')) as file:
+            self.assertEqual(file.read(), '{"a": 1, "b": 2}')
+
+    def __create_conf(self, task_id):
         return {
             'task': task_id,
             'cmd': 'foo bar',
-            'image': 'rainbow_image',
+            'image': self.__IMAGE_NAME,
             'source': 'baz',
             'input_type': 'my_input_type',
             'input_path': 'my_input',
-            'output_path': '/my_output.json'
+            'no_cache': True,
+            'output_path': self.__OUTPUT_PATH,
         }
+
+    @staticmethod
+    def __temp_dir():
+        temp_dir = tempfile.mkdtemp()
+        return temp_dir
+
+    @staticmethod
+    def __remove_dir(temp_dir):
+        shutil.rmtree(temp_dir, ignore_errors=True)
diff --git a/tests/runners/airflow/build/test_build_rainbows.py b/tests/runners/airflow/build/test_build_rainbows.py
index c5d8ea7..7e01245 100644
--- a/tests/runners/airflow/build/test_build_rainbows.py
+++ b/tests/runners/airflow/build/test_build_rainbows.py
@@ -42,7 +42,7 @@ class TestBuildRainbows(TestCase):
     def __remove_images(self):
         for image_name in self.__image_names:
             if len(self.docker_client.images.list(image_name)) > 0:
-                self.docker_client.images.remove(image=image_name)
+                self.docker_client.images.remove(image=image_name, force=True)
 
     def test_build_rainbow(self):
         build_rainbows.build_rainbows(os.path.join(os.path.dirname(__file__), '../rainbow'))
diff --git a/tests/runners/airflow/rainbow/helloworld/hello_world.py b/tests/runners/airflow/rainbow/helloworld/hello_world.py
index 3eae465..95f4e73 100644
--- a/tests/runners/airflow/rainbow/helloworld/hello_world.py
+++ b/tests/runners/airflow/rainbow/helloworld/hello_world.py
@@ -16,12 +16,14 @@
 # specific language governing permissions and limitations
 # under the License.
 import json
+import os
 
-print('Hello world!')
-print()
+print('Hello world!\n')
 
 with open('/rainbow_input.json') as file:
-    print(json.loads(file.readline()))
+    print(f'rainbow_input.json contents = {json.loads(file.readline())}')
 
-with open('/output.json', 'w') as file:
+os.makedirs('/mnt/vol1/', exist_ok=True)
+
+with open('/mnt/vol1/my_output.json', 'w') as file:
     file.write(json.dumps({'a': 1, 'b': 2}))
diff --git a/tests/runners/airflow/rainbow/helloworld/hello_world.py b/tests/runners/airflow/rainbow/pip.conf
similarity index 78%
copy from tests/runners/airflow/rainbow/helloworld/hello_world.py
copy to tests/runners/airflow/rainbow/pip.conf
index 3eae465..217e5db 100644
--- a/tests/runners/airflow/rainbow/helloworld/hello_world.py
+++ b/tests/runners/airflow/rainbow/pip.conf
@@ -15,13 +15,3 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-import json
-
-print('Hello world!')
-print()
-
-with open('/rainbow_input.json') as file:
-    print(json.loads(file.readline()))
-
-with open('/output.json', 'w') as file:
-    file.write(json.dumps({'a': 1, 'b': 2}))
diff --git a/tests/runners/airflow/rainbow/rainbow.yml b/tests/runners/airflow/rainbow/rainbow.yml
index 0b08a1f..77af37b 100644
--- a/tests/runners/airflow/rainbow/rainbow.yml
+++ b/tests/runners/airflow/rainbow/rainbow.yml
@@ -29,8 +29,8 @@ pipelines:
       key1: val1
       key2: val2
     metrics:
-     namespace: TestNamespace
-     backends: [ 'cloudwatch' ]
+      namespace: TestNamespace
+      backends: [ 'cloudwatch' ]
     tasks:
       - task: my_static_input_task
         type: python
@@ -42,7 +42,7 @@ pipelines:
           env2: "b"
         input_type: static
         input_path: '[ { "foo": "bar" }, { "foo": "baz" } ]'
-        output_path: /output.json
+        output_path: /mnt/vol1/my_output.json
         cmd: python -u hello_world.py
       - task: my_parallelized_static_input_task
         type: python
@@ -55,7 +55,7 @@ pipelines:
         input_path: '[ { "foo": "bar" }, { "foo": "baz" } ]'
         split_input: True
         executors: 2
-        cmd: python -u helloworld.py
+        cmd: python -u hello_world.py
       - task: my_task_output_input_task
         type: python
         description: task with input from other task's output


[incubator-liminal] 09/43: Fix requirements.txt

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 45ceeaca954cf9eb437c937e614e588e98bb0a1f
Author: aviemzur <av...@gmail.com>
AuthorDate: Thu Mar 12 10:14:38 2020 +0200

    Fix requirements.txt
---
 requirements.txt | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/requirements.txt b/requirements.txt
index f22c0a7..e952e4c 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,3 +1,3 @@
-docker:4.2.0
-apache-airflow:1.10.9
-docker-pycreds:0.4.0
+docker==4.2.0
+apache-airflow==1.10.9
+docker-pycreds==0.4.0


[incubator-liminal] 23/43: Update README

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 93d9fa2b60652af679add26c4404153d9e3086a7
Author: aviemzur <av...@gmail.com>
AuthorDate: Mon Mar 23 10:09:53 2020 +0200

    Update README
---
 README.md | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index d8b9a23..257bb9a 100644
--- a/README.md
+++ b/README.md
@@ -1,5 +1,11 @@
-# rainbow
+# Rainbow
 
-```
-ln -s "/Applications/Docker.app/Contents//Resources/bin/docker-credential-desktop" "/usr/local/bin/docker-credential-desktop"
-```
\ No newline at end of file
+Rainbow is an end-to-end platform for data engineers & scientists, allowing them to build,
+train and deploy machine learning models in a robust and agile way.
+
+The platform provides the abstractions and declarative capabilities for
+data extraction & feature engineering followed by model training and serving.
+Rainbow's goal is to operationalize the machine learning process, allowing data scientists to
+quickly transition from a successful experiment to an automated pipeline of model training,
+validation, deployment and inference in production, freeing them from engineering and
+non-functional tasks, and allowing them to focus on machine learning code and artifacts.


[incubator-liminal] 22/43: Add job_start and job_end tasks

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 75965668ccf2e75b2f261ce0ca2d5d26ce117852
Author: zionrubin <zi...@naturalint.com>
AuthorDate: Sun Mar 22 10:47:52 2020 +0200

    Add job_start and job_end tasks
---
 rainbow/runners/airflow/dag/rainbow_dags.py        | 21 ++++--
 .../{ => airflow/tasks/defaults}/__init__.py       |  0
 .../tasks/{job_end.py => defaults/default_task.py} | 14 +++-
 .../airflow/tasks/{ => defaults}/job_end.py        | 21 ++++--
 .../tasks/{job_end.py => defaults/job_end.py~HEAD} | 21 ++++--
 .../airflow/tasks/{ => defaults}/job_start.py      | 20 ++++--
 .../{job_start.py => defaults/job_start.py~HEAD}   | 20 ++++--
 tests/runners/airflow/dag/test_rainbow_dags.py     | 31 +++++++--
 tests/runners/airflow/rainbow/rainbow.yml          |  8 ++-
 .../runners/airflow/tasks/defaults}/__init__.py    |  0
 .../runners/airflow/tasks/defaults/test_job_end.py | 77 ++++++++++++++++++++++
 .../airflow/tasks/defaults/test_job_start.py       | 77 ++++++++++++++++++++++
 12 files changed, 276 insertions(+), 34 deletions(-)

diff --git a/rainbow/runners/airflow/dag/rainbow_dags.py b/rainbow/runners/airflow/dag/rainbow_dags.py
index 15b7d9a..71d18d2 100644
--- a/rainbow/runners/airflow/dag/rainbow_dags.py
+++ b/rainbow/runners/airflow/dag/rainbow_dags.py
@@ -22,8 +22,11 @@ import yaml
 from airflow import DAG
 from airflow.models import Variable
 
-from rainbow.core.util import files_util, class_util
+from rainbow.core.util import class_util
+from rainbow.core.util import files_util
 from rainbow.runners.airflow.model.task import Task
+from rainbow.runners.airflow.tasks.defaults.job_end import JobEndTask
+from rainbow.runners.airflow.tasks.defaults.job_start import JobStartTask
 
 
 def register_dags(configs_path):
@@ -42,8 +45,6 @@ def register_dags(configs_path):
             config = yaml.safe_load(stream)
 
             for pipeline in config['pipelines']:
-                parent = None
-
                 pipeline_name = pipeline['pipeline']
 
                 default_args = {
@@ -58,6 +59,9 @@ def register_dags(configs_path):
                     catchup=False
                 )
 
+                job_start_task = JobStartTask(dag, pipeline_name, None, pipeline, 'all_success')
+                parent = job_start_task.apply_task_to_dag()
+
                 trigger_rule = 'all_success'
                 if 'always_run' in config and config['always_run']:
                     trigger_rule = 'all_done'
@@ -70,12 +74,15 @@ def register_dags(configs_path):
 
                     parent = task_instance.apply_task_to_dag()
 
-                print(f'{pipeline_name}: {dag.tasks}')
+                    job_end_task = JobEndTask(dag, pipeline_name, parent, pipeline, 'all_done')
+                    job_end_task.apply_task_to_dag()
+
+                    print(f'{pipeline_name}: {dag.tasks}')
 
-                globals()[pipeline_name] = dag
+                    globals()[pipeline_name] = dag
 
-                dags.append(dag)
-    return dags
+                    dags.append(dag)
+                    return dags
 
 
 print(f'Loading task implementations..')
diff --git a/rainbow/runners/__init__.py b/rainbow/runners/airflow/tasks/defaults/__init__.py
similarity index 100%
copy from rainbow/runners/__init__.py
copy to rainbow/runners/airflow/tasks/defaults/__init__.py
diff --git a/rainbow/runners/airflow/tasks/job_end.py b/rainbow/runners/airflow/tasks/defaults/default_task.py
similarity index 74%
copy from rainbow/runners/airflow/tasks/job_end.py
copy to rainbow/runners/airflow/tasks/defaults/default_task.py
index 42b5e7f..0e901fc 100644
--- a/rainbow/runners/airflow/tasks/job_end.py
+++ b/rainbow/runners/airflow/tasks/defaults/default_task.py
@@ -15,17 +15,25 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+"""
+Default base task.
+"""
+from abc import abstractmethod
 
-from rainbow.runners.airflow.model import task
+from rainbow.runners.airflow.model.task import Task
 
 
-class JobEndTask(task.Task):
+class DefaultTask(Task):
     """
-    Job end task. Reports job end metrics.
+    Default Base task.
     """
 
     def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
         super().__init__(dag, pipeline_name, parent, config, trigger_rule)
+        metrics = self.config.get('metrics', {})
+        self.metrics_namespace = metrics.get('namespace', '')
+        self.metrics_backends = metrics.get('backends', [])
 
+    @abstractmethod
     def apply_task_to_dag(self):
         pass
diff --git a/rainbow/runners/airflow/tasks/job_end.py b/rainbow/runners/airflow/tasks/defaults/job_end.py
similarity index 61%
copy from rainbow/runners/airflow/tasks/job_end.py
copy to rainbow/runners/airflow/tasks/defaults/job_end.py
index 42b5e7f..e177ccc 100644
--- a/rainbow/runners/airflow/tasks/job_end.py
+++ b/rainbow/runners/airflow/tasks/defaults/job_end.py
@@ -16,16 +16,29 @@
 # specific language governing permissions and limitations
 # under the License.
 
-from rainbow.runners.airflow.model import task
+from rainbow.runners.airflow.operators.job_status_operator import JobEndOperator
+from rainbow.runners.airflow.tasks.defaults.default_task import DefaultTask
 
 
-class JobEndTask(task.Task):
+class JobEndTask(DefaultTask):
     """
-    Job end task. Reports job end metrics.
+      Job end task. Reports job end metrics.
     """
 
     def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
         super().__init__(dag, pipeline_name, parent, config, trigger_rule)
 
     def apply_task_to_dag(self):
-        pass
+        job_end_task = JobEndOperator(
+            task_id='end',
+            namespace=self.metrics_namespace,
+            application_name=self.pipeline_name,
+            backends=self.metrics_backends,
+            dag=self.dag,
+            trigger_rule=self.trigger_rule
+        )
+
+        if self.parent:
+            self.parent.set_downstream(job_end_task)
+
+        return job_end_task
diff --git a/rainbow/runners/airflow/tasks/job_end.py b/rainbow/runners/airflow/tasks/defaults/job_end.py~HEAD
similarity index 61%
rename from rainbow/runners/airflow/tasks/job_end.py
rename to rainbow/runners/airflow/tasks/defaults/job_end.py~HEAD
index 42b5e7f..e177ccc 100644
--- a/rainbow/runners/airflow/tasks/job_end.py
+++ b/rainbow/runners/airflow/tasks/defaults/job_end.py~HEAD
@@ -16,16 +16,29 @@
 # specific language governing permissions and limitations
 # under the License.
 
-from rainbow.runners.airflow.model import task
+from rainbow.runners.airflow.operators.job_status_operator import JobEndOperator
+from rainbow.runners.airflow.tasks.defaults.default_task import DefaultTask
 
 
-class JobEndTask(task.Task):
+class JobEndTask(DefaultTask):
     """
-    Job end task. Reports job end metrics.
+      Job end task. Reports job end metrics.
     """
 
     def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
         super().__init__(dag, pipeline_name, parent, config, trigger_rule)
 
     def apply_task_to_dag(self):
-        pass
+        job_end_task = JobEndOperator(
+            task_id='end',
+            namespace=self.metrics_namespace,
+            application_name=self.pipeline_name,
+            backends=self.metrics_backends,
+            dag=self.dag,
+            trigger_rule=self.trigger_rule
+        )
+
+        if self.parent:
+            self.parent.set_downstream(job_end_task)
+
+        return job_end_task
diff --git a/rainbow/runners/airflow/tasks/job_start.py b/rainbow/runners/airflow/tasks/defaults/job_start.py
similarity index 63%
copy from rainbow/runners/airflow/tasks/job_start.py
copy to rainbow/runners/airflow/tasks/defaults/job_start.py
index 64a2f4a..e196919 100644
--- a/rainbow/runners/airflow/tasks/job_start.py
+++ b/rainbow/runners/airflow/tasks/defaults/job_start.py
@@ -15,11 +15,11 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+from rainbow.runners.airflow.operators.job_status_operator import JobStartOperator
+from rainbow.runners.airflow.tasks.defaults.default_task import DefaultTask
 
-from rainbow.runners.airflow.model import task
 
-
-class JobStartTask(task.Task):
+class JobStartTask(DefaultTask):
     """
     Job start task. Reports job start metrics.
     """
@@ -28,4 +28,16 @@ class JobStartTask(task.Task):
         super().__init__(dag, pipeline_name, parent, config, trigger_rule)
 
     def apply_task_to_dag(self):
-        pass
+        job_start_task = JobStartOperator(
+            task_id='start',
+            namespace=self.metrics_namespace,
+            application_name=self.pipeline_name,
+            backends=self.metrics_backends,
+            dag=self.dag,
+            trigger_rule=self.trigger_rule
+        )
+
+        if self.parent:
+            self.parent.set_downstream(job_start_task)
+
+        return job_start_task
diff --git a/rainbow/runners/airflow/tasks/job_start.py b/rainbow/runners/airflow/tasks/defaults/job_start.py~HEAD
similarity index 63%
rename from rainbow/runners/airflow/tasks/job_start.py
rename to rainbow/runners/airflow/tasks/defaults/job_start.py~HEAD
index 64a2f4a..e196919 100644
--- a/rainbow/runners/airflow/tasks/job_start.py
+++ b/rainbow/runners/airflow/tasks/defaults/job_start.py~HEAD
@@ -15,11 +15,11 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+from rainbow.runners.airflow.operators.job_status_operator import JobStartOperator
+from rainbow.runners.airflow.tasks.defaults.default_task import DefaultTask
 
-from rainbow.runners.airflow.model import task
 
-
-class JobStartTask(task.Task):
+class JobStartTask(DefaultTask):
     """
     Job start task. Reports job start metrics.
     """
@@ -28,4 +28,16 @@ class JobStartTask(task.Task):
         super().__init__(dag, pipeline_name, parent, config, trigger_rule)
 
     def apply_task_to_dag(self):
-        pass
+        job_start_task = JobStartOperator(
+            task_id='start',
+            namespace=self.metrics_namespace,
+            application_name=self.pipeline_name,
+            backends=self.metrics_backends,
+            dag=self.dag,
+            trigger_rule=self.trigger_rule
+        )
+
+        if self.parent:
+            self.parent.set_downstream(job_start_task)
+
+        return job_start_task
diff --git a/tests/runners/airflow/dag/test_rainbow_dags.py b/tests/runners/airflow/dag/test_rainbow_dags.py
index c8f2e38..d8c1afc 100644
--- a/tests/runners/airflow/dag/test_rainbow_dags.py
+++ b/tests/runners/airflow/dag/test_rainbow_dags.py
@@ -1,17 +1,38 @@
 import os
+import unittest
 from unittest import TestCase
 
 from rainbow.runners.airflow.dag import rainbow_dags
-import unittest
+from rainbow.runners.airflow.operators.job_status_operator import JobEndOperator, JobStartOperator
 
 
 class Test(TestCase):
     def test_register_dags(self):
-        base_path = os.path.join(os.path.dirname(__file__), '../rainbow')
-        dags = rainbow_dags.register_dags(base_path)
+        dags = self.get_register_dags()
+
         self.assertEqual(len(dags), 1)
-        # TODO: elaborate test
-        pass
+
+        test_pipeline = dags[0]
+        self.assertEqual(test_pipeline.dag_id, 'my_pipeline')
+
+    def test_default_start_task(self):
+        dags = self.get_register_dags()
+
+        task_dict = dags[0].task_dict
+
+        self.assertIsInstance(task_dict['start'], JobStartOperator)
+
+    def test_default_end_task(self):
+        dags = self.get_register_dags()
+
+        task_dict = dags[0].task_dict
+
+        self.assertIsInstance(task_dict['end'], JobEndOperator)
+
+    @staticmethod
+    def get_register_dags():
+        base_path = os.path.join(os.path.dirname(__file__), '../rainbow')
+        return rainbow_dags.register_dags(base_path)
 
 
 if __name__ == '__main__':
diff --git a/tests/runners/airflow/rainbow/rainbow.yml b/tests/runners/airflow/rainbow/rainbow.yml
index e9f9045..27507fd 100644
--- a/tests/runners/airflow/rainbow/rainbow.yml
+++ b/tests/runners/airflow/rainbow/rainbow.yml
@@ -23,7 +23,9 @@ pipelines:
     start_date: 1970-01-01
     timeout-minutes: 45
     schedule: 0 * 1 * *
-    metrics-namespace: TestNamespace
+    metrics:
+     namespace: TestNamespace
+     backends: [ 'cloudwatch' ]
     tasks:
       - task: my_static_input_task
         type: python
@@ -36,7 +38,7 @@ pipelines:
         input_type: static
         input_path: '[ { "foo": "bar" }, { "foo": "baz" } ]'
         output_path: /output.json
-        cmd: python -u helloworld.py
+        cmd: python -u hello_world.py
 #      - task: my_parallelized_static_input_task
 #        type: python
 #        description: parallelized static input task
@@ -59,7 +61,7 @@ pipelines:
           env2: "b"
         input_type: task
         input_path: my_static_input_task
-        cmd: python -u helloworld.py
+        cmd: python -u hello_world.py
 services:
   - service:
     name: my_python_server
diff --git a/rainbow/runners/__init__.py b/tests/runners/airflow/tasks/defaults/__init__.py
similarity index 100%
rename from rainbow/runners/__init__.py
rename to tests/runners/airflow/tasks/defaults/__init__.py
diff --git a/tests/runners/airflow/tasks/defaults/test_job_end.py b/tests/runners/airflow/tasks/defaults/test_job_end.py
new file mode 100644
index 0000000..9a2c398
--- /dev/null
+++ b/tests/runners/airflow/tasks/defaults/test_job_end.py
@@ -0,0 +1,77 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import unittest
+from unittest import TestCase
+
+from rainbow.runners.airflow.tasks.defaults import job_end
+from tests.util import dag_test_utils
+
+
+class TestJobEndTask(TestCase):
+
+    def test_apply_task_to_dag(self):
+        conf = {
+            'pipeline': 'my_pipeline',
+            'metrics': {'namespace': 'EndJobNameSpace', 'backends': ['cloudwatch']},
+        }
+
+        dag = dag_test_utils.create_dag()
+
+        task0 = job_end.JobEndTask(dag, 'my_end_pipeline', None, conf, 'all_done')
+        task0.apply_task_to_dag()
+
+        self.assertEqual(len(dag.tasks), 1)
+        dag_task0 = dag.tasks[0]
+
+        self.assertEqual(dag_task0.namespace, 'EndJobNameSpace')
+        self.assertEqual(dag_task0.backends, ['cloudwatch'])
+
+        self.assertEqual(dag_task0.task_id, 'end')
+
+    def test_apply_task_to_dag_missing_metrics(self):
+        conf = {'pipeline': 'my_pipeline'}
+        dag = dag_test_utils.create_dag()
+
+        task0 = job_end.JobEndTask(dag, 'my_end_pipeline', None, conf, 'all_done')
+        task0.apply_task_to_dag()
+
+        self.assertEqual(len(dag.tasks), 1)
+        dag_task0 = dag.tasks[0]
+
+        self.assertEqual(dag_task0.namespace, '')
+        self.assertEqual(dag_task0.backends, [])
+        self.assertEqual(dag_task0.trigger_rule, 'all_done')
+
+    def test_apply_task_to_dag_with_partial_configuration(self):
+        conf = {'pipeline': 'my_pipeline', 'metrics': {'namespace': 'EndJobNameSpace'}}
+        dag = dag_test_utils.create_dag()
+
+        task0 = job_end.JobEndTask(dag, 'my_end_pipeline', None, conf, 'all_done')
+        task0.apply_task_to_dag()
+
+        self.assertEqual(len(dag.tasks), 1)
+        dag_task0 = dag.tasks[0]
+
+        self.assertEqual(dag_task0.namespace, 'EndJobNameSpace')
+        self.assertEqual(dag_task0.backends, [])
+        self.assertEqual(dag_task0.trigger_rule, 'all_done')
+
+
+if __name__ == '__main__':
+    unittest.main()
diff --git a/tests/runners/airflow/tasks/defaults/test_job_start.py b/tests/runners/airflow/tasks/defaults/test_job_start.py
new file mode 100644
index 0000000..d07cf4b
--- /dev/null
+++ b/tests/runners/airflow/tasks/defaults/test_job_start.py
@@ -0,0 +1,77 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import unittest
+from unittest import TestCase
+
+from rainbow.runners.airflow.tasks.defaults import job_end, job_start
+from tests.util import dag_test_utils
+
+
+class TestJobStartTask(TestCase):
+
+    def test_apply_task_to_dag(self):
+        conf = {
+            'pipeline': 'my_pipeline',
+            'metrics': {'namespace': 'StartJobNameSpace', 'backends': ['cloudwatch']},
+        }
+
+        dag = dag_test_utils.create_dag()
+
+        task0 = job_start.JobStartTask(dag, 'my_start_pipeline', None, conf, 'all_success')
+        task0.apply_task_to_dag()
+
+        self.assertEqual(len(dag.tasks), 1)
+        dag_task0 = dag.tasks[0]
+
+        self.assertEqual(dag_task0.namespace, 'StartJobNameSpace')
+        self.assertEqual(dag_task0.backends, ['cloudwatch'])
+
+        self.assertEqual(dag_task0.task_id, 'start')
+
+    def test_apply_task_to_dag_missing_metrics(self):
+        conf = {'pipeline': 'my_pipeline'}
+
+        dag = dag_test_utils.create_dag()
+
+        task0 = job_start.JobStartTask(dag, 'my_end_pipeline', None, conf, 'all_success')
+        task0.apply_task_to_dag()
+
+        self.assertEqual(len(dag.tasks), 1)
+        dag_task0 = dag.tasks[0]
+
+        self.assertEqual(dag_task0.namespace, '')
+        self.assertEqual(dag_task0.backends, [])
+        self.assertEqual(dag_task0.trigger_rule, 'all_success')
+
+    def test_apply_task_to_dag_with_partial_configuration(self):
+        conf = {'pipeline': 'my_pipeline', 'metrics': {'namespace': 'StartJobNameSpace'}}
+        dag = dag_test_utils.create_dag()
+
+        task0 = job_start.JobStartTask(dag, 'my_start_pipeline', None, conf, 'all_success')
+        task0.apply_task_to_dag()
+
+        self.assertEqual(len(dag.tasks), 1)
+        dag_task0 = dag.tasks[0]
+
+        self.assertEqual(dag_task0.namespace, 'StartJobNameSpace')
+        self.assertEqual(dag_task0.backends, [])
+
+
+if __name__ == '__main__':
+    unittest.main()


[incubator-liminal] 07/43: Add LICENSE

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 79c903171e6c2e95ef6c290ae0b29f69e4f3c97f
Author: aviemzur <av...@gmail.com>
AuthorDate: Wed Mar 11 14:15:30 2020 +0200

    Add LICENSE
---
 LICENSE | 18 +++---------------
 1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/LICENSE b/LICENSE
index 8f1552e..d9f1716 100644
--- a/LICENSE
+++ b/LICENSE
@@ -201,13 +201,14 @@ See the License for the specific language governing permissions and
 limitations under the License.
 
 ============================================================================
-   APACHE AIRFLOW SUBCOMPONENTS:
+   APACHE RAINBOW SUBCOMPONENTS:
 
-   The Apache Airflow project contains subcomponents with separate copyright
+   The Apache Rainbow project contains subcomponents with separate copyright
    notices and license terms. Your use of the source code for the these
    subcomponents is subject to the terms and conditions of the following
    licenses.
 
+TODO: list all dependencies
 
 ========================================================================
 Third party Apache 2.0 licenses
@@ -218,9 +219,6 @@ See project link for details. The text of each license is also included
 at licenses/LICENSE-[project].txt.
 
     (ALv2 License) hue v4.3.0 (https://github.com/cloudera/hue/)
-    (ALv2 License) jqclock v2.3.0 (https://github.com/JohnRDOrazio/jQuery-Clock-Plugin)
-    (ALv2 License) bootstrap3-typeahead v4.0.2 (https://github.com/bassjobsen/Bootstrap-3-Typeahead)
-    (ALv2 License) airflow.contrib.auth.backends.github_enterprise_auth
 
 ========================================================================
 MIT licenses
@@ -230,16 +228,6 @@ The following components are provided under the MIT License. See project link fo
 The text of each license is also included at licenses/LICENSE-[project].txt.
 
     (MIT License) jquery v3.4.1 (https://jquery.org/license/)
-    (MIT License) dagre-d3 v0.6.4 (https://github.com/cpettitt/dagre-d3)
-    (MIT License) bootstrap v3.2 (https://github.com/twbs/bootstrap/)
-    (MIT License) d3-tip v0.9.1 (https://github.com/Caged/d3-tip)
-    (MIT License) dataTables v1.10.20 (https://datatables.net)
-    (MIT License) Bootstrap Toggle v2.2.2 (http://www.bootstraptoggle.com)
-    (MIT License) normalize.css v3.0.2 (http://necolas.github.io/normalize.css/)
-    (MIT License) ElasticMock v1.3.2 (https://github.com/vrcmarcos/elasticmock)
-    (MIT License) MomentJS v2.24.0 (http://momentjs.com/)
-    (MIT License) python-slugify v2.0.1 (https://github.com/un33k/python-slugify)
-    (MIT License) python-nvd3 v0.15.0 (https://github.com/areski/python-nvd3)
 
 ========================================================================
 BSD 3-Clause licenses


[incubator-liminal] 18/43: Class registries

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 38152de3915fff4cc7c647fe2dcabf32ba38331f
Author: aviemzur <av...@gmail.com>
AuthorDate: Mon Mar 16 17:13:09 2020 +0200

    Class registries
---
 rainbow/build/build_rainbows.py                    | 61 +++++++++++++++-------
 rainbow/build/{http/python => image}/__init__.py   |  0
 rainbow/build/{ => image}/image_builder.py         |  4 ++
 rainbow/build/{ => image}/python/Dockerfile        |  0
 rainbow/build/{http => image/python}/__init__.py   |  0
 .../build/{ => image}/python/container-setup.sh    |  0
 .../build/{ => image}/python/container-teardown.sh |  0
 .../python_image.py => image/python/python.py}     |  2 +-
 rainbow/build/python/__init__.py                   |  0
 rainbow/build/{http/python => service}/__init__.py |  0
 .../python => service/python_server}/Dockerfile    |  0
 .../python => service/python_server}/__init__.py   |  0
 .../python_server/python_server.py}                |  5 +-
 .../python_server}/python_server_requirements.txt  |  0
 .../python_server}/rainbow_python_server.py        |  0
 .../service_image_builder.py}                      |  1 +
 rainbow/core/util/class_util.py                    | 58 ++++++++++++++++++++
 rainbow/runners/airflow/dag/rainbow_dags.py        | 20 ++++---
 ...mage.py => test_python_server_image_builder.py} |  2 +-
 ...ython_image.py => test_python_image_builder.py} |  4 +-
 20 files changed, 124 insertions(+), 33 deletions(-)

diff --git a/rainbow/build/build_rainbows.py b/rainbow/build/build_rainbows.py
index c4a14c7..fa3a922 100644
--- a/rainbow/build/build_rainbows.py
+++ b/rainbow/build/build_rainbows.py
@@ -20,9 +20,8 @@ import os
 
 import yaml
 
-from rainbow.build.http.python.python_server_image import PythonServerImageBuilder
-from rainbow.build.python.python_image import PythonImageBuilder
-from rainbow.core.util import files_util
+from rainbow.build.image.image_builder import ImageBuilder, ServiceImageBuilderMixin
+from rainbow.core.util import files_util, class_util
 
 
 def build_rainbows(path):
@@ -41,38 +40,62 @@ def build_rainbows(path):
 
             for pipeline in rainbow_config['pipelines']:
                 for task in pipeline['tasks']:
-                    builder_class = __get_task_build_class(task['type'])
-                    __build_image(base_path, task, builder_class)
+                    task_type = task['type']
+                    builder_class = __get_task_build_class(task_type)
+                    if builder_class:
+                        __build_image(base_path, task, builder_class)
+                    else:
+                        raise ValueError(f'No such task type: {task_type}')
 
                 for service in rainbow_config['services']:
-                    builder_class = __get_service_build_class(service['type'])
-                    __build_image(base_path, service, builder_class)
+                    service_type = service['type']
+                    builder_class = __get_service_build_class(service_type)
+                    if builder_class:
+                        __build_image(base_path, service, builder_class)
+                    else:
+                        raise ValueError(f'No such service type: {service_type}')
 
 
 def __build_image(base_path, builder_config, builder):
     if 'source' in builder_config:
-        server_builder_instance = builder(
+        builder_instance = builder(
             config=builder_config,
             base_path=base_path,
             relative_source_path=builder_config['source'],
             tag=builder_config['image'])
-        server_builder_instance.build()
+        builder_instance.build()
     else:
         print(f"No source provided for {builder_config['name']}, skipping.")
 
 
-__task_build_classes = {
-    'python': PythonImageBuilder,
-}
+def __get_task_build_class(task_type):
+    return task_build_classes[task_type] if task_type in task_build_classes else None
 
-__service_build_classes = {
-    'python_server': PythonServerImageBuilder
-}
 
+def __get_service_build_class(service_type):
+    return service_build_classes[service_type] if service_type in service_build_classes else None
 
-def __get_task_build_class(task_type):
-    return __task_build_classes[task_type]
 
+print(f'Loading image builder implementations..')
+
+# TODO: add configuration for user image builders package
+image_builders_package = 'rainbow/build/image'
+user_image_builders_package = 'TODO: user_image_builders_package'
+
+task_build_classes = class_util.find_subclasses_in_packages(
+    [image_builders_package, user_image_builders_package],
+    ImageBuilder)
+
+print(f'Finished loading image builder implementations: {task_build_classes}')
+
+print(f'Loading service image builder implementations..')
+
+# TODO: add configuration for user service image builders package
+service_builders_package = 'rainbow/build/service'
+user_service_builders_package = 'TODO: user_service_builders_package'
+
+service_build_classes = class_util.find_subclasses_in_packages(
+    [service_builders_package, user_service_builders_package],
+    ServiceImageBuilderMixin)
 
-def __get_service_build_class(task_type):
-    return __service_build_classes[task_type]
+print(f'Finished loading service image builder implementations: {service_build_classes}')
diff --git a/rainbow/build/http/python/__init__.py b/rainbow/build/image/__init__.py
similarity index 100%
copy from rainbow/build/http/python/__init__.py
copy to rainbow/build/image/__init__.py
diff --git a/rainbow/build/image_builder.py b/rainbow/build/image/image_builder.py
similarity index 98%
rename from rainbow/build/image_builder.py
rename to rainbow/build/image/image_builder.py
index b54dc00..e716b9d 100644
--- a/rainbow/build/image_builder.py
+++ b/rainbow/build/image/image_builder.py
@@ -116,3 +116,7 @@ class ImageBuilder:
         File name and content pairs to create files from
         """
         return []
+
+
+class ServiceImageBuilderMixin(object):
+    pass
diff --git a/rainbow/build/python/Dockerfile b/rainbow/build/image/python/Dockerfile
similarity index 100%
rename from rainbow/build/python/Dockerfile
rename to rainbow/build/image/python/Dockerfile
diff --git a/rainbow/build/http/__init__.py b/rainbow/build/image/python/__init__.py
similarity index 100%
copy from rainbow/build/http/__init__.py
copy to rainbow/build/image/python/__init__.py
diff --git a/rainbow/build/python/container-setup.sh b/rainbow/build/image/python/container-setup.sh
similarity index 100%
rename from rainbow/build/python/container-setup.sh
rename to rainbow/build/image/python/container-setup.sh
diff --git a/rainbow/build/python/container-teardown.sh b/rainbow/build/image/python/container-teardown.sh
similarity index 100%
rename from rainbow/build/python/container-teardown.sh
rename to rainbow/build/image/python/container-teardown.sh
diff --git a/rainbow/build/python/python_image.py b/rainbow/build/image/python/python.py
similarity index 95%
rename from rainbow/build/python/python_image.py
rename to rainbow/build/image/python/python.py
index d856b8c..f4fb03b 100644
--- a/rainbow/build/python/python_image.py
+++ b/rainbow/build/image/python/python.py
@@ -18,7 +18,7 @@
 
 import os
 
-from rainbow.build.image_builder import ImageBuilder
+from rainbow.build.image.image_builder import ImageBuilder
 
 
 class PythonImageBuilder(ImageBuilder):
diff --git a/rainbow/build/python/__init__.py b/rainbow/build/python/__init__.py
deleted file mode 100644
index e69de29..0000000
diff --git a/rainbow/build/http/python/__init__.py b/rainbow/build/service/__init__.py
similarity index 100%
copy from rainbow/build/http/python/__init__.py
copy to rainbow/build/service/__init__.py
diff --git a/rainbow/build/http/python/Dockerfile b/rainbow/build/service/python_server/Dockerfile
similarity index 100%
rename from rainbow/build/http/python/Dockerfile
rename to rainbow/build/service/python_server/Dockerfile
diff --git a/rainbow/build/http/python/__init__.py b/rainbow/build/service/python_server/__init__.py
similarity index 100%
rename from rainbow/build/http/python/__init__.py
rename to rainbow/build/service/python_server/__init__.py
diff --git a/rainbow/build/http/python/python_server_image.py b/rainbow/build/service/python_server/python_server.py
similarity index 90%
rename from rainbow/build/http/python/python_server_image.py
rename to rainbow/build/service/python_server/python_server.py
index 9a65477..3404abf 100644
--- a/rainbow/build/http/python/python_server_image.py
+++ b/rainbow/build/service/python_server/python_server.py
@@ -18,11 +18,12 @@
 
 import os
 
-from rainbow.build.image_builder import ImageBuilder
 import yaml
 
+from rainbow.build.image.image_builder import ImageBuilder, ServiceImageBuilderMixin
 
-class PythonServerImageBuilder(ImageBuilder):
+
+class PythonServerImageBuilder(ImageBuilder, ServiceImageBuilderMixin):
 
     def __init__(self, config, base_path, relative_source_path, tag):
         super().__init__(config, base_path, relative_source_path, tag)
diff --git a/rainbow/build/http/python/python_server_requirements.txt b/rainbow/build/service/python_server/python_server_requirements.txt
similarity index 100%
rename from rainbow/build/http/python/python_server_requirements.txt
rename to rainbow/build/service/python_server/python_server_requirements.txt
diff --git a/rainbow/build/http/python/rainbow_python_server.py b/rainbow/build/service/python_server/rainbow_python_server.py
similarity index 100%
rename from rainbow/build/http/python/rainbow_python_server.py
rename to rainbow/build/service/python_server/rainbow_python_server.py
diff --git a/rainbow/build/http/__init__.py b/rainbow/build/service/service_image_builder.py
similarity index 99%
rename from rainbow/build/http/__init__.py
rename to rainbow/build/service/service_image_builder.py
index 217e5db..3742bcc 100644
--- a/rainbow/build/http/__init__.py
+++ b/rainbow/build/service/service_image_builder.py
@@ -15,3 +15,4 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+
diff --git a/rainbow/core/util/class_util.py b/rainbow/core/util/class_util.py
new file mode 100644
index 0000000..59b8543
--- /dev/null
+++ b/rainbow/core/util/class_util.py
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import importlib.util
+import inspect
+import os
+import sys
+
+
+def find_subclasses_in_packages(packages, parent_class):
+    """
+    Finds all subclasses of given parent class within given packages
+    :return: map of module ref -> class
+    """
+    classes = {}
+
+    for package in [a for a in sys.path]:
+        for root, directories, files in os.walk(package):
+            for file in files:
+                file_path = os.path.join(root, file)
+                if any(p in file_path for p in packages) \
+                        and file.endswith('.py') \
+                        and '__pycache__' not in file_path:
+                    spec = importlib.util.spec_from_file_location(file[:-3], file_path)
+                    mod = importlib.util.module_from_spec(spec)
+                    spec.loader.exec_module(mod)
+                    for name, obj in inspect.getmembers(mod):
+                        if inspect.isclass(obj) and not obj.__name__.endswith('Mixin'):
+                            module_name = mod.__name__
+                            class_name = obj.__name__
+                            module = root[len(package) + 1:].replace('/', '.') + '.' + module_name
+                            clazz = __get_class(module, class_name)
+                            if issubclass(clazz, parent_class):
+                                classes.update({module_name: clazz})
+
+    return classes
+
+
+def __get_class(the_module, the_class):
+    m = __import__(the_module)
+    for comp in the_module.split('.')[1:] + [the_class]:
+        m = getattr(m, comp)
+    return m
diff --git a/rainbow/runners/airflow/dag/rainbow_dags.py b/rainbow/runners/airflow/dag/rainbow_dags.py
index 92b6d64..15b7d9a 100644
--- a/rainbow/runners/airflow/dag/rainbow_dags.py
+++ b/rainbow/runners/airflow/dag/rainbow_dags.py
@@ -22,8 +22,8 @@ import yaml
 from airflow import DAG
 from airflow.models import Variable
 
-from rainbow.core.util import files_util
-from rainbow.runners.airflow.tasks.python import PythonTask
+from rainbow.core.util import files_util, class_util
+from rainbow.runners.airflow.model.task import Task
 
 
 def register_dags(configs_path):
@@ -78,15 +78,19 @@ def register_dags(configs_path):
     return dags
 
 
-task_classes = {
-    'python': PythonTask
-}
+print(f'Loading task implementations..')
+
+# TODO: add configuration for user tasks package
+task_package = 'rainbow/runners/airflow/tasks'
+user_task_package = 'TODO: user_tasks_package'
+
+task_classes = class_util.find_subclasses_in_packages([task_package, user_task_package], Task)
+
+print(f'Finished loading task implementations: {task_classes}')
 
 
 def get_task_class(task_type):
     return task_classes[task_type]
 
 
-# TODO: configurable path
-path = Variable.get('rainbows_dir')
-register_dags(path)
+register_dags(Variable.get('rainbows_dir'))
diff --git a/tests/runners/airflow/build/http/python/test_python_server_image.py b/tests/runners/airflow/build/http/python/test_python_server_image_builder.py
similarity index 97%
rename from tests/runners/airflow/build/http/python/test_python_server_image.py
rename to tests/runners/airflow/build/http/python/test_python_server_image_builder.py
index fd38c80..3423976 100644
--- a/tests/runners/airflow/build/http/python/test_python_server_image.py
+++ b/tests/runners/airflow/build/http/python/test_python_server_image_builder.py
@@ -24,7 +24,7 @@ from unittest import TestCase
 
 import docker
 
-from rainbow.build.http.python.python_server_image import PythonServerImageBuilder
+from rainbow.build.service.python_server.python_server import PythonServerImageBuilder
 
 
 class TestPythonServer(TestCase):
diff --git a/tests/runners/airflow/build/python/test_python_image.py b/tests/runners/airflow/build/python/test_python_image_builder.py
similarity index 95%
rename from tests/runners/airflow/build/python/test_python_image.py
rename to tests/runners/airflow/build/python/test_python_image_builder.py
index ff4555d..c8328da 100644
--- a/tests/runners/airflow/build/python/test_python_image.py
+++ b/tests/runners/airflow/build/python/test_python_image_builder.py
@@ -19,10 +19,10 @@ from unittest import TestCase
 
 import docker
 
-from rainbow.build.python.python_image import PythonImageBuilder
+from rainbow.build.image.python.python import PythonImageBuilder
 
 
-class TestPythonImage(TestCase):
+class TestPythonImageBuilder(TestCase):
 
     def test_build(self):
         config = self.__create_conf('my_task')


[incubator-liminal] 28/43: Missing requirements

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit e1767df0d3d0a00b1fcf166d7d3b61a5cf635269
Author: aviemzur <av...@gmail.com>
AuthorDate: Thu Apr 2 14:48:42 2020 +0300

    Missing requirements
---
 requirements.txt | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/requirements.txt b/requirements.txt
index dd1e232..d7eec03 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,7 +1,10 @@
-botocore==1.15.21
 docker==4.2.0
 apache-airflow==1.10.9
 docker-pycreds==0.4.0
 click==7.1.1
-Flask=1.1.1
-pyyaml
\ No newline at end of file
+Flask==1.1.1
+pyyaml
+statsd
+botocore
+boto3
+kubernetes


[incubator-liminal] 04/43: Tasks stubs

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 9e646df8d50cd5a9b9b2d966d79ce08df71efebb
Author: aviemzur <av...@gmail.com>
AuthorDate: Tue Mar 10 12:16:41 2020 +0200

    Tasks stubs
---
 rainbow/runners/airflow/dag/rainbow_dags.py        |  4 +-
 rainbow/runners/airflow/model/task.py              |  7 ++
 .../create_cloudformation_stack.py}                | 22 +++---
 .../delete_cloudformation_stack.py}                | 22 +++---
 .../airflow/{model/task.py => tasks/job_end.py}    | 22 +++---
 .../airflow/{model/task.py => tasks/job_start.py}  | 23 +++---
 rainbow/runners/airflow/tasks/python.py            | 81 ++++++++++------------
 .../airflow/{model/task.py => tasks/spark.py}      | 22 +++---
 .../airflow/{model/task.py => tasks/sql.py}        | 22 +++---
 rainbow/sql/__init__.py                            |  1 +
 10 files changed, 101 insertions(+), 125 deletions(-)

diff --git a/rainbow/runners/airflow/dag/rainbow_dags.py b/rainbow/runners/airflow/dag/rainbow_dags.py
index 577da07..6bdf66b 100644
--- a/rainbow/runners/airflow/dag/rainbow_dags.py
+++ b/rainbow/runners/airflow/dag/rainbow_dags.py
@@ -15,17 +15,15 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-# TODO: Iterate over each pipeline and create a DAG for it. \
-#  Within every pipeline iterate over tasks and apply them to DAG.
 
 import os
 import pprint
+from datetime import datetime
 
 import yaml
 from airflow import DAG
 
 from rainbow.runners.airflow.tasks.python import PythonTask
-from datetime import datetime
 
 
 def register_dags(path):
diff --git a/rainbow/runners/airflow/model/task.py b/rainbow/runners/airflow/model/task.py
index e74085d..2650aa1 100644
--- a/rainbow/runners/airflow/model/task.py
+++ b/rainbow/runners/airflow/model/task.py
@@ -25,6 +25,13 @@ class Task:
     Task.
     """
 
+    def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
+        self.dag = dag
+        self.pipeline_name = pipeline_name
+        self.parent = parent
+        self.config = config
+        self.trigger_rule = trigger_rule
+
     def setup(self):
         """
         Setup method for task.
diff --git a/rainbow/runners/airflow/model/task.py b/rainbow/runners/airflow/tasks/create_cloudformation_stack.py
similarity index 73%
copy from rainbow/runners/airflow/model/task.py
copy to rainbow/runners/airflow/tasks/create_cloudformation_stack.py
index e74085d..9304167 100644
--- a/rainbow/runners/airflow/model/task.py
+++ b/rainbow/runners/airflow/tasks/create_cloudformation_stack.py
@@ -15,24 +15,20 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-"""
-Base task.
-"""
 
+from rainbow.runners.airflow.model import task
 
-class Task:
+
+class CreateCloudFormationStackTask(task.Task):
     """
-    Task.
+    # TODO: Creates cloud_formation stack.
     """
 
+    def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
+        super().__init__(dag, pipeline_name, parent, config, trigger_rule)
+
     def setup(self):
-        """
-        Setup method for task.
-        """
-        raise NotImplementedError()
+        pass
 
     def apply_task_to_dag(self):
-        """
-        Registers Airflow operator to parent task.
-        """
-        raise NotImplementedError()
+        pass
diff --git a/rainbow/runners/airflow/model/task.py b/rainbow/runners/airflow/tasks/delete_cloudformation_stack.py
similarity index 73%
copy from rainbow/runners/airflow/model/task.py
copy to rainbow/runners/airflow/tasks/delete_cloudformation_stack.py
index e74085d..66d5783 100644
--- a/rainbow/runners/airflow/model/task.py
+++ b/rainbow/runners/airflow/tasks/delete_cloudformation_stack.py
@@ -15,24 +15,20 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-"""
-Base task.
-"""
 
+from rainbow.runners.airflow.model import task
 
-class Task:
+
+class DeleteCloudFormationStackTask(task.Task):
     """
-    Task.
+    # TODO: Deletes cloud_formation stack.
     """
 
+    def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
+        super().__init__(dag, pipeline_name, parent, config, trigger_rule)
+
     def setup(self):
-        """
-        Setup method for task.
-        """
-        raise NotImplementedError()
+        pass
 
     def apply_task_to_dag(self):
-        """
-        Registers Airflow operator to parent task.
-        """
-        raise NotImplementedError()
+        pass
diff --git a/rainbow/runners/airflow/model/task.py b/rainbow/runners/airflow/tasks/job_end.py
similarity index 73%
copy from rainbow/runners/airflow/model/task.py
copy to rainbow/runners/airflow/tasks/job_end.py
index e74085d..b3244c4 100644
--- a/rainbow/runners/airflow/model/task.py
+++ b/rainbow/runners/airflow/tasks/job_end.py
@@ -15,24 +15,20 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-"""
-Base task.
-"""
 
+from rainbow.runners.airflow.model import task
 
-class Task:
+
+class JobEndTask(task.Task):
     """
-    Task.
+    # TODO: Job end task. Reports job end metrics.
     """
 
+    def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
+        super().__init__(dag, pipeline_name, parent, config, trigger_rule)
+
     def setup(self):
-        """
-        Setup method for task.
-        """
-        raise NotImplementedError()
+        pass
 
     def apply_task_to_dag(self):
-        """
-        Registers Airflow operator to parent task.
-        """
-        raise NotImplementedError()
+        pass
diff --git a/rainbow/runners/airflow/model/task.py b/rainbow/runners/airflow/tasks/job_start.py
similarity index 71%
copy from rainbow/runners/airflow/model/task.py
copy to rainbow/runners/airflow/tasks/job_start.py
index e74085d..f794e09 100644
--- a/rainbow/runners/airflow/model/task.py
+++ b/rainbow/runners/airflow/tasks/job_start.py
@@ -15,24 +15,21 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-"""
-Base task.
-"""
 
+from rainbow.runners.airflow.model import task
 
-class Task:
+
+class JobStartTask(task.Task):
     """
-    Task.
+    # TODO: Job start task. Reports job start metrics.
     """
 
+    def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
+        super().__init__(dag, pipeline_name, parent, config, trigger_rule)
+
     def setup(self):
-        """
-        Setup method for task.
-        """
-        raise NotImplementedError()
+        pass
 
     def apply_task_to_dag(self):
-        """
-        Registers Airflow operator to parent task.
-        """
-        raise NotImplementedError()
+        # TODO: job start task
+        pass
diff --git a/rainbow/runners/airflow/tasks/python.py b/rainbow/runners/airflow/tasks/python.py
index 727e11c..983ce0c 100644
--- a/rainbow/runners/airflow/tasks/python.py
+++ b/rainbow/runners/airflow/tasks/python.py
@@ -32,22 +32,18 @@ class PythonTask(task.Task):
     """
 
     def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
-        self.dag = dag
-        self.parent = parent
-        self.config = config
-        self.trigger_rule = trigger_rule
-        self.input_type = config['input_type']
-        self.input_path = config['input_path']
-        self.task_name = config['task']
+        super().__init__(dag, pipeline_name, parent, config, trigger_rule)
+
+        self.input_type = self.config['input_type']
+        self.input_path = self.config['input_path']
+        self.task_name = self.config['task']
         self.image = self.config['image']
-        self.resources = self.__resources_config(config)
-        self.env_vars = self.__env_vars(pipeline_name, config)
-        self.kubernetes_kwargs = self.__kubernetes_kwargs(
-            dag, self.env_vars, self.resources, self.task_name
-        )
-        self.cmds, self.arguments = self.__kubernetes_cmds_and_arguments(config)
+        self.resources = self.__kubernetes_resources()
+        self.env_vars = self.__env_vars()
+        self.kubernetes_kwargs = self.__kubernetes_kwargs()
+        self.cmds, self.arguments = self.__kubernetes_cmds_and_arguments()
         self.config_task_id = self.task_name + '_input'
-        self.executors = self.__executors(config)
+        self.executors = self.__executors()
 
     def setup(self):
         # TODO: build docker image if needed.
@@ -126,65 +122,62 @@ class PythonTask(task.Task):
 
             return end_task
 
-    @staticmethod
-    def __executors(config):
+    def __executors(self):
         executors = 1
-        if 'executors' in config:
-            executors = config['executors']
+        if 'executors' in self.config:
+            executors = self.config['executors']
         return executors
 
-    @staticmethod
-    def __kubernetes_cmds_and_arguments(config):
+    def __kubernetes_cmds_and_arguments(self):
         cmds = ['/bin/bash', '-c']
         arguments = [
             f'''sh container-setup.sh && \
-            {config['cmd']} && \
-            sh container-teardown.sh {config['output_path']}'''
+            {self.config['cmd']} && \
+            sh container-teardown.sh {self.config['output_path']}'''
         ]
         return cmds, arguments
 
-    @staticmethod
-    def __kubernetes_kwargs(dag, env_vars, resources, task_name):
+    def __kubernetes_kwargs(self):
         kubernetes_kwargs = {
             'namespace': Variable.get('kubernetes_namespace', default_var='default'),
-            'name': task_name.replace('_', '-'),
+            'name': self.task_name.replace('_', '-'),
             'in_cluster': Variable.get('in_kubernetes_cluster', default_var=False),
             'image_pull_policy': Variable.get('image_pull_policy', default_var='IfNotPresent'),
             'get_logs': True,
-            'env_vars': env_vars,
+            'env_vars': self.env_vars,
             'do_xcom_push': True,
             'is_delete_operator_pod': True,
             'startup_timeout_seconds': 300,
             'image_pull_secrets': 'regcred',
-            'resources': resources,
-            'dag': dag
+            'resources': self.resources,
+            'dag': self.dag
         }
         return kubernetes_kwargs
 
-    @staticmethod
-    def __env_vars(pipeline_name, config):
+    def __env_vars(self):
         env_vars = {}
-        if 'env_vars' in config:
-            env_vars = config['env_vars']
+        if 'env_vars' in self.config:
+            env_vars = self.config['env_vars']
         airflow_configuration_variable = Variable.get(
-            f'''{pipeline_name}_dag_configuration''',
+            f'''{self.pipeline_name}_dag_configuration''',
             default_var=None)
         if airflow_configuration_variable:
             airflow_configs = json.loads(airflow_configuration_variable)
-            environment_variables_key = f'''{self.pipeline}_environment_variables'''
+            environment_variables_key = f'''{self.pipeline_name}_environment_variables'''
             if environment_variables_key in airflow_configs:
                 env_vars = airflow_configs[environment_variables_key]
         return env_vars
 
-    @staticmethod
-    def __resources_config(config):
+    def __kubernetes_resources(self):
         resources = {}
-        if 'request_cpu' in config:
-            resources['request_cpu'] = config['request_cpu']
-        if 'request_memory' in config:
-            resources['request_memory'] = config['request_memory']
-        if 'limit_cpu' in config:
-            resources['limit_cpu'] = config['limit_cpu']
-        if 'limit_memory' in config:
-            resources['limit_memory'] = config['limit_memory']
+
+        if 'request_cpu' in self.config:
+            resources['request_cpu'] = self.config['request_cpu']
+        if 'request_memory' in self.config:
+            resources['request_memory'] = self.config['request_memory']
+        if 'limit_cpu' in self.config:
+            resources['limit_cpu'] = self.config['limit_cpu']
+        if 'limit_memory' in self.config:
+            resources['limit_memory'] = self.config['limit_memory']
+
         return resources
diff --git a/rainbow/runners/airflow/model/task.py b/rainbow/runners/airflow/tasks/spark.py
similarity index 74%
copy from rainbow/runners/airflow/model/task.py
copy to rainbow/runners/airflow/tasks/spark.py
index e74085d..ebae64e 100644
--- a/rainbow/runners/airflow/model/task.py
+++ b/rainbow/runners/airflow/tasks/spark.py
@@ -15,24 +15,20 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-"""
-Base task.
-"""
 
+from rainbow.runners.airflow.model import task
 
-class Task:
+
+class SparkTask(task.Task):
     """
-    Task.
+    # TODO: Executes a Spark application.
     """
 
+    def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
+        super().__init__(dag, pipeline_name, parent, config, trigger_rule)
+
     def setup(self):
-        """
-        Setup method for task.
-        """
-        raise NotImplementedError()
+        pass
 
     def apply_task_to_dag(self):
-        """
-        Registers Airflow operator to parent task.
-        """
-        raise NotImplementedError()
+        pass
diff --git a/rainbow/runners/airflow/model/task.py b/rainbow/runners/airflow/tasks/sql.py
similarity index 74%
copy from rainbow/runners/airflow/model/task.py
copy to rainbow/runners/airflow/tasks/sql.py
index e74085d..6dfc0f1 100644
--- a/rainbow/runners/airflow/model/task.py
+++ b/rainbow/runners/airflow/tasks/sql.py
@@ -15,24 +15,20 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-"""
-Base task.
-"""
 
+from rainbow.runners.airflow.model import task
 
-class Task:
+
+class SparkTask(task.Task):
     """
-    Task.
+    # TODO: Executes an SQL application.
     """
 
+    def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
+        super().__init__(dag, pipeline_name, parent, config, trigger_rule)
+
     def setup(self):
-        """
-        Setup method for task.
-        """
-        raise NotImplementedError()
+        pass
 
     def apply_task_to_dag(self):
-        """
-        Registers Airflow operator to parent task.
-        """
-        raise NotImplementedError()
+        pass
diff --git a/rainbow/sql/__init__.py b/rainbow/sql/__init__.py
index 217e5db..495bf9c 100644
--- a/rainbow/sql/__init__.py
+++ b/rainbow/sql/__init__.py
@@ -15,3 +15,4 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+# TODO: SQL (Scala? Python?)


[incubator-liminal] 36/43: Upgrade the quality of the diagram

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 269459e7036625beb6f911b10f8e957275855d58
Author: lior.schachter <li...@naturalint.com>
AuthorDate: Sat Apr 11 18:02:22 2020 +0300

    Upgrade the quality of the diagram
---
 images/rainbow_002.png | Bin 61815 -> 49192 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)

diff --git a/images/rainbow_002.png b/images/rainbow_002.png
index 8cb0a92..ac04a6a 100644
Binary files a/images/rainbow_002.png and b/images/rainbow_002.png differ


[incubator-liminal] 14/43: Remove TODOs with GitHub issues

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit ae36a5ee373caf9b264ac7bad68bf14f0875c6d1
Author: aviemzur <av...@gmail.com>
AuthorDate: Thu Mar 12 12:14:25 2020 +0200

    Remove TODOs with GitHub issues
---
 rainbow/build/build_rainbows.py                              | 2 --
 rainbow/http/__init__.py                                     | 1 -
 rainbow/monitoring/__init__.py                               | 1 -
 rainbow/runners/airflow/dag/rainbow_dags.py                  | 4 +---
 rainbow/runners/airflow/tasks/create_cloudformation_stack.py | 2 +-
 rainbow/runners/airflow/tasks/delete_cloudformation_stack.py | 2 +-
 rainbow/runners/airflow/tasks/job_end.py                     | 2 +-
 rainbow/runners/airflow/tasks/job_start.py                   | 3 +--
 rainbow/runners/airflow/tasks/spark.py                       | 2 +-
 rainbow/runners/airflow/tasks/sql.py                         | 2 +-
 rainbow/sql/__init__.py                                      | 1 -
 11 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/rainbow/build/build_rainbows.py b/rainbow/build/build_rainbows.py
index d10a9bc..1452bb8 100644
--- a/rainbow/build/build_rainbows.py
+++ b/rainbow/build/build_rainbows.py
@@ -35,7 +35,6 @@ def build_rainbows(path):
         print(f'Building artifacts for file: {config_file}')
 
         with open(config_file) as stream:
-            # TODO: validate config
             config = yaml.safe_load(stream)
 
             for pipeline in config['pipelines']:
@@ -47,7 +46,6 @@ def build_rainbows(path):
                                         tag=task['image'])
 
 
-# TODO: build class registry
 build_classes = {
     'python': PythonImage
 }
diff --git a/rainbow/http/__init__.py b/rainbow/http/__init__.py
index d723ae2..217e5db 100644
--- a/rainbow/http/__init__.py
+++ b/rainbow/http/__init__.py
@@ -15,4 +15,3 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-# TODO: http
diff --git a/rainbow/monitoring/__init__.py b/rainbow/monitoring/__init__.py
index 8df8694..217e5db 100644
--- a/rainbow/monitoring/__init__.py
+++ b/rainbow/monitoring/__init__.py
@@ -15,4 +15,3 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-# TODO: monitoring
diff --git a/rainbow/runners/airflow/dag/rainbow_dags.py b/rainbow/runners/airflow/dag/rainbow_dags.py
index 639f0cc..8557455 100644
--- a/rainbow/runners/airflow/dag/rainbow_dags.py
+++ b/rainbow/runners/airflow/dag/rainbow_dags.py
@@ -38,7 +38,6 @@ def register_dags(configs_path):
         print(f'Registering DAG for file: f{config_file}')
 
         with open(config_file) as stream:
-            # TODO: validate config
             config = yaml.safe_load(stream)
 
             for pipeline in config['pipelines']:
@@ -48,7 +47,7 @@ def register_dags(configs_path):
                     'owner': config['owner'],
                     'start_date': datetime.combine(pipeline['start_date'], datetime.min.time())
                 }
-                # TODO: add all relevant airflow args
+
                 dag = DAG(
                     dag_id='test_dag',
                     default_args=default_args
@@ -67,7 +66,6 @@ def register_dags(configs_path):
     return dags
 
 
-# TODO: task class registry
 task_classes = {
     'python': PythonTask
 }
diff --git a/rainbow/runners/airflow/tasks/create_cloudformation_stack.py b/rainbow/runners/airflow/tasks/create_cloudformation_stack.py
index ca8482a..8f069f3 100644
--- a/rainbow/runners/airflow/tasks/create_cloudformation_stack.py
+++ b/rainbow/runners/airflow/tasks/create_cloudformation_stack.py
@@ -21,7 +21,7 @@ from rainbow.runners.airflow.model import task
 
 class CreateCloudFormationStackTask(task.Task):
     """
-    # TODO: Creates cloud_formation stack.
+    Creates cloud_formation stack.
     """
 
     def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
diff --git a/rainbow/runners/airflow/tasks/delete_cloudformation_stack.py b/rainbow/runners/airflow/tasks/delete_cloudformation_stack.py
index 8ac4e8b..ab99101 100644
--- a/rainbow/runners/airflow/tasks/delete_cloudformation_stack.py
+++ b/rainbow/runners/airflow/tasks/delete_cloudformation_stack.py
@@ -21,7 +21,7 @@ from rainbow.runners.airflow.model import task
 
 class DeleteCloudFormationStackTask(task.Task):
     """
-    # TODO: Deletes cloud_formation stack.
+    Deletes cloud_formation stack.
     """
 
     def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
diff --git a/rainbow/runners/airflow/tasks/job_end.py b/rainbow/runners/airflow/tasks/job_end.py
index 53e1eef..42b5e7f 100644
--- a/rainbow/runners/airflow/tasks/job_end.py
+++ b/rainbow/runners/airflow/tasks/job_end.py
@@ -21,7 +21,7 @@ from rainbow.runners.airflow.model import task
 
 class JobEndTask(task.Task):
     """
-    # TODO: Job end task. Reports job end metrics.
+    Job end task. Reports job end metrics.
     """
 
     def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
diff --git a/rainbow/runners/airflow/tasks/job_start.py b/rainbow/runners/airflow/tasks/job_start.py
index 5c82e1c..64a2f4a 100644
--- a/rainbow/runners/airflow/tasks/job_start.py
+++ b/rainbow/runners/airflow/tasks/job_start.py
@@ -21,12 +21,11 @@ from rainbow.runners.airflow.model import task
 
 class JobStartTask(task.Task):
     """
-    # TODO: Job start task. Reports job start metrics.
+    Job start task. Reports job start metrics.
     """
 
     def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
         super().__init__(dag, pipeline_name, parent, config, trigger_rule)
 
     def apply_task_to_dag(self):
-        # TODO: job start task
         pass
diff --git a/rainbow/runners/airflow/tasks/spark.py b/rainbow/runners/airflow/tasks/spark.py
index 5822e92..9a46dd4 100644
--- a/rainbow/runners/airflow/tasks/spark.py
+++ b/rainbow/runners/airflow/tasks/spark.py
@@ -21,7 +21,7 @@ from rainbow.runners.airflow.model import task
 
 class SparkTask(task.Task):
     """
-    # TODO: Executes a Spark application.
+    Executes a Spark application.
     """
 
     def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
diff --git a/rainbow/runners/airflow/tasks/sql.py b/rainbow/runners/airflow/tasks/sql.py
index 42c02ce..7ae3b9f 100644
--- a/rainbow/runners/airflow/tasks/sql.py
+++ b/rainbow/runners/airflow/tasks/sql.py
@@ -21,7 +21,7 @@ from rainbow.runners.airflow.model import task
 
 class SparkTask(task.Task):
     """
-    # TODO: Executes an SQL application.
+    Executes an SQL application.
     """
 
     def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
diff --git a/rainbow/sql/__init__.py b/rainbow/sql/__init__.py
index 495bf9c..217e5db 100644
--- a/rainbow/sql/__init__.py
+++ b/rainbow/sql/__init__.py
@@ -15,4 +15,3 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-# TODO: SQL (Scala? Python?)


[incubator-liminal] 39/43: perform pip upgrade when building python images

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 324f717eca40e5a8bace61de392ecf1be1d619c4
Author: roei <ro...@naturalint.com>
AuthorDate: Mon Jun 22 15:09:20 2020 +0300

    perform pip upgrade when building python images
---
 rainbow/build/image/python/Dockerfile          | 1 +
 rainbow/build/service/python_server/Dockerfile | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/rainbow/build/image/python/Dockerfile b/rainbow/build/image/python/Dockerfile
index 8e4de05..4a91690 100644
--- a/rainbow/build/image/python/Dockerfile
+++ b/rainbow/build/image/python/Dockerfile
@@ -32,6 +32,7 @@ WORKDIR /app
 COPY ./requirements.txt /app/
 
 # mount the secret in the correct location, then run pip install
+RUN pip install --upgrade pip
 RUN {{mount}} pip install -r requirements.txt
 
 # Copy the current directory contents into the container at /app
diff --git a/rainbow/build/service/python_server/Dockerfile b/rainbow/build/service/python_server/Dockerfile
index 4d4254f..e329738 100644
--- a/rainbow/build/service/python_server/Dockerfile
+++ b/rainbow/build/service/python_server/Dockerfile
@@ -29,6 +29,8 @@ WORKDIR /app
 # Be careful when changing this code.                                                            !
 
 # Install any needed packages specified in python_server_requirements.txt and requirements.txt
+RUN pip install --upgrade pip
+
 COPY ./python_server_requirements.txt /app/
 RUN pip install -r python_server_requirements.txt
 


[incubator-liminal] 03/43: rainbow_dags dag creation + python task

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit aa63e16674d6bfa904a2b6764dd2257d2f3827de
Author: aviemzur <av...@gmail.com>
AuthorDate: Tue Mar 10 11:57:25 2020 +0200

    rainbow_dags dag creation + python task
---
 rainbow/cli/__init__.py                            |   1 +
 rainbow/core/__init__.py                           |   1 +
 rainbow/docker/__init__.py                         |   1 +
 rainbow/http/__init__.py                           |   1 +
 rainbow/monitoring/__init__.py                     |   1 +
 .../runners/airflow/compiler/rainbow_compiler.py   |   8 +-
 rainbow/runners/airflow/dag/rainbow_dags.py        |  90 ++++++++++
 .../airflow/model}/__init__.py                     |   0
 .../rainbow_compiler.py => model/task.py}          |  22 ++-
 .../airflow/tasks}/__init__.py                     |   0
 rainbow/runners/airflow/tasks/python.py            | 190 +++++++++++++++++++++
 tests/runners/airflow/compiler/rainbow.yml         | 115 -------------
 .../airflow/compiler/test_rainbow_compiler.py      |  33 ----
 .../runners/airflow/dag}/__init__.py               |   0
 tests/runners/airflow/dag/rainbow/rainbow.yml      |  51 ++++++
 tests/runners/airflow/dag/test_rainbow_dags.py     |  11 ++
 .../runners/airflow/tasks}/__init__.py             |   0
 tests/runners/airflow/tasks/test_python.py         |  50 ++++++
 {rainbow/monitoring => tests/util}/__init__.py     |   0
 .../util/dag_test_utils.py                         |  21 ++-
 20 files changed, 429 insertions(+), 167 deletions(-)

diff --git a/rainbow/cli/__init__.py b/rainbow/cli/__init__.py
index 217e5db..c24b2fa 100644
--- a/rainbow/cli/__init__.py
+++ b/rainbow/cli/__init__.py
@@ -15,3 +15,4 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+# TODO: cli
diff --git a/rainbow/core/__init__.py b/rainbow/core/__init__.py
index 217e5db..2162b08 100644
--- a/rainbow/core/__init__.py
+++ b/rainbow/core/__init__.py
@@ -15,3 +15,4 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+# TODO: core
diff --git a/rainbow/docker/__init__.py b/rainbow/docker/__init__.py
index 217e5db..8bb1ec2 100644
--- a/rainbow/docker/__init__.py
+++ b/rainbow/docker/__init__.py
@@ -15,3 +15,4 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+# TODO: docker
diff --git a/rainbow/http/__init__.py b/rainbow/http/__init__.py
index 217e5db..d723ae2 100644
--- a/rainbow/http/__init__.py
+++ b/rainbow/http/__init__.py
@@ -15,3 +15,4 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+# TODO: http
diff --git a/rainbow/monitoring/__init__.py b/rainbow/monitoring/__init__.py
index 217e5db..8df8694 100644
--- a/rainbow/monitoring/__init__.py
+++ b/rainbow/monitoring/__init__.py
@@ -15,3 +15,4 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+# TODO: monitoring
diff --git a/rainbow/runners/airflow/compiler/rainbow_compiler.py b/rainbow/runners/airflow/compiler/rainbow_compiler.py
index 818fdc5..bed1efd 100644
--- a/rainbow/runners/airflow/compiler/rainbow_compiler.py
+++ b/rainbow/runners/airflow/compiler/rainbow_compiler.py
@@ -16,11 +16,5 @@
 # specific language governing permissions and limitations
 # under the License.
 """
-Compiler for rainbows.
+TODO: compiler for rainbows.
 """
-import yaml
-
-
-def parse_yaml(path):
-    with open(path, 'r') as stream:
-        return yaml.safe_load(stream)
diff --git a/rainbow/runners/airflow/dag/rainbow_dags.py b/rainbow/runners/airflow/dag/rainbow_dags.py
new file mode 100644
index 0000000..577da07
--- /dev/null
+++ b/rainbow/runners/airflow/dag/rainbow_dags.py
@@ -0,0 +1,90 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# TODO: Iterate over each pipeline and create a DAG for it. \
+#  Within every pipeline iterate over tasks and apply them to DAG.
+
+import os
+import pprint
+
+import yaml
+from airflow import DAG
+
+from rainbow.runners.airflow.tasks.python import PythonTask
+from datetime import datetime
+
+
+def register_dags(path):
+    files = []
+    for r, d, f in os.walk(path):
+        for file in f:
+            if file[file.rfind('.') + 1:] in ['yml', 'yaml']:
+                files.append(os.path.join(r, file))
+
+    print(files)
+
+    dags = []
+
+    for config_file in files:
+        print(f'Registering DAG for file: f{config_file}')
+
+        with open(config_file) as stream:
+            # TODO: validate config
+            config = yaml.safe_load(stream)
+            pp = pprint.PrettyPrinter(indent=4)
+            # pp.pprint(config)
+
+            for pipeline in config['pipelines']:
+                parent = None
+
+                default_args = {
+                    'owner': config['owner'],
+                    'start_date': datetime.combine(pipeline['start_date'], datetime.min.time())
+                }
+                # TODO: add all relevant airflow args
+                dag = DAG(
+                    dag_id='test_dag',
+                    default_args=default_args
+                )
+
+                for task in pipeline['tasks']:
+                    task_type = task['type']
+                    task_instance = get_task_class(task_type)(
+                        dag, pipeline['pipeline'], parent if parent else None, task, 'all_success'
+                    )
+                    parent = task_instance.apply_task_to_dag()
+
+                    print(f'{parent}{{{task_type}}}')
+
+                dags.append(dag)
+    return dags
+
+
+# TODO: task class registry
+task_classes = {
+    'python': PythonTask
+}
+
+
+def get_task_class(task_type):
+    return task_classes[task_type]
+
+
+if __name__ == '__main__':
+    # TODO: configurable yaml dir
+    path = 'tests/runners/airflow/dag/rainbow'
+    register_dags(path)
diff --git a/rainbow/monitoring/__init__.py b/rainbow/runners/airflow/model/__init__.py
similarity index 100%
copy from rainbow/monitoring/__init__.py
copy to rainbow/runners/airflow/model/__init__.py
diff --git a/rainbow/runners/airflow/compiler/rainbow_compiler.py b/rainbow/runners/airflow/model/task.py
similarity index 72%
copy from rainbow/runners/airflow/compiler/rainbow_compiler.py
copy to rainbow/runners/airflow/model/task.py
index 818fdc5..e74085d 100644
--- a/rainbow/runners/airflow/compiler/rainbow_compiler.py
+++ b/rainbow/runners/airflow/model/task.py
@@ -16,11 +16,23 @@
 # specific language governing permissions and limitations
 # under the License.
 """
-Compiler for rainbows.
+Base task.
 """
-import yaml
 
 
-def parse_yaml(path):
-    with open(path, 'r') as stream:
-        return yaml.safe_load(stream)
+class Task:
+    """
+    Task.
+    """
+
+    def setup(self):
+        """
+        Setup method for task.
+        """
+        raise NotImplementedError()
+
+    def apply_task_to_dag(self):
+        """
+        Registers Airflow operator to parent task.
+        """
+        raise NotImplementedError()
diff --git a/rainbow/monitoring/__init__.py b/rainbow/runners/airflow/tasks/__init__.py
similarity index 100%
copy from rainbow/monitoring/__init__.py
copy to rainbow/runners/airflow/tasks/__init__.py
diff --git a/rainbow/runners/airflow/tasks/python.py b/rainbow/runners/airflow/tasks/python.py
new file mode 100644
index 0000000..727e11c
--- /dev/null
+++ b/rainbow/runners/airflow/tasks/python.py
@@ -0,0 +1,190 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import json
+
+from airflow.models import Variable
+from airflow.operators.dummy_operator import DummyOperator
+
+from rainbow.runners.airflow.model import task
+from rainbow.runners.airflow.operators.kubernetes_pod_operator import \
+    ConfigurableKubernetesPodOperator, \
+    ConfigureParallelExecutionOperator
+
+
+class PythonTask(task.Task):
+    """
+    Python task.
+    """
+
+    def __init__(self, dag, pipeline_name, parent, config, trigger_rule):
+        self.dag = dag
+        self.parent = parent
+        self.config = config
+        self.trigger_rule = trigger_rule
+        self.input_type = config['input_type']
+        self.input_path = config['input_path']
+        self.task_name = config['task']
+        self.image = self.config['image']
+        self.resources = self.__resources_config(config)
+        self.env_vars = self.__env_vars(pipeline_name, config)
+        self.kubernetes_kwargs = self.__kubernetes_kwargs(
+            dag, self.env_vars, self.resources, self.task_name
+        )
+        self.cmds, self.arguments = self.__kubernetes_cmds_and_arguments(config)
+        self.config_task_id = self.task_name + '_input'
+        self.executors = self.__executors(config)
+
+    def setup(self):
+        # TODO: build docker image if needed.
+        pass
+
+    def apply_task_to_dag(self):
+
+        def create_pod_operator(task_id, task_split, image):
+            return ConfigurableKubernetesPodOperator(
+                task_id=task_id,
+                config_task_id=self.config_task_id,
+                task_split=task_split,
+                image=image,
+                cmds=self.cmds,
+                arguments=self.arguments,
+                **self.kubernetes_kwargs
+            )
+
+        config_task = None
+
+        if self.input_type in ['static', 'task']:
+            self.env_vars.update({'DATA_PIPELINE_INPUT': self.input_path})
+
+            config_task = ConfigureParallelExecutionOperator(
+                task_id=self.config_task_id,
+                image=self.image,
+                config_type=self.input_type,
+                config_path=self.input_path,
+                executors=self.executors,
+                **self.kubernetes_kwargs
+            )
+
+        if self.executors == 1:
+            pod_task = create_pod_operator(
+                task_id=f'{self.task_name}',
+                task_split=0,
+                image=f'''{self.image}'''
+            )
+
+            first_task = pod_task
+
+            if config_task:
+                first_task = config_task
+                first_task.set_downstream(pod_task)
+
+            if self.parent:
+                self.parent.set_downstream(first_task)
+
+            return pod_task
+        else:
+            if not config_task:
+                config_task = DummyOperator(
+                    task_id=self.config_task_id,
+                    trigger_rule=self.trigger_rule,
+                    dag=self.dag
+                )
+
+            end_task = DummyOperator(
+                task_id=self.task_name,
+                dag=self.dag
+            )
+
+            if self.parent:
+                self.parent.set_downstream(config_task)
+
+                for i in range(self.executors):
+                    split_task = create_pod_operator(
+                        task_id=f'''{self.task_name}_{i}''',
+                        task_split=i,
+                        image=self.image
+                    )
+
+                    config_task.set_downstream(split_task)
+
+                    split_task.set_downstream(end_task)
+
+            return end_task
+
+    @staticmethod
+    def __executors(config):
+        executors = 1
+        if 'executors' in config:
+            executors = config['executors']
+        return executors
+
+    @staticmethod
+    def __kubernetes_cmds_and_arguments(config):
+        cmds = ['/bin/bash', '-c']
+        arguments = [
+            f'''sh container-setup.sh && \
+            {config['cmd']} && \
+            sh container-teardown.sh {config['output_path']}'''
+        ]
+        return cmds, arguments
+
+    @staticmethod
+    def __kubernetes_kwargs(dag, env_vars, resources, task_name):
+        kubernetes_kwargs = {
+            'namespace': Variable.get('kubernetes_namespace', default_var='default'),
+            'name': task_name.replace('_', '-'),
+            'in_cluster': Variable.get('in_kubernetes_cluster', default_var=False),
+            'image_pull_policy': Variable.get('image_pull_policy', default_var='IfNotPresent'),
+            'get_logs': True,
+            'env_vars': env_vars,
+            'do_xcom_push': True,
+            'is_delete_operator_pod': True,
+            'startup_timeout_seconds': 300,
+            'image_pull_secrets': 'regcred',
+            'resources': resources,
+            'dag': dag
+        }
+        return kubernetes_kwargs
+
+    @staticmethod
+    def __env_vars(pipeline_name, config):
+        env_vars = {}
+        if 'env_vars' in config:
+            env_vars = config['env_vars']
+        airflow_configuration_variable = Variable.get(
+            f'''{pipeline_name}_dag_configuration''',
+            default_var=None)
+        if airflow_configuration_variable:
+            airflow_configs = json.loads(airflow_configuration_variable)
+            environment_variables_key = f'''{self.pipeline}_environment_variables'''
+            if environment_variables_key in airflow_configs:
+                env_vars = airflow_configs[environment_variables_key]
+        return env_vars
+
+    @staticmethod
+    def __resources_config(config):
+        resources = {}
+        if 'request_cpu' in config:
+            resources['request_cpu'] = config['request_cpu']
+        if 'request_memory' in config:
+            resources['request_memory'] = config['request_memory']
+        if 'limit_cpu' in config:
+            resources['limit_cpu'] = config['limit_cpu']
+        if 'limit_memory' in config:
+            resources['limit_memory'] = config['limit_memory']
+        return resources
diff --git a/tests/runners/airflow/compiler/rainbow.yml b/tests/runners/airflow/compiler/rainbow.yml
deleted file mode 100644
index 45333a8..0000000
--- a/tests/runners/airflow/compiler/rainbow.yml
+++ /dev/null
@@ -1,115 +0,0 @@
-
----
-name: MyPipeline
-owner: Bosco Albert Baracus
-pipeline:
-  timeout-minutes: 45
-  schedule: 0 * 1 * *
-  metrics-namespace: TestNamespace
-  tasks:
-    - name: mytask1
-      type: sql
-      description: mytask1 is cool
-      query: "select * from mytable"
-      overrides:
-        - prod:
-          partition-columns: dt
-          output-table: test.test_impression_prod
-          output-path: s3://mybucket/myproject-test/impression
-          emr-cluster-name: spark-playground-prod
-        - stg:
-          query: "select * from mytable"
-          partition-columns: dt
-          output-table: test.test_impression_stg
-          output-path: s3://mybucket/haya-test/impression
-          emr-cluster-name: spark-playground-staging
-      tasks:
-        - name: my_static_config_task
-          type: python
-          description: my 1st ds task
-          artifact-id: mytask1artifactid
-          source: mytask1folder
-          env-vars:
-            env1: "a"
-            env2: "b"
-          config-type: static
-          config-path: "{\"configs\": [ { \"campaign_id\": 10 }, { \"campaign_id\": 20 } ]}"
-          cmd: python -u my_app.py
-        - task:
-          name: my_no_config_task
-          type: python
-          description: my 2nd ds task
-          artifact-id: mytask1artifactid
-          env-vars:
-            env1: "a"
-            env2: "b"
-          request-cpu: 100m
-          request-memory: 65M
-          cmd: python -u my_app.py foo bar
-        - task:
-          name: my_create_custom_config_task
-          type: python
-          description: my 2nd ds task
-          artifact-id: myconftask
-          source: myconftask
-          output-config-path: /my_conf.json
-          env-vars:
-            env1: "a"
-            env2: "b"
-          cmd: python -u my_app.py foo bar
-        - task:
-          name: my_custom_config_task
-          type: python
-          description: my 2nd ds task
-          artifact-id: mytask1artifactid
-          config-type: task
-          config-path: my_create_custom_config_task
-          env-vars:
-            env1: "a"
-            env2: "b"
-          cmd: python -u my_app.py foo bar
-        - task:
-          name: my_parallelized_static_config_task
-          type: python
-          description: my 3rd ds task
-          artifact-id: mytask1artifactid
-          executors: 5
-          env-vars:
-            env1: "x"
-            env2: "y"
-            myconf: $CONFIG_FILE
-          config-type: static
-          config-path: "{\"configs\": [ { \"campaign_id\": 10 }, { \"campaign_id\": 20 }, { \"campaign_id\": 30 }, { \"campaign_id\": 40 }, { \"campaign_id\": 50 }, { \"campaign_id\": 60 }, { \"campaign_id\": 70 }, { \"campaign_id\": 80 } ]}"
-          cmd: python -u my_app.py $CONFIG_FILE
-        - task:
-          name: my_parallelized_custom_config_task
-          type: python
-          description: my 4th ds task
-          artifact-id: mytask1artifactid
-          executors: 5
-          config-type: task
-          config-path: my_create_custom_config_task
-          cmd: python -u my_app.py
-        - task:
-          name: my_parallelized_no_config_task
-          type: python
-          description: my 4th ds task
-          artifact-id: mytask1artifactid
-          executors: 5
-          cmd: python -u my_app.py
-services:
-  - service:
-    name: myserver1
-    type: python-server
-    description: my python server
-    artifact-id: myserver1artifactid
-    source: myserver1logicfolder
-    endpoints:
-      - endpoint:
-        path: /myendpoint1
-        module: mymodule1
-        function: myfun1
-      - endpoint:
-        path: /myendpoint2
-        module: mymodule2
-        function: myfun2
diff --git a/tests/runners/airflow/compiler/test_rainbow_compiler.py b/tests/runners/airflow/compiler/test_rainbow_compiler.py
deleted file mode 100644
index 6e73d8f..0000000
--- a/tests/runners/airflow/compiler/test_rainbow_compiler.py
+++ /dev/null
@@ -1,33 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-import unittest
-
-from rainbow.runners.airflow.compiler import rainbow_compiler
-
-
-class TestRainbowCompiler(unittest.TestCase):
-
-    def test_parse(self):
-        expected = {'name': 'MyPipeline', 'owner': 'Bosco Albert Baracus', 'pipeline': {'timeout-minutes': 45, 'schedule': '0 * 1 * *', 'metrics-namespace': 'TestNamespace', 'tasks': [{'name': 'mytask1', 'type': 'sql', 'description': 'mytask1 is cool', 'query': 'select * from mytable', 'overrides': [{'prod': None, 'partition-columns': 'dt', 'output-table': 'test.test_impression_prod', 'output-path': 's3://mybucket/myproject-test/impression', 'emr-cluster-name': 'spark-playground-prod'},  [...]
-        actual = rainbow_compiler.parse_yaml('tests/runners/airflow/compiler/rainbow.yml')
-        self.assertEqual(expected, actual)
-
-
-if __name__ == '__main__':
-    unittest.main()
diff --git a/rainbow/monitoring/__init__.py b/tests/runners/airflow/dag/__init__.py
similarity index 100%
copy from rainbow/monitoring/__init__.py
copy to tests/runners/airflow/dag/__init__.py
diff --git a/tests/runners/airflow/dag/rainbow/rainbow.yml b/tests/runners/airflow/dag/rainbow/rainbow.yml
new file mode 100644
index 0000000..07afd08
--- /dev/null
+++ b/tests/runners/airflow/dag/rainbow/rainbow.yml
@@ -0,0 +1,51 @@
+
+---
+name: MyPipeline
+owner: Bosco Albert Baracus
+pipelines:
+  - pipeline: my_pipeline
+    start_date: 1970-01-01
+    timeout-minutes: 45
+    schedule: 0 * 1 * *
+    metrics-namespace: TestNamespace
+    tasks:
+      - task: my_static_config_task
+        type: python
+        description: my 1st ds task
+        image: mytask1artifactid
+        source: mytask1folder
+        env_vars:
+          env1: "a"
+          env2: "b"
+        input_type: static
+        input_path: "{\"configs\": [ { \"campaign_id\": 10 }, { \"campaign_id\": 20 } ]}"
+        output_path: 'baz'
+        cmd: 'foo bar'
+      - task: my_static_config_task2
+        type: python
+        description: my 1st ds task
+        image: mytask1artifactid
+        source: mytask1folder
+        env_vars:
+          env1: "a"
+          env2: "b"
+        input_type: static
+        input_path: "{\"configs\": [ { \"campaign_id\": 10 }, { \"campaign_id\": 20 } ]}"
+        output_path: 'baz'
+        cmd: 'foo bar'
+services:
+  - service:
+    name: myserver1
+    type: python-server
+    description: my python server
+    artifact-id: myserver1artifactid
+    source: myserver1logicfolder
+    endpoints:
+      - endpoint:
+        path: /myendpoint1
+        module: mymodule1
+        function: myfun1
+      - endpoint:
+        path: /myendpoint2
+        module: mymodule2
+        function: myfun2
diff --git a/tests/runners/airflow/dag/test_rainbow_dags.py b/tests/runners/airflow/dag/test_rainbow_dags.py
new file mode 100644
index 0000000..41bea09
--- /dev/null
+++ b/tests/runners/airflow/dag/test_rainbow_dags.py
@@ -0,0 +1,11 @@
+from unittest import TestCase
+
+from rainbow.runners.airflow.dag import rainbow_dags
+
+
+class Test(TestCase):
+    def test_register_dags(self):
+        dags = rainbow_dags.register_dags("tests/runners/airflow/dag/rainbow")
+        self.assertEqual(len(dags), 1)
+        # TODO: elaborate test
+        pass
diff --git a/rainbow/monitoring/__init__.py b/tests/runners/airflow/tasks/__init__.py
similarity index 100%
copy from rainbow/monitoring/__init__.py
copy to tests/runners/airflow/tasks/__init__.py
diff --git a/tests/runners/airflow/tasks/test_python.py b/tests/runners/airflow/tasks/test_python.py
new file mode 100644
index 0000000..4f5808b
--- /dev/null
+++ b/tests/runners/airflow/tasks/test_python.py
@@ -0,0 +1,50 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from unittest import TestCase
+
+from rainbow.runners.airflow.operators.kubernetes_pod_operator import \
+    ConfigurableKubernetesPodOperator
+from rainbow.runners.airflow.tasks import python
+from tests.util import dag_test_utils
+
+
+class TestPythonTask(TestCase):
+    def test_apply_task_to_dag(self):
+        # TODO: elaborate tests
+        dag = dag_test_utils.create_dag()
+
+        task_id = 'my_task'
+
+        config = {
+            'task': task_id,
+            'cmd': 'foo bar',
+            'image': 'my_image',
+            'input_type': 'my_input_type',
+            'input_path': 'my_input',
+            'output_path': '/my_output.json'
+        }
+
+        task0 = python.PythonTask(dag, 'my_pipeline', None, config, 'all_success')
+        task0.apply_task_to_dag()
+
+        self.assertEqual(len(dag.tasks), 1)
+        dag_task0 = dag.tasks[0]
+
+        self.assertIsInstance(dag_task0, ConfigurableKubernetesPodOperator)
+        self.assertEqual(dag_task0.task_id, task_id)
diff --git a/rainbow/monitoring/__init__.py b/tests/util/__init__.py
similarity index 100%
copy from rainbow/monitoring/__init__.py
copy to tests/util/__init__.py
diff --git a/rainbow/runners/airflow/compiler/rainbow_compiler.py b/tests/util/dag_test_utils.py
similarity index 76%
copy from rainbow/runners/airflow/compiler/rainbow_compiler.py
copy to tests/util/dag_test_utils.py
index 818fdc5..b1fbcab 100644
--- a/rainbow/runners/airflow/compiler/rainbow_compiler.py
+++ b/tests/util/dag_test_utils.py
@@ -15,12 +15,19 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-"""
-Compiler for rainbows.
-"""
-import yaml
 
 
-def parse_yaml(path):
-    with open(path, 'r') as stream:
-        return yaml.safe_load(stream)
+from datetime import datetime
+
+from airflow import DAG
+
+
+def create_dag():
+    """
+    Test util to create a basic DAG for testing.
+    """
+
+    return DAG(
+        dag_id='test_dag',
+        default_args={'start_date': datetime(1970, 1, 1)}
+    )


[incubator-liminal] 17/43: Add python_server service

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 14716fcab2ad7f1140d14333e6435bc158e779e8
Author: aviemzur <av...@gmail.com>
AuthorDate: Mon Mar 16 10:51:56 2020 +0200

    Add python_server service
---
 rainbow/build/build_rainbows.py                    |  51 ++++++---
 .../hello_world => rainbow/build/http}/__init__.py |   0
 rainbow/build/http/python/Dockerfile               |  24 +++++
 rainbow/{http => build/http/python}/__init__.py    |   0
 .../build/http/python/python_server_image.py       |  29 +++--
 .../http/python/python_server_requirements.txt     |   2 +
 rainbow/build/http/python/rainbow_python_server.py |  62 +++++++++++
 rainbow/build/image_builder.py                     | 118 +++++++++++++++++++++
 rainbow/build/python/python_image.py               |  62 ++---------
 requirements.txt                                   |   5 +-
 .../hello_world => build/http}/__init__.py         |   0
 .../hello_world => build/http/python}/__init__.py  |   0
 .../build/http/python/test_python_server_image.py  | 105 ++++++++++++++++++
 .../airflow/build/python/test_python_image.py      |  11 +-
 tests/runners/airflow/build/test_build_rainbow.py  |  27 -----
 tests/runners/airflow/build/test_build_rainbows.py |  56 ++++++++++
 .../{hello_world => helloworld}/__init__.py        |   0
 .../{hello_world => helloworld}/hello_world.py     |   0
 .../rainbow/{hello_world => myserver}/__init__.py  |   0
 .../__init__.py => myserver/my_server.py}          |   4 +
 tests/runners/airflow/rainbow/rainbow.yml          |  29 +++--
 tests/runners/airflow/tasks/test_python.py         |   2 +-
 22 files changed, 465 insertions(+), 122 deletions(-)

diff --git a/rainbow/build/build_rainbows.py b/rainbow/build/build_rainbows.py
index 2a9e6a3..c4a14c7 100644
--- a/rainbow/build/build_rainbows.py
+++ b/rainbow/build/build_rainbows.py
@@ -20,36 +20,59 @@ import os
 
 import yaml
 
+from rainbow.build.http.python.python_server_image import PythonServerImageBuilder
+from rainbow.build.python.python_image import PythonImageBuilder
 from rainbow.core.util import files_util
-from rainbow.build.python.python_image import PythonImage
 
 
 def build_rainbows(path):
     """
-    TODO: doc for build_rainbow
+    TODO: doc for build_rainbows
     """
-
     config_files = files_util.find_config_files(path)
 
     for config_file in config_files:
         print(f'Building artifacts for file: {config_file}')
 
+        base_path = os.path.dirname(config_file)
+
         with open(config_file) as stream:
-            config = yaml.safe_load(stream)
+            rainbow_config = yaml.safe_load(stream)
 
-            for pipeline in config['pipelines']:
+            for pipeline in rainbow_config['pipelines']:
                 for task in pipeline['tasks']:
-                    task_type = task['type']
-                    task_instance = get_build_class(task_type)()
-                    task_instance.build(base_path=os.path.dirname(config_file),
-                                        relative_source_path=task['source'],
-                                        tag=task['image'])
+                    builder_class = __get_task_build_class(task['type'])
+                    __build_image(base_path, task, builder_class)
+
+                for service in rainbow_config['services']:
+                    builder_class = __get_service_build_class(service['type'])
+                    __build_image(base_path, service, builder_class)
+
+
+def __build_image(base_path, builder_config, builder):
+    if 'source' in builder_config:
+        server_builder_instance = builder(
+            config=builder_config,
+            base_path=base_path,
+            relative_source_path=builder_config['source'],
+            tag=builder_config['image'])
+        server_builder_instance.build()
+    else:
+        print(f"No source provided for {builder_config['name']}, skipping.")
 
 
-build_classes = {
-    'python': PythonImage
+__task_build_classes = {
+    'python': PythonImageBuilder,
 }
 
+__service_build_classes = {
+    'python_server': PythonServerImageBuilder
+}
+
+
+def __get_task_build_class(task_type):
+    return __task_build_classes[task_type]
+
 
-def get_build_class(task_type):
-    return build_classes[task_type]
+def __get_service_build_class(task_type):
+    return __service_build_classes[task_type]
diff --git a/tests/runners/airflow/rainbow/hello_world/__init__.py b/rainbow/build/http/__init__.py
similarity index 100%
copy from tests/runners/airflow/rainbow/hello_world/__init__.py
copy to rainbow/build/http/__init__.py
diff --git a/rainbow/build/http/python/Dockerfile b/rainbow/build/http/python/Dockerfile
new file mode 100644
index 0000000..6119437
--- /dev/null
+++ b/rainbow/build/http/python/Dockerfile
@@ -0,0 +1,24 @@
+# Use an official Python runtime as a parent image
+FROM python:3.7-slim
+
+# Install aptitude build-essential
+#RUN apt-get install -y --reinstall build-essential
+
+# Set the working directory to /app
+WORKDIR /app
+
+# Order of operations is important here for docker's caching & incremental build performance.    !
+# Be careful when changing this code.                                                            !
+
+# Install any needed packages specified in python_server_requirements.txt and requirements.txt
+COPY ./python_server_requirements.txt /app
+RUN pip install -r python_server_requirements.txt
+
+COPY ./requirements.txt /app
+RUN pip install -r requirements.txt
+
+# Copy the current directory contents into the container at /app
+RUN echo "Copying source code.."
+COPY . /app
+
+CMD python -u rainbow_python_server.py
diff --git a/rainbow/http/__init__.py b/rainbow/build/http/python/__init__.py
similarity index 100%
rename from rainbow/http/__init__.py
rename to rainbow/build/http/python/__init__.py
diff --git a/tests/runners/airflow/rainbow/hello_world/hello_world.py b/rainbow/build/http/python/python_server_image.py
similarity index 51%
copy from tests/runners/airflow/rainbow/hello_world/hello_world.py
copy to rainbow/build/http/python/python_server_image.py
index 3eae465..9a65477 100644
--- a/tests/runners/airflow/rainbow/hello_world/hello_world.py
+++ b/rainbow/build/http/python/python_server_image.py
@@ -15,13 +15,28 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-import json
 
-print('Hello world!')
-print()
+import os
 
-with open('/rainbow_input.json') as file:
-    print(json.loads(file.readline()))
+from rainbow.build.image_builder import ImageBuilder
+import yaml
 
-with open('/output.json', 'w') as file:
-    file.write(json.dumps({'a': 1, 'b': 2}))
+
+class PythonServerImageBuilder(ImageBuilder):
+
+    def __init__(self, config, base_path, relative_source_path, tag):
+        super().__init__(config, base_path, relative_source_path, tag)
+
+    @staticmethod
+    def _dockerfile_path():
+        return os.path.join(os.path.dirname(__file__), 'Dockerfile')
+
+    @staticmethod
+    def _additional_files_from_paths():
+        return [
+            os.path.join(os.path.dirname(__file__), 'rainbow_python_server.py'),
+            os.path.join(os.path.dirname(__file__), 'python_server_requirements.txt')
+        ]
+
+    def _additional_files_from_filename_content_pairs(self):
+        return [('service.yml', yaml.safe_dump(self.config))]
diff --git a/rainbow/build/http/python/python_server_requirements.txt b/rainbow/build/http/python/python_server_requirements.txt
new file mode 100644
index 0000000..c395de6
--- /dev/null
+++ b/rainbow/build/http/python/python_server_requirements.txt
@@ -0,0 +1,2 @@
+Flask==1.1.1
+pyyaml
diff --git a/rainbow/build/http/python/rainbow_python_server.py b/rainbow/build/http/python/rainbow_python_server.py
new file mode 100644
index 0000000..66aab27
--- /dev/null
+++ b/rainbow/build/http/python/rainbow_python_server.py
@@ -0,0 +1,62 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import yaml
+from flask import Flask
+
+app = Flask(__name__)
+
+
+def start_server(yml_path):
+    with open(yml_path) as stream:
+        __start_server(yaml.safe_load(stream))
+
+
+def __start_server(config):
+    endpoints = config['endpoints']
+
+    for endpoint_config in endpoints:
+        print(f'Registering endpoint: {endpoint_config}')
+        endpoint = endpoint_config['endpoint']
+
+        print(endpoint_config['module'])
+
+        module = __get_module(endpoint_config['module'])
+        function = module.__getattribute__(endpoint_config['function'])
+
+        app.add_url_rule(rule=endpoint,
+                         endpoint=endpoint,
+                         view_func=function,
+                         methods=['GET', 'POST'])
+
+    print('Starting python server')
+
+    app.run(host='0.0.0.0', threaded=False, port=80)
+
+
+def __get_module(kls):
+    parts = kls.split('.')
+    module = ".".join(parts)
+    m = __import__(module)
+    for comp in parts[1:]:
+        m = getattr(m, comp)
+    return m
+
+
+if __name__ == "__main__":
+    start_server('service.yml')
diff --git a/rainbow/build/image_builder.py b/rainbow/build/image_builder.py
new file mode 100644
index 0000000..b54dc00
--- /dev/null
+++ b/rainbow/build/image_builder.py
@@ -0,0 +1,118 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import os
+import shutil
+import tempfile
+
+import docker
+
+
+class ImageBuilder:
+    """
+    Builds an image from source code
+    """
+
+    def __init__(self, config, base_path, relative_source_path, tag):
+        """
+        TODO: pydoc
+
+        :param config:
+        :param base_path:
+        :param relative_source_path:
+        :param tag:
+        """
+        self.base_path = base_path
+        self.relative_source_path = relative_source_path
+        self.tag = tag
+        self.config = config
+
+    def build(self):
+        """
+        Builds source code into an image.
+        """
+        print(f'[ ] Building image: {self.tag}')
+
+        temp_dir = self.__temp_dir()
+
+        self.__copy_source_code(temp_dir)
+        self.__write_additional_files(temp_dir)
+
+        # TODO: log docker output
+        docker_client = docker.from_env()
+        docker_client.images.build(path=temp_dir, tag=self.tag)
+        docker_client.close()
+
+        self.__remove_dir(temp_dir)
+
+        print(f'[X] Building image: {self.tag} (Success).')
+
+    def __copy_source_code(self, temp_dir):
+        self.__copy_dir(os.path.join(self.base_path, self.relative_source_path), temp_dir)
+
+    def __write_additional_files(self, temp_dir):
+        # TODO: move requirements.txt related code to a parent class for python image builders.
+        requirements_file_path = os.path.join(temp_dir, 'requirements.txt')
+        if not os.path.exists(requirements_file_path):
+            with open(requirements_file_path, 'w'):
+                pass
+
+        for file in [self._dockerfile_path()] + self._additional_files_from_paths():
+            self.__copy_file(file, temp_dir)
+
+        for filename, content in self._additional_files_from_filename_content_pairs():
+            with open(os.path.join(temp_dir, filename), 'w') as file:
+                file.write(content)
+
+    def __temp_dir(self):
+        temp_dir = tempfile.mkdtemp()
+        # Delete dir for shutil.copytree to work
+        self.__remove_dir(temp_dir)
+        return temp_dir
+
+    @staticmethod
+    def __remove_dir(temp_dir):
+        shutil.rmtree(temp_dir)
+
+    @staticmethod
+    def __copy_dir(source_path, destination_path):
+        shutil.copytree(source_path, destination_path)
+
+    @staticmethod
+    def __copy_file(source_file_path, destination_file_path):
+        shutil.copy2(source_file_path, destination_file_path)
+
+    @staticmethod
+    def _dockerfile_path():
+        """
+        Path to Dockerfile
+        """
+        raise NotImplementedError()
+
+    @staticmethod
+    def _additional_files_from_paths():
+        """
+        List of paths to additional files
+        """
+        return []
+
+    def _additional_files_from_filename_content_pairs(self):
+        """
+        File name and content pairs to create files from
+        """
+        return []
diff --git a/rainbow/build/python/python_image.py b/rainbow/build/python/python_image.py
index f0fb3a0..d856b8c 100644
--- a/rainbow/build/python/python_image.py
+++ b/rainbow/build/python/python_image.py
@@ -17,62 +17,22 @@
 # under the License.
 
 import os
-import shutil
-import tempfile
 
-import docker
+from rainbow.build.image_builder import ImageBuilder
 
 
-class PythonImage:
+class PythonImageBuilder(ImageBuilder):
 
-    def build(self, base_path, relative_source_path, tag):
-        """
-        TODO: pydoc
-
-        :param base_path:
-        :param relative_source_path:
-        :param tag:
-        :param extra_files:
-        :return:
-        """
-
-        print(f'Building image {tag}')
-
-        temp_dir = tempfile.mkdtemp()
-        # Delete dir for shutil.copytree to work
-        os.rmdir(temp_dir)
-
-        self.__copy_source(os.path.join(base_path, relative_source_path), temp_dir)
-
-        requirements_file_path = os.path.join(temp_dir, 'requirements.txt')
-        if not os.path.exists(requirements_file_path):
-            with open(requirements_file_path, 'w'):
-                pass
-
-        docker_files = [
-            os.path.join(os.path.dirname(__file__), 'Dockerfile'),
-            os.path.join(os.path.dirname(__file__), 'container-setup.sh'),
-            os.path.join(os.path.dirname(__file__), 'container-teardown.sh')
-        ]
-
-        for file in docker_files:
-            self.__copy_file(file, temp_dir)
-
-        docker_client = docker.from_env()
-
-        # TODO: log docker output
-        docker_client.images.build(path=temp_dir, tag=tag)
-
-        docker_client.close()
-
-        print(temp_dir, os.listdir(temp_dir))
-
-        shutil.rmtree(temp_dir)
+    def __init__(self, config, base_path, relative_source_path, tag):
+        super().__init__(config, base_path, relative_source_path, tag)
 
     @staticmethod
-    def __copy_source(source_path, destination_path):
-        shutil.copytree(source_path, destination_path)
+    def _dockerfile_path():
+        return os.path.join(os.path.dirname(__file__), 'Dockerfile')
 
     @staticmethod
-    def __copy_file(source_file_path, destination_file_path):
-        shutil.copy2(source_file_path, destination_file_path)
+    def _additional_files_from_paths():
+        return [
+            os.path.join(os.path.dirname(__file__), 'container-setup.sh'),
+            os.path.join(os.path.dirname(__file__), 'container-teardown.sh'),
+        ]
diff --git a/requirements.txt b/requirements.txt
index 599ab8b..dd1e232 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,6 +1,7 @@
-botocore
-PyYAML
+botocore==1.15.21
 docker==4.2.0
 apache-airflow==1.10.9
 docker-pycreds==0.4.0
 click==7.1.1
+Flask=1.1.1
+pyyaml
\ No newline at end of file
diff --git a/tests/runners/airflow/rainbow/hello_world/__init__.py b/tests/runners/airflow/build/http/__init__.py
similarity index 100%
copy from tests/runners/airflow/rainbow/hello_world/__init__.py
copy to tests/runners/airflow/build/http/__init__.py
diff --git a/tests/runners/airflow/rainbow/hello_world/__init__.py b/tests/runners/airflow/build/http/python/__init__.py
similarity index 100%
copy from tests/runners/airflow/rainbow/hello_world/__init__.py
copy to tests/runners/airflow/build/http/python/__init__.py
diff --git a/tests/runners/airflow/build/http/python/test_python_server_image.py b/tests/runners/airflow/build/http/python/test_python_server_image.py
new file mode 100644
index 0000000..fd38c80
--- /dev/null
+++ b/tests/runners/airflow/build/http/python/test_python_server_image.py
@@ -0,0 +1,105 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import threading
+import time
+import unittest
+import urllib.request
+from unittest import TestCase
+
+import docker
+
+from rainbow.build.http.python.python_server_image import PythonServerImageBuilder
+
+
+class TestPythonServer(TestCase):
+
+    def setUp(self) -> None:
+        super().setUp()
+        self.docker_client = docker.from_env()
+        self.config = self.__create_conf('my_task')
+        self.image_name = self.config['image']
+        self.__remove_containers()
+
+    def tearDown(self) -> None:
+        self.__remove_containers()
+        self.docker_client.close()
+
+    def test_build_python_server(self):
+        builder = PythonServerImageBuilder(config=self.config,
+                                           base_path='tests/runners/airflow/rainbow',
+                                           relative_source_path='myserver',
+                                           tag=self.image_name)
+
+        builder.build()
+
+        thread = threading.Thread(target=self.__run_container, args=[self.image_name])
+        thread.daemon = True
+        thread.start()
+
+        time.sleep(2)
+
+        server_response = urllib.request.urlopen("http://localhost:9294/myendpoint1").read()
+
+        self.assertEqual("b'1'", str(server_response))
+
+    def __remove_containers(self):
+        print(f'Stopping containers with image: {self.image_name}')
+
+        all_containers = self.docker_client.containers
+        matching_containers = all_containers.list(filters={'ancestor': self.image_name})
+
+        for container in matching_containers:
+            container_id = container.id
+            print(f'Stopping container {container_id}')
+            self.docker_client.api.stop(container_id)
+            print(f'Removing container {container_id}')
+            self.docker_client.api.remove_container(container_id)
+
+        self.docker_client.containers.prune()
+
+    def __run_container(self, image_name):
+        try:
+            print(f'Running container for image: {image_name}')
+            self.docker_client.containers.run(image_name, ports={'80/tcp': 9294})
+        except Exception as err:
+            print(err)
+            pass
+
+    @staticmethod
+    def __create_conf(task_id):
+        return {
+            'task': task_id,
+            'cmd': 'foo bar',
+            'image': 'rainbow_server_image',
+            'source': 'tests/runners/airflow/rainbow/myserver',
+            'input_type': 'my_input_type',
+            'input_path': 'my_input',
+            'output_path': '/my_output.json',
+            'endpoints': [
+                {
+                    'endpoint': '/myendpoint1',
+                    'module': 'my_server',
+                    'function': 'myendpoint1func'
+                }
+            ]
+        }
+
+
+if __name__ == '__main__':
+    unittest.main()
diff --git a/tests/runners/airflow/build/python/test_python_image.py b/tests/runners/airflow/build/python/test_python_image.py
index d190fba..ff4555d 100644
--- a/tests/runners/airflow/build/python/test_python_image.py
+++ b/tests/runners/airflow/build/python/test_python_image.py
@@ -19,7 +19,7 @@ from unittest import TestCase
 
 import docker
 
-from rainbow.build.python.python_image import PythonImage
+from rainbow.build.python.python_image import PythonImageBuilder
 
 
 class TestPythonImage(TestCase):
@@ -29,7 +29,12 @@ class TestPythonImage(TestCase):
 
         image_name = config['image']
 
-        PythonImage().build('tests/runners/airflow/rainbow', 'hello_world', image_name)
+        builder = PythonImageBuilder(config=config,
+                                     base_path='tests/runners/airflow/rainbow',
+                                     relative_source_path='helloworld',
+                                     tag=image_name)
+
+        builder.build()
 
         # TODO: elaborate test of image, validate input/output
 
@@ -54,7 +59,7 @@ class TestPythonImage(TestCase):
             'task': task_id,
             'cmd': 'foo bar',
             'image': 'rainbow_image',
-            'source': 'tests/runners/airflow/rainbow/hello_world',
+            'source': 'tests/runners/airflow/rainbow/helloworld',
             'input_type': 'my_input_type',
             'input_path': 'my_input',
             'output_path': '/my_output.json'
diff --git a/tests/runners/airflow/build/test_build_rainbow.py b/tests/runners/airflow/build/test_build_rainbow.py
deleted file mode 100644
index 0817d6c..0000000
--- a/tests/runners/airflow/build/test_build_rainbow.py
+++ /dev/null
@@ -1,27 +0,0 @@
-import unittest
-from unittest import TestCase
-
-import docker
-from rainbow.build import build_rainbows
-
-
-class TestBuildRainbow(TestCase):
-
-    def test_build_rainbow(self):
-        docker_client = docker.client.from_env()
-        image_names = ['my_static_input_task_image', 'my_task_output_input_task_image']
-
-        for image_name in image_names:
-            if len(docker_client.images.list(image_name)) > 0:
-                docker_client.images.remove(image=image_name)
-
-        build_rainbows.build_rainbows('tests/runners/airflow/rainbow')
-
-        for image_name in image_names:
-            docker_client.images.get(name=image_name)
-
-        docker_client.close()
-
-
-if __name__ == '__main__':
-    unittest.main()
diff --git a/tests/runners/airflow/build/test_build_rainbows.py b/tests/runners/airflow/build/test_build_rainbows.py
new file mode 100644
index 0000000..9a4d31c
--- /dev/null
+++ b/tests/runners/airflow/build/test_build_rainbows.py
@@ -0,0 +1,56 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import unittest
+from unittest import TestCase
+
+import docker
+
+from rainbow.build import build_rainbows
+
+
+class TestBuildRainbows(TestCase):
+
+    __image_names = [
+        'my_static_input_task_image',
+        'my_task_output_input_task_image',
+        'my_server_image'
+    ]
+
+    def setUp(self) -> None:
+        self.docker_client = docker.client.from_env()
+        self.__remove_images()
+
+    def tearDown(self) -> None:
+        self.__remove_images()
+        self.docker_client.close()
+
+    def __remove_images(self):
+        for image_name in self.__image_names:
+            if len(self.docker_client.images.list(image_name)) > 0:
+                self.docker_client.images.remove(image=image_name)
+
+    def test_build_rainbow(self):
+        build_rainbows.build_rainbows('tests/runners/airflow/rainbow')
+
+        for image in self.__image_names:
+            self.docker_client.images.get(image)
+
+
+if __name__ == '__main__':
+    unittest.main()
diff --git a/tests/runners/airflow/rainbow/hello_world/__init__.py b/tests/runners/airflow/rainbow/helloworld/__init__.py
similarity index 100%
copy from tests/runners/airflow/rainbow/hello_world/__init__.py
copy to tests/runners/airflow/rainbow/helloworld/__init__.py
diff --git a/tests/runners/airflow/rainbow/hello_world/hello_world.py b/tests/runners/airflow/rainbow/helloworld/hello_world.py
similarity index 100%
rename from tests/runners/airflow/rainbow/hello_world/hello_world.py
rename to tests/runners/airflow/rainbow/helloworld/hello_world.py
diff --git a/tests/runners/airflow/rainbow/hello_world/__init__.py b/tests/runners/airflow/rainbow/myserver/__init__.py
similarity index 100%
copy from tests/runners/airflow/rainbow/hello_world/__init__.py
copy to tests/runners/airflow/rainbow/myserver/__init__.py
diff --git a/tests/runners/airflow/rainbow/hello_world/__init__.py b/tests/runners/airflow/rainbow/myserver/my_server.py
similarity index 95%
rename from tests/runners/airflow/rainbow/hello_world/__init__.py
rename to tests/runners/airflow/rainbow/myserver/my_server.py
index 217e5db..a3f0f2c 100644
--- a/tests/runners/airflow/rainbow/hello_world/__init__.py
+++ b/tests/runners/airflow/rainbow/myserver/my_server.py
@@ -15,3 +15,7 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+
+
+def myendpoint1func():
+    return '1'
diff --git a/tests/runners/airflow/rainbow/rainbow.yml b/tests/runners/airflow/rainbow/rainbow.yml
index 2000621..e9f9045 100644
--- a/tests/runners/airflow/rainbow/rainbow.yml
+++ b/tests/runners/airflow/rainbow/rainbow.yml
@@ -29,14 +29,14 @@ pipelines:
         type: python
         description: static input task
         image: my_static_input_task_image
-        source: hello_world
+        source: helloworld
         env_vars:
           env1: "a"
           env2: "b"
         input_type: static
         input_path: '[ { "foo": "bar" }, { "foo": "baz" } ]'
         output_path: /output.json
-        cmd: python hello_world.py
+        cmd: python -u helloworld.py
 #      - task: my_parallelized_static_input_task
 #        type: python
 #        description: parallelized static input task
@@ -48,31 +48,26 @@ pipelines:
 #        input_path: '[ { "foo": "bar" }, { "foo": "baz" } ]'
 #        split_input: True
 #        executors: 2
-#        cmd: python hello_world.py
+#        cmd: python -u helloworld.py
       - task: my_task_output_input_task
         type: python
         description: parallelized static input task
         image: my_task_output_input_task_image
-        source: hello_world
+        source: helloworld
         env_vars:
           env1: "a"
           env2: "b"
         input_type: task
         input_path: my_static_input_task
-        cmd: python hello_world.py
+        cmd: python -u helloworld.py
 services:
   - service:
-    name: myserver1
-    type: python-server
+    name: my_python_server
+    type: python_server
     description: my python server
-    artifact-id: myserver1artifactid
-    source: myserver1logicfolder
+    image: my_server_image
+    source: myserver
     endpoints:
-      - endpoint:
-        path: /myendpoint1
-        module: mymodule1
-        function: myfun1
-      - endpoint:
-        path: /myendpoint2
-        module: mymodule2
-        function: myfun2
+      - endpoint: /myendpoint1
+        module: myserver.my_server
+        function: myendpoint1func
diff --git a/tests/runners/airflow/tasks/test_python.py b/tests/runners/airflow/tasks/test_python.py
index 260f71d..18e6c1a 100644
--- a/tests/runners/airflow/tasks/test_python.py
+++ b/tests/runners/airflow/tasks/test_python.py
@@ -50,7 +50,7 @@ class TestPythonTask(TestCase):
             'task': task_id,
             'cmd': 'foo bar',
             'image': 'rainbow_image',
-            'source': 'tests/runners/airflow/rainbow/hello_world',
+            'source': 'tests/runners/airflow/rainbow/helloworld',
             'input_type': 'my_input_type',
             'input_path': 'my_input',
             'output_path': '/my_output.json'


[incubator-liminal] 06/43: Add run_tests script

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit c74af6c3a3496073c93cc9ca41ccfcd5a9c2a1cc
Author: aviemzur <av...@gmail.com>
AuthorDate: Wed Mar 11 14:13:47 2020 +0200

    Add run_tests script
---
 run_tests.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/run_tests.sh b/run_tests.sh
new file mode 100755
index 0000000..3e5cd2f
--- /dev/null
+++ b/run_tests.sh
@@ -0,0 +1,3 @@
+#!/bin/sh
+
+python -m unittest
\ No newline at end of file


[incubator-liminal] 40/43: fix jobEndStatus tasks state check

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit fac89af6108d5d426ed415a997eab88d5ada761a
Author: zionrubin <zi...@naturalint.com>
AuthorDate: Tue Jun 23 15:16:55 2020 +0300

    fix jobEndStatus tasks state check
---
 .../airflow/operators/job_status_operator.py       | 38 +++++++++++++++++++---
 1 file changed, 33 insertions(+), 5 deletions(-)

diff --git a/rainbow/runners/airflow/operators/job_status_operator.py b/rainbow/runners/airflow/operators/job_status_operator.py
index ae9382a..8ea997d 100644
--- a/rainbow/runners/airflow/operators/job_status_operator.py
+++ b/rainbow/runners/airflow/operators/job_status_operator.py
@@ -17,12 +17,16 @@
 # specific language governing permissions and limitations
 # under the License.
 from datetime import datetime
+from typing import Any
 
 import pytz
 from airflow.contrib.hooks.aws_hook import AwsHook
 from airflow.exceptions import AirflowException
+from airflow.lineage import apply_lineage
 from airflow.models import BaseOperator
+from airflow.utils.db import provide_session
 from airflow.utils.decorators import apply_defaults
+from airflow.utils.state import State
 
 
 class JobStatusOperator(BaseOperator):
@@ -94,6 +98,7 @@ class JobEndOperator(JobStatusOperator):
         super().__init__(backends=backends, *args, **kwargs)
         self.namespace = namespace
         self.application_name = application_name
+        self.__job_result = 0
 
     def metrics(self, context):
         duration = round((pytz.utc.localize(datetime.utcnow()) - context[
@@ -102,19 +107,40 @@ class JobEndOperator(JobStatusOperator):
         self.log.info('Elapsed time: %s' % duration)
 
         task_instances = context['dag_run'].get_task_instances()
-        task_states = [task_instance.state for task_instance in task_instances[:-1]]
 
-        job_result = 0
-        if all(state == 'success' for state in task_states):
-            job_result = 1
+        task_states = [self.__log_and_get_state(task_instance)
+                       for task_instance in task_instances
+                       if task_instance.task_id != context['task_instance'].task_id]
+
+        if all((state == State.SUCCESS or state == State.SKIPPED) for state in task_states):
+            self.__job_result = 1
 
         return [
-            Metric(self.namespace, 'JobResult', job_result,
+            Metric(self.namespace, 'JobResult', self.__job_result,
                    [Tag('ApplicationName', self.application_name)]),
             Metric(self.namespace, 'JobDuration', duration,
                    [Tag('ApplicationName', self.application_name)])
         ]
 
+    def __log_and_get_state(self, task_instance):
+        state = task_instance.state
+
+        self.log.info(f'{task_instance.task_id} finished with state: {state}')
+
+        return state
+
+    @apply_lineage
+    @provide_session
+    def post_execute(self, context: Any, result: Any = None, session=None):
+        if self.__job_result == 0:
+            self.log.info("Failing this DAG run due to task failure.")
+
+            dag_run = context['ti'].get_dagrun()
+            dag_run.end_date = datetime.utcnow()
+            dag_run.state = State.FAILED
+
+            session.merge(dag_run)
+
 
 # noinspection PyAbstractClass
 class CloudWatchHook(AwsHook):
@@ -150,6 +176,8 @@ class CloudWatchHook(AwsHook):
             ]
         )
 
+        self.log.info(f'Published metric: {metric.name} with value: {value}')
+
 
 class Metric:
     """


[incubator-liminal] 05/43: Add 'build' package

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 1c204b6cb2b5a90dbe2723f01726dee67e2f8e56
Author: aviemzur <av...@gmail.com>
AuthorDate: Wed Mar 11 14:11:54 2020 +0200

    Add 'build' package
---
 .../compiler/rainbow_compiler.py => build/__init__.py}  |  2 +-
 rainbow/runners/airflow/compiler/__init__.py            | 17 -----------------
 2 files changed, 1 insertion(+), 18 deletions(-)

diff --git a/rainbow/runners/airflow/compiler/rainbow_compiler.py b/rainbow/build/__init__.py
similarity index 96%
rename from rainbow/runners/airflow/compiler/rainbow_compiler.py
rename to rainbow/build/__init__.py
index bed1efd..9e84106 100644
--- a/rainbow/runners/airflow/compiler/rainbow_compiler.py
+++ b/rainbow/build/__init__.py
@@ -16,5 +16,5 @@
 # specific language governing permissions and limitations
 # under the License.
 """
-TODO: compiler for rainbows.
+TODO: rainbow build.
 """
diff --git a/rainbow/runners/airflow/compiler/__init__.py b/rainbow/runners/airflow/compiler/__init__.py
deleted file mode 100644
index 217e5db..0000000
--- a/rainbow/runners/airflow/compiler/__init__.py
+++ /dev/null
@@ -1,17 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.


[incubator-liminal] 26/43: Add example repository structure to README

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit 2ce98f48a4be9b2da826432f8fbeddf7d33ae293
Author: aviemzur <av...@gmail.com>
AuthorDate: Mon Mar 23 10:39:17 2020 +0200

    Add example repository structure to README
---
 README.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/README.md b/README.md
index 90c6b78..3e46f34 100644
--- a/README.md
+++ b/README.md
@@ -76,6 +76,8 @@ services:
         function: myendpoint1func
 ```
 
+## Example repository structure
+[Example repository structure](https://github.com/Natural-Intelligence/rainbow/tree/master/tests/runners/airflow/rainbow])
 # Installation
 
 TODO: installation.


[incubator-liminal] 30/43: Fix missing tasks/dags bug

Posted by jb...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jbonofre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git

commit c21bfdf492e712315becd97b24843fdb3800f68c
Author: aviemzur <av...@gmail.com>
AuthorDate: Tue Apr 7 14:48:48 2020 +0300

    Fix missing tasks/dags bug
---
 rainbow/runners/airflow/dag/rainbow_dags.py | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/rainbow/runners/airflow/dag/rainbow_dags.py b/rainbow/runners/airflow/dag/rainbow_dags.py
index 17fd8d9..6b071fd 100644
--- a/rainbow/runners/airflow/dag/rainbow_dags.py
+++ b/rainbow/runners/airflow/dag/rainbow_dags.py
@@ -31,7 +31,7 @@ from rainbow.runners.airflow.tasks.defaults.job_start import JobStartTask
 
 def register_dags(configs_path):
     """
-    TODO: doc for register_dags
+    Registers pipelines in rainbow yml files found in given path (recursively) as airflow DAGs.
     """
 
     config_files = files_util.find_config_files(configs_path)
@@ -75,15 +75,16 @@ def register_dags(configs_path):
 
                     parent = task_instance.apply_task_to_dag()
 
-                    job_end_task = JobEndTask(dag, pipeline_name, parent, pipeline, 'all_done')
-                    job_end_task.apply_task_to_dag()
+                job_end_task = JobEndTask(dag, pipeline_name, parent, pipeline, 'all_done')
+                job_end_task.apply_task_to_dag()
 
-                    print(f'{pipeline_name}: {dag.tasks}')
+                print(f'{pipeline_name}: {dag.tasks}')
 
-                    globals()[pipeline_name] = dag
+                globals()[pipeline_name] = dag
 
-                    dags.append(dag)
-                    return dags
+                dags.append(dag)
+
+            return dags
 
 
 print(f'Loading task implementations..')