You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@liminal.apache.org by li...@apache.org on 2021/07/26 05:50:00 UTC

[incubator-liminal] branch master updated: iris getting started (#57)

This is an automated email from the ASF dual-hosted git repository.

lior pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-liminal.git


The following commit(s) were added to refs/heads/master by this push:
     new ec63063  iris getting started (#57)
ec63063 is described below

commit ec63063aa1360cd83fcb4d1b2e5d6e03f8c692e5
Author: Lidor Ettinger <li...@gmail.com>
AuthorDate: Mon Jul 26 08:49:42 2021 +0300

    iris getting started (#57)
    
    * iris getting started
    
    * update readme
    
    * rewrite desc
    
    * fix link
    
    * cleanup desc
    
    * redefine desc
    
    * redefine desc
    
    * redefine funcs
    
    * adding manifest
    
    * move manifest
    
    * update manifests
---
 docs/README.md                                     |   5 +-
 .../hello_world.md}                                |  57 ++--
 docs/getting-started/iris_classification.md        | 296 +++++++++++++++++++++
 docs/nstatic/{ => hello-world}/airflow_main.png    | Bin
 .../nstatic/{ => hello-world}/airflow_task_log.png | Bin
 .../nstatic/{ => hello-world}/airflow_view_dag.png | Bin
 .../nstatic/{ => hello-world}/airflow_view_log.png | Bin
 docs/nstatic/iris-classification/airflow_main.png  | Bin 0 -> 207885 bytes
 .../iris-classification/airflow_task_log.png       | Bin 0 -> 625315 bytes
 .../iris-classification/airflow_view_dag.png       | Bin 0 -> 250899 bytes
 .../iris-classification/airflow_view_log.png       | Bin 0 -> 359040 bytes
 .../aws-ml-app-demo/manifests/aws-ml-app-demo.yaml |  23 ++
 12 files changed, 355 insertions(+), 26 deletions(-)

diff --git a/docs/README.md b/docs/README.md
index 588af11..c5aa8ed 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -35,10 +35,11 @@ Using simple YAML configuration, create your own schedule data pipelines (a sequ
 perform), application servers,  and more.
 
 ## Getting Started
-A simple getting stated guide for Liminal can be found [here](getting_started.md)
+A simple hello world guide for Liminal can be found [here](getting-started/hello_world.md) \
+A more advanced example which demonstrates a simple data-science workflow can be found [here](getting-started/iris_classification.md
 
 ## Apache Liminal Documentation
 Full documentation of Apache Liminal can be found [here](liminal)
 
 ## High Level Architecture
-High level architecture documentation can be found [here](architecture.md)
+High level architecture documentation can be found [here](architecture.md)
\ No newline at end of file
diff --git a/docs/getting_started.md b/docs/getting-started/hello_world.md
similarity index 78%
rename from docs/getting_started.md
rename to docs/getting-started/hello_world.md
index 5e5a991..f036d5a 100644
--- a/docs/getting_started.md
+++ b/docs/getting-started/hello_world.md
@@ -17,7 +17,7 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Getting started
+# Getting started / ***Hello World***
 
 This guide will allow you to set up your first Apache Liminal environment and allow you to create
 some simple ML pipelines. These will be very similar to the ones you are going to build for real
@@ -32,45 +32,48 @@ Python 3 (3.6 and up)
 *Note: Make sure kubernetes cluster is running in docker desktop (or custom kubernetes installation
 on your machine).*
 
-## Hello World
+## Deploying the Example
 
 In this tutorial, we will go through setting up Liminal for the first time on your local machine.
 
-First, let’s build our examples project:
+### First, let’s build our example project:
 
 In the dev folder, just clone the example code from liminal:
 
 
-```
+```BASH
 git clone https://github.com/apache/incubator-liminal
 ```
 ***Note:*** *You just cloned the entire Liminal Project, you actually only need examples folder.*
 
 Create a python virtual environment to isolate your runs:
 
-```
+```BASH
 cd incubator-liminal/examples/liminal-getting-started
 python3 -m venv env
 ```
 
 Activate your virtual environment:
 
-```
+```BASH
 source env/bin/activate
 ```
 
 Now we are ready to install liminal:
 
-```
+```BASH
 pip install apache-liminal
 ```
 Let's build the images you need for the example:
-```
+```BASH
 liminal build
 ```
-The build will create docker images based on the liminal.yml file in the `images` section.
-
+##### The build will create docker images based on the liminal.yml file in the `images` section.
+Create a kubernetes local volume:
+```BASH
+liminal create
 ```
+```BASH
 liminal deploy --clean  
 ```
 The deploy command deploys a liminal server and deploys any liminal.yml files in your working
@@ -81,7 +84,7 @@ If the LIMINAL_HOME environemnet variable is not defined, home directory default
 ~/liminal_home directory.*
 
 Now lets runs liminal:
-```
+```BASH
 liminal start
 ```
 The start command spins up the liminal server containers which will run pipelines based on your
@@ -96,26 +99,29 @@ Once liminal server has completed starting up, you can navigate to admin UI in y
 By default liminal server starts Apache Airflow servers and admin UI will be that of Apache Airflow.
 
 
-![](nstatic/airflow_main.png)
+![](../nstatic/hello-world/airflow_main.png)
 
 ***Important:** Set off/on toggle to activate your pipeline (DAG), nothing will happen otherwise!*
 
-You can go to tree view to see all the tasks configured in the liminal.yml file: 
-[http://localhost:8080/admin/airflow/tree?dag_id=example_pipeline](
-http://localhost:8080/admin/airflow/tree?dag_id=example_pipeline
+You can go to graph view to see all the tasks configured in the liminal.yml file: 
+[http://localhost:8080/admin/airflow/graph?dag_id=example_pipeline](
+http://localhost:8080/admin/airflow/graph?dag_id=example_pipeline
 )
 
-Now lets see what actually happened to our task:
+#### Now lets see what actually happened to our task:
+
+![](../nstatic/hello-world/airflow_view_dag.png)
+
 
-![](nstatic/airflow_view_dag.png)
+#### Click on “hello_world_example” and you will get this popup:
 
-Click on “hello_world_example” and you will get this popup: \
+![](../nstatic/hello-world/airflow_view_log.png)
 
-![](nstatic/airflow_view_log.png) \
-Click on “view log” button and you can see the log of the current task run: \
 
+#### Click on “view log” button and you can see the log of the current task run:
+
+![](../nstatic/hello-world/airflow_task_log.png)
 
-![](nstatic/airflow_task_log.png)
 
 ## Mounted volumes
 All tasks use a mounted volume as defined in the pipeline YAML:
@@ -123,10 +129,11 @@ All tasks use a mounted volume as defined in the pipeline YAML:
 name: GettingStartedPipeline
 volumes:
   - volume: gettingstartedvol
+    claim_name: gettingstartedvol-pvc
     local:
-      path: ./
+      path: .
 ```
-In our case the mounted volume will point to the liminal hello world example.
+In our case the mounted volume will point to the liminal hello world example. \
 The hello world task will read the **hello_world.json** file from the mounted volume and will write
 the **hello_world_output.json** to it.
 
@@ -142,16 +149,18 @@ described under the task section in the yml:
        path: /mnt/vol1
 ```
 
+
 ## Here are the entire list of commands, if you want to start from scratch:
 
 ```
 git clone https://github.com/apache/incubator-liminal
-cd examples
+cd examples/liminal-getting-started
 python3 -m venv env
 source env/bin/activate
 pip uninstall apache-liminal
 pip install apache-liminal
 Liminal build
+Liminal create
 liminal deploy --clean
 liminal start
 ```
diff --git a/docs/getting-started/iris_classification.md b/docs/getting-started/iris_classification.md
new file mode 100644
index 0000000..1e9ce74
--- /dev/null
+++ b/docs/getting-started/iris_classification.md
@@ -0,0 +1,296 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Getting started / ***Iris Classification***
+
+* [Setup your local environment](#Setup-your-local-environment)
+* [Setup liminal](#setup-liminal)
+    * [Liminal build](#Liminal-build)
+    * [Liminal create](#Liminal-create)
+    * [Liminal deploy](#Liminal-deploy)
+    * [Liminal start](#Liminal-start)
+* [Liminal YAML walkthrough](#Liminal-YAML-walkthrough)
+* [Evaluate the Iris Classification model](#Evaluate-the-iris-classification-model)
+* [Debugging Kubernetes Deployments](#Debugging-Kubernetes-Deployments)
+* [Closing up](#Closing-up)
+
+In this tutorial, we will guide you through setting up Apache Liminal on your local machine and run a simple machine-learning workflow, based on the classic Iris dataset classification example. \
+More details in this [link](https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html).
+
+#### Prerequisites
+
+* [Python 3 (3.6 and up)](https://www.python.org/downloads)
+* [Python Virtual Environments](https://pypi.org/project/virtualenv)
+* [Docker Desktop](https://www.docker.com/products/docker-desktop)
+* [Kubernetes CLI (kubectl)](https://kubernetes.io/docs/tasks/tools/install-kubectl-macos)
+
+*Note: Make sure kubernetes cluster is running in docker desktop*
+
+We will define the following steps and services to implement the Iris classification example: \
+Train, Validate & Deploy - Training and validation execution is managed by Liminal Airflow extension. The training task trains a regression model using a public dataset. \
+We then validate the model and deploy it to a model-store in mounted volume. \
+Inference - online inference is done using a Python Flask service running on the local Kubernetes in docker desktop. The service exposes the `/predict` endpoint. It reads the model stored in the mounted drive and uses it to evaluate the request.
+## Setup your local env environment
+
+In the dev folder, clone the example code from liminal:
+
+
+```BASH
+git clone https://github.com/apache/incubator-liminal
+```
+***Note:*** *You just cloned the entire Liminal Project, you actually only need examples folder.*
+
+
+
+Create a python virtual environment to isolate your runs:
+
+```BASH
+cd incubator-liminal/examples/aws-ml-app-demo
+python3 -m venv env
+```
+
+Activate your virtual environment:
+
+```BASH
+source env/bin/activate
+```
+
+Now we are ready to install liminal:
+
+```BASH
+pip install apache-liminal
+```
+
+## Setup liminal
+### Liminal build
+The build will create docker images based on the liminal.yml file in the `images` section.
+```BASH
+liminal build
+```
+
+### Liminal create
+All tasks use a mounted volume as defined in the pipeline YAML. \
+In our case the mounted volume will point to the liminal Iris Classification example.
+The training task trains a regression model using a public dataset. We then validate the model and deploy it to a model-store in the mounted volume.
+
+Create a kubernetes local volume:
+```BASH
+liminal create
+```
+
+### Liminal deploy
+The deploy command deploys a liminal server and deploys any liminal.yml files in your working directory or any of its subdirectories to your liminal home directory.
+```BASH
+liminal deploy --clean  
+```
+
+*Note: liminal home directory is located in the path defined in LIMINAL_HOME env variable.
+If the LIMINAL_HOME environemnet variable is not defined, home directory defaults to
+~/liminal_home directory.*
+
+### Liminal start
+The start command spins up 3 containers that load the Apache Airflow stack. Liminal's Airflow extension is responsible to execute the workflows defined in the liminal.yml file as standard Airflow DAGs.
+```BASH
+liminal start
+```
+
+It runs the following three containers: 
+* liminal-postgress
+* liminal-webserver
+* liminal-scheduler
+
+Once liminal server has completed starting up, you can navigate to admin UI in your browser:
+[http://localhost:8080](http://localhost:8080)
+
+
+![](../nstatic/iris-classification/airflow_main.png)
+
+***Important:** Set off/on toggle to activate your pipeline (DAG), nothing will happen otherwise!*
+
+You can go to graph view to see all the tasks configured in the liminal.yml file: 
+[http://localhost:8080/admin/airflow/graph?dag_id=my_datascience_pipeline](
+http://localhost:8080/admin/airflow/graph?dag_id=my_datascience_pipeline
+)
+
+#### Now lets see what actually happened to our task:
+![](../nstatic/iris-classification/airflow_view_dag.png)
+
+
+#### Click on “train” and you will get this popup:
+![](../nstatic/iris-classification/airflow_view_log.png)
+
+
+#### Click on “view log” button and you can see the log of the current task run:
+![](../nstatic/iris-classification/airflow_task_log.png)
+
+## Liminal YAML walkthrough
+* [Mounted volumes](#Mounted-volumes)
+* [Pipeline flow](#Pipeline-flow)
+
+### Mounted volumes
+Declaration of the mounted volume in your liminal YAML:
+```YAML
+name: MyDataScienceApp
+owner: Bosco Albert Baracus
+volumes:
+  - volume: gettingstartedvol
+    claim_name: gettingstartedvol-pvc
+    local:
+      path: .
+```
+
+### Pipeline flow
+Declaration of the pipeline tasks flow in your liminal YAML:
+```YAML
+pipelines:
+  - pipeline: my_datascience_pipeline
+    ...
+    schedule: 0 * 1 * *
+    tasks:
+      - task: train
+        type: python
+        description: train model
+        image: myorg/mydatascienceapp
+        cmd: python -u training.py train
+        ...
+      - task: validate
+        type: python
+        description: validate model and deploy
+        image: myorg/mydatascienceapp
+        cmd: python -u training.py validate
+        ...
+```
+
+##### Each task will internally mount the volume defined above to an internal representation, described under the task section in the yml:
+
+```YAML
+pipelines:
+    ...
+    tasks:
+      - task: train
+        ...
+        env:
+          MOUNT_PATH: /mnt/gettingstartedvol
+        mounts:
+          - mount: mymount
+            volume: gettingstartedvol
+            path: /mnt/gettingstartedvol
+```
+###### We specify the `MOUNT_PATH` in which we store the trained model.
+
+## Evaluate the iris classification model
+
+Once the Iris Classification model trainging is completed and model is deployed (to the mounted volume), you can launch a pod of the pre-built image which contains a flask server, by applying the following Kubernetes manifest configuration:
+```BASH
+kubectl apply -f manifests/aws-ml-app-demo.yaml
+```
+
+Alternatively, create a Kubernetes pod from stdin:
+```YAML
+cat <<EOF | kubectl apply -f -
+---
+apiVersion: v1
+kind: Pod
+metadata:
+  name: aws-ml-app-demo
+spec:
+  volumes:
+    - name: task-pv-storage
+      persistentVolumeClaim:
+        claimName: gettingstartedvol-pvc
+  containers:
+    - name: task-pv-container
+      imagePullPolicy: Never
+      image: myorg/mydatascienceapp
+      lifecycle:
+        postStart:
+          exec:
+            command: ["/bin/bash", "-c", "apt update && apt install curl -y"]
+      ports:
+        - containerPort: 80
+          name: "http-server"
+      volumeMounts:
+        - mountPath: "/mnt/gettingstartedvol"
+          name: task-pv-storage
+EOF
+```
+
+Check that the service is running:
+```BASH
+kubectl get pods --namespace=default
+```
+
+Check that the service is up:
+```BASH
+kubectl exec -it --namespace=default aws-ml-app-demo -- /bin/bash -c "curl localhost/healthcheck"
+```
+
+Check the prediction:
+```BASH
+kubectl exec -it --namespace=default aws-ml-app-demo -- /bin/bash -c "curl -X POST -d '{\"petal_width\": \"2.1\"}' localhost/predict"
+```
+
+## Debugging Kubernetes Deployments
+kubectl get pods will help you check your pod status:
+```BASH
+kubectl get pods --namespace=default
+```
+kubectl logs will help you check your pods log:
+```BASH
+kubectl logs --namespace=default aws-ml-app-demo
+```
+kubectl exec to get a shell to a running container:
+```BASH
+kubectl exec --namespace=default aws-ml-app-demo -- bash
+```
+Then you can check the mounted volume `df -h` and to verify the result of the model.
+
+
+## Here are the entire list of commands, if you want to start from scratch:
+
+```
+git clone https://github.com/apache/incubator-liminal
+cd examples/aws-ml-app-demo
+python3 -m venv env
+source env/bin/activate
+rm -rf ~/liminal_home
+pip uninstall apache-liminal
+pip install apache-liminal
+Liminal build
+Liminal create
+liminal deploy --clean
+liminal start
+```
+
+## Closing up
+
+To make sure liminal containers are stopped use:
+```
+liminal stop
+```
+
+To deactivate the python virtual env use:
+```
+deactivate
+```
+
+To terminate the kubernetes pod:
+```
+kubectl delete pod --namespace=default aws-ml-app-demo
+```
diff --git a/docs/nstatic/airflow_main.png b/docs/nstatic/hello-world/airflow_main.png
similarity index 100%
rename from docs/nstatic/airflow_main.png
rename to docs/nstatic/hello-world/airflow_main.png
diff --git a/docs/nstatic/airflow_task_log.png b/docs/nstatic/hello-world/airflow_task_log.png
similarity index 100%
rename from docs/nstatic/airflow_task_log.png
rename to docs/nstatic/hello-world/airflow_task_log.png
diff --git a/docs/nstatic/airflow_view_dag.png b/docs/nstatic/hello-world/airflow_view_dag.png
similarity index 100%
rename from docs/nstatic/airflow_view_dag.png
rename to docs/nstatic/hello-world/airflow_view_dag.png
diff --git a/docs/nstatic/airflow_view_log.png b/docs/nstatic/hello-world/airflow_view_log.png
similarity index 100%
rename from docs/nstatic/airflow_view_log.png
rename to docs/nstatic/hello-world/airflow_view_log.png
diff --git a/docs/nstatic/iris-classification/airflow_main.png b/docs/nstatic/iris-classification/airflow_main.png
new file mode 100644
index 0000000..1f21875
Binary files /dev/null and b/docs/nstatic/iris-classification/airflow_main.png differ
diff --git a/docs/nstatic/iris-classification/airflow_task_log.png b/docs/nstatic/iris-classification/airflow_task_log.png
new file mode 100644
index 0000000..d2d7ab2
Binary files /dev/null and b/docs/nstatic/iris-classification/airflow_task_log.png differ
diff --git a/docs/nstatic/iris-classification/airflow_view_dag.png b/docs/nstatic/iris-classification/airflow_view_dag.png
new file mode 100644
index 0000000..3037b77
Binary files /dev/null and b/docs/nstatic/iris-classification/airflow_view_dag.png differ
diff --git a/docs/nstatic/iris-classification/airflow_view_log.png b/docs/nstatic/iris-classification/airflow_view_log.png
new file mode 100644
index 0000000..f00eb5a
Binary files /dev/null and b/docs/nstatic/iris-classification/airflow_view_log.png differ
diff --git a/examples/aws-ml-app-demo/manifests/aws-ml-app-demo.yaml b/examples/aws-ml-app-demo/manifests/aws-ml-app-demo.yaml
new file mode 100644
index 0000000..1bfc21c
--- /dev/null
+++ b/examples/aws-ml-app-demo/manifests/aws-ml-app-demo.yaml
@@ -0,0 +1,23 @@
+apiVersion: v1
+kind: Pod
+metadata:
+  name: aws-ml-app-demo
+spec:
+  volumes:
+    - name: task-pv-storage
+      persistentVolumeClaim:
+        claimName: gettingstartedvol-pvc
+  containers:
+    - name: task-pv-container
+      imagePullPolicy: Never
+      image: myorg/mydatascienceapp
+      lifecycle:
+        postStart:
+          exec:
+            command: ["/bin/bash", "-c", "apt update && apt install curl -y"]
+      ports:
+        - containerPort: 80
+          name: "http-server"
+      volumeMounts:
+        - mountPath: "/mnt/gettingstartedvol"
+          name: task-pv-storage
\ No newline at end of file