You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@griffin.apache.org by gu...@apache.org on 2018/09/12 11:26:43 UTC
incubator-griffin-site git commit: rename quickstart

Repository: incubator-griffin-site
Updated Branches:
  refs/heads/master b0f9c38ee -> b424dc9bd


rename quickstart


Project: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/commit/b424dc9b
Tree: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/tree/b424dc9b
Diff: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/diff/b424dc9b

Branch: refs/heads/master
Commit: b424dc9bd926051af0974488d7d9a6ca17eca9e2
Parents: b0f9c38
Author: William Guo <gu...@apache.org>
Authored: Wed Sep 12 19:26:35 2018 +0800
Committer: William Guo <gu...@apache.org>
Committed: Wed Sep 12 19:26:35 2018 +0800

----------------------------------------------------------------------
 quick-start.md | 132 ----------------------------------------------------
 quickstart.md  | 132 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 132 insertions(+), 132 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/blob/b424dc9b/quick-start.md
----------------------------------------------------------------------
diff --git a/quick-start.md b/quick-start.md
deleted file mode 100644
index 9922b49..0000000
--- a/quick-start.md
+++ /dev/null
@@ -1,132 +0,0 @@
----
-layout: doc
-title:  "Quick Start" 
-permalink: /docs/quickstart.html
----
-
-## Environment Preparation
-Prepare the environment for Apache Griffin. 
-You can use our pre-built docker images as the environment.
-Follow the [docker guide](https://github.com/apache/incubator-griffin/blob/master/griffin-doc/docker/griffin-docker-guide.md#environment-preparation) to start up the docker images, and login to the griffin container.
-```
-docker exec -it <griffin docker container id> bash
-cd ~/measure
-```
-
-## Data Preparation
-Prepare the test data in Hive.
-In the docker image, we've prepared two Hive tables named `demo_src` and `demo_tgt`, and the test data is generated hourly.
-The schema is like this:
-```
-id                      bigint                                      
-age                     int                                         
-desc                    string                                      
-dt                      string                                      
-hour                    string 
-```
-In which `dt` and `hour` are the partition columns, with string values like `20180912` and `06`.
-
-## Configuration Files
-The environment config file: env.json
-```
-{
-  "spark": {
-    "log.level": "WARN"
-  },
-  "sinks": [
-    {
-      "type": "console"
-    },
-    {
-      "type": "hdfs",
-      "config": {
-        "path": "hdfs:///griffin/persist"
-      }
-    },
-    {
-      "type": "elasticsearch",
-      "config": {
-        "method": "post",
-        "api": "http://es:9200/griffin/accuracy"
-      }
-    }
-  ]
-}
-```
-The DQ config file: dq.json
-```
-{
-  "name": "batch_accu",
-  "process.type": "batch",
-  "data.sources": [
-    {
-      "name": "src",
-      "baseline": true,
-      "connectors": [
-        {
-          "type": "hive",
-          "version": "1.2",
-          "config": {
-            "database": "default",
-            "table.name": "demo_src"
-          }
-        }
-      ]
-    }, {
-      "name": "tgt",
-      "connectors": [
-        {
-
-          "type": "hive",
-          "version": "1.2",
-          "config": {
-            "database": "default",
-            "table.name": "demo_tgt"
-          }
-        }
-      ]
-    }
-  ],
-  "evaluate.rule": {
-    "rules": [
-      {
-        "dsl.type": "griffin-dsl",
-        "dq.type": "accuracy",
-        "out.dataframe.name": "accu",
-        "rule": "src.id = tgt.id AND src.age = tgt.age AND src.desc = tgt.desc",
-        "details": {
-          "source": "src",
-          "target": "tgt",
-          "miss": "miss_count",
-          "total": "total_count",
-          "matched": "matched_count"
-        },
-        "out": [
-          {
-            "type": "metric",
-            "name": "accu"
-          },
-          {
-            "type": "record",
-            "name": "missRecords"
-          }
-        ]
-      }
-    ]
-  },
-  "sinks": ["CONSOLE", "HDFS"]
-}
-```
-
-## Submit Measure Job
-Submit the measure job to Spark, with config file paths as parameters.
-```
-spark-submit --class org.apache.griffin.measure.Application --master yarn --deploy-mode client --queue default \
---driver-memory 1g --executor-memory 1g --num-executors 2 \
-<path>/griffin-measure.jar \
-<path>/env.json <path>/batch-accu-config.json
-```
-Then you can get the calculation log in console, after the job finishes, you can get the result metrics printed. The metrics will also be saved in hdfs: `hdfs:///griffin/persist/<job name>/<timestamp>/_METRICS`.
-
-## More Details
-For more details about griffin measures, you can visit our documents in [github](https://github.com/apache/incubator-griffin/tree/master/griffin-doc).

http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/blob/b424dc9b/quickstart.md
----------------------------------------------------------------------
diff --git a/quickstart.md b/quickstart.md
new file mode 100644
index 0000000..9922b49
--- /dev/null
+++ b/quickstart.md
@@ -0,0 +1,132 @@
+---
+layout: doc
+title:  "Quick Start" 
+permalink: /docs/quickstart.html
+---
+
+## Environment Preparation
+Prepare the environment for Apache Griffin. 
+You can use our pre-built docker images as the environment.
+Follow the [docker guide](https://github.com/apache/incubator-griffin/blob/master/griffin-doc/docker/griffin-docker-guide.md#environment-preparation) to start up the docker images, and login to the griffin container.
+```
+docker exec -it <griffin docker container id> bash
+cd ~/measure
+```
+
+## Data Preparation
+Prepare the test data in Hive.
+In the docker image, we've prepared two Hive tables named `demo_src` and `demo_tgt`, and the test data is generated hourly.
+The schema is like this:
+```
+id                      bigint                                      
+age                     int                                         
+desc                    string                                      
+dt                      string                                      
+hour                    string 
+```
+In which `dt` and `hour` are the partition columns, with string values like `20180912` and `06`.
+
+## Configuration Files
+The environment config file: env.json
+```
+{
+  "spark": {
+    "log.level": "WARN"
+  },
+  "sinks": [
+    {
+      "type": "console"
+    },
+    {
+      "type": "hdfs",
+      "config": {
+        "path": "hdfs:///griffin/persist"
+      }
+    },
+    {
+      "type": "elasticsearch",
+      "config": {
+        "method": "post",
+        "api": "http://es:9200/griffin/accuracy"
+      }
+    }
+  ]
+}
+```
+The DQ config file: dq.json
+```
+{
+  "name": "batch_accu",
+  "process.type": "batch",
+  "data.sources": [
+    {
+      "name": "src",
+      "baseline": true,
+      "connectors": [
+        {
+          "type": "hive",
+          "version": "1.2",
+          "config": {
+            "database": "default",
+            "table.name": "demo_src"
+          }
+        }
+      ]
+    }, {
+      "name": "tgt",
+      "connectors": [
+        {
+
+          "type": "hive",
+          "version": "1.2",
+          "config": {
+            "database": "default",
+            "table.name": "demo_tgt"
+          }
+        }
+      ]
+    }
+  ],
+  "evaluate.rule": {
+    "rules": [
+      {
+        "dsl.type": "griffin-dsl",
+        "dq.type": "accuracy",
+        "out.dataframe.name": "accu",
+        "rule": "src.id = tgt.id AND src.age = tgt.age AND src.desc = tgt.desc",
+        "details": {
+          "source": "src",
+          "target": "tgt",
+          "miss": "miss_count",
+          "total": "total_count",
+          "matched": "matched_count"
+        },
+        "out": [
+          {
+            "type": "metric",
+            "name": "accu"
+          },
+          {
+            "type": "record",
+            "name": "missRecords"
+          }
+        ]
+      }
+    ]
+  },
+  "sinks": ["CONSOLE", "HDFS"]
+}
+```
+
+## Submit Measure Job
+Submit the measure job to Spark, with config file paths as parameters.
+```
+spark-submit --class org.apache.griffin.measure.Application --master yarn --deploy-mode client --queue default \
+--driver-memory 1g --executor-memory 1g --num-executors 2 \
+<path>/griffin-measure.jar \
+<path>/env.json <path>/batch-accu-config.json
+```
+Then you can get the calculation log in console, after the job finishes, you can get the result metrics printed. The metrics will also be saved in hdfs: `hdfs:///griffin/persist/<job name>/<timestamp>/_METRICS`.
+
+## More Details
+For more details about griffin measures, you can visit our documents in [github](https://github.com/apache/incubator-griffin/tree/master/griffin-doc).