You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@griffin.apache.org by gu...@apache.org on 2018/09/14 02:22:06 UTC

incubator-griffin git commit: Updated documentation for outputs

Repository: incubator-griffin
Updated Branches:
  refs/heads/master c5e31756e -> b5360ec70


Updated documentation for outputs

Documentation is out of sync with code after 7b749ad72a78eb4244bcf47a80a52f2a6e3f222f.
Updating measuring examples and format description accordingly.

Author: Nikolay Sokolov <ch...@gmail.com>

Closes #414 from chemikadze/out-documentation.


Project: http://git-wip-us.apache.org/repos/asf/incubator-griffin/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-griffin/commit/b5360ec7
Tree: http://git-wip-us.apache.org/repos/asf/incubator-griffin/tree/b5360ec7
Diff: http://git-wip-us.apache.org/repos/asf/incubator-griffin/diff/b5360ec7

Branch: refs/heads/master
Commit: b5360ec70587454502e3c713f97df244d5bd37c0
Parents: c5e3175
Author: Nikolay Sokolov <ch...@gmail.com>
Authored: Fri Sep 14 10:21:57 2018 +0800
Committer: William Guo <gu...@apache.org>
Committed: Fri Sep 14 10:21:57 2018 +0800

----------------------------------------------------------------------
 griffin-doc/measure/measure-batch-sample.md     | 36 ++++++++++-------
 .../measure/measure-configuration-guide.md      | 41 +++++++++++++-------
 griffin-doc/measure/measure-streaming-sample.md | 37 +++++++++++-------
 3 files changed, 72 insertions(+), 42 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/b5360ec7/griffin-doc/measure/measure-batch-sample.md
----------------------------------------------------------------------
diff --git a/griffin-doc/measure/measure-batch-sample.md b/griffin-doc/measure/measure-batch-sample.md
index af5d43a..c504afa 100644
--- a/griffin-doc/measure/measure-batch-sample.md
+++ b/griffin-doc/measure/measure-batch-sample.md
@@ -68,12 +68,16 @@ Measures consists of batch measure and streaming measure. This document is for t
           "total": "total_count",
           "matched": "matched_count"
         },
-        "metric": {
-          "name": "accu"
-        },
-        "record": {
-          "name": "missRecords"
-        }
+        "out": [
+          {
+            "type": "metric",
+            "name": "accu"
+          },
+          {
+            "type": "record",
+            "name": "missRecords"
+          }        
+        ]        
       }
     ]
   }
@@ -119,19 +123,25 @@ The miss records of source will be persisted as record.
         "dq.type": "profiling",
         "name": "prof",
         "rule": "select max(age) as `max_age`, min(age) as `min_age` from source",
-        "metric": {
-          "name": "prof"
-        }
+        "out": [
+          {
+            "type": "metric",
+            "name": "prof"
+          }        
+        ]        
       },
       {
         "dsl.type": "griffin-dsl",
         "dq.type": "profiling",
         "name": "name_grp",
         "rule": "select name, count(*) as cnt from source group by name",
-        "metric": {
-          "name": "name_grp",
-          "collect.type": "array"
-        }
+        "out": [
+          {
+            "type": "metric",
+            "name": "name_grp",
+            "flatten": "array"
+          }        
+        ]
       }
     ]
   }

http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/b5360ec7/griffin-doc/measure/measure-configuration-guide.md
----------------------------------------------------------------------
diff --git a/griffin-doc/measure/measure-configuration-guide.md b/griffin-doc/measure/measure-configuration-guide.md
index 53f173c..52e8517 100644
--- a/griffin-doc/measure/measure-configuration-guide.md
+++ b/griffin-doc/measure/measure-configuration-guide.md
@@ -158,13 +158,16 @@ Above lists environment parameters.
           "miss": "miss_count",
           "total": "total_count",
           "matched": "matched_count"
-        },
-        "metric": {
-          "name": "accu"
-        },
-        "record": {
-          "name": "missRecords"
-        }
+        },        
+        "out": [
+          {
+            "type": "metric",
+            "name": "accu"
+          },
+          {
+            "type": "record"
+          }        
+        ]
       }
     ]
   }
@@ -201,7 +204,7 @@ Above lists DQ job configure parameters.
 
 ### <a name="rule"></a>Rule
 - **dsl.type**: Rule dsl type, "spark-sql", "df-opr" and "griffin-dsl".
-- **dq.type**: DQ type of this rule, only for "griffin-dsl" type, supporting "accuracy" and "profiling".
+- **dq.type**: DQ type of this rule, only for "griffin-dsl" type. Supported types: "accuracy", "profiling", "timeliness", "uniqueness", "completeness".
 - **name** (step information): Result table name of this rule, optional for "griffin-dsl" type.
 - **rule**: The rule string.
 - **details**: Details of this rule, optional.
@@ -235,10 +238,18 @@ Above lists DQ job configure parameters.
     * source: name of data source to measure timeliness.
     * latency: the latency column name in metric, optional.
     * threshold: optional, if set as a time string like "1h", the items with latency more than 1 hour will be record.
-- **metric**: Configuration of metric export.
-  + name: name of metric.
-  + collect.type: collect metric as the type set, including "default", "entries", "array", "map", optional.
-- **record**: Configuration of record export.
-  + name: name of record.
-  + data.source.cache: optional, if set as data source name, the cache of this data source will be updated by the records, always used in streaming accuracy case.
-  + origin.DF: avaiable only if "data.source.cache" is set, the origin data frame name of records.
\ No newline at end of file
+- **out**: Lits of output sinks for the job.
+  + Metric output.
+    * type: "metric"
+    * name: Metric name, semantics depends on "flatten" field value.   
+    * flatten: Aggregation method used before sending data frame result into the sink:  
+      - default: use "array" if data frame returned multiple records, otherwise use "entries" 
+      - entries: sends first row of data frame as metric results, like like `{"agg_col": "value"}`
+      - array: wraps all metrics into a map, like `{"my_out_name": [{"agg_col": "value"}]}`
+      - map: wraps first row of data frame into a map, like `{"my_out_name": {"agg_col": "value"}}`
+  + Record output. Currenly handled only by HDFS sink.
+    * type: "record"
+    * name: File name within sink output folder to dump files to.   
+  + Data source cache update for streaming jobs.
+    * type: "dsc-update"
+    * name: Data source name to update cache.   

http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/b5360ec7/griffin-doc/measure/measure-streaming-sample.md
----------------------------------------------------------------------
diff --git a/griffin-doc/measure/measure-streaming-sample.md b/griffin-doc/measure/measure-streaming-sample.md
index 5c80576..49093bf 100644
--- a/griffin-doc/measure/measure-streaming-sample.md
+++ b/griffin-doc/measure/measure-streaming-sample.md
@@ -128,13 +128,16 @@ Measures consists of batch measure and streaming measure. This document is for t
           "total": "total_count",
           "matched": "matched_count"
         },
-        "metric": {
-          "name": "accu"
-        },
-        "record": {
-          "name": "missRecords",
-          "data.source.cache": "source"
-        }
+        "out": [
+          {
+            "type": "metric",
+            "name": "accu"
+          },
+          {
+            "type": "record",
+            "name": "missRecords",
+          }        
+        ]
       }
     ]
   }
@@ -228,19 +231,25 @@ The miss records of source will be persisted as record.
         "dq.type": "profiling",
         "name": "prof",
         "rule": "select count(name) as `cnt`, max(age) as `max`, min(age) as `min` from source",
-        "metric": {
-          "name": "prof"
-        }
+        "out": [
+          {
+            "type": "metric",
+            "name": "prof"
+          }        
+        ]        
       },
       {
         "dsl.type": "griffin-dsl",
         "dq.type": "profiling",
         "name": "grp",
         "rule": "select name, count(*) as `cnt` from source group by name",
-        "metric": {
-          "name": "name_group",
-          "collect.type": "array"
-        }
+        "out": [
+          {
+            "type": "metric",
+            "name": "name_group",
+            "flatten": "array"
+          }        
+        ]        
       }
     ]
   }