You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@griffin.apache.org by gu...@apache.org on 2018/09/14 02:22:06 UTC
incubator-griffin git commit: Updated documentation for outputs
Repository: incubator-griffin
Updated Branches:
refs/heads/master c5e31756e -> b5360ec70
Updated documentation for outputs
Documentation is out of sync with code after 7b749ad72a78eb4244bcf47a80a52f2a6e3f222f.
Updating measuring examples and format description accordingly.
Author: Nikolay Sokolov <ch...@gmail.com>
Closes #414 from chemikadze/out-documentation.
Project: http://git-wip-us.apache.org/repos/asf/incubator-griffin/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-griffin/commit/b5360ec7
Tree: http://git-wip-us.apache.org/repos/asf/incubator-griffin/tree/b5360ec7
Diff: http://git-wip-us.apache.org/repos/asf/incubator-griffin/diff/b5360ec7
Branch: refs/heads/master
Commit: b5360ec70587454502e3c713f97df244d5bd37c0
Parents: c5e3175
Author: Nikolay Sokolov <ch...@gmail.com>
Authored: Fri Sep 14 10:21:57 2018 +0800
Committer: William Guo <gu...@apache.org>
Committed: Fri Sep 14 10:21:57 2018 +0800
----------------------------------------------------------------------
griffin-doc/measure/measure-batch-sample.md | 36 ++++++++++-------
.../measure/measure-configuration-guide.md | 41 +++++++++++++-------
griffin-doc/measure/measure-streaming-sample.md | 37 +++++++++++-------
3 files changed, 72 insertions(+), 42 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/b5360ec7/griffin-doc/measure/measure-batch-sample.md
----------------------------------------------------------------------
diff --git a/griffin-doc/measure/measure-batch-sample.md b/griffin-doc/measure/measure-batch-sample.md
index af5d43a..c504afa 100644
--- a/griffin-doc/measure/measure-batch-sample.md
+++ b/griffin-doc/measure/measure-batch-sample.md
@@ -68,12 +68,16 @@ Measures consists of batch measure and streaming measure. This document is for t
"total": "total_count",
"matched": "matched_count"
},
- "metric": {
- "name": "accu"
- },
- "record": {
- "name": "missRecords"
- }
+ "out": [
+ {
+ "type": "metric",
+ "name": "accu"
+ },
+ {
+ "type": "record",
+ "name": "missRecords"
+ }
+ ]
}
]
}
@@ -119,19 +123,25 @@ The miss records of source will be persisted as record.
"dq.type": "profiling",
"name": "prof",
"rule": "select max(age) as `max_age`, min(age) as `min_age` from source",
- "metric": {
- "name": "prof"
- }
+ "out": [
+ {
+ "type": "metric",
+ "name": "prof"
+ }
+ ]
},
{
"dsl.type": "griffin-dsl",
"dq.type": "profiling",
"name": "name_grp",
"rule": "select name, count(*) as cnt from source group by name",
- "metric": {
- "name": "name_grp",
- "collect.type": "array"
- }
+ "out": [
+ {
+ "type": "metric",
+ "name": "name_grp",
+ "flatten": "array"
+ }
+ ]
}
]
}
http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/b5360ec7/griffin-doc/measure/measure-configuration-guide.md
----------------------------------------------------------------------
diff --git a/griffin-doc/measure/measure-configuration-guide.md b/griffin-doc/measure/measure-configuration-guide.md
index 53f173c..52e8517 100644
--- a/griffin-doc/measure/measure-configuration-guide.md
+++ b/griffin-doc/measure/measure-configuration-guide.md
@@ -158,13 +158,16 @@ Above lists environment parameters.
"miss": "miss_count",
"total": "total_count",
"matched": "matched_count"
- },
- "metric": {
- "name": "accu"
- },
- "record": {
- "name": "missRecords"
- }
+ },
+ "out": [
+ {
+ "type": "metric",
+ "name": "accu"
+ },
+ {
+ "type": "record"
+ }
+ ]
}
]
}
@@ -201,7 +204,7 @@ Above lists DQ job configure parameters.
### <a name="rule"></a>Rule
- **dsl.type**: Rule dsl type, "spark-sql", "df-opr" and "griffin-dsl".
-- **dq.type**: DQ type of this rule, only for "griffin-dsl" type, supporting "accuracy" and "profiling".
+- **dq.type**: DQ type of this rule, only for "griffin-dsl" type. Supported types: "accuracy", "profiling", "timeliness", "uniqueness", "completeness".
- **name** (step information): Result table name of this rule, optional for "griffin-dsl" type.
- **rule**: The rule string.
- **details**: Details of this rule, optional.
@@ -235,10 +238,18 @@ Above lists DQ job configure parameters.
* source: name of data source to measure timeliness.
* latency: the latency column name in metric, optional.
* threshold: optional, if set as a time string like "1h", the items with latency more than 1 hour will be record.
-- **metric**: Configuration of metric export.
- + name: name of metric.
- + collect.type: collect metric as the type set, including "default", "entries", "array", "map", optional.
-- **record**: Configuration of record export.
- + name: name of record.
- + data.source.cache: optional, if set as data source name, the cache of this data source will be updated by the records, always used in streaming accuracy case.
- + origin.DF: avaiable only if "data.source.cache" is set, the origin data frame name of records.
\ No newline at end of file
+- **out**: Lits of output sinks for the job.
+ + Metric output.
+ * type: "metric"
+ * name: Metric name, semantics depends on "flatten" field value.
+ * flatten: Aggregation method used before sending data frame result into the sink:
+ - default: use "array" if data frame returned multiple records, otherwise use "entries"
+ - entries: sends first row of data frame as metric results, like like `{"agg_col": "value"}`
+ - array: wraps all metrics into a map, like `{"my_out_name": [{"agg_col": "value"}]}`
+ - map: wraps first row of data frame into a map, like `{"my_out_name": {"agg_col": "value"}}`
+ + Record output. Currenly handled only by HDFS sink.
+ * type: "record"
+ * name: File name within sink output folder to dump files to.
+ + Data source cache update for streaming jobs.
+ * type: "dsc-update"
+ * name: Data source name to update cache.
http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/b5360ec7/griffin-doc/measure/measure-streaming-sample.md
----------------------------------------------------------------------
diff --git a/griffin-doc/measure/measure-streaming-sample.md b/griffin-doc/measure/measure-streaming-sample.md
index 5c80576..49093bf 100644
--- a/griffin-doc/measure/measure-streaming-sample.md
+++ b/griffin-doc/measure/measure-streaming-sample.md
@@ -128,13 +128,16 @@ Measures consists of batch measure and streaming measure. This document is for t
"total": "total_count",
"matched": "matched_count"
},
- "metric": {
- "name": "accu"
- },
- "record": {
- "name": "missRecords",
- "data.source.cache": "source"
- }
+ "out": [
+ {
+ "type": "metric",
+ "name": "accu"
+ },
+ {
+ "type": "record",
+ "name": "missRecords",
+ }
+ ]
}
]
}
@@ -228,19 +231,25 @@ The miss records of source will be persisted as record.
"dq.type": "profiling",
"name": "prof",
"rule": "select count(name) as `cnt`, max(age) as `max`, min(age) as `min` from source",
- "metric": {
- "name": "prof"
- }
+ "out": [
+ {
+ "type": "metric",
+ "name": "prof"
+ }
+ ]
},
{
"dsl.type": "griffin-dsl",
"dq.type": "profiling",
"name": "grp",
"rule": "select name, count(*) as `cnt` from source group by name",
- "metric": {
- "name": "name_group",
- "collect.type": "array"
- }
+ "out": [
+ {
+ "type": "metric",
+ "name": "name_group",
+ "flatten": "array"
+ }
+ ]
}
]
}