You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@griffin.apache.org by gu...@apache.org on 2018/04/08 04:52:28 UTC
incubator-griffin git commit: [GRIFFIN-138] update readme.md,
highlighted docker guide
Repository: incubator-griffin
Updated Branches:
refs/heads/master 95e45dca4 -> 4e0f25d2c
[GRIFFIN-138] update readme.md, highlighted docker guide
update readme.md, describe docker guide, debug guide and deploy guide in order for specific users
Author: Lionel Liu <bh...@163.com>
Closes #248 from bhlx3lyx7/tmst.
Project: http://git-wip-us.apache.org/repos/asf/incubator-griffin/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-griffin/commit/4e0f25d2
Tree: http://git-wip-us.apache.org/repos/asf/incubator-griffin/tree/4e0f25d2
Diff: http://git-wip-us.apache.org/repos/asf/incubator-griffin/diff/4e0f25d2
Branch: refs/heads/master
Commit: 4e0f25d2c9fd64c56a128e3ddde7c5c7addd916c
Parents: 95e45dc
Author: Lionel Liu <bh...@163.com>
Authored: Sun Apr 8 12:52:21 2018 +0800
Committer: Lionel Liu <bh...@163.com>
Committed: Sun Apr 8 12:52:21 2018 +0800
----------------------------------------------------------------------
README.md | 174 ++++----------------------------
griffin-doc/deploy/deploy-guide.md | 160 +++++++++++++++++++++++++++++
2 files changed, 179 insertions(+), 155 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/4e0f25d2/README.md
----------------------------------------------------------------------
diff --git a/README.md b/README.md
index 5bc0e1c..37987d0 100644
--- a/README.md
+++ b/README.md
@@ -27,176 +27,40 @@ Apache Griffin is a model driven data quality solution for modern data systems.
## Getting Started
+### First Try of Griffin
-You can try Griffin in docker following the [docker guide](https://github.com/apache/incubator-griffin/blob/master/griffin-doc/docker/griffin-docker-guide.md).
-
-To run Griffin at local, you can follow instructions below.
-
-### Prerequisites
-You need to install following items
-- jdk (1.8 or later versions).
-- mysql.
-- Postgresql.
-- npm (version 6.0.0+).
-- [Hadoop](http://apache.claz.org/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz) (2.6.0 or later), you can get some help [here](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html).
-- [Spark](http://spark.apache.org/downloads.html) (version 1.6.x, griffin does not support 2.0.x at current), if you want to install Pseudo Distributed/Single Node Cluster, you can get some help [here](http://why-not-learn-something.blogspot.com/2015/06/spark-installation-pseudo.html).
-- [Hive](http://apache.claz.org/hive/hive-1.2.1/apache-hive-1.2.1-bin.tar.gz) (version 1.2.1 or later), you can get some help [here](https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-RunningHive).
- You need to make sure that your spark cluster could access your HiveContext.
-- [Livy](http://archive.cloudera.com/beta/livy/livy-server-0.3.0.zip), you can get some help [here](http://livy.io/quickstart.html).
- Griffin need to schedule spark jobs by server, we use livy to submit our jobs.
- For some issues of Livy for HiveContext, we need to download 3 files, and put them into HDFS.
- ```
- datanucleus-api-jdo-3.2.6.jar
- datanucleus-core-3.2.10.jar
- datanucleus-rdbms-3.2.9.jar
- ```
-- ElasticSearch.
- ElasticSearch works as a metrics collector, Griffin produces metrics to it, and our default UI get metrics from it, you can use your own way as well.
-
-### Configuration
-
-Create database 'quartz' in mysql
-```
-mysql -u username -e "create database quartz" -p
-```
-Init quartz tables in mysql by service/src/main/resources/Init_quartz.sql
-```
-mysql -u username -p quartz < service/src/main/resources/Init_quartz.sql
-```
-
-
-You should also modify some configurations of Griffin for your environment.
-
-- <b>service/src/main/resources/application.properties</b>
-
- ```
- # jpa
- spring.datasource.url = jdbc:postgresql://<your IP>:5432/quartz?autoReconnect=true&useSSL=false
- spring.datasource.username = <user name>
- spring.datasource.password = <password>
- spring.jpa.generate-ddl=true
- spring.datasource.driverClassName = org.postgresql.Driver
- spring.jpa.show-sql = true
-
- # hive metastore
- hive.metastore.uris = thrift://<your IP>:9083
- hive.metastore.dbname = <hive database name> # default is "default"
-
- # external properties directory location, ignore it if not required
- external.config.location =
-
- # login strategy, default is "default"
- login.strategy = <default or ldap>
-
- # ldap properties, ignore them if ldap is not enabled
- ldap.url = ldap://hostname:port
- ldap.email = @example.com
- ldap.searchBase = DC=org,DC=example
- ldap.searchPattern = (sAMAccountName={0})
-
- # hdfs, ignore it if you do not need predicate job
- fs.defaultFS = hdfs://<hdfs-default-name>
-
- # elasticsearch
- elasticsearch.host = <your IP>
- elasticsearch.port = <your elasticsearch rest port>
- # authentication properties, uncomment if basic authentication is enabled
- # elasticsearch.user = user
- # elasticsearch.password = password
- ```
-
-- <b>measure/src/main/resources/env.json</b>
- ```
- "persist": [
- ...
- {
- "type": "http",
- "config": {
- "method": "post",
- "api": "http://<your ES IP>:<ES rest port>/griffin/accuracy"
- }
- }
- ]
- ```
- Put the modified env.json file into HDFS.
-
-- <b>service/src/main/resources/sparkJob.properties</b>
- ```
- sparkJob.file = hdfs://<griffin measure path>/griffin-measure.jar
- sparkJob.args_1 = hdfs://<griffin env path>/env.json
-
- sparkJob.jars = hdfs://<datanucleus path>/spark-avro_2.11-2.0.1.jar\
- hdfs://<datanucleus path>/datanucleus-api-jdo-3.2.6.jar\
- hdfs://<datanucleus path>/datanucleus-core-3.2.10.jar\
- hdfs://<datanucleus path>/datanucleus-rdbms-3.2.9.jar
-
- spark.yarn.dist.files = hdfs:///<spark conf path>/hive-site.xml
-
- livy.uri = http://<your IP>:8998/batches
- spark.uri = http://<your IP>:8088
- ```
- - \<griffin measure path> is the location you should put the jar file of measure module.
- - \<griffin env path> is the location you should put the env.json file.
- - \<datanucleus path> is the location you should put the 3 jar files of livy, and the spark avro jar file if you need.
- - \<spark conf path> is the location of spark conf directory.
-
-### Build and Run
-
-Build the whole project and deploy. (NPM should be installed)
-
- ```
- mvn clean install
- ```
-
-Put jar file of measure module into \<griffin measure path> in HDFS
-
-```
-cp measure/target/measure-<version>-incubating-SNAPSHOT.jar measure/target/griffin-measure.jar
-hdfs dfs -put measure/target/griffin-measure.jar <griffin measure path>/
- ```
-
-After all environment services startup, we can start our server.
-
- ```
- java -jar service/target/service.jar
- ```
-
-After a few seconds, we can visit our default UI of Griffin (by default the port of spring boot is 8080).
-
- ```
- http://<your IP>:8080
- ```
-
-You can use UI following the steps [here](https://github.com/apache/incubator-griffin/blob/master/griffin-doc/ui/user-guide.md).
-
-**Note**: The front-end UI is still under development, you can only access some basic features currently.
-
-
-### Build and Debug
+You can try Griffin in docker following the [docker guide](griffin-doc/docker/griffin-docker-guide.md).
+
+### Environment for Dev
If you want to develop Griffin, please follow [this document](griffin-doc/dev/dev-env-build.md), to skip complex environment building work.
+### Deployment at Local
-## Community
+If you want to deploy Griffin in your local environment, please follow [this document](griffin-doc/deploy/deploy-guide.md).
-You can contact us via email: <a href="mailto:dev@griffin.incubator.apache.org">dev@griffin.incubator.apache.org</a>
+## Community
-You can also subscribe this mail by sending a email to [here](mailto:dev-subscribe@griffin.incubator.apache.org).
+You can access [griffin home page](http://griffin.apache.org).
-You can access our issues jira page [here](https://issues.apache.org/jira/browse/GRIFFIN)
+You can contact us via email:
+- dev-list: <a href="mailto:dev@griffin.incubator.apache.org">dev@griffin.incubator.apache.org</a>
+- user-list: <a href="mailto:user@griffin.incubator.apache.org">user@griffin.incubator.apache.org</a>
+You can also subscribe this mail by sending a email to [subscribe dev-list](mailto:dev-subscribe@griffin.incubator.apache.org) and [subscribe user-list](mailto:user-subscribe@griffin.incubator.apache.org).
+You can access our issues on [JIRA page](https://issues.apache.org/jira/browse/GRIFFIN)
## Contributing
-See [Contributing Guide](./CONTRIBUTING.md) for details on how to contribute code, documentation, etc.
+See [How to Contribute](http://griffin.apache.org/2017/03/04/community) for details on how to contribute code, documentation, etc.
## References
- [Home Page](http://griffin.incubator.apache.org/)
- [Wiki](https://cwiki.apache.org/confluence/display/GRIFFIN/Apache+Griffin)
- Documents:
- - [Measure](https://github.com/apache/incubator-griffin/tree/master/griffin-doc/measure)
- - [Service](https://github.com/apache/incubator-griffin/tree/master/griffin-doc/service)
- - [UI](https://github.com/apache/incubator-griffin/tree/master/griffin-doc/ui)
- - [Docker usage](https://github.com/apache/incubator-griffin/tree/master/griffin-doc/docker)
- - [Postman API](https://github.com/apache/incubator-griffin/tree/master/griffin-doc/service/postman)
\ No newline at end of file
+ - [Measure](griffin-doc/measure)
+ - [Service](griffin-doc/service)
+ - [UI](griffin-doc/ui)
+ - [Docker usage](griffin-doc/docker)
+ - [Postman API](griffin-doc/service/postman)
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/4e0f25d2/griffin-doc/deploy/deploy-guide.md
----------------------------------------------------------------------
diff --git a/griffin-doc/deploy/deploy-guide.md b/griffin-doc/deploy/deploy-guide.md
new file mode 100644
index 0000000..0693c25
--- /dev/null
+++ b/griffin-doc/deploy/deploy-guide.md
@@ -0,0 +1,160 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Apache Griffin Deployment Guide
+For Griffin users, you can deploy it with some dependencies in your environment, you can follow instructions below.
+
+### Prerequisites
+You need to install following items
+- jdk (1.8 or later versions).
+- mysql or Postgresql.
+- npm (version 6.0.0+).
+- [Hadoop](http://apache.claz.org/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz) (2.6.0 or later), you can get some help [here](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html).
+- [Spark](http://spark.apache.org/downloads.html) (version 1.6.x, griffin does not support 2.0.x at current), if you want to install Pseudo Distributed/Single Node Cluster, you can get some help [here](http://why-not-learn-something.blogspot.com/2015/06/spark-installation-pseudo.html).
+- [Hive](http://apache.claz.org/hive/hive-1.2.1/apache-hive-1.2.1-bin.tar.gz) (version 1.2.1 or later), you can get some help [here](https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-RunningHive).
+ You need to make sure that your spark cluster could access your HiveContext.
+- [Livy](http://archive.cloudera.com/beta/livy/livy-server-0.3.0.zip), you can get some help [here](http://livy.io/quickstart.html).
+ Griffin need to schedule spark jobs by server, we use livy to submit our jobs.
+ For some issues of Livy for HiveContext, we need to download 3 files or get them from Spark lib `$SPARK_HOME/lib/`, and put them into HDFS.
+ ```
+ datanucleus-api-jdo-3.2.6.jar
+ datanucleus-core-3.2.10.jar
+ datanucleus-rdbms-3.2.9.jar
+ ```
+- ElasticSearch.
+ ElasticSearch works as a metrics collector, Griffin produces metrics to it, and our default UI get metrics from it, you can use your own way as well.
+
+### Configuration
+
+Create database 'quartz' in mysql
+```
+mysql -u username -e "create database quartz" -p
+```
+Init quartz tables in mysql by service/src/main/resources/Init_quartz.sql
+```
+mysql -u username -p quartz < service/src/main/resources/Init_quartz.sql
+```
+
+
+You should also modify some configurations of Griffin for your environment.
+
+- <b>service/src/main/resources/application.properties</b>
+
+ ```
+ # jpa
+ spring.datasource.url = jdbc:postgresql://<your IP>:5432/quartz?autoReconnect=true&useSSL=false
+ spring.datasource.username = <user name>
+ spring.datasource.password = <password>
+ spring.jpa.generate-ddl=true
+ spring.datasource.driverClassName = org.postgresql.Driver
+ spring.jpa.show-sql = true
+
+ # hive metastore
+ hive.metastore.uris = thrift://<your IP>:9083
+ hive.metastore.dbname = <hive database name> # default is "default"
+
+ # external properties directory location, ignore it if not required
+ external.config.location =
+
+ # login strategy, default is "default"
+ login.strategy = <default or ldap>
+
+ # ldap properties, ignore them if ldap is not enabled
+ ldap.url = ldap://hostname:port
+ ldap.email = @example.com
+ ldap.searchBase = DC=org,DC=example
+ ldap.searchPattern = (sAMAccountName={0})
+
+ # hdfs, ignore it if you do not need predicate job
+ fs.defaultFS = hdfs://<hdfs-default-name>
+
+ # elasticsearch
+ elasticsearch.host = <your IP>
+ elasticsearch.port = <your elasticsearch rest port>
+ # authentication properties, uncomment if basic authentication is enabled
+ # elasticsearch.user = user
+ # elasticsearch.password = password
+ ```
+
+- <b>measure/src/main/resources/env.json</b>
+ ```
+ "persist": [
+ ...
+ {
+ "type": "http",
+ "config": {
+ "method": "post",
+ "api": "http://<your ES IP>:<ES rest port>/griffin/accuracy"
+ }
+ }
+ ]
+ ```
+ Put the modified env.json file into HDFS.
+
+- <b>service/src/main/resources/sparkJob.properties</b>
+ ```
+ sparkJob.file = hdfs://<griffin measure path>/griffin-measure.jar
+ sparkJob.args_1 = hdfs://<griffin env path>/env.json
+
+ sparkJob.jars = hdfs://<datanucleus path>/spark-avro_2.11-2.0.1.jar\
+ hdfs://<datanucleus path>/datanucleus-api-jdo-3.2.6.jar\
+ hdfs://<datanucleus path>/datanucleus-core-3.2.10.jar\
+ hdfs://<datanucleus path>/datanucleus-rdbms-3.2.9.jar
+
+ spark.yarn.dist.files = hdfs:///<spark conf path>/hive-site.xml
+
+ livy.uri = http://<your IP>:8998/batches
+ spark.uri = http://<your IP>:8088
+ ```
+ - \<griffin measure path> is the location you should put the jar file of measure module.
+ - \<griffin env path> is the location you should put the env.json file.
+ - \<datanucleus path> is the location you should put the 3 jar files of livy, and the spark avro jar file if you need to support avro data.
+ - \<spark conf path> is the location of spark conf directory.
+
+### Build and Run
+
+Build the whole project and deploy. (NPM should be installed)
+
+ ```
+ mvn clean install
+ ```
+
+Put jar file of measure module into \<griffin measure path> in HDFS
+
+```
+cp measure/target/measure-<version>-incubating-SNAPSHOT.jar measure/target/griffin-measure.jar
+hdfs dfs -put measure/target/griffin-measure.jar <griffin measure path>/
+ ```
+
+After all environment services startup, we can start our server.
+
+ ```
+ java -jar service/target/service.jar
+ ```
+
+After a few seconds, we can visit our default UI of Griffin (by default the port of spring boot is 8080).
+
+ ```
+ http://<your IP>:8080
+ ```
+
+You can use UI following the steps [here](../ui/user-guide.md).
+
+**Note**: The front-end UI is still under development, you can only access some basic features currently.
+