You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by GitBox <gi...@apache.org> on 2020/11/18 05:11:12 UTC

[GitHub] [incubator-gobblin] hanghangliu opened a new pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

hanghangliu opened a new pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154


   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
   
   
   ### JIRA
   - [ ] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
       - https://issues.apache.org/jira/browse/GOBBLIN-1317
   
   
   ### Description
   - [ ] Here are some details about my PR, including screenshots (if applicable):
   Add documentation to run Gobblin on docker end to end, with latest version.
   
   Add docker recipes including example Wikipedias job, from/to Kafka and HDFS ingestion. Add guidance for these recipes.
   
   ### Tests
   - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason:
   
   
   ### Commits
   - [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
       1. Subject is separated from body by a blank line
       2. Subject is limited to 50 characters
       3. Subject does not end with a period
       4. Subject uses the imperative mood ("add", not "adding")
       5. Body wraps at 72 characters
       6. Body explains "what" and "why", not "how"
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] codecov-io edited a comment on pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154#issuecomment-729444051


   # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=h1) Report
   > Merging [#3154](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=desc) (62b8c54) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/0187e03a6fb213edf7c941e4a43bfed62534559c?el=desc) (0187e03) will **decrease** coverage by `36.85%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master   #3154       +/-   ##
   ============================================
   - Coverage     45.99%   9.13%   -36.86%     
   + Complexity     9600    1723     -7877     
   ============================================
     Files          1993    2006       +13     
     Lines         76013   76854      +841     
     Branches       8464    8547       +83     
   ============================================
   - Hits          34959    7020    -27939     
   - Misses        37786   69149    +31363     
   + Partials       3268     685     -2583     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...c/main/java/org/apache/gobblin/util/FileUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvRmlsZVV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [...n/java/org/apache/gobblin/fork/CopyableSchema.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2ZvcmsvQ29weWFibGVTY2hlbWEuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...java/org/apache/gobblin/stream/ControlMessage.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vc3RyZWFtL0NvbnRyb2xNZXNzYWdlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...va/org/apache/gobblin/dataset/DatasetResolver.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YXNldC9EYXRhc2V0UmVzb2x2ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...va/org/apache/gobblin/converter/EmptyIterable.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbnZlcnRlci9FbXB0eUl0ZXJhYmxlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...org/apache/gobblin/ack/BasicAckableForTesting.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vYWNrL0Jhc2ljQWNrYWJsZUZvclRlc3RpbmcuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...n/java/org/apache/gobblin/salesforce/SfConfig.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2ZDb25maWcuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [.../org/apache/gobblin/yarn/HelixMessageSubTypes.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vSGVsaXhNZXNzYWdlU3ViVHlwZXMuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...va/org/apache/gobblin/cluster/SingleHelixTask.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvU2luZ2xlSGVsaXhUYXNrLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | [.../apache/gobblin/records/ControlMessageHandler.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vcmVjb3Jkcy9Db250cm9sTWVzc2FnZUhhbmRsZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | ... and [1071 more](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=footer). Last update [0187e03...62b8c54](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] codecov-io edited a comment on pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154#issuecomment-729444051


   # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=h1) Report
   > Merging [#3154](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=desc) (833f866) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/0187e03a6fb213edf7c941e4a43bfed62534559c?el=desc) (0187e03) will **decrease** coverage by `36.77%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master   #3154       +/-   ##
   ============================================
   - Coverage     45.99%   9.21%   -36.78%     
   + Complexity     9600    1724     -7876     
   ============================================
     Files          1993    1998        +5     
     Lines         76013   76179      +166     
     Branches       8464    8478       +14     
   ============================================
   - Hits          34959    7017    -27942     
   - Misses        37786   68479    +30693     
   + Partials       3268     683     -2585     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...c/main/java/org/apache/gobblin/util/FileUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvRmlsZVV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [...n/java/org/apache/gobblin/fork/CopyableSchema.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2ZvcmsvQ29weWFibGVTY2hlbWEuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...java/org/apache/gobblin/stream/ControlMessage.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vc3RyZWFtL0NvbnRyb2xNZXNzYWdlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...va/org/apache/gobblin/dataset/DatasetResolver.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YXNldC9EYXRhc2V0UmVzb2x2ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...va/org/apache/gobblin/converter/EmptyIterable.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbnZlcnRlci9FbXB0eUl0ZXJhYmxlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...org/apache/gobblin/ack/BasicAckableForTesting.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vYWNrL0Jhc2ljQWNrYWJsZUZvclRlc3RpbmcuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...n/java/org/apache/gobblin/salesforce/SfConfig.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2ZDb25maWcuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [.../org/apache/gobblin/yarn/HelixMessageSubTypes.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vSGVsaXhNZXNzYWdlU3ViVHlwZXMuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...va/org/apache/gobblin/cluster/SingleHelixTask.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvU2luZ2xlSGVsaXhUYXNrLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | [.../apache/gobblin/records/ControlMessageHandler.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vcmVjb3Jkcy9Db250cm9sTWVzc2FnZUhhbmRsZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | ... and [1059 more](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=footer). Last update [0187e03...833f866](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] hanghangliu commented on a change in pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
hanghangliu commented on a change in pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154#discussion_r528839636



##########
File path: gobblin-docs/user-guide/Docker-Integration.md
##########
@@ -18,68 +18,68 @@ The `gobblin/gobblin-wikipedia` repository contains images that run the Gobblin
 
 The `gobblin/gobblin-standalone` repository contains images that run a [Gobblin standalone service](Gobblin-Deployment#standalone-architecture) inside a Docker container. These images provide an easy and simple way to setup a Gobblin standalone service on any Docker compatible machine.
 
+The `gobblin/gobblin-service` repository contains images that run [Gobblin as a service](Building-Gobblin-as-a-Service#running-gobblin-as-a-service-with-docker), which is a service that takes in a user request (a logical flow) and converts it into a series of Gobblin Jobs, and monitors these jobs in a distributed manner.
+
 The `gobblin/gobblin-base` and `gobblin/gobblin-distributions` repositories are for internal use only, and are primarily useful for Gobblin developers.
 
-## Gobblin-Wikipedia Repository
+# Run Gobblin Standalone
+
+The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-standalone/). These images run a Gobblin standalone service inside a Docker container. The Gobblin standalone service is a long running process that can run Gobblin jobs defined in a `.job` or `.pull` file. The job / pull files are submitted to the standalone service by placing them in a directory on the local filesystem. The standalone service monitors this directory for any new job / pull files and runs them either immediately or on a scheduled basis (more information on how this works can be found [here](Working-with-Job-Configuration-Files#adding-or-changing-job-configuration-files)). Running the Gobblin standalone service inside a Docker container allows Gobblin to pick up job / pull files from a directory on the host filesystem, run the job, and write the output back the host filesystem. All the heavy lifting is done inside a Docker container, the user just needs to
  worry about defining and submitting job / pull files. The goal is to provide a easy to setup environment for the Gobblin standalone service.
 
-The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-wikipedia/). These images are mainly meant to act as a "Hello World" example for the Gobblin-Docker integration, and to provide a sanity check to see if the Gobblin-Docker integration is working on a given machine. The image contains the Gobblin configuration files to run the [Gobblin Wikipedia job](../Getting-Started). When a container is launched using the `gobblin-wikipedia` image, Gobblin starts up, runs the Wikipedia example, and then exits.
+### Set working directory
 
-Running the `gobblin-wikipedia` image requires taking following steps (lets assume we want to an Ubuntu based image):
+Before running docker containers, set a working directory for Gobblin jobs:
 
-* Download the images from the `gobblin/gobblin-wikipedia` repository
+`export LOCAL_JOB_DIR=<local_gobblin_directory>`
 
-```
-docker pull gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+We will use this directory as the [volume](https://docs.docker.com/storage/volumes/) for Gobblin jobs and outputs. Make sure your Docker has the [access](https://docs.docker.com/docker-for-mac/#file-sharing) to this folder. This is the prerequisite for all following example jobs.
 
-* Run the `gobblin/gobblin-wikipedia:ubuntu-gobblin-latest` image in a Docker container
+### Run the docker image with simple wikipedia jobs
 
-```
-docker run gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+Run these commands to start the docker image:
 
-The logs are printed to the console, and no errors should pop up. This should provide a nice sanity check to ensure that everything is working as expected. The output of the job will be written to a directory inside the container. When the container exits that data will be lost. In order to preserve the output of the job, continue to the next step.
+`docker pull gobblin/gobblin-standalone:alpine-gaas-latest`
 
-* Preserving the output of a Docker container requires using a [data volume](https://docs.docker.com/engine/tutorials/dockervolumes/). To do this, run the below command:
+`docker run gobblin/gobblin-standalone:alpine-gaas-latest`

Review comment:
       The wiki pull sample job works fine.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] hanghangliu commented on a change in pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
hanghangliu commented on a change in pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154#discussion_r540757512



##########
File path: gobblin-docker/gobblin-recipes/kafka-hdfs/docker-compose.yml
##########
@@ -0,0 +1,127 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+version: '3'
+services:
+  gobblin-standalone:
+    image: apache/gobblin:latest
+    volumes:
+      - "${LOCAL_JOB_DIR}:/tmp/gobblin-standalone/jobs"
+  zookeeper:
+    image: wurstmeister/zookeeper
+    ports:
+      - "2181:2181"
+  kafka:
+    image: wurstmeister/kafka
+    ports:
+      - "9092:9092"
+    environment:
+      KAFKA_ADVERTISED_HOST_NAME: "kafka"
+      KAFKA_ADVERTISED_PORT: "9092"
+      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
+      KAFKA_CREATE_TOPICS: "test:1:1"
+    volumes:
+      - /var/run/docker.sock:/var/run/docker.sock
+
+  namenode:
+    image: bde2020/hadoop-namenode:1.1.0-hadoop2.7.1-java8
+    container_name: namenode
+    volumes:
+      - hadoop_namenode:/hadoop/dfs/name
+    environment:
+      - CLUSTER_NAME=test_cluster
+    env_file:
+      - ./hadoop.env
+    ports:
+      - "9870:9870"
+
+  resourcemanager:
+    image: bde2020/hadoop-resourcemanager:1.1.0-hadoop2.7.1-java8
+    container_name: resourcemanager
+    restart: on-failure
+    depends_on:

Review comment:
       Resolve it as we discussed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] codecov-io edited a comment on pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154#issuecomment-729444051


   # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=h1) Report
   > Merging [#3154](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=desc) (55c7361) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/0187e03a6fb213edf7c941e4a43bfed62534559c?el=desc) (0187e03) will **decrease** coverage by `36.88%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master   #3154       +/-   ##
   ============================================
   - Coverage     45.99%   9.10%   -36.89%     
   + Complexity     9600    1722     -7878     
   ============================================
     Files          1993    2014       +21     
     Lines         76013   77095     +1082     
     Branches       8464    8559       +95     
   ============================================
   - Hits          34959    7018    -27941     
   - Misses        37786   69392    +31606     
   + Partials       3268     685     -2583     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...c/main/java/org/apache/gobblin/util/FileUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvRmlsZVV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [...n/java/org/apache/gobblin/fork/CopyableSchema.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2ZvcmsvQ29weWFibGVTY2hlbWEuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...java/org/apache/gobblin/stream/ControlMessage.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vc3RyZWFtL0NvbnRyb2xNZXNzYWdlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...va/org/apache/gobblin/dataset/DatasetResolver.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YXNldC9EYXRhc2V0UmVzb2x2ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...va/org/apache/gobblin/converter/EmptyIterable.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbnZlcnRlci9FbXB0eUl0ZXJhYmxlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...org/apache/gobblin/ack/BasicAckableForTesting.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vYWNrL0Jhc2ljQWNrYWJsZUZvclRlc3RpbmcuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...n/java/org/apache/gobblin/salesforce/SfConfig.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2ZDb25maWcuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [.../org/apache/gobblin/yarn/HelixMessageSubTypes.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vSGVsaXhNZXNzYWdlU3ViVHlwZXMuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...va/org/apache/gobblin/cluster/SingleHelixTask.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvU2luZ2xlSGVsaXhUYXNrLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | [.../apache/gobblin/records/ControlMessageHandler.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vcmVjb3Jkcy9Db250cm9sTWVzc2FnZUhhbmRsZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | ... and [1079 more](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=footer). Last update [0187e03...55c7361](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] Will-Lo commented on a change in pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
Will-Lo commented on a change in pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154#discussion_r533848100



##########
File path: gobblin-docs/user-guide/Docker-Integration.md
##########
@@ -18,68 +18,68 @@ The `gobblin/gobblin-wikipedia` repository contains images that run the Gobblin
 
 The `gobblin/gobblin-standalone` repository contains images that run a [Gobblin standalone service](Gobblin-Deployment#standalone-architecture) inside a Docker container. These images provide an easy and simple way to setup a Gobblin standalone service on any Docker compatible machine.
 
+The `gobblin/gobblin-service` repository contains images that run [Gobblin as a service](Building-Gobblin-as-a-Service#running-gobblin-as-a-service-with-docker), which is a service that takes in a user request (a logical flow) and converts it into a series of Gobblin Jobs, and monitors these jobs in a distributed manner.
+
 The `gobblin/gobblin-base` and `gobblin/gobblin-distributions` repositories are for internal use only, and are primarily useful for Gobblin developers.
 
-## Gobblin-Wikipedia Repository
+# Run Gobblin Standalone
+
+The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-standalone/). These images run a Gobblin standalone service inside a Docker container. The Gobblin standalone service is a long running process that can run Gobblin jobs defined in a `.job` or `.pull` file. The job / pull files are submitted to the standalone service by placing them in a directory on the local filesystem. The standalone service monitors this directory for any new job / pull files and runs them either immediately or on a scheduled basis (more information on how this works can be found [here](Working-with-Job-Configuration-Files#adding-or-changing-job-configuration-files)). Running the Gobblin standalone service inside a Docker container allows Gobblin to pick up job / pull files from a directory on the host filesystem, run the job, and write the output back the host filesystem. All the heavy lifting is done inside a Docker container, the user just needs to
  worry about defining and submitting job / pull files. The goal is to provide a easy to setup environment for the Gobblin standalone service.
 
-The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-wikipedia/). These images are mainly meant to act as a "Hello World" example for the Gobblin-Docker integration, and to provide a sanity check to see if the Gobblin-Docker integration is working on a given machine. The image contains the Gobblin configuration files to run the [Gobblin Wikipedia job](../Getting-Started). When a container is launched using the `gobblin-wikipedia` image, Gobblin starts up, runs the Wikipedia example, and then exits.
+### Set working directory
 
-Running the `gobblin-wikipedia` image requires taking following steps (lets assume we want to an Ubuntu based image):
+Before running docker containers, set a working directory for Gobblin jobs:
 
-* Download the images from the `gobblin/gobblin-wikipedia` repository
+`export LOCAL_JOB_DIR=<local_gobblin_directory>`
 
-```
-docker pull gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+We will use this directory as the [volume](https://docs.docker.com/storage/volumes/) for Gobblin jobs and outputs. Make sure your Docker has the [access](https://docs.docker.com/docker-for-mac/#file-sharing) to this folder. This is the prerequisite for all following example jobs.
 
-* Run the `gobblin/gobblin-wikipedia:ubuntu-gobblin-latest` image in a Docker container
+### Run the docker image with simple wikipedia jobs
 
-```
-docker run gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+Run these commands to start the docker image:
 
-The logs are printed to the console, and no errors should pop up. This should provide a nice sanity check to ensure that everything is working as expected. The output of the job will be written to a directory inside the container. When the container exits that data will be lost. In order to preserve the output of the job, continue to the next step.
+`docker pull gobblin/gobblin-standalone:latest`

Review comment:
       We started using apache's repository for our images. Could we change this line to pull from https://hub.docker.com/r/apache/gobblin/tags?page=1&ordering=last_updated ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] hanghangliu closed pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
hanghangliu closed pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] hanghangliu commented on a change in pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
hanghangliu commented on a change in pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154#discussion_r535878064



##########
File path: gobblin-docs/user-guide/Docker-Integration.md
##########
@@ -18,68 +18,68 @@ The `gobblin/gobblin-wikipedia` repository contains images that run the Gobblin
 
 The `gobblin/gobblin-standalone` repository contains images that run a [Gobblin standalone service](Gobblin-Deployment#standalone-architecture) inside a Docker container. These images provide an easy and simple way to setup a Gobblin standalone service on any Docker compatible machine.
 
+The `gobblin/gobblin-service` repository contains images that run [Gobblin as a service](Building-Gobblin-as-a-Service#running-gobblin-as-a-service-with-docker), which is a service that takes in a user request (a logical flow) and converts it into a series of Gobblin Jobs, and monitors these jobs in a distributed manner.
+
 The `gobblin/gobblin-base` and `gobblin/gobblin-distributions` repositories are for internal use only, and are primarily useful for Gobblin developers.
 
-## Gobblin-Wikipedia Repository
+# Run Gobblin Standalone
+
+The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-standalone/). These images run a Gobblin standalone service inside a Docker container. The Gobblin standalone service is a long running process that can run Gobblin jobs defined in a `.job` or `.pull` file. The job / pull files are submitted to the standalone service by placing them in a directory on the local filesystem. The standalone service monitors this directory for any new job / pull files and runs them either immediately or on a scheduled basis (more information on how this works can be found [here](Working-with-Job-Configuration-Files#adding-or-changing-job-configuration-files)). Running the Gobblin standalone service inside a Docker container allows Gobblin to pick up job / pull files from a directory on the host filesystem, run the job, and write the output back the host filesystem. All the heavy lifting is done inside a Docker container, the user just needs to
  worry about defining and submitting job / pull files. The goal is to provide a easy to setup environment for the Gobblin standalone service.

Review comment:
       Added a GaaS section with instruction of starting the service. Will add a detailed end-to-end flow example in the future for GaaS on docker.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] codecov-io edited a comment on pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154#issuecomment-729444051


   # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=h1) Report
   > Merging [#3154](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=desc) (ebde54f) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/0187e03a6fb213edf7c941e4a43bfed62534559c?el=desc) (0187e03) will **decrease** coverage by `4.26%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #3154      +/-   ##
   ============================================
   - Coverage     45.99%   41.72%   -4.27%     
   + Complexity     9600     8753     -847     
   ============================================
     Files          1993     1993              
     Lines         76013    76013              
     Branches       8464     8464              
   ============================================
   - Hits          34959    31715    -3244     
   - Misses        37786    41294    +3508     
   + Partials       3268     3004     -264     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [.../org/apache/gobblin/util/filters/HiddenFilter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvZmlsdGVycy9IaWRkZW5GaWx0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | [...g/apache/gobblin/cluster/HelixMessageSubTypes.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhNZXNzYWdlU3ViVHlwZXMuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...gobblin/runtime/mapreduce/GobblinOutputFormat.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWFwcmVkdWNlL0dvYmJsaW5PdXRwdXRGb3JtYXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [.../gobblin/compaction/suite/CompactionSuiteBase.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vc3VpdGUvQ29tcGFjdGlvblN1aXRlQmFzZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | [...obblin/compaction/source/CompactionFailedTask.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vc291cmNlL0NvbXBhY3Rpb25GYWlsZWRUYXNrLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...n/cluster/event/ClusterManagerShutdownRequest.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvZXZlbnQvQ2x1c3Rlck1hbmFnZXJTaHV0ZG93blJlcXVlc3QuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...blin/compaction/mapreduce/RecordKeyMapperBase.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL1JlY29yZEtleU1hcHBlckJhc2UuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...in/compaction/action/CompactionCompleteAction.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vYWN0aW9uL0NvbXBhY3Rpb25Db21wbGV0ZUFjdGlvbi5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...n/compaction/mapreduce/orc/OrcKeyDedupReducer.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNLZXlEZWR1cFJlZHVjZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | [...n/compaction/suite/CompactionSuiteBaseFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vc3VpdGUvQ29tcGFjdGlvblN1aXRlQmFzZUZhY3RvcnkuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | ... and [149 more](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=footer). Last update [0187e03...ebde54f](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] hanghangliu closed pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
hanghangliu closed pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] Will-Lo commented on a change in pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
Will-Lo commented on a change in pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154#discussion_r536364461



##########
File path: gobblin-docker/gobblin-recipes/kafka-hdfs/docker-compose.yml
##########
@@ -0,0 +1,127 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+version: '3'
+services:
+  gobblin-standalone:
+    image: apache/gobblin:latest
+    volumes:
+      - "${LOCAL_JOB_DIR}:/tmp/gobblin-standalone/jobs"
+  zookeeper:
+    image: wurstmeister/zookeeper
+    ports:
+      - "2181:2181"
+  kafka:
+    image: wurstmeister/kafka
+    ports:
+      - "9092:9092"
+    environment:
+      KAFKA_ADVERTISED_HOST_NAME: "kafka"
+      KAFKA_ADVERTISED_PORT: "9092"
+      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
+      KAFKA_CREATE_TOPICS: "test:1:1"
+    volumes:
+      - /var/run/docker.sock:/var/run/docker.sock
+
+  namenode:
+    image: bde2020/hadoop-namenode:1.1.0-hadoop2.7.1-java8
+    container_name: namenode
+    volumes:
+      - hadoop_namenode:/hadoop/dfs/name
+    environment:
+      - CLUSTER_NAME=test_cluster
+    env_file:
+      - ./hadoop.env
+    ports:
+      - "9870:9870"
+
+  resourcemanager:
+    image: bde2020/hadoop-resourcemanager:1.1.0-hadoop2.7.1-java8
+    container_name: resourcemanager
+    restart: on-failure
+    depends_on:

Review comment:
       Is `depends_on` needed here? You have `restart: on-failure` which should be sufficient. `depends_on` is deprecated from version 3 onwards so we shouldn't introduce it https://docs.docker.com/compose/compose-file/#depends_on

##########
File path: gobblin-docker/gobblin-recipes/kafka-hdfs/docker-compose.yml
##########
@@ -0,0 +1,127 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+version: '3'
+services:
+  gobblin-standalone:
+    image: apache/gobblin:latest
+    volumes:
+      - "${LOCAL_JOB_DIR}:/tmp/gobblin-standalone/jobs"
+  zookeeper:
+    image: wurstmeister/zookeeper
+    ports:
+      - "2181:2181"
+  kafka:

Review comment:
       We should add `restart: on-failure` here in as kafka is dependent on zookeeper so it might terminate if zookeeper takes a while to load up.

##########
File path: gobblin-docker/gobblin-recipes/kafka-hdfs/docker-compose.yml
##########
@@ -0,0 +1,127 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+version: '3'
+services:
+  gobblin-standalone:
+    image: apache/gobblin:latest
+    volumes:
+      - "${LOCAL_JOB_DIR}:/tmp/gobblin-standalone/jobs"
+  zookeeper:
+    image: wurstmeister/zookeeper
+    ports:
+      - "2181:2181"
+  kafka:
+    image: wurstmeister/kafka
+    ports:
+      - "9092:9092"
+    environment:
+      KAFKA_ADVERTISED_HOST_NAME: "kafka"
+      KAFKA_ADVERTISED_PORT: "9092"
+      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
+      KAFKA_CREATE_TOPICS: "test:1:1"
+    volumes:
+      - /var/run/docker.sock:/var/run/docker.sock
+
+  namenode:
+    image: bde2020/hadoop-namenode:1.1.0-hadoop2.7.1-java8
+    container_name: namenode
+    volumes:
+      - hadoop_namenode:/hadoop/dfs/name
+    environment:
+      - CLUSTER_NAME=test_cluster
+    env_file:
+      - ./hadoop.env
+    ports:
+      - "9870:9870"
+
+  resourcemanager:
+    image: bde2020/hadoop-resourcemanager:1.1.0-hadoop2.7.1-java8
+    container_name: resourcemanager
+    restart: on-failure
+    depends_on:
+      - namenode
+      - datanode1
+      - datanode2
+      - datanode3
+    env_file:
+      - ./hadoop.env
+    ports:
+      - "8089:8088"
+
+  historyserver:
+    image: bde2020/hadoop-historyserver:1.1.0-hadoop2.7.1-java8
+    container_name: historyserver
+    depends_on:
+      - namenode
+      - datanode1
+      - datanode2
+    volumes:
+      - hadoop_historyserver:/hadoop/yarn/timeline
+    env_file:
+      - ./hadoop.env
+    ports:
+      - "8188:8188"
+
+  nodemanager1:
+    image: bde2020/hadoop-nodemanager:1.1.0-hadoop2.7.1-java8
+    container_name: nodemanager1
+    depends_on:
+      - namenode
+      - datanode1
+      - datanode2
+    env_file:
+      - ./hadoop.env
+    ports:
+      - "8042:8042"
+
+  datanode1:
+    image: bde2020/hadoop-datanode:1.1.0-hadoop2.7.1-java8
+    container_name: datanode1
+    depends_on:

Review comment:
       same comment as above, depends_on is deprecated on version 3 which this file is on
   
   

##########
File path: gobblin-docker/gobblin-recipes/kafka-hdfs/docker-compose.yml
##########
@@ -0,0 +1,127 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+version: '3'
+services:
+  gobblin-standalone:
+    image: apache/gobblin:latest
+    volumes:
+      - "${LOCAL_JOB_DIR}:/tmp/gobblin-standalone/jobs"
+  zookeeper:
+    image: wurstmeister/zookeeper
+    ports:
+      - "2181:2181"
+  kafka:
+    image: wurstmeister/kafka
+    ports:
+      - "9092:9092"
+    environment:
+      KAFKA_ADVERTISED_HOST_NAME: "kafka"
+      KAFKA_ADVERTISED_PORT: "9092"
+      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
+      KAFKA_CREATE_TOPICS: "test:1:1"
+    volumes:
+      - /var/run/docker.sock:/var/run/docker.sock
+
+  namenode:
+    image: bde2020/hadoop-namenode:1.1.0-hadoop2.7.1-java8
+    container_name: namenode
+    volumes:
+      - hadoop_namenode:/hadoop/dfs/name
+    environment:
+      - CLUSTER_NAME=test_cluster
+    env_file:
+      - ./hadoop.env
+    ports:
+      - "9870:9870"
+
+  resourcemanager:
+    image: bde2020/hadoop-resourcemanager:1.1.0-hadoop2.7.1-java8
+    container_name: resourcemanager
+    restart: on-failure
+    depends_on:
+      - namenode
+      - datanode1
+      - datanode2
+      - datanode3
+    env_file:
+      - ./hadoop.env
+    ports:
+      - "8089:8088"
+
+  historyserver:
+    image: bde2020/hadoop-historyserver:1.1.0-hadoop2.7.1-java8
+    container_name: historyserver
+    depends_on:
+      - namenode
+      - datanode1
+      - datanode2
+    volumes:
+      - hadoop_historyserver:/hadoop/yarn/timeline
+    env_file:
+      - ./hadoop.env
+    ports:
+      - "8188:8188"
+
+  nodemanager1:
+    image: bde2020/hadoop-nodemanager:1.1.0-hadoop2.7.1-java8
+    container_name: nodemanager1
+    depends_on:
+      - namenode
+      - datanode1
+      - datanode2
+    env_file:
+      - ./hadoop.env
+    ports:
+      - "8042:8042"
+
+  datanode1:
+    image: bde2020/hadoop-datanode:1.1.0-hadoop2.7.1-java8
+    container_name: datanode1
+    depends_on:
+      - namenode
+    volumes:
+      - hadoop_datanode1:/hadoop/dfs/data
+    env_file:
+      - ./hadoop.env
+
+  datanode2:
+    image: bde2020/hadoop-datanode:1.1.0-hadoop2.7.1-java8
+    container_name: datanode2
+    depends_on:
+      - namenode
+    volumes:
+      - hadoop_datanode2:/hadoop/dfs/data
+    env_file:
+      - ./hadoop.env
+
+  datanode3:
+    image: bde2020/hadoop-datanode:1.1.0-hadoop2.7.1-java8
+    container_name: datanode3
+    depends_on:

Review comment:
       same comment as above, `depends_on` is deprecated on version 3 which this file is on

##########
File path: gobblin-docker/gobblin-recipes/kafka-hdfs/docker-compose.yml
##########
@@ -0,0 +1,127 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+version: '3'
+services:
+  gobblin-standalone:
+    image: apache/gobblin:latest
+    volumes:
+      - "${LOCAL_JOB_DIR}:/tmp/gobblin-standalone/jobs"
+  zookeeper:
+    image: wurstmeister/zookeeper
+    ports:
+      - "2181:2181"
+  kafka:
+    image: wurstmeister/kafka
+    ports:
+      - "9092:9092"
+    environment:
+      KAFKA_ADVERTISED_HOST_NAME: "kafka"
+      KAFKA_ADVERTISED_PORT: "9092"
+      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
+      KAFKA_CREATE_TOPICS: "test:1:1"
+    volumes:
+      - /var/run/docker.sock:/var/run/docker.sock
+
+  namenode:
+    image: bde2020/hadoop-namenode:1.1.0-hadoop2.7.1-java8
+    container_name: namenode
+    volumes:
+      - hadoop_namenode:/hadoop/dfs/name
+    environment:
+      - CLUSTER_NAME=test_cluster
+    env_file:
+      - ./hadoop.env
+    ports:
+      - "9870:9870"
+
+  resourcemanager:
+    image: bde2020/hadoop-resourcemanager:1.1.0-hadoop2.7.1-java8
+    container_name: resourcemanager
+    restart: on-failure
+    depends_on:
+      - namenode
+      - datanode1
+      - datanode2
+      - datanode3
+    env_file:
+      - ./hadoop.env
+    ports:
+      - "8089:8088"
+
+  historyserver:
+    image: bde2020/hadoop-historyserver:1.1.0-hadoop2.7.1-java8
+    container_name: historyserver
+    depends_on:
+      - namenode
+      - datanode1
+      - datanode2
+    volumes:
+      - hadoop_historyserver:/hadoop/yarn/timeline
+    env_file:
+      - ./hadoop.env
+    ports:
+      - "8188:8188"
+
+  nodemanager1:
+    image: bde2020/hadoop-nodemanager:1.1.0-hadoop2.7.1-java8
+    container_name: nodemanager1
+    depends_on:
+      - namenode
+      - datanode1
+      - datanode2
+    env_file:
+      - ./hadoop.env
+    ports:
+      - "8042:8042"
+
+  datanode1:
+    image: bde2020/hadoop-datanode:1.1.0-hadoop2.7.1-java8
+    container_name: datanode1
+    depends_on:
+      - namenode
+    volumes:
+      - hadoop_datanode1:/hadoop/dfs/data
+    env_file:
+      - ./hadoop.env
+
+  datanode2:
+    image: bde2020/hadoop-datanode:1.1.0-hadoop2.7.1-java8
+    container_name: datanode2
+    depends_on:

Review comment:
       same comment as above, depends_on is deprecated on version 3 which this file is on
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] hanghangliu commented on pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
hanghangliu commented on pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154#issuecomment-730977377


   > Can you rebase against the newest upstream ? So that it will exclude Tamas's change in this PR.
   
   Done. Passed all checks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] Will-Lo commented on a change in pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
Will-Lo commented on a change in pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154#discussion_r528402591



##########
File path: gobblin-docs/user-guide/Docker-Integration.md
##########
@@ -18,68 +18,68 @@ The `gobblin/gobblin-wikipedia` repository contains images that run the Gobblin
 
 The `gobblin/gobblin-standalone` repository contains images that run a [Gobblin standalone service](Gobblin-Deployment#standalone-architecture) inside a Docker container. These images provide an easy and simple way to setup a Gobblin standalone service on any Docker compatible machine.
 
+The `gobblin/gobblin-service` repository contains images that run [Gobblin as a service](Building-Gobblin-as-a-Service#running-gobblin-as-a-service-with-docker), which is a service that takes in a user request (a logical flow) and converts it into a series of Gobblin Jobs, and monitors these jobs in a distributed manner.
+
 The `gobblin/gobblin-base` and `gobblin/gobblin-distributions` repositories are for internal use only, and are primarily useful for Gobblin developers.
 
-## Gobblin-Wikipedia Repository
+# Run Gobblin Standalone
+
+The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-standalone/). These images run a Gobblin standalone service inside a Docker container. The Gobblin standalone service is a long running process that can run Gobblin jobs defined in a `.job` or `.pull` file. The job / pull files are submitted to the standalone service by placing them in a directory on the local filesystem. The standalone service monitors this directory for any new job / pull files and runs them either immediately or on a scheduled basis (more information on how this works can be found [here](Working-with-Job-Configuration-Files#adding-or-changing-job-configuration-files)). Running the Gobblin standalone service inside a Docker container allows Gobblin to pick up job / pull files from a directory on the host filesystem, run the job, and write the output back the host filesystem. All the heavy lifting is done inside a Docker container, the user just needs to
  worry about defining and submitting job / pull files. The goal is to provide a easy to setup environment for the Gobblin standalone service.
 
-The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-wikipedia/). These images are mainly meant to act as a "Hello World" example for the Gobblin-Docker integration, and to provide a sanity check to see if the Gobblin-Docker integration is working on a given machine. The image contains the Gobblin configuration files to run the [Gobblin Wikipedia job](../Getting-Started). When a container is launched using the `gobblin-wikipedia` image, Gobblin starts up, runs the Wikipedia example, and then exits.
+### Set working directory
 
-Running the `gobblin-wikipedia` image requires taking following steps (lets assume we want to an Ubuntu based image):
+Before running docker containers, set a working directory for Gobblin jobs:
 
-* Download the images from the `gobblin/gobblin-wikipedia` repository
+`export LOCAL_JOB_DIR=<local_gobblin_directory>`
 
-```
-docker pull gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+We will use this directory as the [volume](https://docs.docker.com/storage/volumes/) for Gobblin jobs and outputs. Make sure your Docker has the [access](https://docs.docker.com/docker-for-mac/#file-sharing) to this folder. This is the prerequisite for all following example jobs.
 
-* Run the `gobblin/gobblin-wikipedia:ubuntu-gobblin-latest` image in a Docker container
+### Run the docker image with simple wikipedia jobs
 
-```
-docker run gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+Run these commands to start the docker image:
 
-The logs are printed to the console, and no errors should pop up. This should provide a nice sanity check to ensure that everything is working as expected. The output of the job will be written to a directory inside the container. When the container exits that data will be lost. In order to preserve the output of the job, continue to the next step.
+`docker pull gobblin/gobblin-standalone:alpine-gaas-latest`

Review comment:
       We should use the default latest tag here, so `docker pull gobblin/gobblin-standalone:latest`.
   We also might be migrating to apache dockerhub sometime soon but we can edit this doc later to reflect that.

##########
File path: gobblin-docs/user-guide/Docker-Integration.md
##########
@@ -18,68 +18,68 @@ The `gobblin/gobblin-wikipedia` repository contains images that run the Gobblin
 
 The `gobblin/gobblin-standalone` repository contains images that run a [Gobblin standalone service](Gobblin-Deployment#standalone-architecture) inside a Docker container. These images provide an easy and simple way to setup a Gobblin standalone service on any Docker compatible machine.
 
+The `gobblin/gobblin-service` repository contains images that run [Gobblin as a service](Building-Gobblin-as-a-Service#running-gobblin-as-a-service-with-docker), which is a service that takes in a user request (a logical flow) and converts it into a series of Gobblin Jobs, and monitors these jobs in a distributed manner.
+
 The `gobblin/gobblin-base` and `gobblin/gobblin-distributions` repositories are for internal use only, and are primarily useful for Gobblin developers.
 
-## Gobblin-Wikipedia Repository
+# Run Gobblin Standalone
+
+The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-standalone/). These images run a Gobblin standalone service inside a Docker container. The Gobblin standalone service is a long running process that can run Gobblin jobs defined in a `.job` or `.pull` file. The job / pull files are submitted to the standalone service by placing them in a directory on the local filesystem. The standalone service monitors this directory for any new job / pull files and runs them either immediately or on a scheduled basis (more information on how this works can be found [here](Working-with-Job-Configuration-Files#adding-or-changing-job-configuration-files)). Running the Gobblin standalone service inside a Docker container allows Gobblin to pick up job / pull files from a directory on the host filesystem, run the job, and write the output back the host filesystem. All the heavy lifting is done inside a Docker container, the user just needs to
  worry about defining and submitting job / pull files. The goal is to provide a easy to setup environment for the Gobblin standalone service.
 
-The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-wikipedia/). These images are mainly meant to act as a "Hello World" example for the Gobblin-Docker integration, and to provide a sanity check to see if the Gobblin-Docker integration is working on a given machine. The image contains the Gobblin configuration files to run the [Gobblin Wikipedia job](../Getting-Started). When a container is launched using the `gobblin-wikipedia` image, Gobblin starts up, runs the Wikipedia example, and then exits.
+### Set working directory
 
-Running the `gobblin-wikipedia` image requires taking following steps (lets assume we want to an Ubuntu based image):
+Before running docker containers, set a working directory for Gobblin jobs:
 
-* Download the images from the `gobblin/gobblin-wikipedia` repository
+`export LOCAL_JOB_DIR=<local_gobblin_directory>`
 
-```
-docker pull gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+We will use this directory as the [volume](https://docs.docker.com/storage/volumes/) for Gobblin jobs and outputs. Make sure your Docker has the [access](https://docs.docker.com/docker-for-mac/#file-sharing) to this folder. This is the prerequisite for all following example jobs.
 
-* Run the `gobblin/gobblin-wikipedia:ubuntu-gobblin-latest` image in a Docker container
+### Run the docker image with simple wikipedia jobs
 
-```
-docker run gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+Run these commands to start the docker image:
 
-The logs are printed to the console, and no errors should pop up. This should provide a nice sanity check to ensure that everything is working as expected. The output of the job will be written to a directory inside the container. When the container exits that data will be lost. In order to preserve the output of the job, continue to the next step.
+`docker pull gobblin/gobblin-standalone:alpine-gaas-latest`
 
-* Preserving the output of a Docker container requires using a [data volume](https://docs.docker.com/engine/tutorials/dockervolumes/). To do this, run the below command:
+`docker run gobblin/gobblin-standalone:alpine-gaas-latest`
 
-```
-docker run -v /home/gobblin/work-dir:/home/gobblin/work-dir gobblin-wikipedia
-```
+After the container spins up, put the [wikipedia.pull](https://github.com/apache/incubator-gobblin/blob/master/gobblin-example/src/main/resources/wikipedia.pull) in ${LOCAL_JOB_DIR}. You will see the Gobblin daemon pick up the job, and the result output is in ${LOCAL_JOB_DIR}/job-output/.
 
-The output of the Gobblin-Wikipedia job should now be written to `/home/gobblin/work-dir/job-output`. The `-v` command in Docker uses a feature of Docker called [data volumes](https://docs.docker.com/engine/tutorials/dockervolumes/). The `-v` option mounts a host directory into a container and is of the form `[host-directory]:[container-directory]`. Now any modifications to the host directory can be seen inside the container-directory, and any modifications to the container-directory can be seen inside the host-directory. This is a standard way to ensure data persists even after a Docker container finishes. It's important to note that the `[host-directory]` in the `-v` option can be changed to any directory (on OSX it must be under the `/Users/` directory), but the `[container-directory]` must remain `/home/gobblin/work-dir` (at least for now).
+This example job is correspondent to the [getting started guide](https://gobblin.readthedocs.io/en/latest/Getting-Started/). With the docker image, you can focus on the Gobblin functionalities, avoiding the hassle of building a distribution.
 
-## Gobblin-Standalone Repository
+### Use Gobblin Standalone on Docker for Kafka and HDFS Ingestion 
 
-The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-standalone/). These images run a Gobblin standalone service inside a Docker container. The Gobblin standalone service is a long running process that can run Gobblin jobs defined in a `.job` or `.pull` file. The job / pull files are submitted to the standalone service by placing them in a directory on the local filesystem. The standalone service monitors this directory for any new job / pull files and runs them either immediately or on a scheduled basis (more information on how this works can be found [here](Working-with-Job-Configuration-Files#adding-or-changing-job-configuration-files)). Running the Gobblin standalone service inside a Docker container allows Gobblin to pick up job / pull files from a directory on the host filesystem, run the job, and write the output back the host filesystem. All the heavy lifting is done inside a Docker container, the user just needs to
  worry about defining and submitting job / pull files. The goal is to provide a easy to setup environment for the Gobblin standalone service.
+* To ingest from/to Kafka and HDFS by Gobblin, you need to start services for Zookeeper, Kafka and HDFS along with Gobblin. We use docker [compose](https://docs.docker.com/compose/) with images contributed to docker hub. Firstly, you need to create a [docker-compose.yml](https://github.com/apache/incubator-gobblin/blob/master/gobblin-docker/gobblin-recipes/kafka-hdfs/docker-compose.yml) file(copy the content into your .yml file).

Review comment:
       We shouldn't ask them to copy the content into a local docker-compose if they just want to run some examples or try it out I think. We can just link directly to the docker compose recipe here.

##########
File path: gobblin-docs/user-guide/Docker-Integration.md
##########
@@ -18,68 +18,68 @@ The `gobblin/gobblin-wikipedia` repository contains images that run the Gobblin
 
 The `gobblin/gobblin-standalone` repository contains images that run a [Gobblin standalone service](Gobblin-Deployment#standalone-architecture) inside a Docker container. These images provide an easy and simple way to setup a Gobblin standalone service on any Docker compatible machine.
 
+The `gobblin/gobblin-service` repository contains images that run [Gobblin as a service](Building-Gobblin-as-a-Service#running-gobblin-as-a-service-with-docker), which is a service that takes in a user request (a logical flow) and converts it into a series of Gobblin Jobs, and monitors these jobs in a distributed manner.
+
 The `gobblin/gobblin-base` and `gobblin/gobblin-distributions` repositories are for internal use only, and are primarily useful for Gobblin developers.
 
-## Gobblin-Wikipedia Repository
+# Run Gobblin Standalone
+
+The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-standalone/). These images run a Gobblin standalone service inside a Docker container. The Gobblin standalone service is a long running process that can run Gobblin jobs defined in a `.job` or `.pull` file. The job / pull files are submitted to the standalone service by placing them in a directory on the local filesystem. The standalone service monitors this directory for any new job / pull files and runs them either immediately or on a scheduled basis (more information on how this works can be found [here](Working-with-Job-Configuration-Files#adding-or-changing-job-configuration-files)). Running the Gobblin standalone service inside a Docker container allows Gobblin to pick up job / pull files from a directory on the host filesystem, run the job, and write the output back the host filesystem. All the heavy lifting is done inside a Docker container, the user just needs to
  worry about defining and submitting job / pull files. The goal is to provide a easy to setup environment for the Gobblin standalone service.
 
-The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-wikipedia/). These images are mainly meant to act as a "Hello World" example for the Gobblin-Docker integration, and to provide a sanity check to see if the Gobblin-Docker integration is working on a given machine. The image contains the Gobblin configuration files to run the [Gobblin Wikipedia job](../Getting-Started). When a container is launched using the `gobblin-wikipedia` image, Gobblin starts up, runs the Wikipedia example, and then exits.
+### Set working directory
 
-Running the `gobblin-wikipedia` image requires taking following steps (lets assume we want to an Ubuntu based image):
+Before running docker containers, set a working directory for Gobblin jobs:
 
-* Download the images from the `gobblin/gobblin-wikipedia` repository
+`export LOCAL_JOB_DIR=<local_gobblin_directory>`
 
-```
-docker pull gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+We will use this directory as the [volume](https://docs.docker.com/storage/volumes/) for Gobblin jobs and outputs. Make sure your Docker has the [access](https://docs.docker.com/docker-for-mac/#file-sharing) to this folder. This is the prerequisite for all following example jobs.
 
-* Run the `gobblin/gobblin-wikipedia:ubuntu-gobblin-latest` image in a Docker container
+### Run the docker image with simple wikipedia jobs
 
-```
-docker run gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+Run these commands to start the docker image:
 
-The logs are printed to the console, and no errors should pop up. This should provide a nice sanity check to ensure that everything is working as expected. The output of the job will be written to a directory inside the container. When the container exits that data will be lost. In order to preserve the output of the job, continue to the next step.
+`docker pull gobblin/gobblin-standalone:alpine-gaas-latest`
 
-* Preserving the output of a Docker container requires using a [data volume](https://docs.docker.com/engine/tutorials/dockervolumes/). To do this, run the below command:
+`docker run gobblin/gobblin-standalone:alpine-gaas-latest`

Review comment:
       This wouldn't pick up any jobs unless we expose `LOCAL_JOB_DIR` as a docker volume to `/tmp/gobblin-standalone/jobs`, where gobblin standalone picks up its jobs. Usually not a problem in docker-compose where we can define volumes but if we want to run the docker image directly it'd look something like this 
   
   `docker run -v $LOCAL_JOB_DIR:/tmp/gobblin-standalone/jobs gobblin/gobblin-standalone:latest`
   
   Also when I tried running the wikipedia example it gives me this error: 
   ```
   Thread Thread[JobScheduler-0,5,main] threw an uncaught exception: java.lang.VerifyError: Stack map does not match the one at exception handler 77
   ```
   
   Did we confirm that the example is still working?

##########
File path: gobblin-docs/user-guide/Docker-Integration.md
##########
@@ -18,68 +18,68 @@ The `gobblin/gobblin-wikipedia` repository contains images that run the Gobblin
 
 The `gobblin/gobblin-standalone` repository contains images that run a [Gobblin standalone service](Gobblin-Deployment#standalone-architecture) inside a Docker container. These images provide an easy and simple way to setup a Gobblin standalone service on any Docker compatible machine.
 
+The `gobblin/gobblin-service` repository contains images that run [Gobblin as a service](Building-Gobblin-as-a-Service#running-gobblin-as-a-service-with-docker), which is a service that takes in a user request (a logical flow) and converts it into a series of Gobblin Jobs, and monitors these jobs in a distributed manner.
+
 The `gobblin/gobblin-base` and `gobblin/gobblin-distributions` repositories are for internal use only, and are primarily useful for Gobblin developers.
 
-## Gobblin-Wikipedia Repository
+# Run Gobblin Standalone
+
+The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-standalone/). These images run a Gobblin standalone service inside a Docker container. The Gobblin standalone service is a long running process that can run Gobblin jobs defined in a `.job` or `.pull` file. The job / pull files are submitted to the standalone service by placing them in a directory on the local filesystem. The standalone service monitors this directory for any new job / pull files and runs them either immediately or on a scheduled basis (more information on how this works can be found [here](Working-with-Job-Configuration-Files#adding-or-changing-job-configuration-files)). Running the Gobblin standalone service inside a Docker container allows Gobblin to pick up job / pull files from a directory on the host filesystem, run the job, and write the output back the host filesystem. All the heavy lifting is done inside a Docker container, the user just needs to
  worry about defining and submitting job / pull files. The goal is to provide a easy to setup environment for the Gobblin standalone service.
 
-The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-wikipedia/). These images are mainly meant to act as a "Hello World" example for the Gobblin-Docker integration, and to provide a sanity check to see if the Gobblin-Docker integration is working on a given machine. The image contains the Gobblin configuration files to run the [Gobblin Wikipedia job](../Getting-Started). When a container is launched using the `gobblin-wikipedia` image, Gobblin starts up, runs the Wikipedia example, and then exits.
+### Set working directory
 
-Running the `gobblin-wikipedia` image requires taking following steps (lets assume we want to an Ubuntu based image):
+Before running docker containers, set a working directory for Gobblin jobs:
 
-* Download the images from the `gobblin/gobblin-wikipedia` repository
+`export LOCAL_JOB_DIR=<local_gobblin_directory>`
 
-```
-docker pull gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+We will use this directory as the [volume](https://docs.docker.com/storage/volumes/) for Gobblin jobs and outputs. Make sure your Docker has the [access](https://docs.docker.com/docker-for-mac/#file-sharing) to this folder. This is the prerequisite for all following example jobs.
 
-* Run the `gobblin/gobblin-wikipedia:ubuntu-gobblin-latest` image in a Docker container
+### Run the docker image with simple wikipedia jobs
 
-```
-docker run gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+Run these commands to start the docker image:
 
-The logs are printed to the console, and no errors should pop up. This should provide a nice sanity check to ensure that everything is working as expected. The output of the job will be written to a directory inside the container. When the container exits that data will be lost. In order to preserve the output of the job, continue to the next step.
+`docker pull gobblin/gobblin-standalone:alpine-gaas-latest`
 
-* Preserving the output of a Docker container requires using a [data volume](https://docs.docker.com/engine/tutorials/dockervolumes/). To do this, run the below command:
+`docker run gobblin/gobblin-standalone:alpine-gaas-latest`
 
-```
-docker run -v /home/gobblin/work-dir:/home/gobblin/work-dir gobblin-wikipedia
-```
+After the container spins up, put the [wikipedia.pull](https://github.com/apache/incubator-gobblin/blob/master/gobblin-example/src/main/resources/wikipedia.pull) in ${LOCAL_JOB_DIR}. You will see the Gobblin daemon pick up the job, and the result output is in ${LOCAL_JOB_DIR}/job-output/.
 
-The output of the Gobblin-Wikipedia job should now be written to `/home/gobblin/work-dir/job-output`. The `-v` command in Docker uses a feature of Docker called [data volumes](https://docs.docker.com/engine/tutorials/dockervolumes/). The `-v` option mounts a host directory into a container and is of the form `[host-directory]:[container-directory]`. Now any modifications to the host directory can be seen inside the container-directory, and any modifications to the container-directory can be seen inside the host-directory. This is a standard way to ensure data persists even after a Docker container finishes. It's important to note that the `[host-directory]` in the `-v` option can be changed to any directory (on OSX it must be under the `/Users/` directory), but the `[container-directory]` must remain `/home/gobblin/work-dir` (at least for now).
+This example job is correspondent to the [getting started guide](https://gobblin.readthedocs.io/en/latest/Getting-Started/). With the docker image, you can focus on the Gobblin functionalities, avoiding the hassle of building a distribution.
 
-## Gobblin-Standalone Repository
+### Use Gobblin Standalone on Docker for Kafka and HDFS Ingestion 
 
-The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-standalone/). These images run a Gobblin standalone service inside a Docker container. The Gobblin standalone service is a long running process that can run Gobblin jobs defined in a `.job` or `.pull` file. The job / pull files are submitted to the standalone service by placing them in a directory on the local filesystem. The standalone service monitors this directory for any new job / pull files and runs them either immediately or on a scheduled basis (more information on how this works can be found [here](Working-with-Job-Configuration-Files#adding-or-changing-job-configuration-files)). Running the Gobblin standalone service inside a Docker container allows Gobblin to pick up job / pull files from a directory on the host filesystem, run the job, and write the output back the host filesystem. All the heavy lifting is done inside a Docker container, the user just needs to
  worry about defining and submitting job / pull files. The goal is to provide a easy to setup environment for the Gobblin standalone service.
+* To ingest from/to Kafka and HDFS by Gobblin, you need to start services for Zookeeper, Kafka and HDFS along with Gobblin. We use docker [compose](https://docs.docker.com/compose/) with images contributed to docker hub. Firstly, you need to create a [docker-compose.yml](https://github.com/apache/incubator-gobblin/blob/master/gobblin-docker/gobblin-recipes/kafka-hdfs/docker-compose.yml) file(copy the content into your .yml file).

Review comment:
       The link `https://github.com/apache/incubator-gobblin/gobblin-docker/gobblin-recipes/kafka-hdfs/docker-compose.yml` is broken. Did you mean `https://github.com/apache/incubator-gobblin/gobblin-docker/gobblin-recipes/kafka-hdfs/docker-compose.yml`?

##########
File path: gobblin-docs/user-guide/Docker-Integration.md
##########
@@ -18,68 +18,68 @@ The `gobblin/gobblin-wikipedia` repository contains images that run the Gobblin
 
 The `gobblin/gobblin-standalone` repository contains images that run a [Gobblin standalone service](Gobblin-Deployment#standalone-architecture) inside a Docker container. These images provide an easy and simple way to setup a Gobblin standalone service on any Docker compatible machine.
 
+The `gobblin/gobblin-service` repository contains images that run [Gobblin as a service](Building-Gobblin-as-a-Service#running-gobblin-as-a-service-with-docker), which is a service that takes in a user request (a logical flow) and converts it into a series of Gobblin Jobs, and monitors these jobs in a distributed manner.
+
 The `gobblin/gobblin-base` and `gobblin/gobblin-distributions` repositories are for internal use only, and are primarily useful for Gobblin developers.
 
-## Gobblin-Wikipedia Repository
+# Run Gobblin Standalone
+
+The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-standalone/). These images run a Gobblin standalone service inside a Docker container. The Gobblin standalone service is a long running process that can run Gobblin jobs defined in a `.job` or `.pull` file. The job / pull files are submitted to the standalone service by placing them in a directory on the local filesystem. The standalone service monitors this directory for any new job / pull files and runs them either immediately or on a scheduled basis (more information on how this works can be found [here](Working-with-Job-Configuration-Files#adding-or-changing-job-configuration-files)). Running the Gobblin standalone service inside a Docker container allows Gobblin to pick up job / pull files from a directory on the host filesystem, run the job, and write the output back the host filesystem. All the heavy lifting is done inside a Docker container, the user just needs to
  worry about defining and submitting job / pull files. The goal is to provide a easy to setup environment for the Gobblin standalone service.
 
-The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-wikipedia/). These images are mainly meant to act as a "Hello World" example for the Gobblin-Docker integration, and to provide a sanity check to see if the Gobblin-Docker integration is working on a given machine. The image contains the Gobblin configuration files to run the [Gobblin Wikipedia job](../Getting-Started). When a container is launched using the `gobblin-wikipedia` image, Gobblin starts up, runs the Wikipedia example, and then exits.
+### Set working directory
 
-Running the `gobblin-wikipedia` image requires taking following steps (lets assume we want to an Ubuntu based image):
+Before running docker containers, set a working directory for Gobblin jobs:
 
-* Download the images from the `gobblin/gobblin-wikipedia` repository
+`export LOCAL_JOB_DIR=<local_gobblin_directory>`
 
-```
-docker pull gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+We will use this directory as the [volume](https://docs.docker.com/storage/volumes/) for Gobblin jobs and outputs. Make sure your Docker has the [access](https://docs.docker.com/docker-for-mac/#file-sharing) to this folder. This is the prerequisite for all following example jobs.
 
-* Run the `gobblin/gobblin-wikipedia:ubuntu-gobblin-latest` image in a Docker container
+### Run the docker image with simple wikipedia jobs
 
-```
-docker run gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+Run these commands to start the docker image:
 
-The logs are printed to the console, and no errors should pop up. This should provide a nice sanity check to ensure that everything is working as expected. The output of the job will be written to a directory inside the container. When the container exits that data will be lost. In order to preserve the output of the job, continue to the next step.
+`docker pull gobblin/gobblin-standalone:alpine-gaas-latest`
 
-* Preserving the output of a Docker container requires using a [data volume](https://docs.docker.com/engine/tutorials/dockervolumes/). To do this, run the below command:
+`docker run gobblin/gobblin-standalone:alpine-gaas-latest`
 
-```
-docker run -v /home/gobblin/work-dir:/home/gobblin/work-dir gobblin-wikipedia
-```
+After the container spins up, put the [wikipedia.pull](https://github.com/apache/incubator-gobblin/blob/master/gobblin-example/src/main/resources/wikipedia.pull) in ${LOCAL_JOB_DIR}. You will see the Gobblin daemon pick up the job, and the result output is in ${LOCAL_JOB_DIR}/job-output/.
 
-The output of the Gobblin-Wikipedia job should now be written to `/home/gobblin/work-dir/job-output`. The `-v` command in Docker uses a feature of Docker called [data volumes](https://docs.docker.com/engine/tutorials/dockervolumes/). The `-v` option mounts a host directory into a container and is of the form `[host-directory]:[container-directory]`. Now any modifications to the host directory can be seen inside the container-directory, and any modifications to the container-directory can be seen inside the host-directory. This is a standard way to ensure data persists even after a Docker container finishes. It's important to note that the `[host-directory]` in the `-v` option can be changed to any directory (on OSX it must be under the `/Users/` directory), but the `[container-directory]` must remain `/home/gobblin/work-dir` (at least for now).
+This example job is correspondent to the [getting started guide](https://gobblin.readthedocs.io/en/latest/Getting-Started/). With the docker image, you can focus on the Gobblin functionalities, avoiding the hassle of building a distribution.
 
-## Gobblin-Standalone Repository
+### Use Gobblin Standalone on Docker for Kafka and HDFS Ingestion 
 
-The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-standalone/). These images run a Gobblin standalone service inside a Docker container. The Gobblin standalone service is a long running process that can run Gobblin jobs defined in a `.job` or `.pull` file. The job / pull files are submitted to the standalone service by placing them in a directory on the local filesystem. The standalone service monitors this directory for any new job / pull files and runs them either immediately or on a scheduled basis (more information on how this works can be found [here](Working-with-Job-Configuration-Files#adding-or-changing-job-configuration-files)). Running the Gobblin standalone service inside a Docker container allows Gobblin to pick up job / pull files from a directory on the host filesystem, run the job, and write the output back the host filesystem. All the heavy lifting is done inside a Docker container, the user just needs to
  worry about defining and submitting job / pull files. The goal is to provide a easy to setup environment for the Gobblin standalone service.
+* To ingest from/to Kafka and HDFS by Gobblin, you need to start services for Zookeeper, Kafka and HDFS along with Gobblin. We use docker [compose](https://docs.docker.com/compose/) with images contributed to docker hub. Firstly, you need to create a [docker-compose.yml](https://github.com/apache/incubator-gobblin/blob/master/gobblin-docker/gobblin-recipes/kafka-hdfs/docker-compose.yml) file(copy the content into your .yml file).
+
+* Second, in the same folder of the yml file, create a [hadoop.env](https://github.com/apache/incubator-gobblin/blob/master/gobblin-docker/gobblin-recipes/kafka-hdfs/hadoop.env) file to specify all HDFS related config(copy the content into your .env file).
+
+* Open a terminal in the same folder, pull and run these docker services:
+
+    `docker-compose -f ./docker-compose.yml pull`

Review comment:
       The docker commands aren't working for me. I run into the following errors:
   ```
   Unsupported config option for services.kafka: 'datanode1'
   Unsupported config option for services.gobblin-standalone: 'zookeeper'
   ```
   
   Do you know what it could be?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] codecov-io edited a comment on pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154#issuecomment-729444051


   # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=h1) Report
   > Merging [#3154](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=desc) (f188c25) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/0187e03a6fb213edf7c941e4a43bfed62534559c?el=desc) (0187e03) will **decrease** coverage by `36.77%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master   #3154       +/-   ##
   ============================================
   - Coverage     45.99%   9.21%   -36.78%     
   + Complexity     9600    1724     -7876     
   ============================================
     Files          1993    1998        +5     
     Lines         76013   76179      +166     
     Branches       8464    8478       +14     
   ============================================
   - Hits          34959    7022    -27937     
   - Misses        37786   68474    +30688     
   + Partials       3268     683     -2585     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...c/main/java/org/apache/gobblin/util/FileUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvRmlsZVV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [...n/java/org/apache/gobblin/fork/CopyableSchema.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2ZvcmsvQ29weWFibGVTY2hlbWEuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...java/org/apache/gobblin/stream/ControlMessage.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vc3RyZWFtL0NvbnRyb2xNZXNzYWdlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...va/org/apache/gobblin/dataset/DatasetResolver.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YXNldC9EYXRhc2V0UmVzb2x2ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...va/org/apache/gobblin/converter/EmptyIterable.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbnZlcnRlci9FbXB0eUl0ZXJhYmxlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...org/apache/gobblin/ack/BasicAckableForTesting.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vYWNrL0Jhc2ljQWNrYWJsZUZvclRlc3RpbmcuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...n/java/org/apache/gobblin/salesforce/SfConfig.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2ZDb25maWcuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [.../org/apache/gobblin/yarn/HelixMessageSubTypes.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vSGVsaXhNZXNzYWdlU3ViVHlwZXMuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...va/org/apache/gobblin/cluster/SingleHelixTask.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvU2luZ2xlSGVsaXhUYXNrLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | [.../apache/gobblin/records/ControlMessageHandler.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vcmVjb3Jkcy9Db250cm9sTWVzc2FnZUhhbmRsZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | ... and [1058 more](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=footer). Last update [0187e03...833f866](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] codecov-io edited a comment on pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154#issuecomment-729444051


   # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=h1) Report
   > Merging [#3154](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=desc) (8501b73) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/0187e03a6fb213edf7c941e4a43bfed62534559c?el=desc) (0187e03) will **decrease** coverage by `36.88%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master   #3154       +/-   ##
   ============================================
   - Coverage     45.99%   9.10%   -36.89%     
   + Complexity     9600    1724     -7876     
   ============================================
     Files          1993    2014       +21     
     Lines         76013   77095     +1082     
     Branches       8464    8559       +95     
   ============================================
   - Hits          34959    7021    -27938     
   - Misses        37786   69390    +31604     
   + Partials       3268     684     -2584     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...c/main/java/org/apache/gobblin/util/FileUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvRmlsZVV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [...n/java/org/apache/gobblin/fork/CopyableSchema.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2ZvcmsvQ29weWFibGVTY2hlbWEuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...java/org/apache/gobblin/stream/ControlMessage.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vc3RyZWFtL0NvbnRyb2xNZXNzYWdlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...va/org/apache/gobblin/dataset/DatasetResolver.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YXNldC9EYXRhc2V0UmVzb2x2ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...va/org/apache/gobblin/converter/EmptyIterable.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbnZlcnRlci9FbXB0eUl0ZXJhYmxlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...org/apache/gobblin/ack/BasicAckableForTesting.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vYWNrL0Jhc2ljQWNrYWJsZUZvclRlc3RpbmcuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...n/java/org/apache/gobblin/salesforce/SfConfig.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2ZDb25maWcuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [.../org/apache/gobblin/yarn/HelixMessageSubTypes.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vSGVsaXhNZXNzYWdlU3ViVHlwZXMuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...va/org/apache/gobblin/cluster/SingleHelixTask.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvU2luZ2xlSGVsaXhUYXNrLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | [.../apache/gobblin/records/ControlMessageHandler.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vcmVjb3Jkcy9Db250cm9sTWVzc2FnZUhhbmRsZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | ... and [1079 more](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=footer). Last update [0187e03...55c7361](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] hanghangliu commented on a change in pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
hanghangliu commented on a change in pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154#discussion_r540757279



##########
File path: gobblin-docker/gobblin-recipes/kafka-hdfs/docker-compose.yml
##########
@@ -0,0 +1,127 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+version: '3'
+services:
+  gobblin-standalone:
+    image: apache/gobblin:latest
+    volumes:
+      - "${LOCAL_JOB_DIR}:/tmp/gobblin-standalone/jobs"
+  zookeeper:
+    image: wurstmeister/zookeeper
+    ports:
+      - "2181:2181"
+  kafka:

Review comment:
       Added. thanks for your suggestions 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] hanghangliu commented on a change in pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
hanghangliu commented on a change in pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154#discussion_r539763402



##########
File path: gobblin-docker/gobblin-recipes/kafka-hdfs/docker-compose.yml
##########
@@ -0,0 +1,127 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+version: '3'
+services:
+  gobblin-standalone:
+    image: apache/gobblin:latest
+    volumes:
+      - "${LOCAL_JOB_DIR}:/tmp/gobblin-standalone/jobs"
+  zookeeper:
+    image: wurstmeister/zookeeper
+    ports:
+      - "2181:2181"
+  kafka:
+    image: wurstmeister/kafka
+    ports:
+      - "9092:9092"
+    environment:
+      KAFKA_ADVERTISED_HOST_NAME: "kafka"
+      KAFKA_ADVERTISED_PORT: "9092"
+      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
+      KAFKA_CREATE_TOPICS: "test:1:1"
+    volumes:
+      - /var/run/docker.sock:/var/run/docker.sock
+
+  namenode:
+    image: bde2020/hadoop-namenode:1.1.0-hadoop2.7.1-java8
+    container_name: namenode
+    volumes:
+      - hadoop_namenode:/hadoop/dfs/name
+    environment:
+      - CLUSTER_NAME=test_cluster
+    env_file:
+      - ./hadoop.env
+    ports:
+      - "9870:9870"
+
+  resourcemanager:
+    image: bde2020/hadoop-resourcemanager:1.1.0-hadoop2.7.1-java8
+    container_name: resourcemanager
+    restart: on-failure
+    depends_on:

Review comment:
       Version 3 no longer supports the CONDITION form of depends_on.
   
   depends_on is sill needed here since we want to start services in dependency order
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] autumnust commented on pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
autumnust commented on pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154#issuecomment-730553898


   Can you rebase against the newest upstream ?  So that it will exclude Tamas's change in this PR. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] codecov-io edited a comment on pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154#issuecomment-729444051


   # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=h1) Report
   > Merging [#3154](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=desc) (3e7707c) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/0187e03a6fb213edf7c941e4a43bfed62534559c?el=desc) (0187e03) will **decrease** coverage by `36.88%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master   #3154       +/-   ##
   ============================================
   - Coverage     45.99%   9.10%   -36.89%     
   + Complexity     9600    1724     -7876     
   ============================================
     Files          1993    2014       +21     
     Lines         76013   77095     +1082     
     Branches       8464    8559       +95     
   ============================================
   - Hits          34959    7022    -27937     
   - Misses        37786   69390    +31604     
   + Partials       3268     683     -2585     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...c/main/java/org/apache/gobblin/util/FileUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvRmlsZVV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [...n/java/org/apache/gobblin/fork/CopyableSchema.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2ZvcmsvQ29weWFibGVTY2hlbWEuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...java/org/apache/gobblin/stream/ControlMessage.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vc3RyZWFtL0NvbnRyb2xNZXNzYWdlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...va/org/apache/gobblin/dataset/DatasetResolver.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YXNldC9EYXRhc2V0UmVzb2x2ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...va/org/apache/gobblin/converter/EmptyIterable.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbnZlcnRlci9FbXB0eUl0ZXJhYmxlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...org/apache/gobblin/ack/BasicAckableForTesting.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vYWNrL0Jhc2ljQWNrYWJsZUZvclRlc3RpbmcuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...n/java/org/apache/gobblin/salesforce/SfConfig.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2ZDb25maWcuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [.../org/apache/gobblin/yarn/HelixMessageSubTypes.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vSGVsaXhNZXNzYWdlU3ViVHlwZXMuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...va/org/apache/gobblin/cluster/SingleHelixTask.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvU2luZ2xlSGVsaXhUYXNrLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | [.../apache/gobblin/records/ControlMessageHandler.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vcmVjb3Jkcy9Db250cm9sTWVzc2FnZUhhbmRsZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | ... and [1079 more](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=footer). Last update [0187e03...3e7707c](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] hanghangliu commented on a change in pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
hanghangliu commented on a change in pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154#discussion_r528837077



##########
File path: gobblin-docs/user-guide/Docker-Integration.md
##########
@@ -18,68 +18,68 @@ The `gobblin/gobblin-wikipedia` repository contains images that run the Gobblin
 
 The `gobblin/gobblin-standalone` repository contains images that run a [Gobblin standalone service](Gobblin-Deployment#standalone-architecture) inside a Docker container. These images provide an easy and simple way to setup a Gobblin standalone service on any Docker compatible machine.
 
+The `gobblin/gobblin-service` repository contains images that run [Gobblin as a service](Building-Gobblin-as-a-Service#running-gobblin-as-a-service-with-docker), which is a service that takes in a user request (a logical flow) and converts it into a series of Gobblin Jobs, and monitors these jobs in a distributed manner.
+
 The `gobblin/gobblin-base` and `gobblin/gobblin-distributions` repositories are for internal use only, and are primarily useful for Gobblin developers.
 
-## Gobblin-Wikipedia Repository
+# Run Gobblin Standalone
+
+The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-standalone/). These images run a Gobblin standalone service inside a Docker container. The Gobblin standalone service is a long running process that can run Gobblin jobs defined in a `.job` or `.pull` file. The job / pull files are submitted to the standalone service by placing them in a directory on the local filesystem. The standalone service monitors this directory for any new job / pull files and runs them either immediately or on a scheduled basis (more information on how this works can be found [here](Working-with-Job-Configuration-Files#adding-or-changing-job-configuration-files)). Running the Gobblin standalone service inside a Docker container allows Gobblin to pick up job / pull files from a directory on the host filesystem, run the job, and write the output back the host filesystem. All the heavy lifting is done inside a Docker container, the user just needs to
  worry about defining and submitting job / pull files. The goal is to provide a easy to setup environment for the Gobblin standalone service.
 
-The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-wikipedia/). These images are mainly meant to act as a "Hello World" example for the Gobblin-Docker integration, and to provide a sanity check to see if the Gobblin-Docker integration is working on a given machine. The image contains the Gobblin configuration files to run the [Gobblin Wikipedia job](../Getting-Started). When a container is launched using the `gobblin-wikipedia` image, Gobblin starts up, runs the Wikipedia example, and then exits.
+### Set working directory
 
-Running the `gobblin-wikipedia` image requires taking following steps (lets assume we want to an Ubuntu based image):
+Before running docker containers, set a working directory for Gobblin jobs:
 
-* Download the images from the `gobblin/gobblin-wikipedia` repository
+`export LOCAL_JOB_DIR=<local_gobblin_directory>`
 
-```
-docker pull gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+We will use this directory as the [volume](https://docs.docker.com/storage/volumes/) for Gobblin jobs and outputs. Make sure your Docker has the [access](https://docs.docker.com/docker-for-mac/#file-sharing) to this folder. This is the prerequisite for all following example jobs.
 
-* Run the `gobblin/gobblin-wikipedia:ubuntu-gobblin-latest` image in a Docker container
+### Run the docker image with simple wikipedia jobs
 
-```
-docker run gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+Run these commands to start the docker image:
 
-The logs are printed to the console, and no errors should pop up. This should provide a nice sanity check to ensure that everything is working as expected. The output of the job will be written to a directory inside the container. When the container exits that data will be lost. In order to preserve the output of the job, continue to the next step.
+`docker pull gobblin/gobblin-standalone:alpine-gaas-latest`
 
-* Preserving the output of a Docker container requires using a [data volume](https://docs.docker.com/engine/tutorials/dockervolumes/). To do this, run the below command:
+`docker run gobblin/gobblin-standalone:alpine-gaas-latest`
 
-```
-docker run -v /home/gobblin/work-dir:/home/gobblin/work-dir gobblin-wikipedia
-```
+After the container spins up, put the [wikipedia.pull](https://github.com/apache/incubator-gobblin/blob/master/gobblin-example/src/main/resources/wikipedia.pull) in ${LOCAL_JOB_DIR}. You will see the Gobblin daemon pick up the job, and the result output is in ${LOCAL_JOB_DIR}/job-output/.
 
-The output of the Gobblin-Wikipedia job should now be written to `/home/gobblin/work-dir/job-output`. The `-v` command in Docker uses a feature of Docker called [data volumes](https://docs.docker.com/engine/tutorials/dockervolumes/). The `-v` option mounts a host directory into a container and is of the form `[host-directory]:[container-directory]`. Now any modifications to the host directory can be seen inside the container-directory, and any modifications to the container-directory can be seen inside the host-directory. This is a standard way to ensure data persists even after a Docker container finishes. It's important to note that the `[host-directory]` in the `-v` option can be changed to any directory (on OSX it must be under the `/Users/` directory), but the `[container-directory]` must remain `/home/gobblin/work-dir` (at least for now).
+This example job is correspondent to the [getting started guide](https://gobblin.readthedocs.io/en/latest/Getting-Started/). With the docker image, you can focus on the Gobblin functionalities, avoiding the hassle of building a distribution.
 
-## Gobblin-Standalone Repository
+### Use Gobblin Standalone on Docker for Kafka and HDFS Ingestion 
 
-The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-standalone/). These images run a Gobblin standalone service inside a Docker container. The Gobblin standalone service is a long running process that can run Gobblin jobs defined in a `.job` or `.pull` file. The job / pull files are submitted to the standalone service by placing them in a directory on the local filesystem. The standalone service monitors this directory for any new job / pull files and runs them either immediately or on a scheduled basis (more information on how this works can be found [here](Working-with-Job-Configuration-Files#adding-or-changing-job-configuration-files)). Running the Gobblin standalone service inside a Docker container allows Gobblin to pick up job / pull files from a directory on the host filesystem, run the job, and write the output back the host filesystem. All the heavy lifting is done inside a Docker container, the user just needs to
  worry about defining and submitting job / pull files. The goal is to provide a easy to setup environment for the Gobblin standalone service.
+* To ingest from/to Kafka and HDFS by Gobblin, you need to start services for Zookeeper, Kafka and HDFS along with Gobblin. We use docker [compose](https://docs.docker.com/compose/) with images contributed to docker hub. Firstly, you need to create a [docker-compose.yml](https://github.com/apache/incubator-gobblin/blob/master/gobblin-docker/gobblin-recipes/kafka-hdfs/docker-compose.yml) file(copy the content into your .yml file).

Review comment:
       This is expected, since the file haven't been pushed to master. Please kindly use from my personal fork:
   https://github.com/hanghangliu/incubator-gobblin/blob/gobblin-1317-add-docker-recipes-documentations/gobblin-docker/gobblin-recipes/kafka-hdfs/docker-compose.yml
   
   https://github.com/hanghangliu/incubator-gobblin/blob/gobblin-1317-add-docker-recipes-documentations/gobblin-docker/gobblin-recipes/kafka-hdfs/hadoop.env




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] codecov-io edited a comment on pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154#issuecomment-729444051


   # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=h1) Report
   > Merging [#3154](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=desc) (3e7707c) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/0187e03a6fb213edf7c941e4a43bfed62534559c?el=desc) (0187e03) will **increase** coverage by `0.19%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #3154      +/-   ##
   ============================================
   + Coverage     45.99%   46.18%   +0.19%     
   - Complexity     9600     9753     +153     
   ============================================
     Files          1993     2014      +21     
     Lines         76013    77095    +1082     
     Branches       8464     8559      +95     
   ============================================
   + Hits          34959    35604     +645     
   - Misses        37786    38186     +400     
   - Partials       3268     3305      +37     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...apache/gobblin/runtime/api/JobCatalogListener.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBpL0pvYkNhdGFsb2dMaXN0ZW5lci5qYXZh) | `76.92% <0.00%> (-23.08%)` | `0.00% <0.00%> (ø%)` | |
   | [...n/runtime/job\_catalog/JobCatalogListenersList.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvam9iX2NhdGFsb2cvSm9iQ2F0YWxvZ0xpc3RlbmVyc0xpc3QuamF2YQ==) | `63.63% <0.00%> (-10.05%)` | `10.00% <0.00%> (ø%)` | |
   | [...pache/gobblin/cluster/JobConfigurationManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSm9iQ29uZmlndXJhdGlvbk1hbmFnZXIuamF2YQ==) | `81.39% <0.00%> (-6.11%)` | `10.00% <0.00%> (ø%)` | |
   | [...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh) | `32.23% <0.00%> (-5.79%)` | `12.00% <0.00%> (-2.00%)` | |
   | [.../apache/gobblin/runtime/api/MutableJobCatalog.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBpL011dGFibGVKb2JDYXRhbG9nLmphdmE=) | `81.25% <0.00%> (-5.42%)` | `0.00% <0.00%> (ø%)` | |
   | [...ache/gobblin/cluster/GobblinHelixJobScheduler.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkhlbGl4Sm9iU2NoZWR1bGVyLmphdmE=) | `34.48% <0.00%> (-4.74%)` | `6.00% <0.00%> (ø%)` | |
   | [...a/org/apache/gobblin/cluster/HelixJobsMapping.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhKb2JzTWFwcGluZy5qYXZh) | `85.24% <0.00%> (-4.34%)` | `17.00% <0.00%> (+4.00%)` | :arrow_down: |
   | [...source/extractor/hadoop/HadoopFileInputSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvaGFkb29wL0hhZG9vcEZpbGVJbnB1dFNvdXJjZS5qYXZh) | `59.32% <0.00%> (-4.32%)` | `6.00% <0.00%> (ø%)` | |
   | [...che/gobblin/cluster/FsJobConfigurationManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvRnNKb2JDb25maWd1cmF0aW9uTWFuYWdlci5qYXZh) | `71.11% <0.00%> (-3.31%)` | `7.00% <0.00%> (ø%)` | |
   | [.../runtime/job\_catalog/NonObservingFSJobCatalog.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvam9iX2NhdGFsb2cvTm9uT2JzZXJ2aW5nRlNKb2JDYXRhbG9nLmphdmE=) | `55.81% <0.00%> (-1.69%)` | `6.00% <0.00%> (+1.00%)` | :arrow_down: |
   | ... and [50 more](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=footer). Last update [0187e03...3e7707c](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] codecov-io commented on pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154#issuecomment-729444051


   # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=h1) Report
   > Merging [#3154](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=desc) (387e65d) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/7352cad8ae3a1d9be10d3b6fb78383ccbada9b19?el=desc) (7352cad) will **increase** coverage by `0.00%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff            @@
   ##             master    #3154   +/-   ##
   =========================================
     Coverage     41.69%   41.70%           
   - Complexity     8743     8746    +3     
   =========================================
     Files          1993     1993           
     Lines         75990    75990           
     Branches       8462     8462           
   =========================================
   + Hits          31685    31690    +5     
   + Misses        41297    41289    -8     
   - Partials       3008     3011    +3     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [.../org/apache/gobblin/async/AsyncDataDispatcher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2FzeW5jL0FzeW5jRGF0YURpc3BhdGNoZXIuamF2YQ==) | `79.66% <0.00%> (-8.48%)` | `13.00% <0.00%> (-1.00%)` | |
   | [...a/management/copy/publisher/CopyDataPublisher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L2NvcHkvcHVibGlzaGVyL0NvcHlEYXRhUHVibGlzaGVyLmphdmE=) | `74.00% <0.00%> (-1.34%)` | `31.00% <0.00%> (-1.00%)` | |
   | [...pache/gobblin/runtime/GobblinMultiTaskAttempt.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvR29iYmxpbk11bHRpVGFza0F0dGVtcHQuamF2YQ==) | `42.04% <0.00%> (-0.41%)` | `14.00% <0.00%> (-1.00%)` | |
   | [...e/gobblin/util/filesystem/FileSystemDecorator.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvZmlsZXN5c3RlbS9GaWxlU3lzdGVtRGVjb3JhdG9yLmphdmE=) | `12.41% <0.00%> (+1.30%)` | `12.00% <0.00%> (+1.00%)` | |
   | [...lin/restli/throttling/ZookeeperLeaderElection.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi1yZXN0bGkvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2UvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2Utc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3Jlc3RsaS90aHJvdHRsaW5nL1pvb2tlZXBlckxlYWRlckVsZWN0aW9uLmphdmE=) | `72.22% <0.00%> (+2.22%)` | `13.00% <0.00%> (ø%)` | |
   | [...e/gobblin/util/filesystem/ThrottledFileSystem.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvZmlsZXN5c3RlbS9UaHJvdHRsZWRGaWxlU3lzdGVtLmphdmE=) | `47.54% <0.00%> (+4.91%)` | `11.00% <0.00%> (+1.00%)` | |
   | [...a/org/apache/gobblin/util/limiter/NoopLimiter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvbGltaXRlci9Ob29wTGltaXRlci5qYXZh) | `60.00% <0.00%> (+20.00%)` | `3.00% <0.00%> (+1.00%)` | |
   | [...lin/util/filesystem/FileSystemInstrumentation.java](https://codecov.io/gh/apache/incubator-gobblin/pull/3154/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvZmlsZXN5c3RlbS9GaWxlU3lzdGVtSW5zdHJ1bWVudGF0aW9uLmphdmE=) | `92.85% <0.00%> (+35.71%)` | `4.00% <0.00%> (+2.00%)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=footer). Last update [7352cad...387e65d](https://codecov.io/gh/apache/incubator-gobblin/pull/3154?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] asfgit closed pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] hanghangliu commented on a change in pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
hanghangliu commented on a change in pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154#discussion_r528837742



##########
File path: gobblin-docs/user-guide/Docker-Integration.md
##########
@@ -18,68 +18,68 @@ The `gobblin/gobblin-wikipedia` repository contains images that run the Gobblin
 
 The `gobblin/gobblin-standalone` repository contains images that run a [Gobblin standalone service](Gobblin-Deployment#standalone-architecture) inside a Docker container. These images provide an easy and simple way to setup a Gobblin standalone service on any Docker compatible machine.
 
+The `gobblin/gobblin-service` repository contains images that run [Gobblin as a service](Building-Gobblin-as-a-Service#running-gobblin-as-a-service-with-docker), which is a service that takes in a user request (a logical flow) and converts it into a series of Gobblin Jobs, and monitors these jobs in a distributed manner.
+
 The `gobblin/gobblin-base` and `gobblin/gobblin-distributions` repositories are for internal use only, and are primarily useful for Gobblin developers.
 
-## Gobblin-Wikipedia Repository
+# Run Gobblin Standalone
+
+The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-standalone/). These images run a Gobblin standalone service inside a Docker container. The Gobblin standalone service is a long running process that can run Gobblin jobs defined in a `.job` or `.pull` file. The job / pull files are submitted to the standalone service by placing them in a directory on the local filesystem. The standalone service monitors this directory for any new job / pull files and runs them either immediately or on a scheduled basis (more information on how this works can be found [here](Working-with-Job-Configuration-Files#adding-or-changing-job-configuration-files)). Running the Gobblin standalone service inside a Docker container allows Gobblin to pick up job / pull files from a directory on the host filesystem, run the job, and write the output back the host filesystem. All the heavy lifting is done inside a Docker container, the user just needs to
  worry about defining and submitting job / pull files. The goal is to provide a easy to setup environment for the Gobblin standalone service.
 
-The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-wikipedia/). These images are mainly meant to act as a "Hello World" example for the Gobblin-Docker integration, and to provide a sanity check to see if the Gobblin-Docker integration is working on a given machine. The image contains the Gobblin configuration files to run the [Gobblin Wikipedia job](../Getting-Started). When a container is launched using the `gobblin-wikipedia` image, Gobblin starts up, runs the Wikipedia example, and then exits.
+### Set working directory
 
-Running the `gobblin-wikipedia` image requires taking following steps (lets assume we want to an Ubuntu based image):
+Before running docker containers, set a working directory for Gobblin jobs:
 
-* Download the images from the `gobblin/gobblin-wikipedia` repository
+`export LOCAL_JOB_DIR=<local_gobblin_directory>`
 
-```
-docker pull gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+We will use this directory as the [volume](https://docs.docker.com/storage/volumes/) for Gobblin jobs and outputs. Make sure your Docker has the [access](https://docs.docker.com/docker-for-mac/#file-sharing) to this folder. This is the prerequisite for all following example jobs.
 
-* Run the `gobblin/gobblin-wikipedia:ubuntu-gobblin-latest` image in a Docker container
+### Run the docker image with simple wikipedia jobs
 
-```
-docker run gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+Run these commands to start the docker image:
 
-The logs are printed to the console, and no errors should pop up. This should provide a nice sanity check to ensure that everything is working as expected. The output of the job will be written to a directory inside the container. When the container exits that data will be lost. In order to preserve the output of the job, continue to the next step.
+`docker pull gobblin/gobblin-standalone:alpine-gaas-latest`
 
-* Preserving the output of a Docker container requires using a [data volume](https://docs.docker.com/engine/tutorials/dockervolumes/). To do this, run the below command:
+`docker run gobblin/gobblin-standalone:alpine-gaas-latest`
 
-```
-docker run -v /home/gobblin/work-dir:/home/gobblin/work-dir gobblin-wikipedia
-```
+After the container spins up, put the [wikipedia.pull](https://github.com/apache/incubator-gobblin/blob/master/gobblin-example/src/main/resources/wikipedia.pull) in ${LOCAL_JOB_DIR}. You will see the Gobblin daemon pick up the job, and the result output is in ${LOCAL_JOB_DIR}/job-output/.
 
-The output of the Gobblin-Wikipedia job should now be written to `/home/gobblin/work-dir/job-output`. The `-v` command in Docker uses a feature of Docker called [data volumes](https://docs.docker.com/engine/tutorials/dockervolumes/). The `-v` option mounts a host directory into a container and is of the form `[host-directory]:[container-directory]`. Now any modifications to the host directory can be seen inside the container-directory, and any modifications to the container-directory can be seen inside the host-directory. This is a standard way to ensure data persists even after a Docker container finishes. It's important to note that the `[host-directory]` in the `-v` option can be changed to any directory (on OSX it must be under the `/Users/` directory), but the `[container-directory]` must remain `/home/gobblin/work-dir` (at least for now).
+This example job is correspondent to the [getting started guide](https://gobblin.readthedocs.io/en/latest/Getting-Started/). With the docker image, you can focus on the Gobblin functionalities, avoiding the hassle of building a distribution.
 
-## Gobblin-Standalone Repository
+### Use Gobblin Standalone on Docker for Kafka and HDFS Ingestion 
 
-The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-standalone/). These images run a Gobblin standalone service inside a Docker container. The Gobblin standalone service is a long running process that can run Gobblin jobs defined in a `.job` or `.pull` file. The job / pull files are submitted to the standalone service by placing them in a directory on the local filesystem. The standalone service monitors this directory for any new job / pull files and runs them either immediately or on a scheduled basis (more information on how this works can be found [here](Working-with-Job-Configuration-Files#adding-or-changing-job-configuration-files)). Running the Gobblin standalone service inside a Docker container allows Gobblin to pick up job / pull files from a directory on the host filesystem, run the job, and write the output back the host filesystem. All the heavy lifting is done inside a Docker container, the user just needs to
  worry about defining and submitting job / pull files. The goal is to provide a easy to setup environment for the Gobblin standalone service.
+* To ingest from/to Kafka and HDFS by Gobblin, you need to start services for Zookeeper, Kafka and HDFS along with Gobblin. We use docker [compose](https://docs.docker.com/compose/) with images contributed to docker hub. Firstly, you need to create a [docker-compose.yml](https://github.com/apache/incubator-gobblin/blob/master/gobblin-docker/gobblin-recipes/kafka-hdfs/docker-compose.yml) file(copy the content into your .yml file).
+
+* Second, in the same folder of the yml file, create a [hadoop.env](https://github.com/apache/incubator-gobblin/blob/master/gobblin-docker/gobblin-recipes/kafka-hdfs/hadoop.env) file to specify all HDFS related config(copy the content into your .env file).
+
+* Open a terminal in the same folder, pull and run these docker services:
+
+    `docker-compose -f ./docker-compose.yml pull`

Review comment:
       Caused by some weird indent issue. Resolved. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
autumnust commented on a change in pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154#discussion_r533911926



##########
File path: gobblin-docs/user-guide/Docker-Integration.md
##########
@@ -18,68 +18,68 @@ The `gobblin/gobblin-wikipedia` repository contains images that run the Gobblin
 
 The `gobblin/gobblin-standalone` repository contains images that run a [Gobblin standalone service](Gobblin-Deployment#standalone-architecture) inside a Docker container. These images provide an easy and simple way to setup a Gobblin standalone service on any Docker compatible machine.
 
+The `gobblin/gobblin-service` repository contains images that run [Gobblin as a service](Building-Gobblin-as-a-Service#running-gobblin-as-a-service-with-docker), which is a service that takes in a user request (a logical flow) and converts it into a series of Gobblin Jobs, and monitors these jobs in a distributed manner.
+
 The `gobblin/gobblin-base` and `gobblin/gobblin-distributions` repositories are for internal use only, and are primarily useful for Gobblin developers.
 
-## Gobblin-Wikipedia Repository
+# Run Gobblin Standalone
+
+The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-standalone/). These images run a Gobblin standalone service inside a Docker container. The Gobblin standalone service is a long running process that can run Gobblin jobs defined in a `.job` or `.pull` file. The job / pull files are submitted to the standalone service by placing them in a directory on the local filesystem. The standalone service monitors this directory for any new job / pull files and runs them either immediately or on a scheduled basis (more information on how this works can be found [here](Working-with-Job-Configuration-Files#adding-or-changing-job-configuration-files)). Running the Gobblin standalone service inside a Docker container allows Gobblin to pick up job / pull files from a directory on the host filesystem, run the job, and write the output back the host filesystem. All the heavy lifting is done inside a Docker container, the user just needs to
  worry about defining and submitting job / pull files. The goal is to provide a easy to setup environment for the Gobblin standalone service.

Review comment:
       Good intro. Since we mentioned gobblin-service above and in the road map, may be we could touch upon that a bit here in terms of the alternative way to submitting jobs. 

##########
File path: gobblin-docs/user-guide/Docker-Integration.md
##########
@@ -18,68 +18,68 @@ The `gobblin/gobblin-wikipedia` repository contains images that run the Gobblin
 
 The `gobblin/gobblin-standalone` repository contains images that run a [Gobblin standalone service](Gobblin-Deployment#standalone-architecture) inside a Docker container. These images provide an easy and simple way to setup a Gobblin standalone service on any Docker compatible machine.
 
+The `gobblin/gobblin-service` repository contains images that run [Gobblin as a service](Building-Gobblin-as-a-Service#running-gobblin-as-a-service-with-docker), which is a service that takes in a user request (a logical flow) and converts it into a series of Gobblin Jobs, and monitors these jobs in a distributed manner.
+
 The `gobblin/gobblin-base` and `gobblin/gobblin-distributions` repositories are for internal use only, and are primarily useful for Gobblin developers.
 
-## Gobblin-Wikipedia Repository
+# Run Gobblin Standalone
+
+The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-standalone/). These images run a Gobblin standalone service inside a Docker container. The Gobblin standalone service is a long running process that can run Gobblin jobs defined in a `.job` or `.pull` file. The job / pull files are submitted to the standalone service by placing them in a directory on the local filesystem. The standalone service monitors this directory for any new job / pull files and runs them either immediately or on a scheduled basis (more information on how this works can be found [here](Working-with-Job-Configuration-Files#adding-or-changing-job-configuration-files)). Running the Gobblin standalone service inside a Docker container allows Gobblin to pick up job / pull files from a directory on the host filesystem, run the job, and write the output back the host filesystem. All the heavy lifting is done inside a Docker container, the user just needs to
  worry about defining and submitting job / pull files. The goal is to provide a easy to setup environment for the Gobblin standalone service.
 
-The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-wikipedia/). These images are mainly meant to act as a "Hello World" example for the Gobblin-Docker integration, and to provide a sanity check to see if the Gobblin-Docker integration is working on a given machine. The image contains the Gobblin configuration files to run the [Gobblin Wikipedia job](../Getting-Started). When a container is launched using the `gobblin-wikipedia` image, Gobblin starts up, runs the Wikipedia example, and then exits.
+### Set working directory
 
-Running the `gobblin-wikipedia` image requires taking following steps (lets assume we want to an Ubuntu based image):
+Before running docker containers, set a working directory for Gobblin jobs:
 
-* Download the images from the `gobblin/gobblin-wikipedia` repository
+`export LOCAL_JOB_DIR=<local_gobblin_directory>`
 
-```
-docker pull gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+We will use this directory as the [volume](https://docs.docker.com/storage/volumes/) for Gobblin jobs and outputs. Make sure your Docker has the [access](https://docs.docker.com/docker-for-mac/#file-sharing) to this folder. This is the prerequisite for all following example jobs.
 
-* Run the `gobblin/gobblin-wikipedia:ubuntu-gobblin-latest` image in a Docker container
+### Run the docker image with simple wikipedia jobs
 
-```
-docker run gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+Run these commands to start the docker image:
 
-The logs are printed to the console, and no errors should pop up. This should provide a nice sanity check to ensure that everything is working as expected. The output of the job will be written to a directory inside the container. When the container exits that data will be lost. In order to preserve the output of the job, continue to the next step.
+`docker pull gobblin/gobblin-standalone:latest`
 
-* Preserving the output of a Docker container requires using a [data volume](https://docs.docker.com/engine/tutorials/dockervolumes/). To do this, run the below command:
+`docker run -v $LOCAL_JOB_DIR:/tmp/gobblin-standalone/jobs gobblin/gobblin-standalone:latest`
 
-```
-docker run -v /home/gobblin/work-dir:/home/gobblin/work-dir gobblin-wikipedia
-```
+After the container spins up, put the [wikipedia.pull](https://github.com/apache/incubator-gobblin/blob/master/gobblin-example/src/main/resources/wikipedia.pull) in ${LOCAL_JOB_DIR}. You will see the Gobblin daemon pick up the job, and the result output is in ${LOCAL_JOB_DIR}/job-output/.
 
-The output of the Gobblin-Wikipedia job should now be written to `/home/gobblin/work-dir/job-output`. The `-v` command in Docker uses a feature of Docker called [data volumes](https://docs.docker.com/engine/tutorials/dockervolumes/). The `-v` option mounts a host directory into a container and is of the form `[host-directory]:[container-directory]`. Now any modifications to the host directory can be seen inside the container-directory, and any modifications to the container-directory can be seen inside the host-directory. This is a standard way to ensure data persists even after a Docker container finishes. It's important to note that the `[host-directory]` in the `-v` option can be changed to any directory (on OSX it must be under the `/Users/` directory), but the `[container-directory]` must remain `/home/gobblin/work-dir` (at least for now).
+This example job is correspondent to the [getting started guide](https://gobblin.readthedocs.io/en/latest/Getting-Started/). With the docker image, you can focus on the Gobblin functionalities, avoiding the hassle of building a distribution.
 
-## Gobblin-Standalone Repository
+### Use Gobblin Standalone on Docker for Kafka and HDFS Ingestion 
 
-The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-standalone/). These images run a Gobblin standalone service inside a Docker container. The Gobblin standalone service is a long running process that can run Gobblin jobs defined in a `.job` or `.pull` file. The job / pull files are submitted to the standalone service by placing them in a directory on the local filesystem. The standalone service monitors this directory for any new job / pull files and runs them either immediately or on a scheduled basis (more information on how this works can be found [here](Working-with-Job-Configuration-Files#adding-or-changing-job-configuration-files)). Running the Gobblin standalone service inside a Docker container allows Gobblin to pick up job / pull files from a directory on the host filesystem, run the job, and write the output back the host filesystem. All the heavy lifting is done inside a Docker container, the user just needs to
  worry about defining and submitting job / pull files. The goal is to provide a easy to setup environment for the Gobblin standalone service.
+* To ingest from/to Kafka and HDFS by Gobblin, you need to start services for Zookeeper, Kafka and HDFS along with Gobblin. We use docker [compose](https://docs.docker.com/compose/) with images contributed to docker hub. Firstly, you need to create a [docker-compose.yml](https://github.com/apache/incubator-gobblin/blob/master/gobblin-docker/gobblin-recipes/kafka-hdfs/docker-compose.yml) file.
+
+* Second, in the same folder of the yml file, create a [hadoop.env](https://github.com/apache/incubator-gobblin/blob/master/gobblin-docker/gobblin-recipes/kafka-hdfs/hadoop.env) file to specify all HDFS related config(copy the content into your .env file).
+
+* Open a terminal in the same folder, pull and run these docker services:
+
+    `docker-compose -f ./docker-compose.yml pull`
+
+    `docker-compose -f ./docker-compose.yml up`
+    
+    Here we expose Zookeeper at port 2128, Kafka at 9092 with an auto created Kafka topic “test”. All hadoop related configs are stated in the .env file.
 
-Running the `gobblin-standalone` image requires taking the following steps:
+* You should see all services running. Now we can push some massages into the Kafka topic. Open a terminal from [docker desktop](https://docs.docker.com/desktop/dashboard/) dashboard or [docker exec](https://docs.docker.com/engine/reference/commandline/exec/) to interact with Kafka. Inside the Kafka container terminal:

Review comment:
       typo: Messages

##########
File path: gobblin-docs/user-guide/Docker-Integration.md
##########
@@ -18,68 +18,68 @@ The `gobblin/gobblin-wikipedia` repository contains images that run the Gobblin
 
 The `gobblin/gobblin-standalone` repository contains images that run a [Gobblin standalone service](Gobblin-Deployment#standalone-architecture) inside a Docker container. These images provide an easy and simple way to setup a Gobblin standalone service on any Docker compatible machine.
 
+The `gobblin/gobblin-service` repository contains images that run [Gobblin as a service](Building-Gobblin-as-a-Service#running-gobblin-as-a-service-with-docker), which is a service that takes in a user request (a logical flow) and converts it into a series of Gobblin Jobs, and monitors these jobs in a distributed manner.
+
 The `gobblin/gobblin-base` and `gobblin/gobblin-distributions` repositories are for internal use only, and are primarily useful for Gobblin developers.
 
-## Gobblin-Wikipedia Repository
+# Run Gobblin Standalone
+
+The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-standalone/). These images run a Gobblin standalone service inside a Docker container. The Gobblin standalone service is a long running process that can run Gobblin jobs defined in a `.job` or `.pull` file. The job / pull files are submitted to the standalone service by placing them in a directory on the local filesystem. The standalone service monitors this directory for any new job / pull files and runs them either immediately or on a scheduled basis (more information on how this works can be found [here](Working-with-Job-Configuration-Files#adding-or-changing-job-configuration-files)). Running the Gobblin standalone service inside a Docker container allows Gobblin to pick up job / pull files from a directory on the host filesystem, run the job, and write the output back the host filesystem. All the heavy lifting is done inside a Docker container, the user just needs to
  worry about defining and submitting job / pull files. The goal is to provide a easy to setup environment for the Gobblin standalone service.
 
-The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-wikipedia/). These images are mainly meant to act as a "Hello World" example for the Gobblin-Docker integration, and to provide a sanity check to see if the Gobblin-Docker integration is working on a given machine. The image contains the Gobblin configuration files to run the [Gobblin Wikipedia job](../Getting-Started). When a container is launched using the `gobblin-wikipedia` image, Gobblin starts up, runs the Wikipedia example, and then exits.
+### Set working directory
 
-Running the `gobblin-wikipedia` image requires taking following steps (lets assume we want to an Ubuntu based image):
+Before running docker containers, set a working directory for Gobblin jobs:
 
-* Download the images from the `gobblin/gobblin-wikipedia` repository
+`export LOCAL_JOB_DIR=<local_gobblin_directory>`
 
-```
-docker pull gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+We will use this directory as the [volume](https://docs.docker.com/storage/volumes/) for Gobblin jobs and outputs. Make sure your Docker has the [access](https://docs.docker.com/docker-for-mac/#file-sharing) to this folder. This is the prerequisite for all following example jobs.
 
-* Run the `gobblin/gobblin-wikipedia:ubuntu-gobblin-latest` image in a Docker container
+### Run the docker image with simple wikipedia jobs
 
-```
-docker run gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+Run these commands to start the docker image:
 
-The logs are printed to the console, and no errors should pop up. This should provide a nice sanity check to ensure that everything is working as expected. The output of the job will be written to a directory inside the container. When the container exits that data will be lost. In order to preserve the output of the job, continue to the next step.
+`docker pull gobblin/gobblin-standalone:latest`
 
-* Preserving the output of a Docker container requires using a [data volume](https://docs.docker.com/engine/tutorials/dockervolumes/). To do this, run the below command:
+`docker run -v $LOCAL_JOB_DIR:/tmp/gobblin-standalone/jobs gobblin/gobblin-standalone:latest`
 
-```
-docker run -v /home/gobblin/work-dir:/home/gobblin/work-dir gobblin-wikipedia
-```
+After the container spins up, put the [wikipedia.pull](https://github.com/apache/incubator-gobblin/blob/master/gobblin-example/src/main/resources/wikipedia.pull) in ${LOCAL_JOB_DIR}. You will see the Gobblin daemon pick up the job, and the result output is in ${LOCAL_JOB_DIR}/job-output/.
 
-The output of the Gobblin-Wikipedia job should now be written to `/home/gobblin/work-dir/job-output`. The `-v` command in Docker uses a feature of Docker called [data volumes](https://docs.docker.com/engine/tutorials/dockervolumes/). The `-v` option mounts a host directory into a container and is of the form `[host-directory]:[container-directory]`. Now any modifications to the host directory can be seen inside the container-directory, and any modifications to the container-directory can be seen inside the host-directory. This is a standard way to ensure data persists even after a Docker container finishes. It's important to note that the `[host-directory]` in the `-v` option can be changed to any directory (on OSX it must be under the `/Users/` directory), but the `[container-directory]` must remain `/home/gobblin/work-dir` (at least for now).
+This example job is correspondent to the [getting started guide](https://gobblin.readthedocs.io/en/latest/Getting-Started/). With the docker image, you can focus on the Gobblin functionalities, avoiding the hassle of building a distribution.
 
-## Gobblin-Standalone Repository
+### Use Gobblin Standalone on Docker for Kafka and HDFS Ingestion 
 
-The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-standalone/). These images run a Gobblin standalone service inside a Docker container. The Gobblin standalone service is a long running process that can run Gobblin jobs defined in a `.job` or `.pull` file. The job / pull files are submitted to the standalone service by placing them in a directory on the local filesystem. The standalone service monitors this directory for any new job / pull files and runs them either immediately or on a scheduled basis (more information on how this works can be found [here](Working-with-Job-Configuration-Files#adding-or-changing-job-configuration-files)). Running the Gobblin standalone service inside a Docker container allows Gobblin to pick up job / pull files from a directory on the host filesystem, run the job, and write the output back the host filesystem. All the heavy lifting is done inside a Docker container, the user just needs to
  worry about defining and submitting job / pull files. The goal is to provide a easy to setup environment for the Gobblin standalone service.
+* To ingest from/to Kafka and HDFS by Gobblin, you need to start services for Zookeeper, Kafka and HDFS along with Gobblin. We use docker [compose](https://docs.docker.com/compose/) with images contributed to docker hub. Firstly, you need to create a [docker-compose.yml](https://github.com/apache/incubator-gobblin/blob/master/gobblin-docker/gobblin-recipes/kafka-hdfs/docker-compose.yml) file.
+
+* Second, in the same folder of the yml file, create a [hadoop.env](https://github.com/apache/incubator-gobblin/blob/master/gobblin-docker/gobblin-recipes/kafka-hdfs/hadoop.env) file to specify all HDFS related config(copy the content into your .env file).
+
+* Open a terminal in the same folder, pull and run these docker services:
+
+    `docker-compose -f ./docker-compose.yml pull`
+
+    `docker-compose -f ./docker-compose.yml up`
+    
+    Here we expose Zookeeper at port 2128, Kafka at 9092 with an auto created Kafka topic “test”. All hadoop related configs are stated in the .env file.
 
-Running the `gobblin-standalone` image requires taking the following steps:
+* You should see all services running. Now we can push some massages into the Kafka topic. Open a terminal from [docker desktop](https://docs.docker.com/desktop/dashboard/) dashboard or [docker exec](https://docs.docker.com/engine/reference/commandline/exec/) to interact with Kafka. Inside the Kafka container terminal:

Review comment:
       or maybe events? since it is under the context of Kafka




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] hanghangliu commented on a change in pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
hanghangliu commented on a change in pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154#discussion_r528835914



##########
File path: gobblin-docs/user-guide/Docker-Integration.md
##########
@@ -18,68 +18,68 @@ The `gobblin/gobblin-wikipedia` repository contains images that run the Gobblin
 
 The `gobblin/gobblin-standalone` repository contains images that run a [Gobblin standalone service](Gobblin-Deployment#standalone-architecture) inside a Docker container. These images provide an easy and simple way to setup a Gobblin standalone service on any Docker compatible machine.
 
+The `gobblin/gobblin-service` repository contains images that run [Gobblin as a service](Building-Gobblin-as-a-Service#running-gobblin-as-a-service-with-docker), which is a service that takes in a user request (a logical flow) and converts it into a series of Gobblin Jobs, and monitors these jobs in a distributed manner.
+
 The `gobblin/gobblin-base` and `gobblin/gobblin-distributions` repositories are for internal use only, and are primarily useful for Gobblin developers.
 
-## Gobblin-Wikipedia Repository
+# Run Gobblin Standalone
+
+The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-standalone/). These images run a Gobblin standalone service inside a Docker container. The Gobblin standalone service is a long running process that can run Gobblin jobs defined in a `.job` or `.pull` file. The job / pull files are submitted to the standalone service by placing them in a directory on the local filesystem. The standalone service monitors this directory for any new job / pull files and runs them either immediately or on a scheduled basis (more information on how this works can be found [here](Working-with-Job-Configuration-Files#adding-or-changing-job-configuration-files)). Running the Gobblin standalone service inside a Docker container allows Gobblin to pick up job / pull files from a directory on the host filesystem, run the job, and write the output back the host filesystem. All the heavy lifting is done inside a Docker container, the user just needs to
  worry about defining and submitting job / pull files. The goal is to provide a easy to setup environment for the Gobblin standalone service.
 
-The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-wikipedia/). These images are mainly meant to act as a "Hello World" example for the Gobblin-Docker integration, and to provide a sanity check to see if the Gobblin-Docker integration is working on a given machine. The image contains the Gobblin configuration files to run the [Gobblin Wikipedia job](../Getting-Started). When a container is launched using the `gobblin-wikipedia` image, Gobblin starts up, runs the Wikipedia example, and then exits.
+### Set working directory
 
-Running the `gobblin-wikipedia` image requires taking following steps (lets assume we want to an Ubuntu based image):
+Before running docker containers, set a working directory for Gobblin jobs:
 
-* Download the images from the `gobblin/gobblin-wikipedia` repository
+`export LOCAL_JOB_DIR=<local_gobblin_directory>`
 
-```
-docker pull gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+We will use this directory as the [volume](https://docs.docker.com/storage/volumes/) for Gobblin jobs and outputs. Make sure your Docker has the [access](https://docs.docker.com/docker-for-mac/#file-sharing) to this folder. This is the prerequisite for all following example jobs.
 
-* Run the `gobblin/gobblin-wikipedia:ubuntu-gobblin-latest` image in a Docker container
+### Run the docker image with simple wikipedia jobs
 
-```
-docker run gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+Run these commands to start the docker image:
 
-The logs are printed to the console, and no errors should pop up. This should provide a nice sanity check to ensure that everything is working as expected. The output of the job will be written to a directory inside the container. When the container exits that data will be lost. In order to preserve the output of the job, continue to the next step.
+`docker pull gobblin/gobblin-standalone:alpine-gaas-latest`
 
-* Preserving the output of a Docker container requires using a [data volume](https://docs.docker.com/engine/tutorials/dockervolumes/). To do this, run the below command:
+`docker run gobblin/gobblin-standalone:alpine-gaas-latest`

Review comment:
       Agree for the docker volume setup and changed as you proposed.
   
   The uncaught exception was newly introduced by some recent PR. It's not only happening in docker run, but also in direct run from local build. I have raised the discussion in slack. Looks like it was introduced by dependency issue with Jackson.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] Will-Lo commented on a change in pull request #3154: [GOBBLIN-1317] Update Docker integration guide. Add Kafka, HDFS recipes for docker

Posted by GitBox <gi...@apache.org>.
Will-Lo commented on a change in pull request #3154:
URL: https://github.com/apache/incubator-gobblin/pull/3154#discussion_r535494160



##########
File path: gobblin-docs/user-guide/Docker-Integration.md
##########
@@ -12,74 +12,104 @@ For more information on Docker, including how to install it, check out the docum
 
 # Docker Repositories
 
-Gobblin currently has four different repositories, and all are on Docker Hub [here](https://hub.docker.com/u/gobblin/).
+Gobblin currently has four different repositories, and all are on Docker Hub [here](https://hub.docker.com/u/gobblin/). We are also starting to use [Apache's repository](https://hub.docker.com/r/apache/gobblin/tags?page=1&ordering=last_updated) for our images. 
 
 The `gobblin/gobblin-wikipedia` repository contains images that run the Gobblin Wikipedia job found in the [getting started guide](../Getting-Started). These images are useful for users new to Docker or Gobblin, they primarily act as a "Hello World" example for the Gobblin Docker integration.
 
 The `gobblin/gobblin-standalone` repository contains images that run a [Gobblin standalone service](Gobblin-Deployment#standalone-architecture) inside a Docker container. These images provide an easy and simple way to setup a Gobblin standalone service on any Docker compatible machine.
 
+The `gobblin/gobblin-service` repository contains images that run [Gobblin as a service](Building-Gobblin-as-a-Service#running-gobblin-as-a-service-with-docker), which is a service that takes in a user request (a logical flow) and converts it into a series of Gobblin Jobs, and monitors these jobs in a distributed manner.
+
 The `gobblin/gobblin-base` and `gobblin/gobblin-distributions` repositories are for internal use only, and are primarily useful for Gobblin developers.
 
-## Gobblin-Wikipedia Repository
+# Run Gobblin Standalone
 
-The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-wikipedia/). These images are mainly meant to act as a "Hello World" example for the Gobblin-Docker integration, and to provide a sanity check to see if the Gobblin-Docker integration is working on a given machine. The image contains the Gobblin configuration files to run the [Gobblin Wikipedia job](../Getting-Started). When a container is launched using the `gobblin-wikipedia` image, Gobblin starts up, runs the Wikipedia example, and then exits.
+The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-standalone/). These images run a Gobblin standalone service inside a Docker container. The Gobblin standalone service is a long running process that can run Gobblin jobs defined in a `.job` or `.pull` file. The job / pull files are submitted to the standalone service by placing them in a directory on the local filesystem. The standalone service monitors this directory for any new job / pull files and runs them either immediately or on a scheduled basis (more information on how this works can be found [here](Working-with-Job-Configuration-Files#adding-or-changing-job-configuration-files)). Running the Gobblin standalone service inside a Docker container allows Gobblin to pick up job / pull files from a directory on the host filesystem, run the job, and write the output back the host filesystem. All the heavy lifting is done inside a Docker container, the user just needs to
  worry about defining and submitting job / pull files. The goal is to provide a easy to setup environment for the Gobblin standalone service.
 
-Running the `gobblin-wikipedia` image requires taking following steps (lets assume we want to an Ubuntu based image):
+### Set working directory
 
-* Download the images from the `gobblin/gobblin-wikipedia` repository
+Before running docker containers, set a working directory for Gobblin jobs:
 
-```
-docker pull gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+`export LOCAL_JOB_DIR=<local_gobblin_directory>`
 
-* Run the `gobblin/gobblin-wikipedia:ubuntu-gobblin-latest` image in a Docker container
+We will use this directory as the [volume](https://docs.docker.com/storage/volumes/) for Gobblin jobs and outputs. Make sure your Docker has the [access](https://docs.docker.com/docker-for-mac/#file-sharing) to this folder. This is the prerequisite for all following example jobs.
 
-```
-docker run gobblin/gobblin-wikipedia:ubuntu-gobblin-latest
-```
+### Run the docker image with simple wikipedia jobs
 
-The logs are printed to the console, and no errors should pop up. This should provide a nice sanity check to ensure that everything is working as expected. The output of the job will be written to a directory inside the container. When the container exits that data will be lost. In order to preserve the output of the job, continue to the next step.
+Run these commands to start the docker image:
 
-* Preserving the output of a Docker container requires using a [data volume](https://docs.docker.com/engine/tutorials/dockervolumes/). To do this, run the below command:
+`docker pull apache/gobblin:latest`
 
-```
-docker run -v /home/gobblin/work-dir:/home/gobblin/work-dir gobblin-wikipedia
-```
+`docker run -v $LOCAL_JOB_DIR:/tmp/gobblin-standalone/jobs apache/gobblin:latest`
 
-The output of the Gobblin-Wikipedia job should now be written to `/home/gobblin/work-dir/job-output`. The `-v` command in Docker uses a feature of Docker called [data volumes](https://docs.docker.com/engine/tutorials/dockervolumes/). The `-v` option mounts a host directory into a container and is of the form `[host-directory]:[container-directory]`. Now any modifications to the host directory can be seen inside the container-directory, and any modifications to the container-directory can be seen inside the host-directory. This is a standard way to ensure data persists even after a Docker container finishes. It's important to note that the `[host-directory]` in the `-v` option can be changed to any directory (on OSX it must be under the `/Users/` directory), but the `[container-directory]` must remain `/home/gobblin/work-dir` (at least for now).
+After the container spins up, put the [wikipedia.pull](https://github.com/apache/incubator-gobblin/blob/master/gobblin-example/src/main/resources/wikipedia.pull) in ${LOCAL_JOB_DIR}. You will see the Gobblin daemon pick up the job, and the result output is in ${LOCAL_JOB_DIR}/job-output/.
 
-## Gobblin-Standalone Repository
+This example job is correspondent to the [getting started guide](https://gobblin.readthedocs.io/en/latest/Getting-Started/). With the docker image, you can focus on the Gobblin functionalities, avoiding the hassle of building a distribution.
 
-The Docker images for this repository can be found on Docker Hub [here](https://hub.docker.com/r/gobblin/gobblin-standalone/). These images run a Gobblin standalone service inside a Docker container. The Gobblin standalone service is a long running process that can run Gobblin jobs defined in a `.job` or `.pull` file. The job / pull files are submitted to the standalone service by placing them in a directory on the local filesystem. The standalone service monitors this directory for any new job / pull files and runs them either immediately or on a scheduled basis (more information on how this works can be found [here](Working-with-Job-Configuration-Files#adding-or-changing-job-configuration-files)). Running the Gobblin standalone service inside a Docker container allows Gobblin to pick up job / pull files from a directory on the host filesystem, run the job, and write the output back the host filesystem. All the heavy lifting is done inside a Docker container, the user just needs to
  worry about defining and submitting job / pull files. The goal is to provide a easy to setup environment for the Gobblin standalone service.
+### Use Gobblin Standalone on Docker for Kafka and HDFS Ingestion 
+
+* To ingest from/to Kafka and HDFS by Gobblin, you need to start services for Zookeeper, Kafka and HDFS along with Gobblin. We use docker [compose](https://docs.docker.com/compose/) with images contributed to docker hub. Firstly, you need to create a [docker-compose.yml](https://github.com/apache/incubator-gobblin/blob/master/gobblin-docker/gobblin-recipes/kafka-hdfs/docker-compose.yml) file.
+
+* Second, in the same folder of the yml file, create a [hadoop.env](https://github.com/apache/incubator-gobblin/blob/master/gobblin-docker/gobblin-recipes/kafka-hdfs/hadoop.env) file to specify all HDFS related config(copy the content into your .env file).
+
+* Open a terminal in the same folder, pull and run these docker services:
+
+    `docker-compose -f ./docker-compose.yml pull`
+
+    `docker-compose -f ./docker-compose.yml up`
+    
+    Here we expose Zookeeper at port 2128, Kafka at 9092 with an auto created Kafka topic “test”. All hadoop related configs are stated in the .env file.
+
+* You should see all services running. Now we can push some events into the Kafka topic. Open a terminal from [docker desktop](https://docs.docker.com/desktop/dashboard/) dashboard or [docker exec](https://docs.docker.com/engine/reference/commandline/exec/) to interact with Kafka. Inside the Kafka container terminal:
+
+    `cd /opt/kafka`
+
+    `./bin/kafka-console-producer.sh --broker-list kafka:9092 --topic test`
+
+    You can type messages for the topic “test”, and press ctrl+c to exit.
+
+* Put the [kafka-hdfs.pull](https://github.com/apache/incubator-gobblin/blob/master/gobblin-example/src/main/resources/kafka-hdfs.pull) in ${LOCAL_JOB_DIR}, so that the Gobblin daemon will pick up this job and write the result to HDFS. You will see the Gobblin daemon pick up the job.
+
+After the job finished, open a terminal in the HDFS namenode container:
+
+`hadoop fs -ls /gobblintest/job-output/test/`
+
+You will see the result file in this HDFS folder. You can use this command to verify the content in the text file:
+
+`hadoop fs -cat /gobblintest/job-output/test/<output_file.txt>`
+
+# Run Gobblin as a Service
+
+The goal of GaaS(Gobblin as a Service) is to enable a self service so that different users can automatically provision and execute various supported Gobblin applications limiting the need for development and operation teams to be involved during the provisioning process. You can take a look at our [design detail](https://cwiki.apache.org/confluence/display/GOBBLIN/Gobblin+as+a+Service).
+
+We use the same docker image as discussed above. 
+
+### Set working directory
+
+Similar to standalone working directory settings:
+
+`export GAAS_JOB_DIR=<gaas_gobblin_directory>`
 
-Running the `gobblin-standalone` image requires taking the following steps:
+`export LOCAL_DATAPACK_DIR=<local_directory_of_templateUris>`
 
-* Download the images from the `gobblin/gobblin-standalone` repository
+### Start Gobblin as a Service
 
-```
-docker pull gobblin/gobblin-standalone:ubuntu-gobblin-latest
-```
+Run these commands to start the docker image:
 
-* Run the `gobblin/gobblin-standalone:ubuntu-gobblin-latest` image in a Docker container
+`docker pull apache/gobblin:latest`

Review comment:
       So this is sort of awkward but the apache image actually doesn't support GaaS right now. I have a PR I will open later this week to support multiple execution modes with the image but for the meantime could we use our own image at `gobblin/gobblin-service` for GaaS specifically? I can change this doc in my PR to update the image to keep the documentation up to date.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org