You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hop.apache.org by gi...@apache.org on 2021/04/03 09:45:18 UTC

[incubator-hop-docs] branch asf-site updated: Documentation updated to ac86379

This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hop-docs.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new b3e0386  Documentation updated to ac86379
b3e0386 is described below

commit b3e0386ac9606f29972a597ea327fa95175d7948
Author: jenkins <bu...@apache.org>
AuthorDate: Sat Apr 3 09:45:15 2021 +0000

    Documentation updated to ac86379
---
 .../ROOT/pages/plugins/projects/projects.adoc      | 167 +++++++++++++++++++++
 .../modules/ROOT/pages/vfs/aws-s3-vfs.adoc         |   6 +-
 .../ROOT/pages/vfs/azure-blob-storage-vfs.adoc     |  16 +-
 .../ROOT/pages/vfs/google-cloud-storage-vfs.adoc   |  10 +-
 .../modules/ROOT/pages/vfs/google-drive-vfs.adoc   |  11 +-
 .../workflow/actions/deleteresultfilenames.adoc    |   6 +-
 6 files changed, 195 insertions(+), 21 deletions(-)

diff --git a/hop-user-manual/modules/ROOT/pages/plugins/projects/projects.adoc b/hop-user-manual/modules/ROOT/pages/plugins/projects/projects.adoc
index 18bc8b8..18430df 100644
--- a/hop-user-manual/modules/ROOT/pages/plugins/projects/projects.adoc
+++ b/hop-user-manual/modules/ROOT/pages/plugins/projects/projects.adoc
@@ -159,3 +159,170 @@ Environment is short for Project Lifecycle Environment.  It describes the phase
 The main toolbar in the Hop GUI offers buttons to add, edit and delete an environment.
 Please note that you can add non-existing configuration files in the environment dialog.  When editing the Hop GUI will ask you if you want to create the file.
 
+=== Configuration on the command line
+
+The ```hop-conf``` script offers many options to edit environment definitions.
+
+
+==== Creating an environment
+
+[source,bash]
+----
+$ sh hop-conf.sh \
+     --environment-create \
+     --environment hop2 \
+     --environment-project hop2 \
+     --environment-purpose=Development \
+     --environment-config-files=/home/user/projects/hop2-conf.json
+Creating environment 'hop2'
+Environment 'hop2' was created in Hop configuration file <path-to-hop>/config/hop-config.json
+2021/02/01 16:37:02 - General - ERROR: Configuration file '/home/user/projects/hop2-conf.json' does not exist to read variables from.
+Created empty environment configuration file : /home/user/projects/hop2-conf.json
+  hop2
+    Purpose: Development
+    Configuration files:
+    Project name: hop2
+      Config file: /home/user/projects/hop2-conf.json
+
+----
+
+As you can see from the log, an empty file was created to set variables in:
+
+[source,json]
+----
+{ }
+----
+
+==== Setting variables in an environment
+
+This command adds a variable to the environment configuration file:
+
+[source,bash]
+----
+$ h hop-conf.sh --config-file /home/user/projects/hop2-conf.json --config-file-set-variables DB_HOSTNAME=localhost,DB_PASSWORD=abcd
+Configuration file '/home/user/projects/hop2-conf.json' was modified.
+----
+
+If we look at the file ```hop2-conf.json``` we'll see that the variables were added:
+[source,json]
+----
+{
+  "variables" : [ {
+    "name" : "DB_HOSTNAME",
+    "value" : "localhost",
+    "description" : ""
+  }, {
+    "name" : "DB_PASSWORD",
+    "value" : "abcd",
+    "description" : ""
+  } ]
+}
+----
+
+Please note that you can add descriptions for the variables as well with the ```--describe-variable``` option.  Please run hop-conf without options to see all the possibilities.
+
+===== Deleting an environment
+
+The following deletes an environment from the Hop configuration file:
+
+[source,bash]
+----
+$ $ sh hop-conf.sh --environment-delete --environment hop2
+Lifecycle environment 'hop2' was deleted from Hop configuration file <path-to-hop>/config/hop-config.json
+----
+
+== Running pipelines and workflows
+
+You can specify an environment or a project when executing a pipeline or a workflow.
+By doing so you are automatically configuring metadata, variables without too much fuss.
+
+The easiest example is shown by executing the "complex" pipeline from the Apache Beam examples:
+
+[source,bash]
+----
+$ sh hop-run.sh --project samples --file 'beam/pipelines/complex.hpl' --runconfig Direct
+2021/02/01 16:52:15 - HopRun - Enabling project 'samples'
+2021/02/01 16:52:25 - HopRun - Relative path filename specified: config/projects/samples/beam/pipelines/complex.hpl
+2021/02/01 16:52:26 - General - Created Apache Beam pipeline with name 'complex'
+2021/02/01 16:52:27 - General - Handled transform (INPUT) : Customer data
+2021/02/01 16:52:27 - General - Handled transform (INPUT) : State data
+2021/02/01 16:52:27 - General - Handled Group By (STEP) : countPerState, gets data from 1 previous transform(s)
+2021/02/01 16:52:27 - General - Handled transform (STEP) : uppercase state, gets data from 1 previous transform(s), targets=0, infos=0
+2021/02/01 16:52:27 - General - Handled Merge Join (STEP) : Merge join
+2021/02/01 16:52:27 - General - Handled transform (STEP) : Lookup count per state, gets data from 1 previous transform(s), targets=0, infos=1
+2021/02/01 16:52:27 - General - Handled transform (STEP) : name<n, gets data from 1 previous transform(s), targets=2, infos=0
+2021/02/01 16:52:27 - General - Transform Label: N-Z reading from previous transform targeting this one using : name<n - TARGET - Label: N-Z
+2021/02/01 16:52:27 - General - Handled transform (STEP) : Label: N-Z, gets data from 1 previous transform(s), targets=0, infos=0
+2021/02/01 16:52:27 - General - Transform Label: A-M reading from previous transform targeting this one using : name<n - TARGET - Label: A-M
+2021/02/01 16:52:27 - General - Handled transform (STEP) : Label: A-M, gets data from 1 previous transform(s), targets=0, infos=0
+2021/02/01 16:52:27 - General - Handled transform (STEP) : Switch / case, gets data from 2 previous transform(s), targets=4, infos=0
+2021/02/01 16:52:27 - General - Transform CA reading from previous transform targeting this one using : Switch / case - TARGET - CA
+2021/02/01 16:52:27 - General - Handled transform (STEP) : CA, gets data from 1 previous transform(s), targets=0, infos=0
+2021/02/01 16:52:27 - General - Transform NY reading from previous transform targeting this one using : Switch / case - TARGET - NY
+2021/02/01 16:52:27 - General - Handled transform (STEP) : NY, gets data from 1 previous transform(s), targets=0, infos=0
+2021/02/01 16:52:27 - General - Transform FL reading from previous transform targeting this one using : Switch / case - TARGET - FL
+2021/02/01 16:52:27 - General - Handled transform (STEP) : FL, gets data from 1 previous transform(s), targets=0, infos=0
+2021/02/01 16:52:27 - General - Transform Default reading from previous transform targeting this one using : Switch / case - TARGET - Default
+2021/02/01 16:52:27 - General - Handled transform (STEP) : Default, gets data from 1 previous transform(s), targets=0, infos=0
+2021/02/01 16:52:27 - General - Handled transform (STEP) : Collect, gets data from 4 previous transform(s), targets=0, infos=0
+2021/02/01 16:52:27 - General - Handled transform (OUTPUT) : complex, gets data from Collect
+2021/02/01 16:52:27 - General - Executing this pipeline using the Beam Pipeline Engine with run configuration 'Direct'
+2021/02/01 16:52:34 - General - Beam pipeline execution has finished.
+----
+
+To execute an Apache Beam pipeline a lot of information and metadata is needed.  Let's dive into a few fun information tidbits:
+
+* By referencing the ```samples``` project Hop knows where the project is located (```config/projects/samples```)
+* Since we know the location of the project, we can specify pipelines and workflows with a relative path
+* The project knows where its metadata is stored (```config/projects/samples/metadata```) so it knows where to find the ```Direct``` pipeline run configuration (```config/projects/samples/metadata/pipeline-run-configuration/Direct.json```)
+* This run configuration defines its own pipeline engine specific variables, in this case the output folder : ```DATA_OUTPUT={openvar}PROJECT_HOME{closevar}/beam/output/```
+* The output of the samples is as such written to ```config/projects/samples/beam/output```
+
+To reference an environment you can execute using ```-e``` or ```--environment```.  The only difference is that you'll have a number of extra environment variables set while executing.
+
+== Plugin configuration
+
+There are various options to configure the behavior of the ```Projects``` plugin itself. In Hop configuration file ```hop-config.json``` we can find the following options:
+
+[source,json]
+----
+{
+    "projectMandatory" : true,
+    "environmentMandatory" : false,
+    "defaultProject" : "default",
+    "defaultEnvironment" : null,
+    "standardParentProject" : "default",
+    "standardProjectsFolder" : "/home/matt/test-stuff/"
+}
+----
+
+
+|===
+|Option |Description |hop-conf option
+
+|projectMandatory
+|This will prevent anyone from using hop-run without specifying a project
+|```--project-mandatory```
+
+|environmentMandatory
+|This will prevent anyone from using hop-run without specifying an environment
+|```--environment-mandatory```
+
+|defaultProject
+|The default project to use when none is specified
+|```--default-project```
+
+|defaultEnvironment
+|The default environment to use when none is specified
+|```--default-environment```
+
+|standardParentProject
+|The standard parent project to propose when creating new project
+|```--standard-parent-project```
+
+|standardProjectsFolder
+|The folder to which you'll browse by default in the GUI when creating new projects
+|```--standard-projects-folder```
+
+|===
+
diff --git a/hop-user-manual/modules/ROOT/pages/vfs/aws-s3-vfs.adoc b/hop-user-manual/modules/ROOT/pages/vfs/aws-s3-vfs.adoc
index 81df842..55675c6 100644
--- a/hop-user-manual/modules/ROOT/pages/vfs/aws-s3-vfs.adoc
+++ b/hop-user-manual/modules/ROOT/pages/vfs/aws-s3-vfs.adoc
@@ -22,14 +22,10 @@ under the License.
 
 == Scheme
 
-The scheme you can use to access your files in Amazon Web Services S3 is
+The scheme you can use to access your files in Azure Blob Storage is
 
 `**s3://**`
 
-Example:
-
-* `s3:///hopstorage/file.txt`
-
 == Configuration
 
 The configuration of the Amazon Web Services Simple Cloud Storage can be done through a variety of ways.  Most require you to have an `Access Key` and a `Secret Key`.
diff --git a/hop-user-manual/modules/ROOT/pages/vfs/azure-blob-storage-vfs.adoc b/hop-user-manual/modules/ROOT/pages/vfs/azure-blob-storage-vfs.adoc
index 9fa8f0c..c3cb865 100644
--- a/hop-user-manual/modules/ROOT/pages/vfs/azure-blob-storage-vfs.adoc
+++ b/hop-user-manual/modules/ROOT/pages/vfs/azure-blob-storage-vfs.adoc
@@ -26,10 +26,6 @@ The scheme you can use to access your files in Azure Blob Storage is
 
 `**azure://**`
 
-Example:
-
-* `azure:///hopstorage/file.txt`
-
 == Configuration
 
 To get access to your Azure storage files you need to configure a few things:
@@ -43,7 +39,17 @@ You can find both in the Storage Accounts section of your Azure portal.
 
 All 3 options can be set in either the Hop GUI options dialog (Menu: Tools / Options) or using the following Hop Conf (`hop-conf.sh` or `hop-conf.bat`) command line options:
 
-include::../hop-tools/hop-conf-cloud-azure-blob-storage.adoc[]
+[source,shell script]
+----
+      -aza, --azure-account=<account>
+                            The account to use for the Azure VFS
+      -azi, --azure-block-increment=<blockIncrement>
+                            The block increment size for new files on Azure,
+                              multiples of 512 only.
+      -azk, --azure-key=<key>
+                            The key to use for the Azure VFS
+
+----
 
 Once done you will see an `azure` entry in the central `hop-config.json` file:
 
diff --git a/hop-user-manual/modules/ROOT/pages/vfs/google-cloud-storage-vfs.adoc b/hop-user-manual/modules/ROOT/pages/vfs/google-cloud-storage-vfs.adoc
index c7e42f7..13d79f2 100644
--- a/hop-user-manual/modules/ROOT/pages/vfs/google-cloud-storage-vfs.adoc
+++ b/hop-user-manual/modules/ROOT/pages/vfs/google-cloud-storage-vfs.adoc
@@ -26,15 +26,15 @@ The scheme you can use to access your files in Google Cloud Storage is
 
 `**gs://**`
 
-Example:
-
-* `gs:///hopstorage/file.txt`
-
 == Configuration
 
 You need to generate a key file for a service account to make it work.  Go to the Google Cloud console to do this. Once you have a key file for your service account, with permissions to access your GCP storage, point to it with either a system environment variable called `GOOGLE_APPLICATION_CREDENTIALS` (standard Google way of doing this) or in the Options dialog in the 'Google Cloud' tab. You can also use `hop-conf`:
 
-include::../hop-tools/hop-conf-cloud-google-cloud-storage.adoc[]
+[source,shell script]
+----
+      -gck, --google-cloud-service-account-key-file=<serviceAccountKeyFile>
+                            Configure the path to a Google Cloud service account JSON key file
+----
 
 Once done you will see a `googleCloud` entry in the central `hop-config.json` file:
 
diff --git a/hop-user-manual/modules/ROOT/pages/vfs/google-drive-vfs.adoc b/hop-user-manual/modules/ROOT/pages/vfs/google-drive-vfs.adoc
index 447e788..be70d3d 100644
--- a/hop-user-manual/modules/ROOT/pages/vfs/google-drive-vfs.adoc
+++ b/hop-user-manual/modules/ROOT/pages/vfs/google-drive-vfs.adoc
@@ -26,13 +26,18 @@ The scheme you can use to access your files in Google Drive is
 
 `**googledrive://**`
 
-
-
 == Configuration
 
 You need to generate a credentials file to make it work.  Follow the Google documentation to see how that is done.  You also need to specify a folder in which security tokens are going to be saved.  You can specify both in the Hop system configuration options.  This can be done in the Hop GUI: go to the "Google Drive" tab in the Options dialog (from the Tools menu).  You can also use the `hop-conf` script with the following options:
 
-include::../hop-tools/hop-conf-cloud-google-drive.adoc[]
+[source,shell script]
+----
+      -gdc, --google-drive-credentials-file=<credentialsFile>
+                            Configure the path to a Google Drive credentials JSON
+                              file
+      -gdt, --google-drive-tokens-folder=<tokensFolder>
+                            Configure the path to a Google Drive tokens folder
+----
 
 Once done you will see a `googleDrive` entry in the central `hop-config.json` file:
 
diff --git a/hop-user-manual/modules/ROOT/pages/workflow/actions/deleteresultfilenames.adoc b/hop-user-manual/modules/ROOT/pages/workflow/actions/deleteresultfilenames.adoc
index 1998da3..748ab56 100644
--- a/hop-user-manual/modules/ROOT/pages/workflow/actions/deleteresultfilenames.adoc
+++ b/hop-user-manual/modules/ROOT/pages/workflow/actions/deleteresultfilenames.adoc
@@ -28,8 +28,8 @@ Use this action to delete all the filenames that are in the result files list of
 [width="90%", options="header"]
 |===
 |Option|Description
-|Workflow action name|The name of the workflow action.
-|Limit action to|Enable this feature if you want to limit the deletion to certain filenames in the result file list.
+|Workflow action name|The name of the workflow action. *Note*: This name has to be unique in a single workflow. A workflow action can be placed several times on the canvas, however it will be the same workflow action.
+|Limit action to| Enable this feature if you want to limit the deletion to certain filenames in the result file list.
 |Wildcard|The regular expression to limit the files to delete
-|Exclude wildcard|The regular expression to exclude certain files from being deleted. 
+|Exclude wildcard|The regular expression to exclude certain files from being deleted.
 |===
\ No newline at end of file