You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by xi...@apache.org on 2017/05/23 00:58:41 UTC

[6/6] samza git commit: Improve documentation for Resource Localization

Improve documentation for Resource Localization

This is a follow-up to Fred Ji's original PR : https://github.com/apache/samza/pull/191 .

Author: vjagadish1989 <jv...@linkedin.com>

Reviewers: Prateek Maheshwari <pm...@linkedin.com>

Closes #199 from vjagadish1989/doc-improvements


Project: http://git-wip-us.apache.org/repos/asf/samza/repo
Commit: http://git-wip-us.apache.org/repos/asf/samza/commit/06860479
Tree: http://git-wip-us.apache.org/repos/asf/samza/tree/06860479
Diff: http://git-wip-us.apache.org/repos/asf/samza/diff/06860479

Branch: refs/heads/0.13.0
Commit: 06860479f35b14336864b2bea99120cf8809a46f
Parents: 8ff402c
Author: vjagadish1989 <jv...@linkedin.com>
Authored: Mon May 22 17:42:32 2017 -0700
Committer: Xinyu Liu <xi...@xiliu-ld.linkedin.biz>
Committed: Mon May 22 17:56:57 2017 -0700

----------------------------------------------------------------------
 .../yarn/yarn-resource-localization.md          | 59 +++++++-------------
 1 file changed, 19 insertions(+), 40 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/samza/blob/06860479/docs/learn/documentation/versioned/yarn/yarn-resource-localization.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/yarn/yarn-resource-localization.md b/docs/learn/documentation/versioned/yarn/yarn-resource-localization.md
index a55670b..3d1c87a 100644
--- a/docs/learn/documentation/versioned/yarn/yarn-resource-localization.md
+++ b/docs/learn/documentation/versioned/yarn/yarn-resource-localization.md
@@ -18,58 +18,49 @@ title: YARN Resource Localization
    See the License for the specific language governing permissions and
    limitations under the License.
 -->
-
-When Samza jobs run on YARN clusters, sometimes there are needs to preload some files or data (called as resources in this doc) before job starts, such as preparing the job package, fetching job certificate, or etc., Samza supports a general configuration way to localize difference resources.
+When running Samza jobs on YARN clusters, you may need to download some resources before startup (For example, downloading the job binaries, fetching certificate files etc.) This step is called as Resource Localization.
 
 ### Resource Localization Process
 
-For the Samza jobs running on YARN, the resource localization leverages the YARN node manager localization service. Here is a good [deep dive](https://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/) from Horton Works on how localization works in YARN. 
-
-Depending on where and how the resource comes from, fetching the resource is associated with a scheme in the path, such as `http`, `https`, `hdfs`, `ftp`, `file`, etc., which maps to a certain FileSystem for handling the localization. 
+For Samza jobs running on YARN, resource localization leverages the YARN node manager's localization service. Here is a [deep dive](https://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/) on how localization works in YARN. 
 
-If there is an implementation of [FileSystem](https://hadoop.apache.org/docs/stable/api/index.html?org/apache/hadoop/fs/FileSystem.html) on YARN supporting a scheme, then that scheme can be used for resource localization. 
+Depending on where and how the resource comes from, fetching the resource is associated with a scheme in the path (such as `http`, `https`, `hdfs`, `ftp`, `file`, etc). The scheme maps to a corresponding `FileSystem` implementation for handling the localization. 
 
-There are some predefined file systems in Hadoop or Samza, which are provided if you run Samza jobs on YARN:
+There are some predefined `FileSystem` implementations in Hadoop and Samza, which are provided if you run Samza jobs on YARN:
 
-* `org.apache.samza.util.hadoop.HttpFileSystem`: used for fetching resources based on http, or https without client side authentication requirement.
+* `org.apache.samza.util.hadoop.HttpFileSystem`: used for fetching resources based on http or https without client side authentication.
 * `org.apache.hadoop.hdfs.DistributedFileSystem`: used for fetching resource from DFS system on Hadoop.
 * `org.apache.hadoop.fs.LocalFileSystem`: used for copying resources from local file system to the job directory.
 * `org.apache.hadoop.fs.ftp.FTPFileSystem`: used for fetching resources based on ftp.
-* ...
 
-If you would like to have your own file system, you need to implement a class which extends from `org.apache.hadoop.fs.FileSystem`. 
+You can create your own file system implementation by creating a class which extends from `org.apache.hadoop.fs.FileSystem`. 
 
-### Job Configuration
-With the configuration properly defined, the resources a job requiring from external or internal locations may be prepared automatically before it runs.
-
-For each resource with the name `<resourceName>` in the Samza job, the following set of job configurations are used when running on a YARN cluster. The first one which definiing resource path is required, but the others are optional and they have default values.
+### Resource Configuration
+You can specify a resource to be localized by the following configuration.
 
+#### Required Configuration
 1. `yarn.resources.<resourceName>.path`
-    * Required
-    * The path for fetching the resource for localization, e.g. http://hostname.com/packages/mySamzaJob
+    * The path for fetching the resource for localization, e.g. http://hostname.com/packages/myResource
+
+#### Optional Configuration
 2. `yarn.resources.<resourceName>.local.name`
-    * Optional 
     * The local name used for the localized resource.
-    * If not set, the default one will be `<resourceName>` from the config key.
+    * If it is not set, the default will be the `<resourceName>` specified in `yarn.resources.<resourceName>.path`
 3. `yarn.resources.<resourceName>.local.type`
-    * Optional 
-    * Localized resource type with valid values from: `ARCHIVE`, `FILE`, `PATTERN`.
+    * The type of the resource with valid values from: `ARCHIVE`, `FILE`, `PATTERN`.
         * ARCHIVE: the localized resource will be an archived directory;
         * FILE: the localized resource will be a file;
         * PATTERN: the localized resource will be the entries extracted from the archive with the pattern.
-    * If not set, the default value is `FILE`.
+    * If it is not set, the default value is `FILE`.
 4. `yarn.resources.<resourceName>.local.visibility`
-    * Optional
-    * Localized resource visibility for the resource, and it can be a value from `PUBLIC`, `PRIVATE`, `APPLICATION`
+    * Visibility for the resource with valid values from `PUBLIC`, `PRIVATE`, `APPLICATION`
         * PUBLIC: visible to everyone 
         * PRIVATE: visible to just the account which runs the job
         * APPLICATION: visible only to the specific application job which has the resource configuration
-    * If not set, the default value is `APPLICATION`
-
-It is up to you how to name the resource, but `<resourceName>` should be the same in the above configurations to apply to the same resource. 
+    * If it is not set, the default value is `APPLICATION`
 
 ### YARN Configuration
-Make sure the scheme used in the yarn.resources.&lt;resourceName&gt;.path is configured in YARN core-site.xml with a FileSystem implementation. For example, for scheme `http`, you should have the following property in YARN core-site.xml:
+Make sure the scheme used in the `yarn.resources.<resourceName>.path` is configured with a corresponding FileSystem implementation in YARN core-site.xml.
 
 {% highlight xml %}
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
@@ -81,19 +72,7 @@ Make sure the scheme used in the yarn.resources.&lt;resourceName&gt;.path is con
 </configuration>
 {% endhighlight %}
 
-You can override a behavior for a scheme by linking it to another file system. For example, you have a special need for localizing a resource for your job through http request, you may implement your own Http File System by extending [FileSystem](https://hadoop.apache.org/docs/stable/api/index.html?org/apache/hadoop/fs/FileSystem.html), and have the following configuration:
-
-{% highlight xml %}
-<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-<configuration>
-    <property>
-      <name>fs.http.impl</name>
-      <value>com.myCompany.MyHttpFileSystem</value>
-    </property>
-</configuration>
-{% endhighlight %}
-
-If you are using other scheme which is not defined in Hadoop or Samza, for example, `yarn.resources.mySampleResource.path=myScheme://host.com/test`, in your job configuration, you may implement your own [FileSystem](https://hadoop.apache.org/docs/stable/api/index.html?org/apache/hadoop/fs/FileSystem.html) such as com.myCompany.MySchemeFileSystem and link it with your own scheme in yarn core-site.xml configuration.
+If you are using your own scheme (for example, `yarn.resources.myResource.path=myScheme://host.com/test`), you can link your [FileSystem](https://hadoop.apache.org/docs/stable/api/index.html?org/apache/hadoop/fs/FileSystem.html) implementation with it as follows.
 
 {% highlight xml %}
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>