You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tez.apache.org by bi...@apache.org on 2014/09/05 05:41:44 UTC

[1/2] git commit: TEZ-1440. Post-release tasks (bikas)

Repository: tez
Updated Branches:
  refs/heads/master afd32dad7 -> eed2d9ab0


TEZ-1440. Post-release tasks (bikas)


Project: http://git-wip-us.apache.org/repos/asf/tez/repo
Commit: http://git-wip-us.apache.org/repos/asf/tez/commit/b4cd9729
Tree: http://git-wip-us.apache.org/repos/asf/tez/tree/b4cd9729
Diff: http://git-wip-us.apache.org/repos/asf/tez/diff/b4cd9729

Branch: refs/heads/master
Commit: b4cd97295d432e682c66fa25ebe90df68d8e74a3
Parents: afd32da
Author: Bikas Saha <bi...@apache.org>
Authored: Thu Sep 4 20:41:21 2014 -0700
Committer: Bikas Saha <bi...@apache.org>
Committed: Thu Sep 4 20:41:21 2014 -0700

----------------------------------------------------------------------
 docs/src/site/markdown/install.md               | 132 ++++++++++++-------
 docs/src/site/markdown/install_0_5_0.md         | 117 ----------------
 docs/src/site/markdown/talks.md                 |   8 +-
 docs/src/site/site.xml                          |   1 +
 .../input/ConcatenatedMergedKeyValueInput.java  |   2 +
 .../input/ConcatenatedMergedKeyValuesInput.java |   2 +
 .../library/input/OrderedGroupedKVInput.java    |   2 +
 .../input/OrderedGroupedMergedKVInput.java      |   2 +
 .../runtime/library/input/UnorderedKVInput.java |   2 +
 .../output/OrderedPartitionedKVOutput.java      |   2 +
 .../library/output/UnorderedKVOutput.java       |   2 +
 .../output/UnorderedPartitionedKVOutput.java    |   2 +
 12 files changed, 102 insertions(+), 172 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tez/blob/b4cd9729/docs/src/site/markdown/install.md
----------------------------------------------------------------------
diff --git a/docs/src/site/markdown/install.md b/docs/src/site/markdown/install.md
index cb9b586..d31eeb9 100644
--- a/docs/src/site/markdown/install.md
+++ b/docs/src/site/markdown/install.md
@@ -17,87 +17,88 @@
 
 <head><title>Install and Deployment Instructions</title></head>
 
-[Install instructions for Tez-0.5.0-SNAPSHOT - master branch](./install_0_5_0.html)
------------------------------------------------------------------------------------
-
-Install/Deploy Instructions for the latest Tez release [(Tez-0.4.1 src)](http://www.apache.org/dyn/closer.cgi/incubator/tez/tez-0.4.1-incubating/)
---------------------------------------------------------------------------------------------------------------------------------------------------
+Install/Deploy Instructions for Tez
+---------------------------------------------------------------------------
+Replace x.y.z with the tez release number that you are using. E.g. 0.5.0
 
 1.  Deploy Apache Hadoop using either the 2.2.0 release or a compatible
     2.x version.
     -   One thing to note though when compiling Tez is that you will
         need to change the value of the hadoop.version property in the
-        toplevel pom.xml to match the version of the hadoop branch being
+        top-level pom.xml to match the version of the hadoop branch being
         used.
-2.  Build tez using `mvn clean install -DskipTests=true -Dmaven.javadoc.skip=true`
+2.  Build tez using `mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true`
     -   This assumes that you have already installed JDK6 or later,
         Maven 3 or later and Protocol Buffers (protoc compiler) 2.5 or
         later
     -   If you prefer to run the unit tests, remove skipTests from the
         command above.
-    -   If you would like to create a tarball of the release, use `mvn
-        clean package -Dtar -DskipTests=true -Dmaven.javadoc.skip=true`
     -   If you use Eclipse IDE, you can import the projects using
         "Import/Maven/Existing Maven Projects". Eclipse does not
         automatically generate Java sources or include the generated
         sources into the projects. Please build using maven as described
         above and then use Project Properties to include
-        "target/generated-sources/java" as a source directory into the
+        "target/generatedsources/java" as a source directory into the
         "Java Build Path" for these projects: tez-api, tez-mapreduce,
         tez-runtime-internals and tez-runtime-library. This needs to be done
         just once after importing the project.
-3.  Copy the tez jars and their dependencies into HDFS.
-    -   The tez jars and dependencies will be found in
-        tez-dist/target/tez-0.4.1-incubating/tez-0.4.1-incubating if you run
-        the intial command mentioned in step 2.
+3.  Copy the relevant tez tarball into HDFS, and configure tez-site.xml
+    -   A tez tarball containing tez and hadoop libraries will be found
+        at tez-dist/target/tez-x.y.z-SNAPSHOT.tar.gz
     -   Assuming that the tez jars are put in /apps/ on HDFS, the
-        command would be `hadoop dfs -put
-        tez-dist/target/tez-0.4.1-incubating/tez-0.4.1-incubating /apps/`
-    -   Please do not upload the tarball to HDFS, upload only the jars.
-4.  Configure tez-site.xml to set tez.lib.uris to point to the paths in
-    HDFS containing the jars. Please note that the paths are not
-    searched recursively so for *basedir* and *basedir*/lib/, you will
-    need to configure the 2 paths as a comma-separated list. * Assuming
-    you followed step 3, the value would be:
-    "${fs.default.name}/apps/tez-0.4.1-incubating,${fs.default.name}/apps/tez-0.4.1-incubating/lib/"
-5.  Modify mapred-site.xml to change _mapreduce.framework.name_ property
-    from its default value of *yarn* to *yarn-tez*
-6.  Set HADOOP_CLASSPATH to have the following paths in it:
-    -   TEZ_CONF_DIR - location of tez-site.xml
-    -   TEZ_JARS and TEZ_JARS/libs - location of the tez jars and
-        dependencies.
-    -   The command to set up the classpath should be something like:
-        `export HADOOP_CLASSPATH=${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*`
-        Please note the "*" which is an important requirement when
+        command would be
+        ```
+            hadoop dfs -mkdir /apps/tez-x.y.z-SNAPSHOT
+            hadoop dfs -copyFromLocal tez-dist/target/tez-x.y.z-SNAPSHOT-archive.tar.gz /apps/tez-x.y.z-SNAPSHOT/
+        ```
+    -   tez-site.xml configuration.
+        -   Set tez.lib.uris to point to the tar.gz uploaded to HDFS.
+            Assuming the steps mentioned so far were followed,
+            ```
+            set tez.lib.uris to "${fs.defaultFS}/apps/tez-x.y.z-SNAPSHOT/tez-x.y.z-SNAPSHOT.tar.gz"
+            ```
+        -   Ensure tez.use.cluster.hadoop-libs is not set in tez-site.xml,
+            or if it is set, the value should be false
+4.  Optional: If running existing MapReduce jobs on Tez. Modify
+    mapred-site.xml to change "mapreduce.framework.name" property from
+    its default value of "yarn" to "yarn-tez"
+5.  Configure the client node to include the tez-libraries in the hadoop
+    classpath
+    -   Extract the tez minimal tarball created in step 2 to a local directory
+        (assuming TEZ_JARS is where the files will be decompressed for
+        the next steps)
+        ```
+        tar -xvzf tez-dist/target/tez-x.y.z-minimal.tar.gz -C $TEZ_JARS
+        ```
+    -   set TEZ_CONF_DIR to the location of tez-site.xml
+    -   Add $TEZ_CONF_DIR, ${TEZ_JARS}/* and ${TEZ_JARS}/lib/* to the application classpath.
+        For example, doing it via the standard Hadoop tool chain would use the following command 
+	to set up the application classpath:
+        ```
+        export HADOOP_CLASSPATH=${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*
+        ```
+    -   Please note the "*" which is an important requirement when
         setting up classpaths for directories containing jar files.
-7.  Submit a MR job as you normally would using something like:
+6.  There is a basic example of using an MRR job in the tez-examples.jar.
+    Refer to OrderedWordCount.java in the source code. To run this
+    example:
 
     ```
-    $HADOOP_PREFIX/bin/hadoop jar hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT-tests.jar sleep -mt 1 -rt 1 -m 1 -r 1
-    ```
-
-    This will use the TEZ DAG ApplicationMaster to run the MR job. This
-    can be verified by looking at the AM’s logs from the YARN
-    ResourceManager UI.
-8.  There is a basic example of using an MRR job in the
-    tez-mapreduce-examples.jar. Refer to OrderedWordCount.java in the
-    source code. To run this example:
-
-    ``` 
-    $HADOOP_PREFIX/bin/hadoop jar tez-mapreduce-examples.jar orderedwordcount <input> <output>
+    $HADOOP_PREFIX/bin/hadoop jar tez-examples.jar orderedwordcount <input> <output>
     ```
 
     This will use the TEZ DAG ApplicationMaster to run the ordered word
     count job. This job is similar to the word count example except that
     it also orders all words based on the frequency of occurrence.
 
-    There are multiple variations to run orderedwordcount. You can use
-    it to run multiple DAGs serially on different inputs/outputs. These
-    DAGs could be run separately as different applications or serially
-    within a single TEZ session.
+    Tez DAGs could be run separately as different applications or
+    serially within a single TEZ session. There is a different variation
+    of orderedwordcount in tez-tests that supports the use of Sessions
+    and handling multiple input-output pairs. You can use it to run
+    multiple DAGs serially on different inputs/outputs.
 
     ```
-    $HADOOP_PREFIX/bin/hadoop jar tez-mapreduce-examples.jar orderedwordcount <input1> <output1> <input2> <output2> <input3> <output3> ...
+    $HADOOP_PREFIX/bin/hadoop jar tez-tests.jar testorderedwordcount <input1> <output1> <input2> <output2> <input3> <output3> ...
     ```
 
     The above will run multiple DAGs for each input-output pair.
@@ -105,5 +106,34 @@ Install/Deploy Instructions for the latest Tez release [(Tez-0.4.1 src)](http://
     To use TEZ sessions, set -DUSE_TEZ_SESSION=true
 
     ```
-    $HADOOP_PREFIX/bin/hadoop jar tez-mapreduce-examples.jar orderedwordcount -DUSE_TEZ_SESSION=true <input1> <output1> <input2> <output2>
+    $HADOOP_PREFIX/bin/hadoop jar tez-tests.jar testorderedwordcount -DUSE_TEZ_SESSION=true <input1> <output1> <input2> <output2>
+    ```
+7.  Submit a MR job as you normally would using something like:
+
     ```
+    $HADOOP_PREFIX/bin/hadoop jar hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT-tests.jar sleep -mt 1 -rt 1 -m 1 -r 1
+    ```
+
+    This will use the TEZ DAG ApplicationMaster to run the MR job. This
+    can be verified by looking at the AM’s logs from the YARN ResourceManager UI. 
+    This needs mapred-site.xml to have "mapreduce.framework.name" set to "yarn-tez"
+
+Hadoop Installation dependent Install/Deploy Instructions
+---------------------------------------------------------
+The above install instructions use Tez with pre-packaged Hadoop libraries included in the package and is the 
+recommended method for installation. If its needed to make Tez use the existing cluster Hadoop libraries then
+follow this alternate machanism to setup Tez to use Hadoop libraries from the cluster.
+Step 3 above changes as follows. Also subsequent steps would use tez-dist/target/tez-x.y.z-minimal.tar.gz instead of tez-dist/target/tez-x.y.z.tar.gz
+- A tez build without Hadoop dependencies will be available at tez-dist/target/tez-x.y.z-minimal.tar.gz
+- Assuming that the tez jars are put in /apps/ on HDFS, the command would be
+"hadoop fs -mkdir /apps/tez-x.y.z"
+"hadoop fs -copyFromLocal tez-dist/target/tez-x.y.z-minimal.tar.gz /apps/tez-x.y.z"
+- tez-site.xml configuration
+- Set tez.lib.uris to point to the paths in HDFS containing the tez jars. Assuming the steps mentioned so far were followed,
+set tez.lib.uris to "${fs.defaultFS}/apps/tez-x.y.z/tez-x.y.z-minimal.tar.gz
+- set tez.use.cluster.hadoop-libs to true
+
+
+[Install instructions for older versions of Tez (pre 0.5.0)](./install_pre_0_5_0.html)
+-----------------------------------------------------------------------------------
+

http://git-wip-us.apache.org/repos/asf/tez/blob/b4cd9729/docs/src/site/markdown/install_0_5_0.md
----------------------------------------------------------------------
diff --git a/docs/src/site/markdown/install_0_5_0.md b/docs/src/site/markdown/install_0_5_0.md
deleted file mode 100644
index 2cf0e3d..0000000
--- a/docs/src/site/markdown/install_0_5_0.md
+++ /dev/null
@@ -1,117 +0,0 @@
-<!--
-   Licensed to the Apache Software Foundation (ASF) under one or more
-   contributor license agreements.  See the NOTICE file distributed with
-   this work for additional information regarding copyright ownership.
-   The ASF licenses this file to You under the Apache License, Version 2.0
-   (the "License"); you may not use this file except in compliance with
-   the License.  You may obtain a copy of the License at
-
-       http://www.apache.org/licenses/LICENSE-2.0
-
-   Unless required by applicable law or agreed to in writing, software
-   distributed under the License is distributed on an "AS IS" BASIS,
-   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-   See the License for the specific language governing permissions and
-   limitations under the License.
--->
-
-<head><title>Install and Deployment Instructions</title></head>
-
-Install/Deploy Instructions for Tez-current (0.5.0-SNAPSHOT, branch master)
----------------------------------------------------------------------------
-
-1.  Deploy Apache Hadoop using either the 2.2.0 release or a compatible
-    2.x version.
-    -   One thing to note though when compiling Tez is that you will
-        need to change the value of the hadoop.version property in the
-        top-level pom.xml to match the version of the hadoop branch being
-        used.
-2.  Build tez using `mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true`
-    -   This assumes that you have already installed JDK6 or later,
-        Maven 3 or later and Protocol Buffers (protoc compiler) 2.5 or
-        later
-    -   If you prefer to run the unit tests, remove skipTests from the
-        command above.
-    -   If you use Eclipse IDE, you can import the peojects using
-        "Import/Maven/Existing Maven Projects". Eclipse does not
-        automatically generate Java sources or include the generated
-        sources into the projects. Please build using maven as described
-        above and then use Project Properties to include
-        "target/generatedsources/java" as a source directory into the
-        "Java Build Path" for these projects: tez-api, tez-mapreduce,
-        tez-runtime-internals and tez-runtime-library. This needs to be done
-        just once after importing the project.
-3.  Copy the relevant tez tarball into HDFS, and configure tezsite.xml
-    -   A tez tarball containing tez and hadoop libraries will be found
-        at tez-dist/target/tez-0.5.0-SNAPSHOT.tar.gz
-    -   Assuming that the tez jars are put in /apps/ on HDFS, the
-        command would be
-        ```
-            hadoop dfs -mkdir /apps/tez-0.5.0-SNAPSHOT
-            hadoop dfs -copyFromLocal tez-dist/target/tez-0.5.0-SNAPSHOT-archive.tar.gz /apps/tez-0.5.0-SNAPSHOT/
-        ```
-    -   tez-site.xml configuration.
-        -   Set tez.lib.uris to point to the tar.gz uploaded to HDFS.
-            Assuming the steps mentioned so far were followed,
-            ```
-            set tez.lib.uris to "${fs.default.name}/apps/tez-0.5.0-SNAPSHOT/tez-0.5.0-SNAPSHOT.tar.gz"
-            ```
-        -   Ensure tez.use.cluster.hadoop-libs is not set in tez-site.xml,
-            or if it is set, the value should be false
-4.  Optional: If running existing MapReduce jobs on Tez. Modify
-    mapred-site.xml to change "mapreduce.framework.name" property from
-    its default value of "yarn" to "yarn-tez"
-5.  Configure the client node to include the tez-libraries in the hadoop
-    classpath
-    -   Extract the tez tarball created in step 2 to a local directory
-        (assuming TEZ_JARS is where the files will be decompressed for
-        the next steps)
-        ```
-        tar -xvzf tez-dist/target/tez-0.5.0-SNAPSHOT.tar.gz -C $TEZ_JARS
-        ```
-    -   set TEZ_CONF_DIR to the location of tez-site.xml
-    -   The command to set up the classpath should be something like:
-        ```
-        export HADOOP_CLASSPATH=${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*
-        ```
-    -   Please note the "*" which is an important requirement when
-        setting up classpaths for directories containing jar files.
-6.  Submit a MR job as you normally would using something like:
-
-    ```
-    $HADOOP_PREFIX/bin/hadoop jar hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT-tests.jar sleep -mt 1 -rt 1 -m 1 -r 1
-    ```
-
-    This will use the TEZ DAG ApplicationMaster to run the MR job. This
-    can be verified by looking at the AM’s logs from the YARN
-    ResourceManager UI.
-
-7.  There is a basic example of using an MRR job in the tez-examples.jar.
-    Refer to OrderedWordCount.java in the source code. To run this
-    example:
-
-    ```
-    $HADOOP_PREFIX/bin/hadoop jar tez-examples.jar orderedwordcount <input> <output>
-    ```
-
-    This will use the TEZ DAG ApplicationMaster to run the ordered word
-    count job. This job is similar to the word count example except that
-    it also orders all words based on the frequency of occurrence.
-
-    Tez DAGs could be run separately as different applications or
-    serially within a single TEZ session. There is a different variation
-    of orderedwordcount in tez-tests that supports the use of Sessions
-    and handling multiple input-output pairs. You can use it to run
-    multiple DAGs serially on different inputs/outputs.
-
-    ```
-    $HADOOP_PREFIX/bin/hadoop jar tez-tests.jar testorderedwordcount <input1> <output1> <input2> <output2> <input3> <output3> ...
-    ```
-
-    The above will run multiple DAGs for each input-output pair.
-
-    To use TEZ sessions, set -DUSE_TEZ_SESSION=true
-
-    ```
-    $HADOOP_PREFIX/bin/hadoop jar tez-tests.jar testorderedwordcount -DUSE_TEZ_SESSION=true <input1> <output1> <input2> <output2>
-    ```

http://git-wip-us.apache.org/repos/asf/tez/blob/b4cd9729/docs/src/site/markdown/talks.md
----------------------------------------------------------------------
diff --git a/docs/src/site/markdown/talks.md b/docs/src/site/markdown/talks.md
index 8971088..a29d553 100644
--- a/docs/src/site/markdown/talks.md
+++ b/docs/src/site/markdown/talks.md
@@ -19,10 +19,10 @@
 
 Talks
 -----
--   Apache Tez : Accelerating Hadoop Query Processing by Arun Murthy and
-    Bikas Saha at [Hadoop Summit 2013, San Jose, CA, USA](http://hadoopsummit.org/san-jose/)
-    -   [Slides](http://www.slideshare.net/Hadoop_Summit/murhty-saha-june26255pmroom212)
-    -   [Video](http://www.youtube.com/watch?v=9ZLLzlsz7h8)
+-   Apache Tez : Accelerating Hadoop Query Processing by Bikas Saha and
+    Hitesh Shah at [Hadoop Summit 2014, San Jose, CA, USA](http://hadoopsummit.org/san-jose/)
+    -   [Slides](http://www.slideshare.net/Hadoop_Summit/w-1205phall1saha)
+    -   [Video](http://www.youtube.com/watch?v=yf_hBiZy3nk)
 
 User Meetup Recordings
 ----------------------

http://git-wip-us.apache.org/repos/asf/tez/blob/b4cd9729/docs/src/site/site.xml
----------------------------------------------------------------------
diff --git a/docs/src/site/site.xml b/docs/src/site/site.xml
index 1a915db..a3d629c 100644
--- a/docs/src/site/site.xml
+++ b/docs/src/site/site.xml
@@ -112,6 +112,7 @@
 
     <menu name="Releases">
       <item name="0.4.1-incubating" href="http://archive.apache.org/dist/incubator/tez/tez-0.4.1-incubating/"/>
+      <item name="0.5.0" href="index_0_5_0.html"/>
     </menu>
 
     <menu name="Contribute">

http://git-wip-us.apache.org/repos/asf/tez/blob/b4cd9729/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/ConcatenatedMergedKeyValueInput.java
----------------------------------------------------------------------
diff --git a/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/ConcatenatedMergedKeyValueInput.java b/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/ConcatenatedMergedKeyValueInput.java
index e875240..39e0fff 100644
--- a/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/ConcatenatedMergedKeyValueInput.java
+++ b/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/ConcatenatedMergedKeyValueInput.java
@@ -21,6 +21,7 @@ package org.apache.tez.runtime.library.input;
 import java.io.IOException;
 import java.util.List;
 
+import org.apache.hadoop.classification.InterfaceAudience.Public;
 import org.apache.tez.dag.api.GroupInputEdge;
 import org.apache.tez.dag.api.TezUncheckedException;
 import org.apache.tez.runtime.api.Input;
@@ -34,6 +35,7 @@ import org.apache.tez.runtime.library.api.KeyValueReader;
  * (e.g. from a {@link GroupInputEdge} and provide a unified view of the 
  * input. It concatenates all the inputs to provide a unified view
  */
+@Public
 public class ConcatenatedMergedKeyValueInput extends MergedLogicalInput {
 
   public ConcatenatedMergedKeyValueInput(MergedInputContext context,

http://git-wip-us.apache.org/repos/asf/tez/blob/b4cd9729/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/ConcatenatedMergedKeyValuesInput.java
----------------------------------------------------------------------
diff --git a/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/ConcatenatedMergedKeyValuesInput.java b/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/ConcatenatedMergedKeyValuesInput.java
index 7a57240..0cc3244 100644
--- a/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/ConcatenatedMergedKeyValuesInput.java
+++ b/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/ConcatenatedMergedKeyValuesInput.java
@@ -21,6 +21,7 @@ package org.apache.tez.runtime.library.input;
 import java.io.IOException;
 import java.util.List;
 
+import org.apache.hadoop.classification.InterfaceAudience.Public;
 import org.apache.tez.dag.api.GroupInputEdge;
 import org.apache.tez.dag.api.TezUncheckedException;
 import org.apache.tez.runtime.api.Input;
@@ -35,6 +36,7 @@ import org.apache.tez.runtime.library.api.KeyValuesReader;
  * input. It concatenates all the inputs to provide a unified view
  */
 
+@Public
 public class ConcatenatedMergedKeyValuesInput extends MergedLogicalInput {
 
   public ConcatenatedMergedKeyValuesInput(MergedInputContext context,

http://git-wip-us.apache.org/repos/asf/tez/blob/b4cd9729/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/OrderedGroupedKVInput.java
----------------------------------------------------------------------
diff --git a/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/OrderedGroupedKVInput.java b/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/OrderedGroupedKVInput.java
index 7e7f2c4..fe85a99 100644
--- a/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/OrderedGroupedKVInput.java
+++ b/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/OrderedGroupedKVInput.java
@@ -30,6 +30,7 @@ import java.util.concurrent.atomic.AtomicBoolean;
 import org.apache.commons.logging.Log;
 import org.apache.commons.logging.LogFactory;
 import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceAudience.Public;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.io.RawComparator;
 import org.apache.tez.common.TezUtils;
@@ -66,6 +67,7 @@ import com.google.common.base.Preconditions;
  * completion. Attempting to get a reader on a non-complete input will block.
  *
  */
+@Public
 public class OrderedGroupedKVInput extends AbstractLogicalInput {
 
   static final Log LOG = LogFactory.getLog(OrderedGroupedKVInput.class);

http://git-wip-us.apache.org/repos/asf/tez/blob/b4cd9729/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/OrderedGroupedMergedKVInput.java
----------------------------------------------------------------------
diff --git a/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/OrderedGroupedMergedKVInput.java b/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/OrderedGroupedMergedKVInput.java
index 45c68aa..08cf662 100644
--- a/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/OrderedGroupedMergedKVInput.java
+++ b/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/OrderedGroupedMergedKVInput.java
@@ -29,6 +29,7 @@ import java.util.Set;
 
 import org.apache.commons.logging.Log;
 import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.classification.InterfaceAudience.Public;
 import org.apache.hadoop.io.RawComparator;
 import org.apache.tez.runtime.api.Input;
 import org.apache.tez.runtime.api.MergedLogicalInput;
@@ -43,6 +44,7 @@ import org.apache.tez.runtime.library.api.KeyValuesReader;
  * Combiners and Secondary Sort are not implemented, so there is no guarantee on
  * the order of values.
  */
+@Public
 public class OrderedGroupedMergedKVInput extends MergedLogicalInput {
 
   private static final Log LOG = LogFactory.getLog(OrderedGroupedMergedKVInput.class);

http://git-wip-us.apache.org/repos/asf/tez/blob/b4cd9729/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/UnorderedKVInput.java
----------------------------------------------------------------------
diff --git a/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/UnorderedKVInput.java b/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/UnorderedKVInput.java
index 87caf4c..75fa64a 100644
--- a/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/UnorderedKVInput.java
+++ b/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/UnorderedKVInput.java
@@ -27,6 +27,7 @@ import java.util.concurrent.atomic.AtomicBoolean;
 import org.apache.commons.logging.Log;
 import org.apache.commons.logging.LogFactory;
 import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceAudience.Public;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.io.compress.CompressionCodec;
 import org.apache.hadoop.io.compress.DefaultCodec;
@@ -57,6 +58,7 @@ import com.google.common.base.Preconditions;
  * unified view to that data. There are no ordering constraints applied by
  * this input.
  */
+@Public
 public class UnorderedKVInput extends AbstractLogicalInput {
 
   private static final Log LOG = LogFactory.getLog(UnorderedKVInput.class);

http://git-wip-us.apache.org/repos/asf/tez/blob/b4cd9729/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/output/OrderedPartitionedKVOutput.java
----------------------------------------------------------------------
diff --git a/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/output/OrderedPartitionedKVOutput.java b/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/output/OrderedPartitionedKVOutput.java
index 15121da..40e99ad 100644
--- a/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/output/OrderedPartitionedKVOutput.java
+++ b/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/output/OrderedPartitionedKVOutput.java
@@ -29,6 +29,7 @@ import java.util.concurrent.atomic.AtomicBoolean;
 import org.apache.commons.logging.Log;
 import org.apache.commons.logging.LogFactory;
 import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceAudience.Public;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.yarn.api.ApplicationConstants;
@@ -65,6 +66,7 @@ import com.google.protobuf.ByteString;
  * key/value pairs written to it. It also partitions the output based on a
  * {@link Partitioner}
  */
+@Public
 public class OrderedPartitionedKVOutput extends AbstractLogicalOutput {
 
   private static final Log LOG = LogFactory.getLog(OrderedPartitionedKVOutput.class);

http://git-wip-us.apache.org/repos/asf/tez/blob/b4cd9729/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/output/UnorderedKVOutput.java
----------------------------------------------------------------------
diff --git a/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/output/UnorderedKVOutput.java b/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/output/UnorderedKVOutput.java
index ea02743..76dab45 100644
--- a/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/output/UnorderedKVOutput.java
+++ b/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/output/UnorderedKVOutput.java
@@ -29,6 +29,7 @@ import org.apache.commons.logging.Log;
 import org.apache.commons.logging.LogFactory;
 import org.apache.hadoop.classification.InterfaceAudience;
 import org.apache.hadoop.classification.InterfaceAudience.Private;
+import org.apache.hadoop.classification.InterfaceAudience.Public;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.yarn.api.ApplicationConstants;
 import org.apache.tez.common.TezUtils;
@@ -58,6 +59,7 @@ import com.google.protobuf.ByteString;
  * value data without applying any ordering or grouping constraints. This can be
  * used to write raw key value data as is.
  */
+@Public
 public class UnorderedKVOutput extends AbstractLogicalOutput {
 
   private static final Log LOG = LogFactory.getLog(UnorderedKVOutput.class);

http://git-wip-us.apache.org/repos/asf/tez/blob/b4cd9729/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/output/UnorderedPartitionedKVOutput.java
----------------------------------------------------------------------
diff --git a/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/output/UnorderedPartitionedKVOutput.java b/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/output/UnorderedPartitionedKVOutput.java
index 8d4ed3a..9b61df0 100644
--- a/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/output/UnorderedPartitionedKVOutput.java
+++ b/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/output/UnorderedPartitionedKVOutput.java
@@ -29,6 +29,7 @@ import com.google.common.base.Preconditions;
 import org.apache.commons.logging.Log;
 import org.apache.commons.logging.LogFactory;
 import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceAudience.Public;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.tez.common.TezUtils;
 import org.apache.tez.common.TezRuntimeFrameworkConfigs;
@@ -47,6 +48,7 @@ import org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWrite
  * write Key-Value pairs. The key-value pairs are written to the correct partition based on the
  * configured Partitioner.
  */
+@Public
 public class UnorderedPartitionedKVOutput extends AbstractLogicalOutput {
 
   private static final Log LOG = LogFactory.getLog(UnorderedPartitionedKVOutput.class);

[2/2] git commit: TEZ-1440. Post-release tasks (bikas)

Posted by bi...@apache.org.

TEZ-1440. Post-release tasks (bikas)


Project: http://git-wip-us.apache.org/repos/asf/tez/repo
Commit: http://git-wip-us.apache.org/repos/asf/tez/commit/eed2d9ab
Tree: http://git-wip-us.apache.org/repos/asf/tez/tree/eed2d9ab
Diff: http://git-wip-us.apache.org/repos/asf/tez/diff/eed2d9ab

Branch: refs/heads/master
Commit: eed2d9ab0c90d82b2c3cf1302b33a0c1acd9fa03
Parents: b4cd972
Author: Bikas Saha <bi...@apache.org>
Authored: Thu Sep 4 20:41:38 2014 -0700
Committer: Bikas Saha <bi...@apache.org>
Committed: Thu Sep 4 20:41:38 2014 -0700

----------------------------------------------------------------------
 docs/src/site/markdown/index_0_5_0.md       |  29 ++++++
 docs/src/site/markdown/install_pre_0_5_0.md | 109 +++++++++++++++++++++++
 2 files changed, 138 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tez/blob/eed2d9ab/docs/src/site/markdown/index_0_5_0.md
----------------------------------------------------------------------
diff --git a/docs/src/site/markdown/index_0_5_0.md b/docs/src/site/markdown/index_0_5_0.md
new file mode 100644
index 0000000..5295ca3
--- /dev/null
+++ b/docs/src/site/markdown/index_0_5_0.md
@@ -0,0 +1,29 @@
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+<head><title>Apache Tez 0.5.0</title></head>
+
+Apache Tez 0.5.0
+----------------
+
+- [Release Artifacts](http://www.apache.org/dyn/closer.cgi/tez)
+- [Release Notes](releases/0.5.0/release-notes.txt)
+- Documentation
+    - [API Javadocs](releases/0.5.0/tez-api-javadocs/index.html) : Documentation for the Tez APIs
+    - [Runtime Library Javadocs](releases/0.5.0/tez-runtime-library-javadocs/index.html) : Documentation for built-in implementations of useful Inputs, Outputs, Processors etc. written based on the Tez APIs 
+    - [Tez Mapreduce Javadocs](releases/0.5.0/tez-mapreduce-javadocs/index.html) : Documentation for built-in implementations of Mapreduce compatible Inputs, Outputs, Processors etc. written based on the Tez APIs 
+

http://git-wip-us.apache.org/repos/asf/tez/blob/eed2d9ab/docs/src/site/markdown/install_pre_0_5_0.md
----------------------------------------------------------------------
diff --git a/docs/src/site/markdown/install_pre_0_5_0.md b/docs/src/site/markdown/install_pre_0_5_0.md
new file mode 100644
index 0000000..494ff54
--- /dev/null
+++ b/docs/src/site/markdown/install_pre_0_5_0.md
@@ -0,0 +1,109 @@
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+<head><title>Install and Deployment Instructions</title></head>
+
+[Install instructions for Tez (post 0.5.0)](./install.html)
+-----------------------------------------------------------------------------------
+
+Install/Deploy Instructions for Tez release pre-0.5.0 E.g. [(Tez-0.4.1)](http://archive.apache.org/dist/incubator/tez/tez-0.4.1-incubating/)
+--------------------------------------------------------------------------------------------------------------------------------------------------
+
+1.  Deploy Apache Hadoop using either the 2.2.0 release or a compatible
+    2.x version.
+    -   One thing to note though when compiling Tez is that you will
+        need to change the value of the hadoop.version property in the
+        toplevel pom.xml to match the version of the hadoop branch being
+        used.
+2.  Build tez using `mvn clean install -DskipTests=true -Dmaven.javadoc.skip=true`
+    -   This assumes that you have already installed JDK6 or later,
+        Maven 3 or later and Protocol Buffers (protoc compiler) 2.5 or
+        later
+    -   If you prefer to run the unit tests, remove skipTests from the
+        command above.
+    -   If you would like to create a tarball of the release, use `mvn
+        clean package -Dtar -DskipTests=true -Dmaven.javadoc.skip=true`
+    -   If you use Eclipse IDE, you can import the projects using
+        "Import/Maven/Existing Maven Projects". Eclipse does not
+        automatically generate Java sources or include the generated
+        sources into the projects. Please build using maven as described
+        above and then use Project Properties to include
+        "target/generated-sources/java" as a source directory into the
+        "Java Build Path" for these projects: tez-api, tez-mapreduce,
+        tez-runtime-internals and tez-runtime-library. This needs to be done
+        just once after importing the project.
+3.  Copy the tez jars and their dependencies into HDFS.
+    -   The tez jars and dependencies will be found in
+        tez-dist/target/tez-0.4.1-incubating/tez-0.4.1-incubating if you run
+        the intial command mentioned in step 2.
+    -   Assuming that the tez jars are put in /apps/ on HDFS, the
+        command would be `hadoop dfs -put
+        tez-dist/target/tez-0.4.1-incubating/tez-0.4.1-incubating /apps/`
+    -   Please do not upload the tarball to HDFS, upload only the jars.
+4.  Configure tez-site.xml to set tez.lib.uris to point to the paths in
+    HDFS containing the jars. Please note that the paths are not
+    searched recursively so for *basedir* and *basedir*/lib/, you will
+    need to configure the 2 paths as a comma-separated list. * Assuming
+    you followed step 3, the value would be:
+    "${fs.default.name}/apps/tez-0.4.1-incubating,${fs.default.name}/apps/tez-0.4.1-incubating/lib/"
+5.  Modify mapred-site.xml to change _mapreduce.framework.name_ property
+    from its default value of *yarn* to *yarn-tez*
+6.  Set HADOOP_CLASSPATH to have the following paths in it:
+    -   TEZ_CONF_DIR - location of tez-site.xml
+    -   TEZ_JARS and TEZ_JARS/libs - location of the tez jars and
+        dependencies.
+    -   The command to set up the classpath should be something like:
+        `export HADOOP_CLASSPATH=${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*`
+        Please note the "*" which is an important requirement when
+        setting up classpaths for directories containing jar files.
+7.  Submit a MR job as you normally would using something like:
+
+    ```
+    $HADOOP_PREFIX/bin/hadoop jar hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT-tests.jar sleep -mt 1 -rt 1 -m 1 -r 1
+    ```
+
+    This will use the TEZ DAG ApplicationMaster to run the MR job. This
+    can be verified by looking at the AM’s logs from the YARN
+    ResourceManager UI.
+8.  There is a basic example of using an MRR job in the
+    tez-mapreduce-examples.jar. Refer to OrderedWordCount.java in the
+    source code. To run this example:
+
+    ``` 
+    $HADOOP_PREFIX/bin/hadoop jar tez-mapreduce-examples.jar orderedwordcount <input> <output>
+    ```
+
+    This will use the TEZ DAG ApplicationMaster to run the ordered word
+    count job. This job is similar to the word count example except that
+    it also orders all words based on the frequency of occurrence.
+
+    There are multiple variations to run orderedwordcount. You can use
+    it to run multiple DAGs serially on different inputs/outputs. These
+    DAGs could be run separately as different applications or serially
+    within a single TEZ session.
+
+    ```
+    $HADOOP_PREFIX/bin/hadoop jar tez-mapreduce-examples.jar orderedwordcount <input1> <output1> <input2> <output2> <input3> <output3> ...
+    ```
+
+    The above will run multiple DAGs for each input-output pair.
+
+    To use TEZ sessions, set -DUSE_TEZ_SESSION=true
+
+    ```
+    $HADOOP_PREFIX/bin/hadoop jar tez-mapreduce-examples.jar orderedwordcount -DUSE_TEZ_SESSION=true <input1> <output1> <input2> <output2>
+    ```