You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@apex.apache.org by th...@apache.org on 2016/03/02 02:40:30 UTC

[4/8] incubator-apex-core git commit: Migrating docs

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/application_packages.md
----------------------------------------------------------------------
diff --git a/application_packages.md b/application_packages.md
deleted file mode 100644
index 521779a..0000000
--- a/application_packages.md
+++ /dev/null
@@ -1,669 +0,0 @@
-Apache Apex Application Packages
-================================
-
-An Apache Apex Application Package is a zip file that contains all the
-necessary files to launch an application in Apache Apex. It is the
-standard way for assembling and sharing an Apache Apex application.
-
-# Requirements
-
-You will need have the following installed:
-
-1. Apache Maven 3.0 or later (for assembling the App Package)
-2. Apache Apex 3.0.0 or later (for launching the App Package in your cluster)
-
-# Creating Your First Apex App Package
-
-You can create an Apex Application Package using your Linux command
-line, or using your favorite IDE.
-
-## Using Command Line
-
-First, change to the directory where you put your projects, and create
-an Apex application project using Maven by running the following
-command.  Replace "com.example", "mydtapp" and "1.0-SNAPSHOT" with the
-appropriate values (make sure this is all on one line):
-
-    $ mvn archetype:generate \
-     -DarchetypeGroupId=org.apache.apex \
-     -DarchetypeArtifactId=apex-app-archetype -DarchetypeVersion=3.2.0-incubating \
-     -DgroupId=com.example -Dpackage=com.example.mydtapp -DartifactId=mydtapp \
-     -Dversion=1.0-SNAPSHOT
-
-This creates a Maven project named "mydtapp". Open it with your favorite
-IDE (e.g. NetBeans, Eclipse, IntelliJ IDEA). In the project, there is a
-sample DAG that generates a number of tuples with a random number and
-prints out "hello world" and the random number in the tuples.  The code
-that builds the DAG is in
-src/main/java/com/example/mydtapp/Application.java, and the code that
-runs the unit test for the DAG is in
-src/test/java/com/example/mydtapp/ApplicationTest.java. Try it out by
-running the following command:
-
-    $cd mydtapp; mvn package
-
-This builds the App Package runs the unit test of the DAG.  You should
-be getting test output similar to this:
-
-```
- -------------------------------------------------------
-  TESTS
- -------------------------------------------------------
-
- Running com.example.mydtapp.ApplicationTest
- hello world: 0.8015370953286478
- hello world: 0.9785359225545481
- hello world: 0.6322611586644047
- hello world: 0.8460953663451775
- hello world: 0.5719372906929072
- hello world: 0.6361174312337172
- hello world: 0.14873007534816318
- hello world: 0.8866986277418261
- hello world: 0.6346526809866057
- hello world: 0.48587295703904465
- hello world: 0.6436832429676687
-
- ...
-
- Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.863
- sec
-
- Results :
-
- Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
-```
-
-The "mvn package" command creates the App Package file in target
-directory as target/mydtapp-1.0-SNAPSHOT.apa. You will be able to use
-that App Package file to launch this sample application in your actual
-Apex installation.
-
-## Using IDE
-
-Alternatively, you can do the above steps all within your IDE.  For
-example, in NetBeans, select File -\> New Project.  Then choose “Maven”
-and “Project from Archetype” in the dialog box, as shown.
-
-![](images/AppPackage/ApplicationPackages.html-image00.png)
-
-Then fill the Group ID, Artifact ID, Version and Repository entries as shown below.
-
-![](images/AppPackage/ApplicationPackages.html-image02.png)
-
-Group ID: org.apache.apex
-Artifact ID: apex-app-archetype
-Version: 3.2.0-incubating (or any later version)
-
-Press Next and fill out the rest of the required information. For
-example:
-
-![](images/AppPackage/ApplicationPackages.html-image01.png)
-
-Click Finish, and now you have created your own Apache Apex App Package
-project, with a default unit test.  You can run the unit test, make code
-changes or make dependency changes within your IDE.  The procedure for
-other IDEs, like Eclipse or IntelliJ, is similar.
-
-# Writing Your Own App Package
-
-
-Please refer to the [Creating Apps](create.md) on the basics on how to write an Apache Apex application.  In your AppPackage project, you can add custom operators (refer to [Operator Development Guide](https://www.datatorrent.com/docs/guides/OperatorDeveloperGuide.html)), project dependencies, default and required configuration properties, pre-set configurations and other metadata.
-
-## Adding (and removing) project dependencies
-
-Under the project, you can add project dependencies in pom.xml, or do it
-through your IDE.  Here’s the section that describes the dependencies in
-the default pom.xml:
-```
-  <dependencies>
-    <!-- add your dependencies here -->
-    <dependency>
-      <groupId>org.apache.apex</groupId>
-      <artifactId>malhar-library</artifactId>
-      <version>${apex.version}</version>
-      <!--
-           If you know your application do not need the transitive dependencies that are pulled in by malhar-library,
-           Uncomment the following to reduce the size of your app package.
-      -->
-      <!--
-      <exclusions>
-        <exclusion>
-          <groupId>*</groupId>
-          <artifactId>*</artifactId>
-        </exclusion>
-      </exclusions>
-      -->
-    </dependency>
-    <dependency>
-      <groupId>org.apache.apex</groupId>
-      <artifactId>apex-engine</artifactId>
-      <version>${apex.version}</version>
-      <scope>provided</scope>
-    </dependency>
-    <dependency>
-      <groupId>junit</groupId>
-      <artifactId>junit</artifactId>
-      <version>4.10</version>
-      <scope>test</scope>
-    </dependency>
-  </dependencies>
-```
-
-By default, as shown above, the default dependencies include
-malhar-library in compile scope, dt-engine in provided scope, and junit
-in test scope.  Do not remove these three dependencies since they are
-necessary for any Apex application.  You can, however, exclude
-transitive dependencies from malhar-library to reduce the size of your
-App Package, provided that none of the operators in malhar-library that
-need the transitive dependencies will be used in your application.
-
-In the sample application, it is safe to remove the transitive
-dependencies from malhar-library, by uncommenting the "exclusions"
-section.  It will reduce the size of the sample App Package from 8MB to
-700KB.
-
-Note that if we exclude \*, in some versions of Maven, you may get
-warnings similar to the following:
-
-```
-
- [WARNING] 'dependencies.dependency.exclusions.exclusion.groupId' for
- org.apache.apex:malhar-library:jar with value '*' does not match a
- valid id pattern.
-
- [WARNING]
- [WARNING] It is highly recommended to fix these problems because they
- threaten the stability of your build.
- [WARNING]
- [WARNING] For this reason, future Maven versions might no longer support
- building such malformed projects.
- [WARNING]
-
-```
-This is a bug in early versions of Maven 3.  The dependency exclusion is
-still valid and it is safe to ignore these warnings.
-
-## Application Configuration
-
-A configuration file can be used to configure an application.  Different
-kinds of configuration parameters can be specified. They are application
-attributes, operator attributes and properties, port attributes, stream
-properties and application specific properties. They are all specified
-as name value pairs, in XML format, like the following.
-
-```
-<?xml version="1.0"?>
-<configuration>
-  <property>
-    <name>some_name_1</name>
-    <value>some_default_value</value>
-  </property>
-  <property>
-    <name>some_name_2</name>
-    <value>some_default_value</value>
-  </property>
-</configuration>
-```
-
-## Application attributes
-
-Application attributes are used to specify the platform behavior for the
-application. They can be specified using the parameter
-```dt.attr.<attribute>```. The prefix “dt” is a constant, “attr” is a
-constant denoting an attribute is being specified and ```<attribute>```
-specifies the name of the attribute. Below is an example snippet setting
-the streaming windows size of the application to be 1000 milliseconds.
-
-```
-  <property>
-     <name>dt.attr.STREAMING_WINDOW_SIZE_MILLIS</name>
-     <value>1000</value>
-  </property>
-```
-
-The name tag specifies the attribute and value tag specifies the
-attribute value. The name of the attribute is a JAVA constant name
-identifying the attribute. The constants are defined in
-com.datatorrent.api.Context.DAGContext and the different attributes can
-be specified in the format described above.
-
-## Operator attributes
-
-Operator attributes are used to specify the platform behavior for the
-operator. They can be specified using the parameter
-```dt.operator.<operator-name>.attr.<attribute>```. The prefix “dt” is a
-constant, “operator” is a constant denoting that an operator is being
-specified, ```<operator-name>``` denotes the name of the operator, “attr” is
-the constant denoting that an attribute is being specified and
-```<attribute>``` is the name of the attribute. The operator name is the
-same name that is specified when the operator is added to the DAG using
-the addOperator method. An example illustrating the specification is
-shown below. It specifies the number of streaming windows for one
-application window of an operator named “input” to be 10
-
-```
-<property>
-  <name>dt.operator.input.attr.APPLICATION_WINDOW_COUNT</name>
-  <value>10</value>
-</property>
-```
-
-The name tag specifies the attribute and value tag specifies the
-attribute value. The name of the attribute is a JAVA constant name
-identifying the attribute. The constants are defined in
-com.datatorrent.api.Context.OperatorContext and the different attributes
-can be specified in the format described above.
-
-## Operator properties
-
-Operators can be configured using operator specific properties. The
-properties can be specified using the parameter
-```dt.operator.<operator-name>.prop.<property-name>```. The difference
-between this and the operator attribute specification described above is
-that the keyword “prop” is used to denote that it is a property and
-```<property-name>``` specifies the property name.  An example illustrating
-this is specified below. It specifies the property “hostname” of the
-redis server for a “redis” output operator.
-
-```
-  <property>
-    <name>dt.operator.redis.prop.host</name>
-    <value>127.0.0.1</value>
-  </property>
-```
-
-The name tag specifies the property and the value specifies the property
-value. The property name is converted to a setter method which is called
-on the actual operator. The method name is composed by appending the
-word “set” and the property name with the first character of the name
-capitalized. In the above example the setter method would become
-setHost. The method is called using JAVA reflection and the property
-value is passed as an argument. In the above example the method setHost
-will be called on the “redis” operator with “127.0.0.1” as the argument.
-
-## Port attributes
-Port attributes are used to specify the platform behavior for input and
-output ports. They can be specified using the parameter ```dt.operator.<operator-name>.inputport.<port-name>.attr.<attribute>```
-for input port and ```dt.operator.<operator-name>.outputport.<port-name>.attr.<attribute>```
-for output port. The keyword “inputport” is used to denote an input port
-and “outputport” to denote an output port. The rest of the specification
-follows the conventions described in other specifications above. An
-example illustrating this is specified below. It specifies the queue
-capacity for an input port named “input” of an operator named “range” to
-be 4k.
-
-```
-<property>
-  <name>dt.operator.range.inputport.input.attr.QUEUE_CAPACITY</name>
-  <value>4000</value>
-</property>
-```
-
-The name tag specifies the attribute and value tag specifies the
-attribute value. The name of the attribute is a JAVA constant name
-identifying the attribute. The constants are defined in
-com.datatorrent.api.Context.PortContext and the different attributes can
-be specified in the format described above.
-
-The attributes for an output port can also be specified in a similar way
-as described above with a change that keyword “outputport” is used
-instead of “intputport”. A generic keyword “port” can be used to specify
-either an input or an output port. It is useful in the wildcard
-specification described below.
-
-## Stream properties
-
-Streams can be configured using stream properties. The properties can be
-specified using the parameter
-```dt.stream.<stream-name>.prop.<property-name>```  The constant “stream”
-specifies that it is a stream, ```<stream-name>``` specifies the name of the
-stream and ```<property-name>``` the name of the property. The name of the
-stream is the same name that is passed when the stream is added to the
-DAG using the addStream method. An example illustrating the
-specification is shown below. It sets the locality of the stream named
-“stream1” to container local indicating that the operators the stream is
-connecting be run in the same container.
-
-```
-  <property>
-    <name>dt.stream.stream1.prop.locality</name>
-    <value>CONTAINER_LOCAL</value>
-  </property>
-```
-
-The property name is converted into a set method on the stream in the
-same way as described in operator properties section above. In this case
-the method would be setLocality and it will be called in the stream
-“stream1” with the value as the argument.
-
-Along with the above system defined parameters, the applications can
-define their own specific parameters they can be specified in the
-configuration file. The only condition is that the names of these
-parameters don’t conflict with the system defined parameters or similar
-application parameters defined by other applications. To this end, it is
-recommended that the application parameters have the format
-```<full-application-class-name>.<param-name>.``` The
-full-application-class-name is the full JAVA class name of the
-application including the package path and param-name is the name of the
-parameter within the application. The application will still have to
-still read the parameter in using the configuration API of the
-configuration object that is passed in populateDAG.
-
-##  Wildcards
-
-Wildcards and regular expressions can be used in place of names to
-specify a group for applications, operators, ports or streams. For
-example, to specify an attribute for all ports of an operator it can be
-done as follows
-```
-<property>
-  <name>dt.operator.range.port.*.attr.QUEUE_CAPACITY</name>
-  <value>4000</value>
-</property>
-```
-
-The wildcard “\*” was used instead of the name of the port. Wildcard can
-also be used for operator name, stream name or application name. Regular
-expressions can also be used for names to specify attributes or
-properties for a specific set.
-
-## Adding configuration properties
-
-It is common for applications to require configuration parameters to
-run.  For example, the address and port of the database, the location of
-a file for ingestion, etc.  You can specify them in
-src/main/resources/META-INF/properties.xml under the App Package
-project. The properties.xml may look like:
-
-```
-<?xml version="1.0"?>
-<configuration>
-  <property>
-    <name>some_name_1</name>
-  </property>
-  <property>
-    <name>some_name_2</name>
-    <value>some_default_value</value>
-  </property>
-</configuration>
-```
-
-The name of an application-specific property takes the form of:
-
-```dt.operator.{opName}.prop.{propName} ```
-
-The first represents the property with name propName of operator opName.
- Or you can set the application name at run time by setting this
-property:
-
-        dt.attr.APPLICATION_NAME
-
-There are also other properties that can be set.  For details on
-properties, refer to the [Operation and Installation Guide](https://www.datatorrent.com/docs/guides/OperationandInstallationGuide.html).
-
-In this example, property some_name_1 is a required property which
-must be set at launch time, or it must be set by a pre-set configuration
-(see next section).  Property some\_name\_2 is a property that is
-assigned with value some\_default\_value unless it is overridden at
-launch time.
-
-## Adding pre-set configurations
-
-
-At build time, you can add pre-set configurations to the App Package by
-adding configuration XML files under ```src/site/conf/<conf>.xml```in your
-project.  You can then specify which configuration to use at launch
-time.  The configuration XML is of the same format of the properties.xml
-file.
-
-## Application-specific properties file
-
-You can also specify properties.xml per application in the application
-package.  Just create a file with the name properties-{appName}.xml and
-it will be picked up when you launch the application with the specified
-name within the application package.  In short:
-
-  properties.xml: Properties that are global to the Configuration
-Package
-
-  properties-{appName}.xml: Properties that are specific when launching
-an application with the specified appName.
-
-## Properties source precedence
-
-If properties with the same key appear in multiple sources (e.g. from
-app package default configuration as META-INF/properties.xml, from app
-package configuration in the conf directory, from launch time defines,
-etc), the precedence of sources, from highest to lowest, is as follows:
-
-1. Launch time defines (using -D option in CLI, or the POST payload
-    with the Gateway REST API’s launch call)
-2. Launch time specified configuration file in file system (using -conf
-    option in CLI)
-3. Launch time specified package configuration (using -apconf option in
-    CLI or the conf={confname} with Gateway REST API’s launch call)
-4. Configuration from \$HOME/.dt/dt-site.xml
-5. Application defaults within the package as
-    META-INF/properties-{appname}.xml
-6. Package defaults as META-INF/properties.xml
-7. dt-site.xml in local DT installation
-8. dt-site.xml stored in HDFS
-
-## Other meta-data
-
-In a Apex App Package project, the pom.xml file contains a
-section that looks like:
-
-```
-<properties>
-  <apex.version>3.2.0-incubating</apex.version>
-  <apex.apppackage.classpath\>lib*.jar</apex.apppackage.classpath>
-</properties>
-```
-apex.version is the Apache Apex version that are to be used
-with this Application Package.
-
-apex.apppackage.classpath is the classpath that is used when
-launching the application in the Application Package.  The default is
-lib/\*.jar, where lib is where all the dependency jars are kept within
-the Application Package.  One reason to change this field is when your
-Application Package needs the classpath in a specific order.
-
-## Logging configuration
-
-Just like other Java projects, you can change the logging configuration
-by having your log4j.properties under src/main/resources.  For example,
-if you have the following in src/main/resources/log4j.properties:
-```
- log4j.rootLogger=WARN,CONSOLE
- log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
- log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
- log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} [%t] %-5p
- %c{2} %M - %m%n
-```
-
-The root logger’s level is set to WARN and the output is set to the console (stdout).
-
-Note that by default from project created from the maven archetype,
-there is already a log4j.properties file under src/test/resources and
-that file is only used for the unit test.
-
-# Zip Structure of Application Package
-
-
-Apache Apex Application Package files are zip files.  You can examine the content of any Application Package by using unzip -t on your Linux command line.
-
-There are four top level directories in an Application Package:
-
-1. "app" contains the jar files of the DAG code and any custom operators.
-2. "lib" contains all dependency jars
-3. "conf" contains all the pre-set configuration XML files.
-4. "META-INF" contains the MANIFEST.MF file and the properties.xml file.
-5. “resources” contains other files that are to be served by the Gateway on behalf of the app package.
-
-
-# Managing Application Packages Through DT Gateway
-
-The DT Gateway provides storing and retrieving Application Packages to
-and from your distributed file system, e.g. HDFS.
-
-## Storing an Application Package
-
-You can store your Application Packages through DT Gateway using this
-REST call:
-
-```
- POST /ws/v2/appPackages
-```
-
-The payload is the raw content of your Application Package.  For
-example, you can issue this request using curl on your Linux command
-line like this, assuming your DT Gateway is accepting requests at
-localhost:9090:
-
-```
-$ curl -XPOST -T <app-package-file> http://localhost:9090/ws/v2/appPackages
-```
-
-## Getting Meta Information on Application Packages
-
-
-You can get the meta information on Application Packages stored through
-DT Gateway using this call.  The information includes the logical plan
-of each application within the Application Package.
-
-```
- GET /ws/v2/appPackages/{owner}/{pkgName}/{pkgVersion}
-```
-
-## Getting Available Operators In Application Package
-
-You can get the list of available operators in the Application Package
-using this call.
-
-```
-GET /ws/v2/appPackages/{owner}/{pkgName}/{pkgVersion}/operators?parent={parent}
-```
-
-The parent parameter is optional.  If given, parent should be the fully
-qualified class name.  It will only return operators that derive from
-that class or interface. For example, if parent is
-com.datatorrent.api.InputOperator, this call will only return input
-operators provided by the Application Package.
-
-## Getting Properties of Operators in Application Package
-
-You can get the list of properties of any operator in the Application
-Package using this call.
-
-```
-GET  /ws/v2/appPackages/{owner}/{pkgName}/{pkgVersion}/operators/{className}
-```
-
-## Getting List of Pre-Set Configurations in Application Package
-
-You can get a list of pre-set configurations within the Application
-Package using this call.
-
-```
-GET /ws/v2/appPackages/{owner}/{pkgName}/{packageVersion}/configs
-```
-
-You can also get the content of a specific pre-set configuration within
-the Application Package.
-
-```
- GET /ws/v2/appPackages/{owner}/{pkgName}/{pkgVersion}/configs/{configName}
-```
-
-## Changing Pre-Set Configurations in Application Package
-
-You can create or replace pre-set configurations within the Application
-Package
-```
- PUT   /ws/v2/appPackages/{owner}/{pkgName}/{pkgVersion}/configs/{configName}
-```
-The payload of this PUT call is the XML file that represents the pre-set configuration.  The Content-Type of the payload is "application/xml" and you can delete a pre-set configuration within the Application Package.
-```
- DELETE /ws/v2/appPackages/{owner}/{pkgName}/{pkgVersion}/configs/{configName}
-```
-
-## Retrieving an Application Package
-
-You can download the Application Package file.  This Application Package
-is not necessarily the same file as the one that was originally uploaded
-since the pre-set configurations may have been modified.
-
-```
- GET /ws/v2/appPackages/{owner}/{pkgName}/{pkgVersion}/download
-```
-
-## Launching an Application Package
-
-You can launch an application within an Application Package.
-```
-POST /ws/v2/appPackages/{owner}/{pkgName}/{pkgVersion}/applications/{appName}/launch?config={configName}
-```
-
-The config parameter is optional.  If given, it must be one of the
-pre-set configuration within the given Application Package.  The
-Content-Type of the payload of the POST request is "application/json"
-and should contain the properties to be launched with the application.
- It is of the form:
-
-```
- {"property-name":"property-value", ... }
-```
-
-Here is an example of launching an application through curl:
-
-```
- $ curl -XPOST -d'{"dt.operator.console.prop.stringFormat":"xyz %s"}'
- http://localhost:9090/ws/v2/appPackages/dtadmin/mydtapp/1.0-SNAPSHOT/app
- lications/MyFirstApplication/launch
-```
-
-Please refer to the [Gateway API reference](https://www.google.com/url?q=https://www.datatorrent.com/docs/guides/DTGatewayAPISpecification.html&sa=D&usg=AFQjCNEWfN7-e7fd6MoWZjmJUE3GW7UwdQ) for the complete specification of the REST API.
-
-# Examining and Launching Application Packages Through Apex CLI
-
-If you are working with Application Packages in the local filesystem and
-do not want to deal with dtGateway, you can use the Apex Command Line Interface (dtcli).  Please refer to the [Gateway API](dtgateway_api.md)
-to see samples for these commands.
-
-## Getting Application Package Meta Information
-
-You can get the meta information about the Application Package using
-this Apex CLI command.
-
-```
- dt> get-app-package-info <app-package-file>
-```
-
-## Getting Available Operators In Application Package
-
-You can get the list of available operators in the Application Package
-using this command.
-
-```
- dt> get-app-package-operators <app-package-file> <package-prefix>
- [parent-class]
-```
-
-## Getting Properties of Operators in Application Package
-
-You can get the list of properties of any operator in the Application
-Package using this command.
-
- dt> get-app-package-operator-properties <app-package-file> <operator-class>
-
-
-## Launching an Application Package
-
-You can launch an application within an Application Package.
-```
-dt> launch [-D property-name=property-value, ...] [-conf config-name]
- [-apconf config-file-within-app-package] <app-package-file>
- [matching-app-name]
-```
-Note that -conf expects a configuration file in the file system, while -apconf expects a configuration file within the app package.

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/autometrics.md
----------------------------------------------------------------------
diff --git a/autometrics.md b/autometrics.md
deleted file mode 100644
index f6000e8..0000000
--- a/autometrics.md
+++ /dev/null
@@ -1,311 +0,0 @@
-Apache Apex AutoMetrics
-=======================
-
-# Introduction
-Metrics collect various statistical information about a process which can be very useful for diagnosis. Auto Metrics in Apex can help monitor operators in a running application.  The goal of *AutoMetric* API is to enable operator developer to define relevant metrics for an operator in a simple way which the platform collects and reports automatically.
-
-# Specifying AutoMetrics in an Operator
-An *AutoMetric* can be any object. It can be of a primitive type - int, long, etc. or a complex one. A field or a `get` method in an operator can be annotated with `@AutoMetric` to specify that its value is a metric. After every application end window, the platform collects the values of these fields/methods in a map and sends it to application master.
-
-```java
-public class LineReceiver extends BaseOperator
-{
- @AutoMetric
- long length;
-
- @AutoMetric
- long count;
-
- public final transient DefaultInputPort<String> input = new DefaultInputPort<String>()
- {
-   @Override
-   public void process(String s)
-   {
-     length += s.length();
-     count++;
-   }
- };
-
- @Override
- public void beginWindow(long windowId)
- {
-   length = 0;
-   count = 0;
- }
-}
-```
-
-There are 2 auto-metrics declared in the `LineReceiver`. At the end of each application window, the platform will send a map with 2 entries - `[(length, 100), (count, 10)]` to the application master.
-
-# Aggregating AutoMetrics across Partitions
-When an operator is partitioned, it is useful to aggregate the values of auto-metrics across all its partitions every window to get a logical view of these metrics. The application master performs these aggregations using metrics aggregators.
-
-The AutoMetric API helps to achieve this by providing an interface for writing aggregators- `AutoMetric.Aggregator`. Any implementation of `AutoMetric.Aggregator` can be set as an operator attribute - `METRICS_AGGREGATOR` for a particular operator which in turn is used for aggregating physical metrics.
-
-## Default aggregators
-[`MetricsAggregator`](https://github.com/apache/incubator-apex-core/blob/devel-3/common/src/main/java/com/datatorrent/common/metric/MetricsAggregator.java) is a simple implementation of `AutoMetric.Aggregator` that platform uses as a default for summing up primitive types - int, long, float and double.
-
-`MetricsAggregator` is just a collection of `SingleMetricAggregator`s. There are multiple implementations of `SingleMetricAggregator` that perform sum, min, max, avg which are present in Apex core and Apex malhar.
-
-For the `LineReceiver` operator, the application developer need not specify any aggregator. The platform will automatically inject an instance of `MetricsAggregator` that contains two `LongSumAggregator`s - one for `length` and one for `count`. This aggregator will report sum of length and sum of count across all the partitions of `LineReceiver`.
-
-
-## Building custom aggregators
-Platform cannot perform any meaningful aggregations for non-numeric metrics. In such cases, the operator or application developer can write custom aggregators. Let’s say, if the `LineReceiver` was modified to have a complex metric as shown below.
-
-```java
-public class AnotherLineReceiver extends BaseOperator
-{
-  @AutoMetric
-  final LineMetrics lineMetrics = new LineMetrics();
-
-  public final transient DefaultInputPort<String> input = new DefaultInputPort<String>()
-  {
-    @Override
-    public void process(String s)
-    {
-      lineMetrics.length += s.length();
-      lineMetrics.count++;
-    }
-  };
-
-  @Override
-  public void beginWindow(long windowId)
-  {
-    lineMetrics.length = 0;
-    lineMetrics.count = 0;
-  }
-
-  public static class LineMetrics implements Serializable
-  {
-    long length;
-    long count;
-
-    private static final long serialVersionUID = 201511041908L;
-  }
-}
-```
-
-Below is a custom aggregator that can calculate average line length across all partitions of `AnotherLineReceiver`.
-
-```java
-public class AvgLineLengthAggregator implements AutoMetric.Aggregator
-{
-
-  Map<String, Object> result = Maps.newHashMap();
-
-  @Override
-  public Map<String, Object> aggregate(long l, Collection<AutoMetric.PhysicalMetricsContext> collection)
-  {
-    long totalLength = 0;
-    long totalCount = 0;
-    for (AutoMetric.PhysicalMetricsContext pmc : collection) {
-      AnotherLineReceiver.LineMetrics lm = (AnotherLineReceiver.LineMetrics)pmc.getMetrics().get("lineMetrics");
-      totalLength += lm.length;
-      totalCount += lm.count;
-    }
-    result.put("avgLineLength", totalLength/totalCount);
-    return result;
-  }
-}
-```
-An instance of above aggregator can be specified as the `METRIC_AGGREGATOR` for `AnotherLineReceiver` while creating the DAG as shown below.
-
-```java
-  @Override
-  public void populateDAG(DAG dag, Configuration configuration)
-  {
-    ...
-    AnotherLineReceiver lineReceiver = dag.addOperator("LineReceiver", new AnotherLineReceiver());
-    dag.setAttribute(lineReceiver, Context.OperatorContext.METRICS_AGGREGATOR, new AvgLineLengthAggregator());
-    ...
-  }
-```
-
-# Retrieving AutoMetrics
-The Gateway REST API provides a way to retrieve the latest AutoMetrics for each logical operator.  For example:
-
-```
-GET /ws/v2/applications/{appid}/logicalPlan/operators/{opName}
-{
-    ...
-    "autoMetrics": {
-       "count": "71314",
-       "length": "27780706"
-    },
-    "className": "com.datatorrent.autometric.LineReceiver",
-    ...
-}
-```
-
-# System Metrics
-System metrics are standard operator metrics provided by the system.  Examples include:
-
-- processed tuples per second
-- emitted tuples per second
-- total tuples processed
-- total tuples emitted
-- latency
-- CPU percentage
-- failure count
-- checkpoint elapsed time
-
-The Gateway REST API provides a way to retrieve the latest values for all of the above for each of the logical operators in the application.
-
-```
-GET /ws/v2/applications/{appid}/logicalPlan/operators/{opName}
-{
-    ...
-    "cpuPercentageMA": "{cpuPercentageMA}",
-    "failureCount": "{failureCount}",
-    "latencyMA": "{latencyMA}",  
-    "totalTuplesEmitted": "{totalTuplesEmitted}",
-    "totalTuplesProcessed": "{totalTuplesProcessed}",
-    "tuplesEmittedPSMA": "{tuplesEmittedPSMA}",
-    "tuplesProcessedPSMA": "{tuplesProcessedPSMA}",
-    ...
-}
-```
-
-However, just like AutoMetrics, the Gateway only provides the latest metrics.  For historical metrics, we will need the help of App Data Tracker.
-
-# App Data Tracker
-As discussed above, STRAM aggregates the AutoMetrics from physical operators (partitions) to something that makes sense in one logical operator.  It pushes the aggregated AutoMetrics values using Websocket to the Gateway at every second along with system metrics for each operator.  Gateway relays the information to an application called App Data Tracker.  It is another Apex application that runs in the background and further aggregates the incoming values by time bucket and stores the values in HDHT.  It also allows the outside to retrieve the aggregated AutoMetrics and system metrics through websocket interface.
-
-![AppDataTracker](images/autometrics/adt.png)
-
-App Data Tracker is enabled by having these properties in dt-site.xml:
-
-```xml
-<property>
-  <name>dt.appDataTracker.enable</name>
-  <value>true</value>
-</property>
-<property>
-  <name>dt.appDataTracker.transport</name>
-  <value>builtin:AppDataTrackerFeed</value>
-</property>
-<property>
-  <name>dt.attr.METRICS_TRANSPORT</name>
-  <value>builtin:AppDataTrackerFeed</value>
-</property>
-```
-
-All the applications launched after the App Data Tracker is enabled will have metrics sent to it.
-
-**Note**: The App Data Tracker will be shown running in dtManage as a “system app”.  It will show up if the “show system apps” button is pressed.
-
-By default, the time buckets App Data Tracker aggregates upon are one minute, one hour and one day.  It can be overridden by changing the operator attribute `METRICS_DIMENSIONS_SCHEME`.
-
-Also by default, the app data tracker performs all these aggregations: SUM, MIN, MAX, AVG, COUNT, FIRST, LAST on all number metrics.  You can also override by changing the same operator attribute `METRICS_DIMENSIONS_SCHEME`, provided the custom aggregator is known to the App Data Tracker.  (See next section)
-
-# Custom Aggregator in App Data Tracker
-Custom aggregators allow you to do your own custom computation on statistics generated by any of your applications. In order to implement a Custom aggregator you have to do two things:
-
-1. Combining new inputs with the current aggregation
-2. Combining two aggregations together into one aggregation
-
-Let’s consider the case where we want to perform the following rolling average:
-
-Y_n = ½ * X_n + ½ * X_n-1 + ¼ * X_n-2 + ⅛ * X_n-3 +...
-
-This aggregation could be performed by the following Custom Aggregator:
-
-```java
-@Name("IIRAVG")
-public class AggregatorIIRAVG extends AbstractIncrementalAggregator
-{
-  ...
-
-  private void aggregateHelper(DimensionsEvent dest, DimensionsEvent src)
-  {
-    double[] destVals = dest.getAggregates().getFieldsDouble();
-    double[] srcVals = src.getAggregates().getFieldsDouble();
-
-    for (int index = 0; index < destLongs.length; index++) {
-      destVals[index] = .5 * destVals[index] + .5 * srcVals[index];
-    }
-  }
-
-  @Override
-  public void aggregate(Aggregate dest, InputEvent src)
-  {
-    //Aggregate a current aggregation with a new input
-    aggregateHelper(dest, src);
-  }
-
-  @Override
-  public void aggregate(Aggregate destAgg, Aggregate srcAgg)
-  {
-    //Combine two existing aggregations together
-    aggregateHelper(destAgg, srcAgg);
-  }
-}
-```
-
-## Discovery of Custom Aggregators
-AppDataTracker searches for custom aggregator jars under the following directories statically before launching:
-
-1. {dt\_installation\_dir}/plugin/aggregators
-2. {user\_home\_dir}/.dt/plugin/aggregators
-
-It uses reflection to find all the classes that extend from `IncrementalAggregator` and `OTFAggregator` in these jars and registers them with the name provided by `@Name` annotation (or class name when `@Name` is absent).
-
-# Using `METRICS_DIMENSIONS_SCHEME`
-
-Here is a sample code snippet on how you can make use of `METRICS_DIMENSIONS_SCHEME` to set your own time buckets and your own set of aggregators for certain `AutoMetric`s performed by the App Data Tracker in your application.
-
-```java
-  @Override
-  public void populateDAG(DAG dag, Configuration configuration)
-  {
-    ...
-    LineReceiver lineReceiver = dag.addOperator("LineReceiver", new LineReceiver());
-    ...
-    AutoMetric.DimensionsScheme dimensionsScheme = new AutoMetric.DimensionsScheme()
-    {
-      String[] timeBuckets = new String[] { "1s", "1m", "1h" };
-      String[] lengthAggregators = new String[] { "IIRAVG", "SUM" };
-      String[] countAggregators = new String[] { "SUM" };
-
-      /* Setting the aggregation time bucket to be one second, one minute and one hour */
-      @Override
-      public String[] getTimeBuckets()
-      {
-        return timeBuckets;
-      }
-
-      @Override
-      public String[] getDimensionAggregationsFor(String logicalMetricName)
-      {
-        if ("length".equals(logicalMetricName)) {
-          return lengthAggregators;
-        } else if ("count".equals(logicalMetricName)) {
-          return countAggregators;
-        } else {
-          return null; // use default
-        }
-      }
-    };
-
-    dag.setAttribute(lineReceiver, OperatorContext.METRICS_DIMENSIONS_SCHEME, dimensionsScheme);
-    ...
-  }
-```
-
-
-# Dashboards
-With App Data Tracker enabled, you can visualize the AutoMetrics and system metrics in the Dashboards within dtManage.   Refer back to the diagram in the App Data Tracker section, dtGateway relays queries and query results to and from the App Data Tracker.  In this way, dtManage sends queries and receives results from the App Data Tracker via dtGateway and uses the results to let the user visualize the data.
-
-Click on the visualize button in dtManage's application page.
-
-![AppDataTracker](images/autometrics/visualize.png)
-
-You will see the dashboard for the AutoMetrics and the system metrics.
-
-![AppDataTracker](images/autometrics/dashboard.png)
-
-The left widget shows the AutoMetrics of `line` and `count` for the LineReceiver operator.  The right widget shows the system metrics.
-
-The Dashboards have some simple builtin widgets to visualize the data.  Line charts and bar charts are some examples.
-Users will be able to implement their own widgets to visualize their data.

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/configuration_packages.md
----------------------------------------------------------------------
diff --git a/configuration_packages.md b/configuration_packages.md
deleted file mode 100644
index 30f1717..0000000
--- a/configuration_packages.md
+++ /dev/null
@@ -1,242 +0,0 @@
-Apache Apex Configuration Packages
-==================================
-
-An Apache Apex Application Configuration Package is a zip file that contains
-configuration files and additional files to be launched with an
-[Application Package](application_packages.md) using 
-DTCLI or REST API.  This guide assumes the reader’s familiarity of
-Application Package.  Please read the Application Package document to
-get yourself familiar with the concept first if you have not done so.
-
-#Requirements 
-
-You will need have the following installed:
-
-1. Apache Maven 3.0 or later (for assembling the Config Package)
-2. Apex 3.0.0 or later (for launching the App Package with the Config
-    Package in your cluster)
-
-#Creating Your First Configuration Package 
-
-You can create a Configuration Package using your Linux command line, or
-using your favorite IDE.  
-
-## Using Command Line
-
-First, change to the directory where you put your projects, and create a
-DT configuration project using Maven by running the following command.
- Replace "com.example", "mydtconfig" and "1.0-SNAPSHOT" with the
-appropriate values:
-
-    $ mvn archetype:generate \
-     -DarchetypeGroupId=org.apache.apex \
-     -DarchetypeArtifactId=apex-conf-archetype -DarchetypeVersion=3.2.0-incubating \
-     -DgroupId=com.example -Dpackage=com.example.mydtconfig -DartifactId=mydtconfig \
-     -Dversion=1.0-SNAPSHOT
-
-This creates a Maven project named "mydtconfig". Open it with your
-favorite IDE (e.g. NetBeans, Eclipse, IntelliJ IDEA).  Try it out by
-running the following command:
-
-```
-$ mvn package                                                         
-```
-
-The "mvn package" command creates the Config Package file in target
-directory as target/mydtconfig.apc. You will be able to use that
-Configuration Package file to launch an Apache Apex application.
-
-## Using IDE 
-
-Alternatively, you can do the above steps all within your IDE.  For
-example, in NetBeans, select File -\> New Project.  Then choose “Maven”
-and “Project from Archetype” in the dialog box, as shown.
-
-![](images/AppConfig/ApplicationConfigurationPackages.html-image01.png)
-
-Then fill the Group ID, Artifact ID, Version and Repository entries as
-shown below.
-
-![](images/AppConfig/ApplicationConfigurationPackages.html-image02.png)
-
-Group ID: org.apache.apex
-Artifact ID: apex-conf-archetype
-Version: 3.2.0-incubating (or any later version)
-
-Press Next and fill out the rest of the required information. For
-example:
-
-![](images/AppConfig/ApplicationConfigurationPackages.html-image00.png)
-
-Click Finish, and now you have created your own Apex
-Configuration Package project.  The procedure for other IDEs, like
-Eclipse or IntelliJ, is similar.
-
-
-# Assembling your own configuration package 
-
-Inside the project created by the archetype, these are the files that
-you should know about when assembling your own configuration package:
-
-    ./pom.xml
-    ./src/main/resources/classpath
-    ./src/main/resources/files
-    ./src/main/resources/META-INF/properties.xml
-    ./src/main/resources/META-INF/properties-{appname}.xml
-
-## pom.xml 
-
-Example:
-
-```
-  <groupId>com.example</groupId>
-  <version>1.0.0</version>
-  <artifactId>mydtconf</artifactId>
-  <packaging>jar</packaging>
-  <!-- change these to the appropriate values -->
-  <name>My DataTorrent Application Configuration</name>
-  <description>My DataTorrent Application Configuration Description</description>
-  <properties>
-    <datatorrent.apppackage.name>mydtapp</datatorrent.apppackage.name>
-    <datatorrent.apppackage.minversion>1.0.0</datatorrent.apppackage.minversion>
-   <datatorrent.apppackage.maxversion>1.9999.9999</datatorrent.apppackage.maxversion>
-    <datatorrent.appconf.classpath>classpath/*</datatorrent.appconf.classpath>
-    <datatorrent.appconf.files>files/*</datatorrent.appconf.files>
-  </properties> 
-
-```
-In pom.xml, you can change the following keys to your desired values
-
-* ```<groupId>```
-* ```<version>```
-* ```<artifactId>```
-* ```<name> ```
-* ```<description>```
-
-You can also change the values of 
-
-* ```<datatorrent.apppackage.name>```
-* ```<datatorrent.apppackage.minversion>```
-* ```<datatorrent.apppackage.maxversion>```
-
-to reflect what app packages should be used with this configuration package.  Apex will use this information to check whether a
-configuration package is compatible with the application package when you issue a launch command.
-
-## ./src/main/resources/classpath 
-
-Place any file in this directory that you’d like to be copied to the
-compute machines when launching an application and included in the
-classpath of the application.  Example of such files are Java properties
-files and jar files.
-
-## ./src/main/resources/files 
-
-Place any file in this directory that you’d like to be copied to the
-compute machines when launching an application but not included in the
-classpath of the application.
-
-## Properties XML file
-
-A properties xml file consists of a set of key-value pairs.  The set of
-key-value pairs specifies the configuration options the application
-should be launched with.  
-
-Example:
-```
-<configuration>
-  <property>
-    <name>some-property-name</name>
-    <value>some-property-value</value>
-  </property>
-   ...
-</configuration>
-```
-Names of properties XML file:
-
-*  **properties.xml:** Properties that are global to the Configuration
-Package
-*  **properties-{appName}.xml:** Properties that are specific when launching
-an application with the specified appName within the Application
-Package.
-
-After you are done with the above, remember to do mvn package to
-generate a new configuration package, which will be located in the
-target directory in your project.
-
-## Zip structure of configuration package 
-Apex Application Configuration Package files are zip files.  You
-can examine the content of any Application Configuration Package by
-using unzip -t on your Linux command line.  The structure of the zip
-file is as follow:
-
-```
-META-INF
-  MANIFEST.MF
-  properties.xml
-  properties-{appname}.xml
-classpath
-  {classpath files}
-files
-  {files} 
-```
-
-
-
-#Launching with CLI 
-
-`-conf` option of the launch command in CLI supports specifying configuration package in the local filesystem.  Example:
-
-    dt\> launch DTApp-mydtapp-1.0.0.jar -conf DTConfig-mydtconfig-1.0.0.jar
-
-This command expects both the application package and the configuration package to be in the local file system.
-
-
-
-# Related REST API 
-
-### POST /ws/v2/configPackages
-
-Payload: Raw content of configuration package zip
-
-Function: Creates or replace a configuration package zip file in HDFS
-
-Curl example:
-
-    $ curl -XPOST -T DTConfig-{name}.jar http://{yourhost:port}/ws/v2/configPackages
-
-### GET /ws/v2/configPackages?appPackageName=...&appPackageVersion=... 
-
-All query parameters are optional
-
-Function: Returns the configuration packages that the user is authorized to use and that are compatible with the specified appPackageName, appPackageVersion and appName. 
-
-### GET /ws/v2/configPackages/``<user>``?appPackageName=...&appPackageVersion=... 
-
-All query parameters are optional
-
-Function: Returns the configuration packages under the specified user and that are compatible with the specified appPackageName, appPackageVersion and appName.
-
-### GET /ws/v2/configPackages/```<user>```/```<name>``` 
-
-Function: Returns the information of the specified configuration package
-
-### GET /ws/v2/configPackages/```<user>```/```<name>```/download 
-
-Function: Returns the raw config package file
-
-Curl example:
-
-```sh
-$ curl http://{yourhost:port}/ws/v2/configPackages/{user}/{name}/download \> DTConfig-xyz.jar
-$ unzip -t DTConfig-xyz.jar
-```
-
-### POST /ws/v2/appPackages/```<user>```/```<app-pkg-name>```/```<app-pkg-version>```/applications/{app-name}/launch?configPackage=```<user>```/```<confpkgname>```
-
-Function: Launches the app package with the specified configuration package stored in HDFS.
-
-Curl example:
-
-```sh
-$ curl -XPOST -d ’{}’ http://{yourhost:port}/ws/v2/appPackages/{user}/{app-pkg-name}/{app-pkg-version}/applications/{app-name}/launch?configPackage={user}/{confpkgname}
-```
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/apex.md
----------------------------------------------------------------------
diff --git a/docs/apex.md b/docs/apex.md
new file mode 100644
index 0000000..215a957
--- /dev/null
+++ b/docs/apex.md
@@ -0,0 +1,14 @@
+Apache Apex
+================================================================================
+
+Apache Apex (incubating) is the industry’s only open source, enterprise-grade unified stream and batch processing engine.  Apache Apex includes key features requested by open source developer community that are not available in current open source technologies.
+
+* Event processing guarantees
+* In-memory performance & scalability
+* Fault tolerance and state management
+* Native rolling and tumbling window support
+* Hadoop-native YARN & HDFS implementation
+
+For additional information visit [Apache Apex](http://apex.incubator.apache.org/).
+
+[![](images/apex_logo.png)](http://apex.incubator.apache.org/)

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/apex_development_setup.md
----------------------------------------------------------------------
diff --git a/docs/apex_development_setup.md b/docs/apex_development_setup.md
new file mode 100644
index 0000000..777f2f9
--- /dev/null
+++ b/docs/apex_development_setup.md
@@ -0,0 +1,151 @@
+Apache Apex Development Environment Setup
+=========================================
+
+This document discusses the steps needed for setting up a development environment for creating applications that run on the Apache Apex or the DataTorrent RTS streaming platform.
+
+
+Microsoft Windows
+------------------------------
+
+There are a few tools that will be helpful when developing Apache Apex applications, some required and some optional:
+
+1.  *git* -- A revision control system (version 1.7.1 or later). There are multiple git clients available for Windows (<http://git-scm.com/download/win> for example), so download and install a client of your choice.
+
+2.  *java JDK* (not JRE). Includes the Java Runtime Environment as well as the Java compiler and a variety of tools (version 1.7.0\_79 or later). Can be downloaded from the Oracle website.
+
+3.  *maven* -- Apache Maven is a build system for Java projects (version 3.0.5 or later). It can be downloaded from <https://maven.apache.org/download.cgi>.
+
+4.  *VirtualBox* -- Oracle VirtualBox is a virtual machine manager (version 4.3 or later) and can be downloaded from <https://www.virtualbox.org/wiki/Downloads>. It is needed to run the Data Torrent Sandbox.
+
+5.  *DataTorrent Sandbox* -- The sandbox can be downloaded from <https://www.datatorrent.com/download>. It is useful for testing simple applications since it contains Apache Hadoop and Data Torrent RTS 3.1.1 pre-installed with a time-limited Enterprise License. If you already installed the RTS Enterprise Edition (evaluation or production license) on a cluster, you can use that setup for deployment and testing instead of the sandbox.
+
+6.  (Optional) If you prefer to use an IDE (Integrated Development Environment) such as *NetBeans*, *Eclipse* or *IntelliJ*, install that as well.
+
+
+After installing these tools, make sure that the directories containing the executable files are in your PATH environment; for example, for the JDK executables like _java_ and _javac_, the directory might be something like `C:\\Program Files\\Java\\jdk1.7.0\_80\\bin`; for _git_ it might be `C:\\Program Files\\Git\\bin`; and for maven it might be `C:\\Users\\user\\Software\\apache-maven-3.3.3\\bin`. Open a console window and enter the command:
+
+    echo %PATH%
+
+to see the value of the `PATH` variable and verify that the above directories are present. If not, you can change its value clicking on the button at _Control Panel_ &#x21e8; _Advanced System Settings_ &#x21e8; _Advanced tab_ &#x21e8; _Environment Variables_.
+
+
+Now run the following commands and ensure that the output is something similar to that shown in the table below:
+
+
+<table>
+<colgroup>
+<col width="30%" />
+<col width="70%" />
+</colgroup>
+<tbody>
+<tr class="odd">
+<td align="left"><p>Command</p></td>
+<td align="left"><p>Output</p></td>
+</tr>
+<tr class="even">
+<td align="left"><p><tt>javac -version</tt></p></td>
+<td align="left"><p>javac 1.7.0_80</p></td>
+</tr>
+<tr class="odd">
+<td align="left"><p><tt>java -version</tt></p></td>
+<td align="left"><p>java version &quot;1.7.0_80&quot;</p>
+<p>Java(TM) SE Runtime Environment (build 1.7.0_80-b15)</p>
+<p>Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)</p></td>
+</tr>
+<tr class="even">
+<td align="left"><p><tt>git --version</tt></p></td>
+<td align="left"><p>git version 2.6.1.windows.1</p></td>
+</tr>
+<tr class="odd">
+<td align="left"><p><tt>mvn --version</tt></p></td>
+<td align="left"><p>Apache Maven 3.3.3 (7994120775791599e205a5524ec3e0dfe41d4a06; 2015-04-22T06:57:37-05:00)</p>
+<p>Maven home: C:\Users\ram\Software\apache-maven-3.3.3\bin\..</p>
+<p>Java version: 1.7.0_80, vendor: Oracle Corporation</p>
+<p>Java home: C:\Program Files\Java\jdk1.7.0_80\jre</p>
+<p>Default locale: en_US, platform encoding: Cp1252</p>
+<p>OS name: &quot;windows 8&quot;, version: &quot;6.2&quot;, arch: &quot;amd64&quot;, family: &quot;windows&quot;</p></td>
+</tr>
+</tbody>
+</table>
+
+
+To install the sandbox, first download it from <https://www.datatorrent.com/download> and import the downloaded file into VirtualBox. Once the import completes, you can select it and click the  Start button to start the sandbox.
+
+
+The sandbox is configured with 6GB RAM; if your development machine has 16GB or more, you can increase the sandbox RAM to 8GB or more using the VirtualBox console. This will yield better performance and support larger applications. Additionally, you can change the network adapter from **NAT** to **Bridged Adapter**; this will allow you to login to the sandbox from your host machine using an _ssh_ tool like **PuTTY** and also to transfer files to and from the host using `pscp` on Windows. Of course all such configuration must be done when when the sandbox is not running.
+
+
+You can choose to develop either directly on the sandbox or on your development machine. The advantage of the former is that most of the tools (e.g. _jdk_, _git_, _maven_) are pre-installed and also the package files created by your project are directly available to the Data Torrent tools such as  **dtManage** and **dtcli**. The disadvantage is that the sandbox is a memory-limited environment so running a memory-hungry tool like a Java IDE on it may starve other applications of memory.
+
+
+You can now use the maven archetype to create a basic Apache Apex project as follows: Put these lines in a Windows command file called, for example, `newapp.cmd` and run it:
+
+    @echo off
+    @rem Script for creating a new application
+    setlocal
+    mvn archetype:generate ^
+    -DarchetypeRepository=https://www.datatorrent.com/maven/content/repositories/releases ^
+      -DarchetypeGroupId=com.datatorrent ^
+      -DarchetypeArtifactId=apex-app-archetype ^
+      -DarchetypeVersion=3.1.1 ^
+      -DgroupId=com.example ^
+      -Dpackage=com.example.myapexapp ^
+      -DartifactId=myapexapp ^
+      -Dversion=1.0-SNAPSHOT
+    endlocal
+
+
+
+The caret (^) at the end of some lines indicates that a continuation line follows. When you run this file, the properties will be displayed and you will be prompted with `` Y: :``; just press **Enter** to complete the project generation.
+
+
+This command file also exists in the Data Torrent _examples_ repository which you can check out with:
+
+    git clone https://github.com/DataTorrent/examples
+
+You will find the script under `examples\tutorials\topnwords\scripts\newapp.cmd`.
+
+You can also, if you prefer, use an IDE to generate the project as described in Section 3 of [Application Packages](application_packages.md) but use the archetype version 3.1.1 instead of 3.0.0.
+
+
+When the run completes successfully, you should see a new directory named `myapexapp` containing a maven project for building a basic Apache Apex application. It includes 3 source files:**Application.java**,  **RandomNumberGenerator.java** and **ApplicationTest.java**. You can now build the application by stepping into the new directory and running the appropriate maven command:
+
+    cd myapexapp
+    mvn clean package -DskipTests
+
+The build should create the application package file `myapexapp\target\myapexapp-1.0-SNAPSHOT.apa`. This file can then be uploaded to the Data Torrent GUI tool on the sandbox (called **dtManage**) and launched  from there. It generates a stream of random numbers and prints them out, each prefixed by the string  `hello world: `.  If you built this package on the host, you can transfer it to the sandbox using the `pscp` tool bundled with **PuTTY** mentioned earlier.
+
+
+If you want to checkout the Apache Apex source repositories and build them, you can do so by running the script `build-apex.cmd` located in the same place in the examples repository described above. The source repositories contain more substantial demo applications and the associated source code. Alternatively, if you do not want to use the script, you can follow these simple manual steps:
+
+
+1.  Check out the source code repositories:
+
+        git clone https://github.com/apache/incubator-apex-core
+        git clone https://github.com/apache/incubator-apex-malhar
+
+2.  Switch to the appropriate release branch and build each repository:
+
+        pushd incubator-apex-core
+        git checkout release-3.1
+        mvn clean install -DskipTests
+        popd
+        pushd incubator-apex-malhar
+        git checkout release-3.1
+        mvn clean install -DskipTests
+        popd
+
+The `install` argument to the `mvn` command installs resources from each project to your local maven repository (typically `.m2/repository` under your home directory), and **not** to the system directories, so Administrator privileges are not required. The  `-DskipTests` argument skips running unit tests since they take a long time. If this is a first-time installation, it might take several minutes to complete because maven will download a number of associated plugins.
+
+After the build completes, you should see the demo application package files in the target directory under each demo subdirectory in `incubator-apex-malhar\demos\`.
+
+Linux
+------------------
+
+Most of the instructions for Linux (and other Unix-like systems) are similar to those for Windows described above, so we will just note the differences.
+
+
+The pre-requisites (such as _git_, _maven_, etc.) are the same as for Windows described above; please run the commands in the table and ensure that appropriate versions are present in your PATH environment variable (the command to display that variable is: `echo $PATH`).
+
+
+The maven archetype command is the same except that continuation lines use a backslash (``\``) instead of caret (``^``); the script for it is available in the same location and is named `newapp` (without the `.cmd` extension). The script to checkout and build the Apache Apex repositories is named `build-apex`.

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/apex_malhar.md
----------------------------------------------------------------------
diff --git a/docs/apex_malhar.md b/docs/apex_malhar.md
new file mode 100644
index 0000000..ef2e371
--- /dev/null
+++ b/docs/apex_malhar.md
@@ -0,0 +1,65 @@
+Apache Apex Malhar
+================================================================================
+
+Apache Apex Malhar is an open source operator and codec library that can be used with the Apache Apex platform to build real-time streaming applications.  As part of enabling enterprises extract value quickly, Malhar operators help get data in, analyze it in real-time and get data out of Hadoop in real-time with no paradigm limitations.  In addition to the operators, the library contains a number of demos applications, demonstrating operator features and capabilities.
+
+![MalharDiagram](images/MalharOperatorOverview.png)
+
+# Capabilities common across Malhar operators
+
+For most streaming platforms, connectors are afterthoughts and often end up being simple ‘bolt-ons’ to the platform. As a result they often cause performance issues or data loss when put through failure scenarios and scalability requirements. Malhar operators do not face these issues as they were designed to be integral parts of apex*.md RTS. Hence, they have following core streaming runtime capabilities
+
+1.  **Fault tolerance** – Apache Apex Malhar operators where applicable have fault tolerance built in. They use the checkpoint capability provided by the framework to ensure that there is no data loss under ANY failure scenario.
+2.  **Processing guarantees** – Malhar operators where applicable provide out of the box support for ALL three processing guarantees – exactly once, at-least once & at-most once WITHOUT requiring the user to write any additional code.  Some operators like MQTT operator deal with source systems that cant track processed data and hence need the operators to keep track of the data. Malhar has support for a generic operator that uses alternate storage like HDFS to facilitate this. Finally for databases that support transactions or support any sort of atomic batch operations Malhar operators can do exactly once down to the tuple level.
+3.  **Dynamic updates** – Based on changing business conditions you often have to tweak several parameters used by the operators in your streaming application without incurring any application downtime. You can also change properties of a Malhar operator at runtime without having to bring down the application.
+4.  **Ease of extensibility** – Malhar operators are based on templates that are easy to extend.
+5.  **Partitioning support** – In streaming applications the input data stream often needs to be partitioned based on the contents of the stream. Also for operators that ingest data from external systems partitioning needs to be done based on the capabilities of the external system. E.g. With the Kafka or Flume operator, the operator can automatically scale up or down based on the changes in the number of Kafka partitions or Flume channels
+
+# Operator Library Overview
+
+## Input/output connectors
+
+Below is a summary of the various sub categories of input and output operators. Input operators also have a corresponding output operator
+
+*   **File Systems** – Most streaming analytics use cases we have seen require the data to be stored in HDFS or perhaps S3 if the application is running in AWS. Also, customers often need to re-run their streaming analytical applications against historical data or consume data from upstream processes that are perhaps writing to some NFS share. Hence, it’s not just enough to be able to save data to various file systems. You also have to be able to read data from them. RTS supports input & output operators for HDFS, S3, NFS & Local Files
+*   **Flume** – NOTE: Flume operator is not yet part of Malhar
+
+Many customers have existing Flume deployments that are being used to aggregate log data from variety of sources. However Flume does not allow analytics on the log data on the fly. The Flume input/output operator enables RTS to consume data from flume and analyze it in real-time before being persisted.
+
+*   **Relational databases** – Most stream processing use cases require some reference data lookups to enrich, tag or filter streaming data. There is also a need to save results of the streaming analytical computation to a database so an operational dashboard can see them. RTS supports a JDBC operator so you can read/write data from any JDBC compliant RDBMS like Oracle, MySQL etc.
+*   **NoSQL databases** –NoSQL key-value pair databases like Cassandra & HBase are becoming a common part of streaming analytics application architectures to lookup reference data or store results. Malhar has operators for HBase, Cassandra, Accumulo (common with govt. & healthcare companies) MongoDB & CouchDB.
+*   **Messaging systems** – JMS brokers have been the workhorses of messaging infrastructure in most enterprises. Also Kafka is fast coming up in almost every customer we talk to. Malhar has operators to read/write to Kafka, any JMS implementation, ZeroMQ & RabbitMQ.
+*   **Notification systems** – Almost every streaming analytics application has some notification requirements that are tied to a business condition being triggered. Malhar supports sending notifications via SMTP & SNMP. It also has an alert escalation mechanism built in so users don’t get spammed by notifications (a common drawback in most streaming platforms)
+*   **In-memory Databases & Caching platforms** - Some streaming use cases need instantaneous access to shared state across the application. Caching platforms and in-memory databases serve this purpose really well. To support these use cases, Malhar has operators for memcached & Redis
+*   **Protocols** - Streaming use cases driven by machine-to-machine communication have one thing in common – there is no standard dominant protocol being used for communication. Malhar currently has support for MQTT. It is one of the more commonly, adopted protocols we see in the IoT space. Malhar also provides connectors that can directly talk to HTTP, RSS, Socket, WebSocket & FTP sources
+
+
+
+## Compute
+
+One of the most important promises of a streaming analytics platform like Apache Apex is the ability to do analytics in real-time. However delivering on the promise becomes really difficult when the platform does not provide out of the box operators to support variety of common compute functions as the user then has to worry about making these scalable, fault tolerant etc. Malhar takes this responsibility away from the application developer by providing a huge variety of out of the box computational operators. The application developer can thus focus on the analysis.
+
+Below is just a snapshot of the compute operators available in Malhar
+
+*   Statistics & Math - Provide various mathematical and statistical computations over application defined time windows.
+*   Filtering & pattern matching
+*   Machine learning & Algorithms
+*   Real-time model scoring is a very common use case for stream processing platforms. &nbsp;Malhar allows users to invoke their R models from streaming applications
+*   Sorting, Maps, Frequency, TopN, BottomN, Random Generator etc.
+
+
+## Query & Script invocation
+
+Many streaming use cases are legacy implementations that need to be ported over. This often requires re-use some of the existing investments and code that perhaps would be really hard to re-write. With this in mind, Malhar supports invoking external scripts and queries as part of the streaming application using operators for invoking SQL query, Shell script, Ruby, Jython, and JavaScript etc.
+
+## Parsers
+
+There are many industry vertical specific data formats that a streaming application developer might need to parse. Often there are existing parsers available for these that can be directly plugged into an Apache Apex application. For example in the Telco space, a Java based CDR parser can be directly plugged into Apache Apex operator. To further simplify development experience, Malhar also provides some operators for parsing common formats like XML (DOM & SAX), JSON (flat map converter), Apache log files, syslog, etc.
+
+## Stream manipulation
+
+Streaming data aka ‘stream’ is raw data that inevitably needs processing to clean, filter, tag, summarize etc. The goal of Malhar is to enable the application developer to focus on ‘WHAT’ needs to be done to the stream to get it in the right format and not worry about the ‘HOW’. Hence, Malhar has several operators to perform the common stream manipulation actions like – DeDupe, GroupBy, Join, Distinct/Unique, Limit, OrderBy, Split, Sample, Inner join, Outer join, Select, Update etc.
+
+## Social Media
+
+Malhar includes an operator to connect to the popular Twitter stream fire hose.