You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@apex.apache.org by th...@apache.org on 2016/09/29 03:29:12 UTC

apex-site git commit: Update presentation links and roadmap.

Repository: apex-site
Updated Branches:
  refs/heads/master 78a443ee0 -> 2a37a06fe


Update presentation links and roadmap.


Project: http://git-wip-us.apache.org/repos/asf/apex-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/apex-site/commit/2a37a06f
Tree: http://git-wip-us.apache.org/repos/asf/apex-site/tree/2a37a06f
Diff: http://git-wip-us.apache.org/repos/asf/apex-site/diff/2a37a06f

Branch: refs/heads/master
Commit: 2a37a06fe554438c6bd9535d724e0ef6cb990049
Parents: 78a443e
Author: Thomas Weise <th...@apache.org>
Authored: Wed Sep 28 20:27:28 2016 -0700
Committer: Thomas Weise <th...@apache.org>
Committed: Wed Sep 28 20:27:28 2016 -0700

----------------------------------------------------------------------
 roadmap.json   | 209 +++++++++++++++++++++++++++++-----------------------
 src/md/docs.md |   8 +-
 2 files changed, 120 insertions(+), 97 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/apex-site/blob/2a37a06f/roadmap.json
----------------------------------------------------------------------
diff --git a/roadmap.json b/roadmap.json
index f517307..3a6bcb2 100644
--- a/roadmap.json
+++ b/roadmap.json
@@ -284,12 +284,12 @@
       },
       {
         "expand": "operations,editmeta,changelog,transitions,renderedFields",
-        "id": "12955374",
-        "self": "https://issues.apache.org/jira/rest/api/2/issue/12955374",
-        "key": "APEXCORE-414",
+        "id": "12955475",
+        "self": "https://issues.apache.org/jira/rest/api/2/issue/12955475",
+        "key": "APEXCORE-418",
         "fields": {
-          "summary": "Native support for event-time windowing",
-          "description": "Apex core has streaming windows that establish a boundary based on arrival time of events. Many applications require boundaries based on the time of events, which could be a field in the tuple. Some of the operators support this today (time bucketing), but it would be good to provide more generic support for this in the engine itself. ",
+          "summary": "Support for Mesos",
+          "description": "Today Apex has two modes of execution: Embedded mode (everything running in a single JVM) and YARN. There has been a few questions around native support for Mesos. A cursory look suggests that Mesos support can be added by reimplementing the YARN specific portions in the master (AppMasterService, ContainerLauncher) and limited changes to the streaming container driver.\r\n\r\nMesos has a different model of resource allocation: The master offers resources to the framework while in YARN resources are requested. Apex master needs to implement the \"framework scheduler\" that is responsible to accept the resources and control the tasks.\r\n\r\nhttp://mesos.apache.org/documentation/latest/app-framework-development-guide/\r\n\r\nTasks are launched through executors, command line and docker executors are provided.  \r\n\r\nApex also requires support to deploy the dependencies to the nodes on which the streaming containers are launched. YARN supports that through r
 esource localization. Mesos supports this through the fetcher, which can copy the resources to the slave node.\r\n\r\nhttp://mesos.apache.org/documentation/latest/fetcher/\r\n",
           "fixVersions": [],
           "priority": {
             "self": "https://issues.apache.org/jira/rest/api/2/priority/3",
@@ -315,12 +315,43 @@
       },
       {
         "expand": "operations,editmeta,changelog,transitions,renderedFields",
-        "id": "12955475",
-        "self": "https://issues.apache.org/jira/rest/api/2/issue/12955475",
-        "key": "APEXCORE-418",
+        "id": "12992322",
+        "self": "https://issues.apache.org/jira/rest/api/2/issue/12992322",
+        "key": "APEXCORE-498",
         "fields": {
-          "summary": "Support for Mesos",
-          "description": "Today Apex has two modes of execution: Embedded mode (everything running in a single JVM) and YARN. There has been a few questions around native support for Mesos. A cursory look suggests that Mesos support can be added by reimplementing the YARN specific portions in the master (AppMasterService, ContainerLauncher) and limited changes to the streaming container driver.\r\n\r\nMesos has a different model of resource allocation: The master offers resources to the framework while in YARN resources are requested. Apex master needs to implement the \"framework scheduler\" that is responsible to accept the resources and control the tasks.\r\n\r\nhttp://mesos.apache.org/documentation/latest/app-framework-development-guide/\r\n\r\nTasks are launched through executors, command line and docker executors are provided.  \r\n\r\nApex also requires support to deploy the dependencies to the nodes on which the streaming containers are launched. YARN supports that through r
 esource localization. Mesos supports this through the fetcher, which can copy the resources to the slave node.\r\n\r\nhttp://mesos.apache.org/documentation/latest/fetcher/\r\n",
+          "summary": "Named Checkpoints - Checkpoint the DAG with a name/tag and start the app from that point",
+          "description": "Named Checkpoints \r\n\r\n1. Ability to tag/name the checkpoints\r\n2. On demand - checkpoint the DAG\r\n3. Start the app from the named checkpoints\r\n\r\nAll checkpoints that happened before the committed window is deleted but the named checkpoints won't be deleted.",
+          "fixVersions": [],
+          "priority": {
+            "self": "https://issues.apache.org/jira/rest/api/2/priority/3",
+            "iconUrl": "https://issues.apache.org/jira/images/icons/priorities/major.png",
+            "name": "Major",
+            "id": "3"
+          },
+          "status": {
+            "self": "https://issues.apache.org/jira/rest/api/2/status/1",
+            "description": "The issue is open and ready for the assignee to start work on it.",
+            "iconUrl": "https://issues.apache.org/jira/images/icons/statuses/open.png",
+            "name": "Open",
+            "id": "1",
+            "statusCategory": {
+              "self": "https://issues.apache.org/jira/rest/api/2/statuscategory/2",
+              "id": 2,
+              "key": "new",
+              "colorName": "blue-gray",
+              "name": "New"
+            }
+          }
+        }
+      },
+      {
+        "expand": "operations,editmeta,changelog,transitions,renderedFields",
+        "id": "13005178",
+        "self": "https://issues.apache.org/jira/rest/api/2/issue/13005178",
+        "key": "APEXCORE-536",
+        "fields": {
+          "summary": "Upgrade Hadoop dependency",
+          "description": "Currently Apex depends on Hadoop 2.2 and runs on all later 2.x version. Hadoop 2.2 is quite old, most Apex users have more recent Hadoop installs. Latest distro releases are based on 2.6 and 2.7. There are several important features that were added in Hadoop since 2.2 that Apex should be able to leverage.",
           "fixVersions": [],
           "priority": {
             "self": "https://issues.apache.org/jira/rest/api/2/priority/3",
@@ -402,37 +433,6 @@
     "jiras": [
       {
         "expand": "operations,editmeta,changelog,transitions,renderedFields",
-        "id": "12926249",
-        "self": "https://issues.apache.org/jira/rest/api/2/issue/12926249",
-        "key": "APEXMALHAR-1720",
-        "fields": {
-          "summary": "Development of Inner Join Operator",
-          "description": null,
-          "fixVersions": [],
-          "priority": {
-            "self": "https://issues.apache.org/jira/rest/api/2/priority/3",
-            "iconUrl": "https://issues.apache.org/jira/images/icons/priorities/major.png",
-            "name": "Major",
-            "id": "3"
-          },
-          "status": {
-            "self": "https://issues.apache.org/jira/rest/api/2/status/3",
-            "description": "This issue is being actively worked on at the moment by the assignee.",
-            "iconUrl": "https://issues.apache.org/jira/images/icons/statuses/inprogress.png",
-            "name": "In Progress",
-            "id": "3",
-            "statusCategory": {
-              "self": "https://issues.apache.org/jira/rest/api/2/statuscategory/4",
-              "id": 4,
-              "key": "indeterminate",
-              "colorName": "yellow",
-              "name": "In Progress"
-            }
-          }
-        }
-      },
-      {
-        "expand": "operations,editmeta,changelog,transitions,renderedFields",
         "id": "12926159",
         "self": "https://issues.apache.org/jira/rest/api/2/issue/12926159",
         "key": "APEXMALHAR-1811",
@@ -478,17 +478,17 @@
             "id": "3"
           },
           "status": {
-            "self": "https://issues.apache.org/jira/rest/api/2/status/1",
-            "description": "The issue is open and ready for the assignee to start work on it.",
-            "iconUrl": "https://issues.apache.org/jira/images/icons/statuses/open.png",
-            "name": "Open",
-            "id": "1",
+            "self": "https://issues.apache.org/jira/rest/api/2/status/3",
+            "description": "This issue is being actively worked on at the moment by the assignee.",
+            "iconUrl": "https://issues.apache.org/jira/images/icons/statuses/inprogress.png",
+            "name": "In Progress",
+            "id": "3",
             "statusCategory": {
-              "self": "https://issues.apache.org/jira/rest/api/2/statuscategory/2",
-              "id": 2,
-              "key": "new",
-              "colorName": "blue-gray",
-              "name": "New"
+              "self": "https://issues.apache.org/jira/rest/api/2/statuscategory/4",
+              "id": 4,
+              "key": "indeterminate",
+              "colorName": "yellow",
+              "name": "In Progress"
             }
           }
         }
@@ -588,17 +588,48 @@
       },
       {
         "expand": "operations,editmeta,changelog,transitions,renderedFields",
-        "id": "12953482",
-        "self": "https://issues.apache.org/jira/rest/api/2/issue/12953482",
-        "key": "APEXMALHAR-2026",
+        "id": "12969033",
+        "self": "https://issues.apache.org/jira/rest/api/2/issue/12969033",
+        "key": "APEXMALHAR-2089",
+        "fields": {
+          "summary": "Apache Beam support",
+          "description": "Apex should provide a runner for Beam. This ticket is a proxy for BEAM-261 as the implementation should probably live in the Beam repository.\r\n",
+          "fixVersions": [],
+          "priority": {
+            "self": "https://issues.apache.org/jira/rest/api/2/priority/3",
+            "iconUrl": "https://issues.apache.org/jira/images/icons/priorities/major.png",
+            "name": "Major",
+            "id": "3"
+          },
+          "status": {
+            "self": "https://issues.apache.org/jira/rest/api/2/status/1",
+            "description": "The issue is open and ready for the assignee to start work on it.",
+            "iconUrl": "https://issues.apache.org/jira/images/icons/statuses/open.png",
+            "name": "Open",
+            "id": "1",
+            "statusCategory": {
+              "self": "https://issues.apache.org/jira/rest/api/2/statuscategory/2",
+              "id": 2,
+              "key": "new",
+              "colorName": "blue-gray",
+              "name": "New"
+            }
+          }
+        }
+      },
+      {
+        "expand": "operations,editmeta,changelog,transitions,renderedFields",
+        "id": "12985430",
+        "self": "https://issues.apache.org/jira/rest/api/2/issue/12985430",
+        "key": "APEXMALHAR-2130",
         "fields": {
-          "summary": "Spill-able Datastructures",
-          "description": "Add libraryies for spooling datastructures to a key value store. There are several customer use cases which require spooled data structures.\r\n\r\n1 - Some operators like AbstractFileInputOperator have ever growing state. This is an issue because eventually the state of the operator will grow larger than the memory allocated to the operator, which will cause the operator to perpetually fail. However if the operator's datastructures are spooled then the operator will never run out of memory.\r\n\r\n2 - Some users have requested for the ability to maintain a map as well as a list of keys over which to iterate. Most key value stores don't provide this functionality. However, with spooled datastructures this functionality can be provided by maintaining a spooled map and an iterable set of keys.\r\n\r\n3 - Some users have requested building graph databases within APEX. This would require implementing a spooled graph data structure.\r\n\r\n4 - Another use case f
 or spooled data structures is database operators. Database operators need to write data to a data base, but sometimes the database is down. In this case most of the database operators repeatedly fail until the database comes back up. In order to avoid constant failures the database operator need to writes data to a queue when the data base is down, then when the database is up the operator need to take data from the queue and write it to the database. In the case of a database failure this queue will grow larger than the total amount of memory available to the operator, so the queue should be spooled in order to prevent the operator from failing.\r\n\r\n5 - Any operator which needs to maintain a large data structure in memory currently needs to have that data serialized and written out to HDFS with every checkpoint. This is costly when the data structure is large. If the data structure is spooled, then only the changes to the data structure are written out to HDFS instead of the ent
 ire data structure.\r\n\r\n6 - Also building an Apex Native database for aggregations requires indices. These indices need to take the form of spooled data structures.\r\n\r\n7 - In the future any operator which needs to maintain a data structure larger than the memory available to it will need to spool the data structure.",
+          "summary": "Scalable windowed storage",
+          "description": "This feature is used for supporting windowing.\r\n\r\nThe storage needs to have the following features:\r\n1. Spillable key value storage (integrate with APEXMALHAR-2026)\r\n2. Upon checkpoint, it saves a snapshot for the entire data set with the checkpointing window id.  This should be done incrementally (ManagedState) to avoid wasting space with unchanged data\r\n3. When recovering, it takes the recovery window id and restores to that snapshot\r\n4. When a window is committed, all windows with a lower ID should be purged from the store.\r\n5. It should implement the WindowedStorage and WindowedKeyedStorage interfaces, and because of 2 and 3, we may want to add methods to the WindowedStorage interface so that the implementation of WindowedOperator can notify the storage of checkpointing, recovering and committing of a window.\r\n",
           "fixVersions": [
             {
-              "self": "https://issues.apache.org/jira/rest/api/2/version/12335815",
-              "id": "12335815",
-              "name": "3.5.0",
+              "self": "https://issues.apache.org/jira/rest/api/2/version/12338174",
+              "id": "12338174",
+              "name": "3.6.0",
               "archived": false,
               "released": false
             }
@@ -610,29 +641,29 @@
             "id": "3"
           },
           "status": {
-            "self": "https://issues.apache.org/jira/rest/api/2/status/5",
-            "description": "A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.",
-            "iconUrl": "https://issues.apache.org/jira/images/icons/statuses/resolved.png",
-            "name": "Resolved",
-            "id": "5",
+            "self": "https://issues.apache.org/jira/rest/api/2/status/4",
+            "description": "This issue was once resolved, but the resolution was deemed incorrect. From here issues are either marked assigned or resolved.",
+            "iconUrl": "https://issues.apache.org/jira/images/icons/statuses/reopened.png",
+            "name": "Reopened",
+            "id": "4",
             "statusCategory": {
-              "self": "https://issues.apache.org/jira/rest/api/2/statuscategory/3",
-              "id": 3,
-              "key": "done",
-              "colorName": "green",
-              "name": "Complete"
+              "self": "https://issues.apache.org/jira/rest/api/2/statuscategory/2",
+              "id": 2,
+              "key": "new",
+              "colorName": "blue-gray",
+              "name": "New"
             }
           }
         }
       },
       {
         "expand": "operations,editmeta,changelog,transitions,renderedFields",
-        "id": "12969033",
-        "self": "https://issues.apache.org/jira/rest/api/2/issue/12969033",
-        "key": "APEXMALHAR-2089",
+        "id": "13006875",
+        "self": "https://issues.apache.org/jira/rest/api/2/issue/13006875",
+        "key": "APEXMALHAR-2260",
         "fields": {
-          "summary": "Apache Beam support",
-          "description": "Apex should provide a runner for Beam. This ticket is a proxy for BEAM-261 as the implementation should probably live in the Beam repository.\r\n",
+          "summary": "Python execution for operator logic ",
+          "description": "Support execution of Python code in an operator. \r\n\r\nhttps://lists.apache.org/thread.html/9837b1dee8f909ed400c6030ce5c6a94a12f43183718019dd0bfd228@%3Cdev.apex.apache.org%3E\r\n",
           "fixVersions": [],
           "priority": {
             "self": "https://issues.apache.org/jira/rest/api/2/priority/3",
@@ -658,21 +689,13 @@
       },
       {
         "expand": "operations,editmeta,changelog,transitions,renderedFields",
-        "id": "12988875",
-        "self": "https://issues.apache.org/jira/rest/api/2/issue/12988875",
-        "key": "APEXMALHAR-2142",
+        "id": "13006876",
+        "self": "https://issues.apache.org/jira/rest/api/2/issue/13006876",
+        "key": "APEXMALHAR-2261",
         "fields": {
-          "summary": "High-level API window support",
-          "description": null,
-          "fixVersions": [
-            {
-              "self": "https://issues.apache.org/jira/rest/api/2/version/12335815",
-              "id": "12335815",
-              "name": "3.5.0",
-              "archived": false,
-              "released": false
-            }
-          ],
+          "summary": "Python binding for high level API",
+          "description": "A high level API similar to the Apex Java stream API that lets users specify an application in Python.\r\n\r\nhttps://lists.apache.org/thread.html/9837b1dee8f909ed400c6030ce5c6a94a12f43183718019dd0bfd228@%3Cdev.apex.apache.org%3E\r\n",
+          "fixVersions": [],
           "priority": {
             "self": "https://issues.apache.org/jira/rest/api/2/priority/3",
             "iconUrl": "https://issues.apache.org/jira/images/icons/priorities/major.png",
@@ -717,22 +740,22 @@
         "self": "https://issues.apache.org/jira/rest/api/2/version/12334968",
         "id": "12334968",
         "name": "3.3.2",
-        "archived": false,
+        "archived": true,
         "released": false,
         "projectId": 12318824
       },
       {
         "self": "https://issues.apache.org/jira/rest/api/2/version/12335827",
         "id": "12335827",
-        "name": "3.4.1",
+        "name": "3.5.1",
         "archived": false,
         "released": false,
         "projectId": 12318824
       },
       {
-        "self": "https://issues.apache.org/jira/rest/api/2/version/12335815",
-        "id": "12335815",
-        "name": "3.5.0",
+        "self": "https://issues.apache.org/jira/rest/api/2/version/12338174",
+        "id": "12338174",
+        "name": "3.6.0",
         "archived": false,
         "released": false,
         "projectId": 12318824

http://git-wip-us.apache.org/repos/asf/apex-site/blob/2a37a06f/src/md/docs.md
----------------------------------------------------------------------
diff --git a/src/md/docs.md b/src/md/docs.md
index 25822c4..0604fd7 100644
--- a/src/md/docs.md
+++ b/src/md/docs.md
@@ -10,6 +10,7 @@ Documentation for previous releases is available in [Downloads](/downloads.html)
 
 - <a href="http://docs.datatorrent.com/beginner/" rel="nofollow">Beginner's Guide to Apache Apex</a> This document provides a comprehensive overview of Apex and is recommended for developers just starting out with Apex.
 - [Building Your First Apache Apex Application](https://youtu.be/LwRWBudOjg4) This video has a hands-on demonstration of how to check out the source code repositories and build them, then run the maven archetype command to generate a new Apache Apex project, populate the project with Java source files for a new application, and finally, build and run the application -- all on a virtual machine running Linux with Apache Hadoop installed.
+- [Writing an Apache Apex application](http://files.meetup.com/18978602/University%20program%20-%20Writing%20an%20Apache%20Apex%20application.pdf) A PDF document that frames a hands-on exercise of building a basic application; also includes a diagram illustrating the life-cycle of operators.
 - <a href="http://docs.datatorrent.com/tutorials/topnwords/" rel="nofollow">Top N Words Application Tutorial</a> This document provides a detailed step-by-step description of how to build and run a
 word counting application with Apache Apex starting with setting up your development environment, progressing to building, running and monitoring the application, visualizing the output and concluding with some advanced features such as assessing operator memory requirements, partitioning, and debugging.
 - <a href="http://docs.datatorrent.com/tutorials/salesdimensions/" rel="nofollow">Sales Dimensions Application Tutorial</a> Similar to the Top N Words application but covers
@@ -20,12 +21,11 @@ dimensional computations on a simulated sales data stream.
 ### Presentations
 
 - [Slideshare/ApacheApex](http://www.slideshare.net/ApacheApex/presentations) Presentations from past meetup events and other talks covering Apache Apex introduction, feature deep dive, integration, customer use cases and more.
-- [Writing an Apache Apex application](http://files.meetup.com/18978602/University%20program%20-%20Writing%20an%20Apache%20Apex%20application.pdf) A PDF document that frames a hands-on exercise of building a basic application; also includes a diagram illustrating the life-cycle of operators.
 - [Next Gen Decision Making in < 2ms](https://www.youtube.com/watch?v=98EW5NGM3u0) A video discussing CapitalOne's experience with Apache Apex and evaluation of competing technologies along with the [slides](http://www.slideshare.net/ApacheApex/capital-ones-next-generation-decision-in-less-than-2-ms). 
-- [Apache Nifi Integration with Apex](https://www.youtube.com/watch?v=EdBiOnQn3Gw) video and [slide deck](http://www.slideshare.net/ApacheApex/integrating-ni-fiandapex-by-bryan-bende).
 - [Introducing Apache Apex](https://www.brighttalk.com/webcast/13685/190407) A webinar that begins with the historical context for the rise of Hadoop and Big Data, discusses why the promise of Hadoop remains largely unfulfilled and why moving beyond Map-Reduce model is essential and why operability is critically important. It continues with a discussion of the programming model, the various components of a running application on a YARN cluster and the large library of operators and connectors available with Apache Apex for reading data from and writing data to external systems. Concludes with a brief description of the visualization dashboards.
-- [Stream Processing with Apache Apex](http://www.slideshare.net/PramodImmaneni/meetup-59089806) A broad overview slide deck covering topics such as windowing, static and dynamic partitioning, unification, fault tolerance, locality, monitoring, etc.
-- [Fault Tolerance and Processing Semantics](https://www.brighttalk.com/webcast/13685/194115) A webinar and associated [slides](http://www.slideshare.net/ApacheApexOrganizer/webinar-fault-toleranceandprocessingsemantics) covering core Apache Apex features including checkpointing and fault tolerance with fast, incremental recovery via a buffer server which uses a publish-subscribe model for inter-operator data transport. A variety of failure scenarios and processing guarantees are discussed.
+- [Stream Processing with Apache Apex (video)](https://www.youtube.com/watch?v=1DVMSRTNdIQ) and [(slides)](http://www.slideshare.net/ApacheApex/hadoop-summit-sj-2016-next-gen-big-data-analytics-with-apache-apex) A broad overview slide deck covering topics such as windowing, static and dynamic partitioning, unification, fault tolerance, locality, monitoring, etc.
+- [Fault Tolerance and Processing Semantics (video)](https://www.youtube.com/watch?v=FCMY6Ii89Nw) and [(slides)](http://www.slideshare.net/ApacheApexOrganizer/webinar-fault-toleranceandprocessingsemantics) A webinar covering core Apache Apex features including checkpointing and fault tolerance with fast, incremental recovery via a buffer server which uses a publish-subscribe model for inter-operator data transport. A variety of failure scenarios and processing guarantees are discussed.
+- [Smart Partitioning with Apache Apex (video)](https://www.youtube.com/watch?v=kJWMajIjGG0) and [(slides)](http://www.slideshare.net/ApacheApex/smart-partitioning-with-apache-apex-webinar) Webinar covering partitioning, including unique Apex features such as elasticity with dynamic resource allocation, parallel partitions for speculative execution and processing SLA etc.
 - [Windows in Apache Apex](http://www.slideshare.net/DevendraVyavahare/windowing-in-apex) Discusses the various flavors of windows available in Apache Apex and how to configure and
 use them via callbacks. Contrasts windows with micro-batches.
 - [Real Time Stream Processing Versus Batch](http://www.slideshare.net/DevendraVyavahare/batch-processing-vs-real-time-data-processing-streaming) Slide deck compares and contrasts the needs, use cases and challenges of stream processing with those of batch processing.