You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@streams.apache.org by sb...@apache.org on 2016/04/25 18:07:02 UTC

[05/11] incubator-streams-master git commit: tweaks to site outline and content

tweaks to site outline and content


Project: http://git-wip-us.apache.org/repos/asf/incubator-streams-master/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-streams-master/commit/ecbb81a0
Tree: http://git-wip-us.apache.org/repos/asf/incubator-streams-master/tree/ecbb81a0
Diff: http://git-wip-us.apache.org/repos/asf/incubator-streams-master/diff/ecbb81a0

Branch: refs/heads/newwebpage
Commit: ecbb81a0f5e627053553144ecbbdb43198762f1b
Parents: 09f4d65
Author: Steve Blackmon @steveblackmon <sb...@apache.org>
Authored: Mon Feb 29 09:27:34 2016 -0600
Committer: Steve Blackmon @steveblackmon <sb...@apache.org>
Committed: Mon Feb 29 09:27:34 2016 -0600

----------------------------------------------------------------------
 src/site/markdown/architecture.md | 70 ++++++++++++++++++++--------------
 src/site/markdown/faq.md          | 51 +++++++++++++++----------
 src/site/markdown/index.md        | 10 ++---
 src/site/site.xml                 | 18 +++++++--
 src/site/site_en.xml              | 17 +++++++--
 5 files changed, 105 insertions(+), 61 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-streams-master/blob/ecbb81a0/src/site/markdown/architecture.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/architecture.md b/src/site/markdown/architecture.md
index 120f139..174fabb 100644
--- a/src/site/markdown/architecture.md
+++ b/src/site/markdown/architecture.md
@@ -6,54 +6,68 @@ In general streams can be characterized as perpetual (capable of running indefin
 
 ###Basic Concepts
 
-####Activity
+####Module
 
-Apache Streams has a preference for ActivityStreams formatted messages.  These messages may be passed using the 'Activity' class or one of it's sub-classes.  
+Apache Streams consists of a loosely coupled set of modules with specific capabilities, such as:
 
-####Datum
+ - collecting data.
+ - transforming or filter data
+ - storing and retrieving documents and metadata from databases
+ - binding streams components to other systems
+ - facilitating starting and stopping of streams.
 
-A Datum is a single piece of data within a stream.  A datum typically has an identifier, a timestamp, a document (which may be any java object), and additional metadata kept apart from the document related to upstream or downstream processing..
+Each module has it's own POM and dependency tree.  Each stream deployment needs to import only the modules it needs for what it wants to do.
 
-####Module
+####Component
 
-Apache Streams consists of a loosely coupled set of modules with specific capabilities.  Such as:
- * collecting data.
- * transforming or filter data
- * storing and retrieving documents and metadata from databases
- * binding streams components to other systems
- * facilitating starting and stopping of streams.
+Components are the classes that do stuff within a stream.  Components are assembled into pipelines and executed using a runtime.  There are several core types of Components, each using a specific java interface:
 
-Each module has it's own POM and dependency tree.  Each stream deployment needs to import only the modules it needs.
+#####Provider
 
-####Pipeline
+A Provider is a component that *provides* data to the stream from external systems.
 
-A Pipeline is a set of collection, processing, and storage components structured in a directed graph (cycles may be permitted) which is packaged, deployed, started, and stopped together.
+#####Processor
 
-####Runtime
+A Processor is a component that *processes* data flowing through the stream - transformations, filters, and enrichments are common processors.
+
+#####PersistWriter
+
+A PersistWriter is a component that *writes* data exiting the stream.
+
+#####PersistReader
+
+A PersistReader is a component that *reads* data, often previously written by a PersistWriter.
 
-A Runtime is a module containing bindings that help setup and run a pipeline.  Runtimes may submit pipeline binaries to an existing cluster, or may launch the processes to execute the stream directly.  
 ####Schema
 
-A Schema defines the expected shape of the documents that will passed from step to step within a stream.  Defining the schema for a type of document allows source files and resource files to be generated at compile time. Schema can include other schemas, whether in the same repo or available via HTTP, allowing for full or partial reuse.
+A Schema defines the expected shape of the documents that will passed from step to step within a stream.  Defining the schema for a type of document allows source files and resource files to be generated by the build process, relieving your team of the need to maintain these files by hand.
 
-####Component
+Schemas can include other schemas, whether in the same repo or available via HTTP, allowing for full or partial reuse within or across organizations.
 
-Components are individual instances of classes that do stuff within a stream.  Components are assembled into pipelines and executed using a runtime.  
+####Datum
 
-####Types of Components
+A Datum is a single piece of data within a stream.  A datum typically has an identifier, a timestamp, a document (which may be any java object), and additional metadata kept apart from the document related to upstream or downstream processing..
 
-#####Provider
+####Activity
 
-A Provider is a component that *provides* data to the stream from external systems.
+Apache Streams has a preference for ActivityStreams formatted messages.  These messages may be passed using the 'Activity' class or one of it's sub-classes.  
 
-#####Processor
+####ActivityObject
 
-A Processor is a component that *processes* data flowing through the stream - transformations, filters, and enrichments are common processors.
+An activity has several sub-object fields:
 
-#####PersistWriter
+ - actor (required)
+ - object (optional)
+ - target (optional)
+ - generator (optional)
+ - provider (optional)
 
-A PersistWriter is a component that writes data exiting the stream.
+Streams containing details of actors, objects, etc... may be created using the 'ActivityObject' class or one of it's sub-classes.  
 
-#####PersistReader
+####Pipeline
+
+A Pipeline is a set of collection, processing, and storage components structured in a directed graph (cycles may be permitted) which is packaged, deployed, started, and stopped together.
+
+####Runtime
 
-A PersistReader is a component that reads data, often previously written by a PersistWriter.
+A Runtime is a module containing bindings that help setup and run a pipeline.  Runtimes may submit pipeline binaries to an existing cluster, or may launch the process(es) to execute the stream directly.  

http://git-wip-us.apache.org/repos/asf/incubator-streams-master/blob/ecbb81a0/src/site/markdown/faq.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/faq.md b/src/site/markdown/faq.md
index 14c18ea..151852a 100644
--- a/src/site/markdown/faq.md
+++ b/src/site/markdown/faq.md
@@ -1,14 +1,12 @@
-#Frequently Asked Questions
+## Frequently Asked Questions
 
 ###    Why should I adopt activity streams for my project?
 
-Odds are the dataset you are working with is some combination of timestamped events and observations of entities and their relationships at various points in time.  Activity Streams provides a simple yet powerful standard format for these types of data, regardless of their origin, publisher, or specific details.  As an community-driven specification designed for interoperability and flexibility, by adopting activity streams you maximize the chance that a new data-source of interest to you will be compatible with your existing database, and that your database will be compatible with a community working on a new project.
+Odds are the dataset you are working with is some combination of timestamped events and observations of entities and their relationships at various points in time.  Activity Streams provides a simple yet powerful standard format for these types of data, regardless of their origin, publisher, or specific details.  Activity Streams is a community-driven specification designed for interoperability and flexibility.  By supporting activity streams you maximize the chance that a new data-source of interest to you will be compatible with your existing data, and that your data will be compatible with that of other communities working on similar projects.
 
 ###    Why should I consider using Apache Streams for my project?
 
-If you are working with structured event and or entity data that fits with an Activity Streams model, and working with a JVM language, Apache Streams can simplify many of the challenging aspects involved with these types of projects.
-
-Here are a few examples:
+If you are working with structured event and or entity data that fits the Activity Streams model, and working with a JVM language, Apache Streams can simplify many of the challenging aspects involved with these types of projects.  For example:
 
 * Keeping track of the original source of each piece of information
 * Harmonizing a multitude of date-time formats
@@ -31,37 +29,37 @@ Apache Streams is not
 * one-size-fits-all
 * only useful for projects fully dedicated to activity streams datasets
 
-The primary Streams git repository incubator-streams (org.apache.streams:streams-project) which contains a library of modules inputs, outputs, and reusable components for tranforming and enriching data streams.  Similar modules can also be hosted externally - so long as they publish maven artifacts compatible with your version of streams, you can import and use them in your streams easily.
+The primary Streams git repository incubator-streams (org.apache.streams:streams-project) contains a library of modules inputs, outputs, and reusable components for tranforming and enriching data streams.  Similar modules can also be hosted externally - so long as they publish maven artifacts compatible with your version of streams, you can import and use them in your streams easily.
 
 The streams community also supports a seperate repository incubator-streams-examples (org.apache.streams:streams-examples) which contains a library of simple streams that are 'ready-to-run'.  Look here to see what Streams user code look like.
 
 ###    Why bother with any data framework at all?
 
-Why use Linux, Java, Postgres, Elasticsearch, Cassandra, or Hadoop?
+Why use Postgres, Elasticsearch, Cassandra, Hadoop, Linux, or Java?
 
 Frameworks make important but boring parts of systems and code just work so your team can focus on features important to your users.
 
 If you are sure you can write code that is some combination of faster, more readable, better tested, easier to learn, easier to build with, or more maintainable than any existing framework (including Streams), maybe you should.
 
-On the other hand, maybe you are under-estimating how difficult it will be to optimize across these factors and keep improving.
+On the other hand, maybe you are under-estimating how difficult it will be to optimize across these factors and continuous improving those libraries.
 
 Or maybe your time is just more valuable focused on your product rather than on plumbing.
 
-Or maybe by joining forces with others who have more than just a passing interest in running water everyone can benefit from .
+Or maybe by joining forces with others who have more than just a passing interest in running water, everyone can run better, faster, stronger code assembled with expertise including your own.
 
 ###    How is streams different than "*processing framework*"?
 
-You don't have to look hard to find great data processing frameworks for batch or for real-time.  Storm, Spark, Flink, and Dataflow are well-known and pretty solid.  At the core these platforms help you specify inputs, outputs, and a directed graph of computation and then run your code at scale.
+You don't have to look hard to find great data processing frameworks for batch or for real-time.  Storm, Spark, Samza, Flink, and Dataflow are well-known, well-documented, and solid.  At the core these platforms help you specify inputs, outputs, and a directed graph of computation and then run your code at scale.
 
-Streams supports a similar computational model, but is more focused on intelligently modeling the data that will flow through the stream.  In this sense Streams has an alternative to avro or protocol buffers, that places flexibility, expressivity and tooling ahead of speed or compute efficiency.
+Streams supports a similar computational model, but is more focused on intelligently modeling the data that will flow through the stream.  In this sense Streams is an alternative to avro or protocol buffers which prioritizes flexibility, expressivity, interoperability, and tooling ahead of speed or compute efficiency.
 
-Streams also seeks to make it easy to design and evolve streams, and to configure complex streams sensibly.  Where many processing frameworks leave all business logic and configuration issues to the developer, streams modules are designed to mix-and-match.
+Streams also seeks to make it easy to design and evolve streams, and to configure complex streams sensibly.  Where many processing frameworks leave all business logic and configuration issues to the developer, streams modules are designed to mix-and-match.  Streams modules expect to be embedded with other frameworks and are organized to make that process painless.
 
 ###    How do I deploy Streams?
 
-Currently you cannot deploy "Streams".  Streams has no shrink-wrapped ready-to-run server process.  You can however deploy streams.  The right method for packaging, deploying, and running streams depends on what runtime you are going to use.
+Currently you cannot deploy Streams (uppercase).  Streams has no shrink-wrapped ready-to-run server process.  You can however deploy streams (lowercase).  The right method for packaging, deploying, and running streams depends on what runtime you are going to use.
 
-Streams includes a local runtime that uses blocking queues and multi-threaded execution within a single process.  In this scenario you build an uberjar with few exclusions and ship it to a target environment however you want - maven, scp, docker, etc...  You launch the stream process with an appropriate run configuration and watch the magic / catastrophic fail.
+Streams includes a local runtime that uses multi-threaded execution and blocking queues within a single process.  In this scenario you build an uberjar with few exclusions and ship it to a target environment however you want - maven, scp, docker, etc...  You launch the stream process with an appropriate configuration and watch the magic / catastrophic fail.
 
 Alternatively, components written to streams interfaces can be bound within other platforms such as pig or spark.  In this scenario, you build an uberjar that excludes the platform parts of the classpath and launch your stream using the launch style of that platform.
 
@@ -73,24 +71,35 @@ A better long-term approach is to archive each data series you observe, and labe
 
 ###    What if I need data from "*specific API*"?
 
-No problem - anyone can write a Streams provider.  The project contains providers that use sockets, webhooks, and polling to generate near-real-time data streams.  There are providers which work sequentially through a backlog of items from the stream configuration, running a thread to collect data related to each item.  And if you need to collect so many items that you can't fit all of their ids in the memory available to your stream, a stream can read an arbitrarily long sequence of ids and launch new streams for each batch that terminate when complete.
+No problem - anyone can write a Streams provider.  The project contains providers that use a variety of strategies to generate near-real-time data streams, including:
+ - sockets
+ - webhooks
+ - polling
+ - scraping
+
+Providers can run continuously and pass-through new data, or they can work sequentially through a backlog of items.  If you need to collect so many items that you can't fit all of their ids in the memory available to your stream, a stream provider can read an arbitrarily long sequence of ids and hand those off to other providers for collection.
 
 ###    What if I want to keep data in "*unsupported database*"?
 
-No problem - anyone can write a Streams persist reader or persist writer.  The project contains persist writers that write documents efficiently with batch-style binary indexing, that write documents one-by-one to services with REST api endpoints, and that write data to local or distributed buffers.  If you just want to get incoming data into a queueing system to work with outside of streams that's understandable.
+No problem - anyone can write a Streams persist reader or persist writer.  The project contains persist writers that:
+ - write documents efficiently with batch-style binary indexing
+ - write documents one-by-one to services with REST api endpoints
+ - write data to local or distributed buffers.
+
+If you just want to use streams providers to collect and feed incoming data into a queueing system to work with outside of streams that's just fine.
 
 ###    Can't I just use "*third-party SDK*" to do the same thing?
 
-For any specific data collection, processing, or storage function there are several if not tens of basic implementations on GitHub.  There may even be language-specific libraries published by a vendor backing the technology in question.
+For any specific data collection, processing, or storage function there are several if not tens of basic implementations on GitHub.  There may be language-specific libraries published by a vendor backing the technology in question.
 
-However, in general there are a set of tradeoffs involved relying on these package.  They often have transitive dependencies.  They may not use performant HTTP and JSON libraries.  The object representations and lifecycle mechanisms they provide may not be consistent with the rest of your code.
+However, in general there are a set of tradeoffs involved relying on these package.  They often have transitive dependencies.  They may not use performant HTTP and JSON libraries.  The object representations and lifecycle mechanisms they provide may not be consistent with the rest of your code.  They may source configuration properties in a problematic or cumbersome fashion.  Their licenses may be restrictive or undocumented.
 
-Streams goes to great lengths to regularize many of these challenges so they uniform across existing modules, and easy to reuse within new modules.  Where quality java libraries exist, the most useful parts of their classpath may be included within a streams implementation while other dependencies are excluded.  
+Streams goes to great lengths to regularize many of these issues so they are uniform across existing modules, and easy to reuse within new modules.  Where quality java libraries exist, their most useful parts may be included within a streams module, while parts of their classpath are excluded.
 
 ###    Where do I start?
 
-Navigate the list of 'Getting Started' recommendation in order to get up and running with streams.
+Navigate the list of 'Getting Started' recommendations in order to get up and running with streams.
 
 ###    How can I help?
 
-Please join our mailing list, then ask questions and suggest features.  Contribute to the documentation in one of the streams repositories.  Consider writing a new provider using an existing provider as a template.  
+Please join our mailing list, then ask questions and suggest features.  Contribute to the documentation in one of the streams repositories.  Consider writing a new provider using an existing provider as a template.  Consider adding a feature (and / or tests) to an existing module you intend to use.  Consider building and contributing a new example.  

http://git-wip-us.apache.org/repos/asf/incubator-streams-master/blob/ecbb81a0/src/site/markdown/index.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/index.md b/src/site/markdown/index.md
index f18667a..bbff087 100644
--- a/src/site/markdown/index.md
+++ b/src/site/markdown/index.md
@@ -1,15 +1,15 @@
-# Overview
+## Overview
 
 Apache Streams (incubating) unifies a diverse world of digital profiles and online activities into common formats and vocabularies, and makes these datasets accessible across a variety of databases, devices, and platforms for streaming, browsing, search, sharing, and analytics use-cases.
 
-## What is Streams?
+### What is Streams?
 Apache Streams contains JRE-based modules that developers can use to easily integrate with online data sources and build polyglot indexes of activities, entities, and relationships - all based on public standards such as [Activity Streams](activitystrea.ms), or other published organizational standards.
 
-## Why use Streams?
+### Why use Streams?
 Streams contains libraries and patterns for specifying, publishing, and inter-linking schemas, and assists with conversion of activities (posts, shares, likes, follows, etc.) and objects (profiles, pages, photos, videos, etc.) between the representation, format, and encoding preferred by supported data providers (Twitter, Instagram, etc.), and storage services (Cassandra, Elasticsearch, HBase, HDFS, Neo4J, etc.)
 
-## Why is Streams important?
+### Why is Streams important?
 The project aims to provide simple two-way data interchange with all popular REST APIs in activity streams formats using a universal protocol.  No other active open-source project has this ambitious goal, as well as production-worthy implementations for >10 services.  Streams compatibility with multiple storage back-ends and ability to be embedded within any java-based real-time or batch data processing platform ensures that its interoperability features come with little technical baggage.
 
-# Disclaimer
+### Disclaimer
 Apache Streams is an effort undergoing incubation at [The Apache Software Foundation (ASF)](apache.org) sponsored by the [Apache Incubator PMC](incubator.apache.org). Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

http://git-wip-us.apache.org/repos/asf/incubator-streams-master/blob/ecbb81a0/src/site/site.xml
----------------------------------------------------------------------
diff --git a/src/site/site.xml b/src/site/site.xml
index 547ac5d..dfcaf2a 100644
--- a/src/site/site.xml
+++ b/src/site/site.xml
@@ -44,19 +44,31 @@
     <body>
         <breadcrumbs>
           <item name="Incubator" href="http://incubator.apache.org/"/>
+          <item name="Streams" href="http://streams.incubator.apache.org/"/>
         </breadcrumbs>
         <menu ref="parent" inherit="top"/>
-        <menu name="Project Overview">
+        <menu name="Overview">
             <item name="Overview" href="index.html" />
             <item name="Architecture" href="architecture.html" />
             <item name="Downloads" href="downloads.html" />
             <item name="Frequently Asked Questions" href="faq.html" />
         </menu>
+        <menu name="Details">
+          <item name="Project License" href="license.html" />
+          <item name="Mailing Lists" href="mail-lists.html" />
+          <item name="Project Team" href="team-list.html" />
+          <item name="Continuous Integration" href="integration.html"/>
+          <item name="Issue Tracking" href="issue-tracking.html" />
+        </menu>
+        <menu name="Projects">
+          <item name="streams-master" />
+          <item name="streams-project" href="http://streams.incubator.apache.org/site/0.2-incubating/streams-project" />
+          <item name="streams-examples" href="http://streams.incubator.apache.org/site/0.2-incubating-SNAPSHOT/streams-examples/" />
+        </menu>
         <menu name="Getting Started">
             <item name="Learn more about Activity Streams" href="http://activitystrea.ms" />
             <item name="Check out streams-project web site" href="http://streams.incubator.apache.org/site/0.2-incubating/streams-project/" />
             <item name="View the official Apache Streams jsonschema files" href="http://streams.incubator.apache.org/site/0.2-incubating/streams-project/streams-pojo/index.html" />
-            <item name="Check out streams-examples web site" href="http://streams.incubator.apache.org/site/0.2-incubating-SNAPSHOT/streams-examples/" />
             <item name="Set up a local environment to run streams" />
             <item name="Set up a local database to store streams data" />
             <item name="Build and run twitter-history-elasticsearch" href="http://streams.incubator.apache.org/site/0.2-incubating-SNAPSHOT/streams-examples/streams-examples-local/twitter-history-elasticsearch/index.html" />
@@ -65,11 +77,9 @@
             <item name="Read about twitter / streams conversion" href="http://streams.incubator.apache.org/site/0.2-incubating/streams-project/streams-contrib/index.html" />
             <item name="Learn about utility streams components" href="http://streams.incubator.apache.org/site/0.2-incubating/streams-project/streams-components/index.html"  />
             <item name="Learn about streams interoperability modules" href="http://streams.incubator.apache.org/site/0.2-incubating/streams-project/streams-contrib/index.html"  />
-            <item name="Browse the streams-project javadocs" href="http://streams.incubator.apache.org/site/0.2-incubating/streams-project/apidocs/index.html" />
         </menu>
         <menu name="Foundation">
           <item name="Foundation Info" href="http://www.apache.org/" />
-          <item name="License" href="http://apache.org/licenses/LICENSE-2.0.html" />
           <item name="Sponsorship" href="http://www.apache.org/foundation/sponsorship.html" />
           <item name="Thanks" href="http://www.apache.org/foundation/thanks.html" />
         </menu>

http://git-wip-us.apache.org/repos/asf/incubator-streams-master/blob/ecbb81a0/src/site/site_en.xml
----------------------------------------------------------------------
diff --git a/src/site/site_en.xml b/src/site/site_en.xml
index 0068beb..dfcaf2a 100644
--- a/src/site/site_en.xml
+++ b/src/site/site_en.xml
@@ -44,19 +44,31 @@
     <body>
         <breadcrumbs>
           <item name="Incubator" href="http://incubator.apache.org/"/>
+          <item name="Streams" href="http://streams.incubator.apache.org/"/>
         </breadcrumbs>
         <menu ref="parent" inherit="top"/>
-        <menu name="Project Overview">
+        <menu name="Overview">
             <item name="Overview" href="index.html" />
             <item name="Architecture" href="architecture.html" />
             <item name="Downloads" href="downloads.html" />
             <item name="Frequently Asked Questions" href="faq.html" />
         </menu>
+        <menu name="Details">
+          <item name="Project License" href="license.html" />
+          <item name="Mailing Lists" href="mail-lists.html" />
+          <item name="Project Team" href="team-list.html" />
+          <item name="Continuous Integration" href="integration.html"/>
+          <item name="Issue Tracking" href="issue-tracking.html" />
+        </menu>
+        <menu name="Projects">
+          <item name="streams-master" />
+          <item name="streams-project" href="http://streams.incubator.apache.org/site/0.2-incubating/streams-project" />
+          <item name="streams-examples" href="http://streams.incubator.apache.org/site/0.2-incubating-SNAPSHOT/streams-examples/" />
+        </menu>
         <menu name="Getting Started">
             <item name="Learn more about Activity Streams" href="http://activitystrea.ms" />
             <item name="Check out streams-project web site" href="http://streams.incubator.apache.org/site/0.2-incubating/streams-project/" />
             <item name="View the official Apache Streams jsonschema files" href="http://streams.incubator.apache.org/site/0.2-incubating/streams-project/streams-pojo/index.html" />
-            <item name="Check out streams-examples web site" href="http://streams.incubator.apache.org/site/0.2-incubating-SNAPSHOT/streams-examples/" />
             <item name="Set up a local environment to run streams" />
             <item name="Set up a local database to store streams data" />
             <item name="Build and run twitter-history-elasticsearch" href="http://streams.incubator.apache.org/site/0.2-incubating-SNAPSHOT/streams-examples/streams-examples-local/twitter-history-elasticsearch/index.html" />
@@ -68,7 +80,6 @@
         </menu>
         <menu name="Foundation">
           <item name="Foundation Info" href="http://www.apache.org/" />
-          <item name="License" href="http://apache.org/licenses/LICENSE-2.0.html" />
           <item name="Sponsorship" href="http://www.apache.org/foundation/sponsorship.html" />
           <item name="Thanks" href="http://www.apache.org/foundation/thanks.html" />
         </menu>