You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by fr...@apache.org on 2016/10/19 04:06:33 UTC

[4/8] incubator-beam-site git commit: Add Design Principles (take from the original Beam technical vision document).

Add Design Principles (take from the original Beam technical vision document).


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/commit/99783418
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/tree/99783418
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/diff/99783418

Branch: refs/heads/asf-site
Commit: 997834188ecf29b307e195c9c7e8d31fa60b34ff
Parents: 7f234a5
Author: Frances Perry <fj...@google.com>
Authored: Mon Oct 3 19:00:03 2016 -0700
Committer: Frances Perry <fj...@google.com>
Committed: Tue Oct 18 20:56:39 2016 -0700

----------------------------------------------------------------------
 _includes/header.html           |  5 ++--
 contribute/design-principles.md | 53 ++++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-beam-site/blob/99783418/_includes/header.html
----------------------------------------------------------------------
diff --git a/_includes/header.html b/_includes/header.html
index 182b30a..67631a9 100644
--- a/_includes/header.html
+++ b/_includes/header.html
@@ -63,12 +63,13 @@
 			  <li role="separator" class="divider"></li>
 			  <li class="dropdown-header">Basics</li>
 			  <li><a href="{{ site.baseurl }}/contribute/contribution-guide/">Contribution Guide</a></li>
-			  <li><a href="{{ site.baseurl }}/contribute/testing/">Testing</a></li>
 			  <li><a href="{{ site.baseurl }}/use/mailing-lists/">Mailing Lists</a></li>
               <li><a href="{{ site.baseurl }}/contribute/source-repository/">Source Repository</a></li>
               <li><a href="{{ site.baseurl }}/use/issue-tracking/">Issue Tracking</a></li>
               <li role="separator" class="divider"></li>
-			  <li class="dropdown-header">Technical Resources</li>
+			  <li class="dropdown-header">Technical References</list>
+			  <li><a href="{{ site.baseurl }}/contribute/testing/">Testing</a></li>
+              <li><a href="{{ site.baseurl }}/contribute/design-principles/">Design Principles</a></li>
 			  <li><a href="https://goo.gl/nk5OM0">Technical Vision</a></li>
 		  </ul>
 	    </li>

http://git-wip-us.apache.org/repos/asf/incubator-beam-site/blob/99783418/contribute/design-principles.md
----------------------------------------------------------------------
diff --git a/contribute/design-principles.md b/contribute/design-principles.md
new file mode 100644
index 0000000..87ddd24
--- /dev/null
+++ b/contribute/design-principles.md
@@ -0,0 +1,53 @@
+---
+layout: default
+title: 'Design Principles in Beam'
+permalink: /contribute/design-principles/
+---
+
+# Design Principles in the Apache Beam Project
+
+Joshua Bloch\u2019s [API Design Bumper Stickers](https://www.infoq.com/articles/API-Design-Joshua-Bloch) are a great list of what makes for good API design. In addition, we have specific design principles we follow in Beam.
+
+* TOC
+{:toc}
+
+## Use cases
+
+### Unify the model
+Provide one model that works over both bounded (aka. batch) and unbounded (aka. streaming) datasets. Pay special attention to windows / triggers / state / timers, which often trip up folks used to a batch world.  Provide users with the right abstractions to adjust latency and completeness guarantees to cover both traditional batch and streaming use cases. 
+
+### Separate data shapes and runtime requirements
+The model should focus on letting users describe their data and processing, without exposing any details of a specific runtime system. For example, bounded and unbounded describe the shape of data, but batch and streaming describe the behavior of specific runtime systems. Good test cases are to imagine a mythical micro-batching runner that sits somewhere between batch and streaming or a engine that dynamically switches between streaming and batch depending on the backlog.
+
+### Make efficient things easy, rather than make easy things efficient
+Don\u2019t prevent efficiency for ease of use. Design APIs that provide the information necessary for efficiently executing at scale. Provide class hierarchies and wrappers to make the common cases simpler.
+
+## Usability
+
+### Validate Early
+Validate constraints on graph shape, runner requirements, etc as early in the compile time - construction time - submission time - execution time spectrum as reasonably possible in order to provide a smoother user experience.
+
+### Public APIs, like diamonds, are forever (at least until the next major version)
+Backwards incompatible changes can only be made in the next major version. Because of the burden major versions place on users (code has to be modified, conflicting dependency nightmares, etc), we aim to do this infrequently. Clearly mark APIs that are considered experimental (may change at any point) and deprecated (will be removed in the next major version). Consider what APIs are more amenable to future changes (abstract classes vs. interfaces, etc.)
+
+### Examples should be pedagogical
+Canonical examples help people ingrain the principles. Design examples that teach complex concepts in modular chunks. If you can\u2019t explain the concept easily, then the API isn\u2019t right. Examples should withstand random copy-pasting. 
+
+## Extensibility
+
+### Use PTransforms for modularity
+Composite transformations (transformations formed by a subgraph of other transformations) are treated as first class objects. They can be named and applied directly in any pipeline to nicely encapsulate concepts. This removes the artificial separation between those built into PCollection and those provided by users. In addition, PTransforms can be used as a clear concept in graphical monitoring and provide a way to scope metadata like aggregators, logging, and resources. Use these when building pipelines.
+
+### Keep Beam SDKs consistent
+Beam SDKs should expose the complete set of concepts in the programming model. They should all use the same set of abstractions and be able to share conceptual documentation.
+
+### When in ~~Rome~~ Python, do as the ~~Romans~~ Pythonians do
+Each SDK must feel right to those who live and breath that language. Adapt the general Beam concepts into language-dependent styles when the benefits clearly outweigh the drawbacks.
+
+### Encourage DSLs  
+Many use cases or user communities can be served by provided \u2018wrapper\u2019 SDKs that provide a simpler or domain-specific set of abstractions that then build on a Beam SDK and take advantage of Beam Runners.
+
+### Design for the model, not specific runners
+
+The Beam APIs should serve all runners. Behind every runner-specific hook, there is a general principle in the model. Design APIs that generalize across multiple runners.
+