You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by ma...@apache.org on 2014/01/22 21:33:26 UTC
svn commit: r1560502 [18/18] - in /incubator/spark: ./ _layouts/ _plugins/
css/ images/ js/ mllib/ news/ news/_posts/ releases/_posts/
screencasts/_posts/ site/ site/css/ site/images/ site/js/ site/mllib/
site/news/ site/releases/ site/screencasts/ sit...
Added: incubator/spark/streaming/index.md
URL: http://svn.apache.org/viewvc/incubator/spark/streaming/index.md?rev=1560502&view=auto
==============================================================================
--- incubator/spark/streaming/index.md (added)
+++ incubator/spark/streaming/index.md Wed Jan 22 20:33:24 2014
@@ -0,0 +1,149 @@
+---
+layout: global
+type: "page singular"
+title: Spark Streaming
+subproject: Streaming
+---
+
+
+<div class="jumbotron">
+ <b>Spark Streaming</b> makes it easy to build scalable fault-tolerant streaming
+ applications.
+</div>
+
+
+
+<div class="row row-padded">
+ <div class="col-md-7 col-sm-7">
+ <h2>Ease of Use</h2>
+ <p class="lead">
+ Build applications through high-level operators.
+ </p>
+ <p>
+ Spark Streaming brings <a href="{{site.url}}">Spark</a>'s
+ language-integrated API to stream processing,
+ letting you write streaming applications the same way you write batch jobs.
+ It supports both Java and Scala.
+ </p>
+ </div>
+ <div class="col-md-5 col-sm-5 col-padded-top col-center">
+
+ <div style="margin-top: 15px; text-align: left; display: inline-block;">
+ <div class="code">
+ TwitterUtils.createStream(...)<br/>
+ .<span class="sparkop">filter</span>(<span class="closure">_.getText.contains("Spark")</span>)<br/>
+ .<span class="sparkop">countByWindow</span>(Seconds(5))
+ </div>
+ <div class="caption">Counting tweets on a sliding window</div>
+ </div>
+ </div>
+</div>
+
+<div class="row row-padded">
+ <div class="col-md-7 col-sm-7">
+ <h2>Fault Tolerance</h2>
+ <p class="lead">
+ Stateful exactly-once semantics out of the box.
+ </p>
+ <p>
+ Spark Streaming recovers both lost work
+ and operator state (e.g. sliding windows) out of the box, without any extra code on your part.
+ </p>
+ </div>
+ <div class="col-md-5 col-sm-5 col-padded-top col-center">
+ <div style="width: 100%; max-width: 300px; display: inline-block;">
+ <img src="{{site.url}}images/spark-streaming-recovery.png" style="width: 100%; max-width: 300px;">
+ </div>
+ </div>
+</div>
+
+<div class="row row-padded">
+ <div class="col-md-7 col-sm-7">
+ <h2>Spark Integration</h2>
+ <p class="lead">
+ Combine streaming with batch and interactive queries.
+ </p>
+ <p>
+ By running on Spark, Spark Streaming lets you reuse the same code for batch
+ processing, join streams against historical data, or run ad-hoc
+ queries on stream state.
+ Build powerful interactive applications, not just analytics.
+ </p>
+ </div>
+ <div class="col-md-5 col-sm-5 col-padded-top col-center">
+ <div style="margin-top: 20px; text-align: left; display: inline-block;">
+ <div class="code">
+ stream.<span class="sparkop">join</span>(historicCounts).<span class="sparkop">filter</span> {<span class="closure"><br/>
+ case (word, (curCount, oldCount)) =><br/>
+ curCount > oldCount<br/>
+ </span>}
+ </div>
+ <div class="caption">Find words with higher frequency than historic data</div>
+ </div>
+ </div>
+</div>
+
+
+{% extra %}
+
+
+<div class="row">
+ <div class="col-md-4 col-padded">
+ <h3>Deployment Options</h3>
+ <p>
+ Spark Streaming can read data from
+ <a href="http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html">HDFS</a>,
+ <a href="http://flume.apache.org">Flume</a>,
+ <a href="http://kafka.apache.org">Kafka</a>,
+ <a href="https://dev.twitter.com">Twitter</a> and
+ <a href="http://zeromq.org">ZeroMQ</a>.
+ You can also define your own custom data sources.
+ </p>
+ <p>
+ You can run Spark Streaming on Spark's <a href="{{site.url}}docs/latest/spark-standalone.html">standalone cluster mode</a>
+ or <a href="{{site.url}}docs/latest/ec2-scripts.html">EC2</a>.
+ It also includes a local run mode for development.
+ In production,
+ Spark Streaming uses <a href="http://zookeeper.apache.org">ZooKeeper</a> and <a href="http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html">HDFS</a> for high availability.
+ </p>
+ </div>
+
+ <div class="col-md-4 col-padded">
+ <h3>Community</h3>
+ <p>
+ Spark Streaming is developed as part of Apache Spark. It thus gets
+ tested and updated with each Spark release.
+ </p>
+ <p>
+ If you have questions about the system, ask on the
+ <a href="{{site.url}}community.html#mailing-lists">Spark mailing lists</a>.
+ </p>
+ <p>
+ The Spark Streaming developers welcome contributions. If you'd like to help out,
+ read <a href="https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark">how to
+ contribute to Spark</a>, and send us a patch!
+ </p>
+ </div>
+
+ <div class="col-md-4 col-padded">
+ <h3>Getting Started</h3>
+ <p>
+ To get started with Spark Streaming:
+ </p>
+ <ul class="list-narrow">
+ <li><a href="{{site.url}}downloads.html">Download Spark</a>. It includes Streaming as a module.</li>
+ <li>Read the <a href="{{site.url}}docs/latest/streaming-programming-guide.html">Spark Streaming programming guide</a>, which includes a tutorial and describes system architecture, configuration and high availability.</li>
+ <li>Check out example programs in <a href="https://github.com/apache/incubator-spark/tree/master/examples/src/main/scala/org/apache/spark/streaming/examples">Scala</a> and <a href="https://github.com/apache/incubator-spark/tree/master/examples/src/main/java/org/apache/spark/streaming/examples">Java</a>.</li>
+ </ul>
+ </div>
+</div>
+
+<div class="row">
+ <div class="col-sm-12 col-center">
+ <a href="{{site.url}}downloads.html" class="btn btn-success btn-lg btn-multiline">
+ Download Spark<br/><span class="small">Includes Spark Streaming</span>
+ </a>
+ </div>
+</div>
+
+{% endextra %}