You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by ma...@apache.org on 2015/05/31 21:04:54 UTC

svn commit: r1682772 - in /spark: faq.md site/faq.html site/streaming/index.html streaming/index.md

Author: matei
Date: Sun May 31 19:04:53 2015
New Revision: 1682772

URL: http://svn.apache.org/r1682772
Log:
Some updates to FAQ on streaming

Modified:
    spark/faq.md
    spark/site/faq.html
    spark/site/streaming/index.html
    spark/streaming/index.md

Modified: spark/faq.md
URL: http://svn.apache.org/viewvc/spark/faq.md?rev=1682772&r1=1682771&r2=1682772&view=diff
==============================================================================
--- spark/faq.md (original)
+++ spark/faq.md Sun May 31 19:04:53 2015
@@ -36,9 +36,6 @@ Spark is a fast and general processing e
 <p class="question">How can I access data in S3?</p>
 <p class="answer">Use the <code>s3n://</code> URI scheme (<code>s3n://bucket/path</code>). You will also need to set your Amazon security credentials, either by setting the environment variables <code>AWS_ACCESS_KEY_ID</code> and <code>AWS_SECRET_ACCESS_KEY</code> before your program runs, or by setting <code>fs.s3.awsAccessKeyId</code> and <code>fs.s3.awsSecretAccessKey</code> in <code>SparkContext.hadoopConfiguration</code>.</p>
 
-<p class="question">Which languages does Spark support?</p>
-<p class="answer">Spark supports Scala, Java and Python.</p>
-
 <p class="question">Does Spark require modified versions of Scala or Python?</p>
 <p class="answer">No. Spark requires no changes to Scala or compiler plugins. The Python API uses the standard CPython implementation, and can call into existing C libraries for Python such as NumPy.</p>
 
@@ -48,9 +45,9 @@ Spark is a fast and general processing e
 
 <p>In addition, Spark also has <a href="{{site.url}}docs/latest/java-programming-guide.html">Java</a> and <a href="{{site.url}}docs/latest/python-programming-guide.html">Python</a> APIs.</p>
 
-<p class="question">What license is Spark under?</p>
+<p class="question">I understand Spark Streaming uses micro-batching. Does this increase latency?</p>
 
-<p class="answer">Starting in version 0.8, Spark is under the <a href="http://www.apache.org/licenses/LICENSE-2.0.html">Apache 2.0 license</a>. Previous versions used the <a href="https://github.com/mesos/spark/blob/branch-0.7/LICENSE">BSD license</a>.</p>
+While Spark does use a micro-batch execution model, this does not have much impact on applications, because the batches can be as short as 0.5 seconds. In most applications of streaming big data, the analytics is done over a larger window (say 10 minutes), or the latency to get data in is higher (e.g. sensors collect readings every 10 seconds). The benefit of Spark's micro-batch model is that it enables <a href="http://people.csail.mit.edu/matei/papers/2013/sosp_spark_streaming.pdf">exactly-once semantics</a>, meaning the system can recover all intermediate state and results on failure.
 
 <p class="question">How can I contribute to Spark?</p>
 

Modified: spark/site/faq.html
URL: http://svn.apache.org/viewvc/spark/site/faq.html?rev=1682772&r1=1682771&r2=1682772&view=diff
==============================================================================
--- spark/site/faq.html (original)
+++ spark/site/faq.html Sun May 31 19:04:53 2015
@@ -196,9 +196,6 @@ Spark is a fast and general processing e
 <p class="question">How can I access data in S3?</p>
 <p class="answer">Use the <code>s3n://</code> URI scheme (<code>s3n://bucket/path</code>). You will also need to set your Amazon security credentials, either by setting the environment variables <code>AWS_ACCESS_KEY_ID</code> and <code>AWS_SECRET_ACCESS_KEY</code> before your program runs, or by setting <code>fs.s3.awsAccessKeyId</code> and <code>fs.s3.awsSecretAccessKey</code> in <code>SparkContext.hadoopConfiguration</code>.</p>
 
-<p class="question">Which languages does Spark support?</p>
-<p class="answer">Spark supports Scala, Java and Python.</p>
-
 <p class="question">Does Spark require modified versions of Scala or Python?</p>
 <p class="answer">No. Spark requires no changes to Scala or compiler plugins. The Python API uses the standard CPython implementation, and can call into existing C libraries for Python such as NumPy.</p>
 
@@ -208,9 +205,9 @@ Spark is a fast and general processing e
 
 <p>In addition, Spark also has <a href="/docs/latest/java-programming-guide.html">Java</a> and <a href="/docs/latest/python-programming-guide.html">Python</a> APIs.</p>
 
-<p class="question">What license is Spark under?</p>
+<p class="question">I understand Spark Streaming uses micro-batching. Does this increase latency?</p>
 
-<p class="answer">Starting in version 0.8, Spark is under the <a href="http://www.apache.org/licenses/LICENSE-2.0.html">Apache 2.0 license</a>. Previous versions used the <a href="https://github.com/mesos/spark/blob/branch-0.7/LICENSE">BSD license</a>.</p>
+While Spark does use a micro-batch execution model, this does not have much impact on applications, because the batches can be as short as 0.5 seconds. In most applications of streaming big data, the analytics is done over a larger window (say 10 minutes), or the latency to get data in is higher (e.g. sensors collect readings every 10 seconds). The benefit of Spark's micro-batch model is that it enables <a href="http://people.csail.mit.edu/matei/papers/2013/sosp_spark_streaming.pdf">exactly-once semantics</a>, meaning the system can recover all intermediate state and results on failure.
 
 <p class="question">How can I contribute to Spark?</p>
 

Modified: spark/site/streaming/index.html
URL: http://svn.apache.org/viewvc/spark/site/streaming/index.html?rev=1682772&r1=1682771&r2=1682772&view=diff
==============================================================================
--- spark/site/streaming/index.html (original)
+++ spark/site/streaming/index.html Sun May 31 19:04:53 2015
@@ -182,9 +182,9 @@
       Build applications through high-level operators.
     </p>
     <p>
-      Spark Streaming brings <a href="/">Spark</a>'s
-      language-integrated API to stream processing,
-      letting you write streaming jobs the same way you write batch jobs.
+      Spark Streaming brings Spark's
+      <a href="/docs/latest/streaming-programming-guide.html">language-integrated API</a>
+      to stream processing, letting you write streaming jobs the same way you write batch jobs.
       It supports Java, Scala and Python.
     </p>
   </div>

Modified: spark/streaming/index.md
URL: http://svn.apache.org/viewvc/spark/streaming/index.md?rev=1682772&r1=1682771&r2=1682772&view=diff
==============================================================================
--- spark/streaming/index.md (original)
+++ spark/streaming/index.md Sun May 31 19:04:53 2015
@@ -21,9 +21,9 @@ subproject: Streaming
       Build applications through high-level operators.
     </p>
     <p>
-      Spark Streaming brings <a href="{{site.url}}">Spark</a>'s
-      language-integrated API to stream processing,
-      letting you write streaming jobs the same way you write batch jobs.
+      Spark Streaming brings Spark's
+      <a href="{{site.url}}docs/latest/streaming-programming-guide.html">language-integrated API</a>
+      to stream processing, letting you write streaming jobs the same way you write batch jobs.
       It supports Java, Scala and Python.
     </p>
   </div>



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org