You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by ma...@apache.org on 2013/08/24 00:17:17 UTC

svn commit: r1517075 - /incubator/spark/index.md

Author: matei
Date: Fri Aug 23 22:17:16 2013
New Revision: 1517075

URL: http://svn.apache.org/r1517075
Log:
Front page

Modified:
    incubator/spark/index.md

Modified: incubator/spark/index.md
URL: http://svn.apache.org/viewvc/incubator/spark/index.md?rev=1517075&r1=1517074&r2=1517075&view=diff
==============================================================================
--- incubator/spark/index.md (original)
+++ incubator/spark/index.md Fri Aug 23 22:17:16 2013
@@ -7,17 +7,29 @@ navigation:
   show: true
 ---
 ## What is Apache Spark?
+
 Apache Spark is an open source cluster computing system that aims to make data analytics <em>fast</em> — both fast to run and fast to write.
-To run programs faster, Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly much more quickly than with disk-based systems like Hadoop MapReduce.
-To make programming faster, Spark provides clean, concise APIs in <a href="http://www.scala-lang.org" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://www.scala-lang.org']);">Scala</a>, <a href="{{site.url}}docs/latest/quick-start.html#a-standalone-job-in-java" >Java</a> and <a href="{{site.url}}docs/latest/quick-start.html#a-standalone-job-in-python" >Python</a>. You can also use Spark interactively from the Scala and Python shells to rapidly query big datasets.
+
+To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop.
+
+To make programming faster, Spark provides clean, concise APIs in
+<a href="{{site.url}}docs/latest/quick-start.html#a-standalone-job-in-python" >Python</a>,
+<a href="http://www.scala-lang.org" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://www.scala-lang.org']);">Scala</a> and
+<a href="{{site.url}}docs/latest/quick-start.html#a-standalone-job-in-java">Java</a>.
+You can also use Spark interactively from the Scala and Python shells to rapidly query big datasets.
 
 ## What can it do?
-Spark was initially developed for two  applications where keeping data in memory helps: <em>iterative</em> algorithms, which are common in machine learning, and <em>interactive</em> data mining. In both cases, Spark can run up to <b>100x</b> faster than Hadoop MapReduce. However, you can use Spark for general data processing too. Check out our <a href="{{site.url}}examples.html" >example jobs</a>.
+
+Spark was initially developed for two  applications where placing data in memory helps: <em>iterative</em> algorithms, which are common in machine learning, and <em>interactive</em> data mining. In both cases, Spark can run up to <b>100x</b> faster than Hadoop MapReduce. However, you can use Spark for general data processing too. Check out our <a href="{{site.url}}examples.html" >example jobs</a>.
+
 Spark is also the engine behind <a href="http://shark.cs.berkeley.edu" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://shark.cs.berkeley.edu']);">Shark</a>, a fully <a href="http://hive.apache.org" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://hive.apache.org']);">Apache Hive</a>-compatible data warehousing system that can run 100x faster than Hive.
+
 While Spark is a new engine, it can access any data source supported by Hadoop, making it easy to run over existing data.
 
 ## Who uses it?
-Spark was developed in the <a href="https://amplab.cs.berkeley.edu" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://amplab.cs.berkeley.edu']);">UC Berkeley AMPLab</a>. It&#8217;s used by several groups of researchers at Berkeley to run large-scale applications such as spam filtering and traffic prediction. It&#8217;s also used to accelerate data analytics at <a href="http://www.yahoo.com" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://www.yahoo.com']);">Yahoo!</a>, <a href="http://www.conviva.com" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://www.conviva.com']);">Conviva</a>, <a href="http://www.quantifind.com" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://www.quantifind.com']);">Quantifind</a>, and other companies &#8212; in total, 17 companies have contributed to Spark! Spark is <a href="https://github.com/mesos/spark" onclick="javascript:_gaq.push(['_trackEvent','outbound-article'
 ,'http://github.com']);">open source</a> under a BSD license, so <a href="{{site.url}}downloads.html" >download</a> it to check it out.
+Spark was initially developed in the <a href="https://amplab.cs.berkeley.edu" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://amplab.cs.berkeley.edu']);">UC Berkeley AMPLab</a>, but is now being used and developed at a wide array of companies, including <a href="http://www.yahoo.com" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://www.yahoo.com']);">Yahoo!</a>, <a href="http://www.conviva.com" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://www.conviva.com']);">Conviva</a>, and <a href="http://www.quantifind.com" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://www.quantifind.com']);">Quantifind</a>.
+In total, over 20 companies have contributed code to Spark.
+Spark is <a href="https://github.com/mesos/spark" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://github.com']);">open source</a> under an Apache license, so <a href="{{site.url}}downloads.html" >download</a> it to check it out.
 
 ## Apache Incubator notice
 Apache Spark is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.