You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by pw...@apache.org on 2014/12/19 19:53:06 UTC

svn commit: r1646822 - in /spark: releases/_posts/2014-12-18-spark-release-1-2-0.md site/releases/spark-release-1-2-0.html

Author: pwendell
Date: Fri Dec 19 18:53:06 2014
New Revision: 1646822

URL: http://svn.apache.org/r1646822
Log:
Release note typo's

Modified:
    spark/releases/_posts/2014-12-18-spark-release-1-2-0.md
    spark/site/releases/spark-release-1-2-0.html

Modified: spark/releases/_posts/2014-12-18-spark-release-1-2-0.md
URL: http://svn.apache.org/viewvc/spark/releases/_posts/2014-12-18-spark-release-1-2-0.md?rev=1646822&r1=1646821&r2=1646822&view=diff
==============================================================================
--- spark/releases/_posts/2014-12-18-spark-release-1-2-0.md (original)
+++ spark/releases/_posts/2014-12-18-spark-release-1-2-0.md Fri Dec 19 18:53:06 2014
@@ -16,13 +16,13 @@ Spark 1.2.0 is the third release on the
 To download Spark 1.2 visit the <a href="{{site.url}}downloads.html">downloads</a> page.
 
 ### Spark Core
-In 1.2 Spark core upgrades to major subsystems to improve the performance and stability of very large scale shuffles. The first is Spark’s communication manager used during bulk transfers, which upgrades to a [netty-based implementation](https://issues.apache.org/jira/browse/SPARK-2468). The second is Spark’s shuffle mechanism, which upgrades to the [“sort based” shuffle initially released in Spark 1.1](https://issues.apache.org/jira/browse/SPARK-3280). These both improve the performance and stability of very large scale shuffles. Spark also adds an [elastic scaling mechanism](https://issues.apache.org/jira/browse/SPARK-3174) designed to improve cluster utilization during long running ETL-style jobs. This is currently supported on YARN and will make its way to other cluster managers in future versions. Finally, Spark 1.2 adds support for Scala 2.11. For instructions on building for Scala 2.11 see the [build documentation](/docs/1.2.0/building-spark.html#building-
 for-scala-211).
+In 1.2 Spark core upgrades two major subsystems to improve the performance and stability of very large scale shuffles. The first is Spark’s communication manager used during bulk transfers, which upgrades to a [netty-based implementation](https://issues.apache.org/jira/browse/SPARK-2468). The second is Spark’s shuffle mechanism, which upgrades to the [“sort based” shuffle initially released in Spark 1.1](https://issues.apache.org/jira/browse/SPARK-3280). These both improve the performance and stability of very large scale shuffles. Spark also adds an [elastic scaling mechanism](https://issues.apache.org/jira/browse/SPARK-3174) designed to improve cluster utilization during long running ETL-style jobs. This is currently supported on YARN and will make its way to other cluster managers in future versions. Finally, Spark 1.2 adds support for Scala 2.11. For instructions on building for Scala 2.11 see the [build documentation](/docs/1.2.0/building-spark.html#building
 -for-scala-211).
 
 ### Spark Streaming
 This release includes two major feature additions to Spark’s streaming library, a Python API and a write ahead log for full driver H/A. The [Python API](https://issues.apache.org/jira/browse/SPARK-2377) covers almost all the DStream transformations and output operations. Input sources based on text files and text over sockets are currently supported. Support for Kafka and Flume input streams in Python will be added in the next release. Second, Spark streaming now features H/A driver support through a [write ahead log (WAL)](https://issues.apache.org/jira/browse/SPARK-3129). In Spark 1.1 and earlier, some buffered (received but not yet processed) data can be lost during driver restarts. To prevent this Spark 1.2 adds an optional WAL, which buffers received data into a fault-tolerant file system (e.g. HDFS). See the [streaming programming guide](/docs/1.2.0/streaming-programming-guide.html) for more details. 
 
 ### MLLib
-Spark 1.2 previews a new set of machine learning API’s in a package called spark.ml that [supports learning pipelines](https://issues.apache.org/jira/browse/SPARK-3530), where multiple algorithms are run in sequence with varying parameters. This type of pipeline is common in practical machine learning deployments. The new ML package uses Spark’s SchemaRDD to represent [ML datasets](https://issues.apache.org/jira/browse/SPARK-3573), providing directly interoperability with Spark SQL. In addition to the new API, Spark 1.2 extends decision trees with two tree ensemble methods: [random forests](https://issues.apache.org/jira/browse/SPARK-1545) and [gradient-boosted trees](https://issues.apache.org/jira/browse/SPARK-1547), among the most successful tree-based models for classification and regression. Finally, MLlib's Python implementation receives a major update in 1.2 to simplify the process of adding Python APIs, along with better Python API coverage.
+Spark 1.2 previews a new set of machine learning API’s in a package called spark.ml that [supports learning pipelines](https://issues.apache.org/jira/browse/SPARK-3530), where multiple algorithms are run in sequence with varying parameters. This type of pipeline is common in practical machine learning deployments. The new ML package uses Spark’s SchemaRDD to represent [ML datasets](https://issues.apache.org/jira/browse/SPARK-3573), providing direct interoperability with Spark SQL. In addition to the new API, Spark 1.2 extends decision trees with two tree ensemble methods: [random forests](https://issues.apache.org/jira/browse/SPARK-1545) and [gradient-boosted trees](https://issues.apache.org/jira/browse/SPARK-1547), among the most successful tree-based models for classification and regression. Finally, MLlib's Python implementation receives a major update in 1.2 to simplify the process of adding Python APIs, along with better Python API coverage.
 
 ### Spark SQL
 In this release Spark SQL adds a [new API for external data sources](https://issues.apache.org/jira/browse/SPARK-3247). This API supports mounting external data sources as temporary tables, with support for optimizations such as predicate pushdown. Spark’s [Parquet](https://issues.apache.org/jira/browse/SPARK-4413) and JSON bindings have been re-written to use this API and we expect a variety of community projects to integrate with other systems and formats during the 1.2 lifecycle.
@@ -30,7 +30,6 @@ In this release Spark SQL adds a [new AP
 Hive integration has been improved with support for the [fixed-precision decimal type](https://issues.apache.org/jira/browse/SPARK-3929) and [Hive 0.13](https://issues.apache.org/jira/browse/SPARK-2706). Spark SQL also adds [dynamically partitioned inserts](https://issues.apache.org/jira/browse/SPARK-3007), a popular Hive feature. An internal re-architecting around caching improves the performance and semantics of [caching SchemaRDD](https://issues.apache.org/jira/browse/SPARK-3212) instances and adds support for [statistics-based partition pruning](https://issues.apache.org/jira/browse/SPARK-2961) for cached data.
 
 ### GraphX
-In 1.2 GraphX graduates from an alpha component and adds a stable API. This means applications written against GraphX are guaranteed to work with future Spark versions without code changes. 
 In 1.2 GraphX graduates from an alpha component and adds a stable API. This means applications written against GraphX are guaranteed to work with future Spark versions without code changes. A new core API, aggregateMessages, is introduced to replace the now deprecated mapReduceTriplet API. The new aggregateMessages API features a more imperative programming model and improves performance. Some early test users found 20% - 1X performance improvement by switching to the new API.
 
 In addition, Spark now supports [graph checkpointing](https://issues.apache.org/jira/browse/SPARK-3623) and [lineage truncation](https://issues.apache.org/jira/browse/SPARK-4672) which are necessary to support large numbers of iterations in production jobs. Finally, a handful of performance improvements have been added for [PageRank](https://issues.apache.org/jira/browse/SPARK-3427) and [graph loading](https://issues.apache.org/jira/browse/SPARK-4646).
@@ -43,7 +42,7 @@ In addition, Spark now supports [graph c
 
 ### Upgrading to Spark 1.2
 
-Spark 1.2 is binary compatible with Spark 1.0 and 1.1, so no code changes are necessary. This excludes API’s marked explicitly as unstable. Spark changes default configuration in a handful of cases for improved performance. Users who want to preserve identical configurations to Spark 1.1 can roll back these changes.
+Spark 1.2 is binary compatible with Spark 1.0 and 1.1, so no code changes are necessary. This excludes APIs marked explicitly as unstable. Spark changes default configuration in a handful of cases for improved performance. Users who want to preserve identical configurations to Spark 1.1 can roll back these changes.
 
 1. `spark.shuffle.blockTransferService` has been changed from `nio` to `netty`
 2. `spark.shuffle.manager` has been changed from `hash` to `sort`

Modified: spark/site/releases/spark-release-1-2-0.html
URL: http://svn.apache.org/viewvc/spark/site/releases/spark-release-1-2-0.html?rev=1646822&r1=1646821&r2=1646822&view=diff
==============================================================================
--- spark/site/releases/spark-release-1-2-0.html (original)
+++ spark/site/releases/spark-release-1-2-0.html Fri Dec 19 18:53:06 2014
@@ -171,13 +171,13 @@
 <p>To download Spark 1.2 visit the <a href="/downloads.html">downloads</a> page.</p>
 
 <h3 id="spark-core">Spark Core</h3>
-<p>In 1.2 Spark core upgrades to major subsystems to improve the performance and stability of very large scale shuffles. The first is Spark’s communication manager used during bulk transfers, which upgrades to a <a href="https://issues.apache.org/jira/browse/SPARK-2468">netty-based implementation</a>. The second is Spark’s shuffle mechanism, which upgrades to the <a href="https://issues.apache.org/jira/browse/SPARK-3280">“sort based” shuffle initially released in Spark 1.1</a>. These both improve the performance and stability of very large scale shuffles. Spark also adds an <a href="https://issues.apache.org/jira/browse/SPARK-3174">elastic scaling mechanism</a> designed to improve cluster utilization during long running ETL-style jobs. This is currently supported on YARN and will make its way to other cluster managers in future versions. Finally, Spark 1.2 adds support for Scala 2.11. For instructions on building for Scala 2.11 see the <a href="/docs/1.2.0/buildi
 ng-spark.html#building-for-scala-211">build documentation</a>.</p>
+<p>In 1.2 Spark core upgrades two major subsystems to improve the performance and stability of very large scale shuffles. The first is Spark’s communication manager used during bulk transfers, which upgrades to a <a href="https://issues.apache.org/jira/browse/SPARK-2468">netty-based implementation</a>. The second is Spark’s shuffle mechanism, which upgrades to the <a href="https://issues.apache.org/jira/browse/SPARK-3280">“sort based” shuffle initially released in Spark 1.1</a>. These both improve the performance and stability of very large scale shuffles. Spark also adds an <a href="https://issues.apache.org/jira/browse/SPARK-3174">elastic scaling mechanism</a> designed to improve cluster utilization during long running ETL-style jobs. This is currently supported on YARN and will make its way to other cluster managers in future versions. Finally, Spark 1.2 adds support for Scala 2.11. For instructions on building for Scala 2.11 see the <a href="/docs/1.2.0/build
 ing-spark.html#building-for-scala-211">build documentation</a>.</p>
 
 <h3 id="spark-streaming">Spark Streaming</h3>
 <p>This release includes two major feature additions to Spark’s streaming library, a Python API and a write ahead log for full driver H/A. The <a href="https://issues.apache.org/jira/browse/SPARK-2377">Python API</a> covers almost all the DStream transformations and output operations. Input sources based on text files and text over sockets are currently supported. Support for Kafka and Flume input streams in Python will be added in the next release. Second, Spark streaming now features H/A driver support through a <a href="https://issues.apache.org/jira/browse/SPARK-3129">write ahead log (WAL)</a>. In Spark 1.1 and earlier, some buffered (received but not yet processed) data can be lost during driver restarts. To prevent this Spark 1.2 adds an optional WAL, which buffers received data into a fault-tolerant file system (e.g. HDFS). See the <a href="/docs/1.2.0/streaming-programming-guide.html">streaming programming guide</a> for more details. </p>
 
 <h3 id="mllib">MLLib</h3>
-<p>Spark 1.2 previews a new set of machine learning API’s in a package called spark.ml that <a href="https://issues.apache.org/jira/browse/SPARK-3530">supports learning pipelines</a>, where multiple algorithms are run in sequence with varying parameters. This type of pipeline is common in practical machine learning deployments. The new ML package uses Spark’s SchemaRDD to represent <a href="https://issues.apache.org/jira/browse/SPARK-3573">ML datasets</a>, providing directly interoperability with Spark SQL. In addition to the new API, Spark 1.2 extends decision trees with two tree ensemble methods: <a href="https://issues.apache.org/jira/browse/SPARK-1545">random forests</a> and <a href="https://issues.apache.org/jira/browse/SPARK-1547">gradient-boosted trees</a>, among the most successful tree-based models for classification and regression. Finally, MLlib&#8217;s Python implementation receives a major update in 1.2 to simplify the process of adding Python APIs, along with
  better Python API coverage.</p>
+<p>Spark 1.2 previews a new set of machine learning API’s in a package called spark.ml that <a href="https://issues.apache.org/jira/browse/SPARK-3530">supports learning pipelines</a>, where multiple algorithms are run in sequence with varying parameters. This type of pipeline is common in practical machine learning deployments. The new ML package uses Spark’s SchemaRDD to represent <a href="https://issues.apache.org/jira/browse/SPARK-3573">ML datasets</a>, providing direct interoperability with Spark SQL. In addition to the new API, Spark 1.2 extends decision trees with two tree ensemble methods: <a href="https://issues.apache.org/jira/browse/SPARK-1545">random forests</a> and <a href="https://issues.apache.org/jira/browse/SPARK-1547">gradient-boosted trees</a>, among the most successful tree-based models for classification and regression. Finally, MLlib&#8217;s Python implementation receives a major update in 1.2 to simplify the process of adding Python APIs, along with b
 etter Python API coverage.</p>
 
 <h3 id="spark-sql">Spark SQL</h3>
 <p>In this release Spark SQL adds a <a href="https://issues.apache.org/jira/browse/SPARK-3247">new API for external data sources</a>. This API supports mounting external data sources as temporary tables, with support for optimizations such as predicate pushdown. Spark’s <a href="https://issues.apache.org/jira/browse/SPARK-4413">Parquet</a> and JSON bindings have been re-written to use this API and we expect a variety of community projects to integrate with other systems and formats during the 1.2 lifecycle.</p>
@@ -185,8 +185,7 @@
 <p>Hive integration has been improved with support for the <a href="https://issues.apache.org/jira/browse/SPARK-3929">fixed-precision decimal type</a> and <a href="https://issues.apache.org/jira/browse/SPARK-2706">Hive 0.13</a>. Spark SQL also adds <a href="https://issues.apache.org/jira/browse/SPARK-3007">dynamically partitioned inserts</a>, a popular Hive feature. An internal re-architecting around caching improves the performance and semantics of <a href="https://issues.apache.org/jira/browse/SPARK-3212">caching SchemaRDD</a> instances and adds support for <a href="https://issues.apache.org/jira/browse/SPARK-2961">statistics-based partition pruning</a> for cached data.</p>
 
 <h3 id="graphx">GraphX</h3>
-<p>In 1.2 GraphX graduates from an alpha component and adds a stable API. This means applications written against GraphX are guaranteed to work with future Spark versions without code changes. 
-In 1.2 GraphX graduates from an alpha component and adds a stable API. This means applications written against GraphX are guaranteed to work with future Spark versions without code changes. A new core API, aggregateMessages, is introduced to replace the now deprecated mapReduceTriplet API. The new aggregateMessages API features a more imperative programming model and improves performance. Some early test users found 20% - 1X performance improvement by switching to the new API.</p>
+<p>In 1.2 GraphX graduates from an alpha component and adds a stable API. This means applications written against GraphX are guaranteed to work with future Spark versions without code changes. A new core API, aggregateMessages, is introduced to replace the now deprecated mapReduceTriplet API. The new aggregateMessages API features a more imperative programming model and improves performance. Some early test users found 20% - 1X performance improvement by switching to the new API.</p>
 
 <p>In addition, Spark now supports <a href="https://issues.apache.org/jira/browse/SPARK-3623">graph checkpointing</a> and <a href="https://issues.apache.org/jira/browse/SPARK-4672">lineage truncation</a> which are necessary to support large numbers of iterations in production jobs. Finally, a handful of performance improvements have been added for <a href="https://issues.apache.org/jira/browse/SPARK-3427">PageRank</a> and <a href="https://issues.apache.org/jira/browse/SPARK-4646">graph loading</a>.</p>
 
@@ -200,7 +199,7 @@ In 1.2 GraphX graduates from an alpha co
 
 <h3 id="upgrading-to-spark-12">Upgrading to Spark 1.2</h3>
 
-<p>Spark 1.2 is binary compatible with Spark 1.0 and 1.1, so no code changes are necessary. This excludes API’s marked explicitly as unstable. Spark changes default configuration in a handful of cases for improved performance. Users who want to preserve identical configurations to Spark 1.1 can roll back these changes.</p>
+<p>Spark 1.2 is binary compatible with Spark 1.0 and 1.1, so no code changes are necessary. This excludes APIs marked explicitly as unstable. Spark changes default configuration in a handful of cases for improved performance. Users who want to preserve identical configurations to Spark 1.1 can roll back these changes.</p>
 
 <ol>
   <li><code>spark.shuffle.blockTransferService</code> has been changed from <code>nio</code> to <code>netty</code></li>



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org