You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2017/09/25 00:16:32 UTC
spark git commit: [SPARK-22107] Change as to alias in python
quickstart
Repository: spark
Updated Branches:
refs/heads/master 576c43fb4 -> 20adf9aa1
[SPARK-22107] Change as to alias in python quickstart
## What changes were proposed in this pull request?
Updated docs so that a line of python in the quick start guide executes. Closes #19283
## How was this patch tested?
Existing tests.
Author: John O'Leary <jg...@gmail.com>
Closes #19326 from jgoleary/issues/22107.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/20adf9aa
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/20adf9aa
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/20adf9aa
Branch: refs/heads/master
Commit: 20adf9aa1f42353432d356117e655e799ea1290b
Parents: 576c43f
Author: John O'Leary <jg...@gmail.com>
Authored: Mon Sep 25 09:16:27 2017 +0900
Committer: hyukjinkwon <gu...@gmail.com>
Committed: Mon Sep 25 09:16:27 2017 +0900
----------------------------------------------------------------------
docs/quick-start.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/20adf9aa/docs/quick-start.md
----------------------------------------------------------------------
diff --git a/docs/quick-start.md b/docs/quick-start.md
index a85e5b2..200b972 100644
--- a/docs/quick-start.md
+++ b/docs/quick-start.md
@@ -153,7 +153,7 @@ This first maps a line to an integer value and aliases it as "numWords", creatin
One common data flow pattern is MapReduce, as popularized by Hadoop. Spark can implement MapReduce flows easily:
{% highlight python %}
->>> wordCounts = textFile.select(explode(split(textFile.value, "\s+")).as("word")).groupBy("word").count()
+>>> wordCounts = textFile.select(explode(split(textFile.value, "\s+")).alias("word")).groupBy("word").count()
{% endhighlight %}
Here, we use the `explode` function in `select`, to transfrom a Dataset of lines to a Dataset of words, and then combine `groupBy` and `count` to compute the per-word counts in the file as a DataFrame of 2 columns: "word" and "count". To collect the word counts in our shell, we can call `collect`:
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org