You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jomach <gi...@git.apache.org> on 2017/10/04 13:14:16 UTC
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
GitHub user jomach opened a pull request:
https://github.com/apache/spark/pull/19429
[SPARK-20055] [Docs] Added documentation for loading csv files into DataFrames
## What changes were proposed in this pull request?
Added documentation for loading csv files into Dataframes
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
Please review http://spark.apache.org/contributing.html before opening a pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jomach/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19429.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19429
----
commit f5941bf196a36afe8715d713fcaaf3f1a136d9e8
Author: Jorge Machado <jo...@hotmail.com>
Date: 2017-10-04T13:09:16Z
SPARK-20055 Documentation
-Added documentation for loading csv files into Dataframes
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19429
@jomach is a new contributor to Apache Spark. It might be hard for him to address the above comments. Please submit a separate PR for addressing it. Will review it. Thanks!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19429
**[Test build #82628 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82628/testReport)** for PR 19429 at commit [`cd69fa2`](https://github.com/apache/spark/commit/cd69fa240d453a7b8344796349a2bf03a20ffbfc).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19429#discussion_r143287505
--- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java ---
@@ -115,7 +115,20 @@ private static void runBasicDataSourceExample(SparkSession spark) {
Dataset<Row> peopleDF =
spark.read().format("json").load("examples/src/main/resources/people.json");
peopleDF.select("name", "age").write().format("parquet").save("namesAndAges.parquet");
- // $example off:manual_load_options$
+ // $example on:manual_load_options_csv$
--- End diff --
You still need to keep
> // $example off:manual_load_options$
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19429#discussion_r143933800
--- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java ---
@@ -116,6 +116,13 @@ private static void runBasicDataSourceExample(SparkSession spark) {
spark.read().format("json").load("examples/src/main/resources/people.json");
peopleDF.select("name", "age").write().format("parquet").save("namesAndAges.parquet");
// $example off:manual_load_options$
+ // $example on:manual_load_options_csv$
+ Dataset<Row> peopleDFCsv = spark.read().format("csv")
+ .option("sep", ";")
--- End diff --
ditto
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19429#discussion_r143288308
--- Diff: docs/sql-programming-guide.md ---
@@ -479,6 +481,47 @@ source type can be converted into other types using this syntax.
</div>
</div>
+To load a csv file you can use:
+
+<div class="codetabs">
+<div data-lang="scala" markdown="1">
+{% include_example manual_load_options_csv scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
+</div>
+
+<div data-lang="java" markdown="1">
+{% include_example manual_load_options_csv java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
+</div>
+
+<div data-lang="python" markdown="1">
+{% include_example manual_load_options_csv python/sql/datasource.py %}
+</div>
+
+<div data-lang="r" markdown="1">
+{% include_example manual_load_options_csv r/RSparkSQLExample.R %}
+</div>
+</div>
+
+To load a csv file you can use:
--- End diff --
This is also a duplicate.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19429
**[Test build #82629 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82629/testReport)** for PR 19429 at commit [`68799ed`](https://github.com/apache/spark/commit/68799ede999ec1874c80d242441032cd29a2f695).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/19429
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19429#discussion_r143933594
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -112,6 +112,11 @@ namesAndAges <- select(df, "name", "age")
write.df(namesAndAges, "namesAndAges.parquet", "parquet")
# $example off:manual_load_options$
+# $example on:manual_load_options_csv$
--- End diff --
I'd add a newline here above to keep consistent in this file
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19429
Thanks! Merged to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19429
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19429
I intended to explain `multiLine`, `inferSchema` and `header` which are quite arguably commonly used rather than just show up the examples. JSON one explains `multiLine` and each line of the examples with detailed comments.
Another point is, I'd like to let users see the options (without duplications) rather than checking API documentation as I am quite sure newbies often misunderstand this. For example, I happen to see newbies setting `inferSchema` to `true` to non-CSV datasources time to time, or setting `com.databricks.spark.csv` instead of `csv`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Posted by jomach <gi...@git.apache.org>.
Github user jomach commented on the issue:
https://github.com/apache/spark/pull/19429
@gatorsmile pr comments fixed. Sorry but is my first time.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19429
I wasn't even quite sure when I opened the JIRA. That's why I asked it to one of PMCs who might have a better insight. I am okay with going ahead as a small improvement in the docs if any committer likes it (though I don't support) but please leave the JIRA open. I think this PR does not fully solve the issue.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Posted by jomach <gi...@git.apache.org>.
Github user jomach commented on the issue:
https://github.com/apache/spark/pull/19429
@gatorsmile I dressed your comments. Still I cannot use the jekyll build...
`SKIP_API=1 jekyll build --incremental
Configuration file: /Users/jorge/Downloads/spark/docs/_config.yml
Deprecation: The 'gems' configuration option has been renamed to 'plugins'. Please update your config file accordingly.
Source: /Users/jorge/Downloads/spark/docs
Destination: /Users/jorge/Downloads/spark/docs/_site
Incremental build: enabled
Generating...
Liquid Exception: invalid byte sequence in US-ASCII in _layouts/redirect.html
`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19429
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82628/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/19429
+1 for more detailed documentation (we should steer away from `inferSchema`)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19429#discussion_r143933737
--- Diff: examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala ---
@@ -49,6 +49,14 @@ object SQLDataSourceExample {
val peopleDF = spark.read.format("json").load("examples/src/main/resources/people.json")
peopleDF.select("name", "age").write.format("parquet").save("namesAndAges.parquet")
// $example off:manual_load_options$
+ // $example on:manual_load_options_csv$
+ val peopleDFCsv = spark.read.format("csv")
+ .option("sep", ";")
--- End diff --
double-spaced (no tab of course ..)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19429#discussion_r143287943
--- Diff: examples/src/main/resources/people.csv ---
@@ -0,0 +1,3 @@
+name;age;job
+Jorge;30;Developer
+Bob;32;Developer
--- End diff --
Add an empty line.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Posted by jomach <gi...@git.apache.org>.
Github user jomach commented on a diff in the pull request:
https://github.com/apache/spark/pull/19429#discussion_r144321507
--- Diff: docs/sql-programming-guide.md ---
@@ -479,6 +481,26 @@ source type can be converted into other types using this syntax.
</div>
</div>
+To load a CSV file you can use:
+
+<div class="codetabs">
+<div data-lang="scala" markdown="1">
+{% include_example manual_load_options_csv scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
+</div>
+
+<div data-lang="java" markdown="1">
+{% include_example manual_load_options_csv java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
+</div>
+
+<div data-lang="python" markdown="1">
+{% include_example manual_load_options_csv python/sql/datasource.py %}
+</div>
+
+<div data-lang="r" markdown="1">
+{% include_example manual_load_options_csv r/RSparkSQLExample.R %}
+
+</div>
+</div>
### Run SQL on files directly
--- End diff --
@HyukjinKwon should I add a new line between line 503 and 504 ?
For example :
```
{% include_example generic_load_save_functions r/RSparkSQLExample.R %}
</div>
</div>
### Manually Specifying Options
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19429
ok to test
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19429
**[Test build #82628 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82628/testReport)** for PR 19429 at commit [`cd69fa2`](https://github.com/apache/spark/commit/cd69fa240d453a7b8344796349a2bf03a20ffbfc).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19429#discussion_r144201090
--- Diff: docs/sql-programming-guide.md ---
@@ -479,6 +481,26 @@ source type can be converted into other types using this syntax.
</div>
</div>
+To load a CSV file you can use:
+
+<div class="codetabs">
+<div data-lang="scala" markdown="1">
+{% include_example manual_load_options_csv scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
+</div>
+
+<div data-lang="java" markdown="1">
+{% include_example manual_load_options_csv java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
+</div>
+
+<div data-lang="python" markdown="1">
+{% include_example manual_load_options_csv python/sql/datasource.py %}
+</div>
+
+<div data-lang="r" markdown="1">
+{% include_example manual_load_options_csv r/RSparkSQLExample.R %}
+
+</div>
+</div>
### Run SQL on files directly
--- End diff --
Yup, that's okay. BTW, I initially what I meant in https://github.com/apache/spark/pull/19429#discussion_r143932389 was a newline between `</div>` and `### Run ..` (not `...ample.R %}` and `</div>`. This breaks rendering:
<img src="https://user-images.githubusercontent.com/6477701/31481516-cd9ddb80-af5e-11e7-970b-d2c279f025d4.png" width="200" />
Let's don't forget to fix this up before the release if the followup couldn't be made ahead.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19429
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Posted by jomach <gi...@git.apache.org>.
Github user jomach commented on the issue:
https://github.com/apache/spark/pull/19429
@felixcheung Sorry for that. Should be there now. Can you test ? thanks
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19429
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19429#discussion_r143932389
--- Diff: docs/sql-programming-guide.md ---
@@ -479,6 +481,25 @@ source type can be converted into other types using this syntax.
</div>
</div>
+To load a csv file you can use:
+
+<div class="codetabs">
+<div data-lang="scala" markdown="1">
+{% include_example manual_load_options_csv scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
+</div>
+
+<div data-lang="java" markdown="1">
+{% include_example manual_load_options_csv java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
+</div>
+
+<div data-lang="python" markdown="1">
+{% include_example manual_load_options_csv python/sql/datasource.py %}
+</div>
+
+<div data-lang="r" markdown="1">
+{% include_example manual_load_options_csv r/RSparkSQLExample.R %}
+</div>
+</div>
--- End diff --
Let's add another newline here. It breaks rendering.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19429#discussion_r144330025
--- Diff: docs/sql-programming-guide.md ---
@@ -479,6 +481,26 @@ source type can be converted into other types using this syntax.
</div>
</div>
+To load a CSV file you can use:
+
+<div class="codetabs">
+<div data-lang="scala" markdown="1">
+{% include_example manual_load_options_csv scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
+</div>
+
+<div data-lang="java" markdown="1">
+{% include_example manual_load_options_csv java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
+</div>
+
+<div data-lang="python" markdown="1">
+{% include_example manual_load_options_csv python/sql/datasource.py %}
+</div>
+
+<div data-lang="r" markdown="1">
+{% include_example manual_load_options_csv r/RSparkSQLExample.R %}
+
+</div>
+</div>
### Run SQL on files directly
--- End diff --
Yup, a newline between 503 and 504.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19429
**[Test build #82629 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82629/testReport)** for PR 19429 at commit [`68799ed`](https://github.com/apache/spark/commit/68799ede999ec1874c80d242441032cd29a2f695).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19429
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82629/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19429
**[Test build #82630 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82630/testReport)** for PR 19429 at commit [`7ff1d84`](https://github.com/apache/spark/commit/7ff1d84779acc50ab3c63d9bc0651ac53193f555).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19429
**[Test build #82630 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82630/testReport)** for PR 19429 at commit [`7ff1d84`](https://github.com/apache/spark/commit/7ff1d84779acc50ab3c63d9bc0651ac53193f555).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Posted by jomach <gi...@git.apache.org>.
Github user jomach commented on the issue:
https://github.com/apache/spark/pull/19429
@gatorsmile pr comments fixed. The problem with the actual docs is that people wen start with spark usually don't start with JSON files but with CSV files to "see" something....
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19429#discussion_r143932676
--- Diff: docs/sql-programming-guide.md ---
@@ -461,6 +461,8 @@ name (i.e., `org.apache.spark.sql.parquet`), but for built-in sources you can al
names (`json`, `parquet`, `jdbc`, `orc`, `libsvm`, `csv`, `text`). DataFrames loaded from any data
source type can be converted into other types using this syntax.
+To load a json file you can use:
--- End diff --
I'd say `JSON` instead of `json`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19429#discussion_r143929178
--- Diff: examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala ---
@@ -49,6 +49,14 @@ object SQLDataSourceExample {
val peopleDF = spark.read.format("json").load("examples/src/main/resources/people.json")
peopleDF.select("name", "age").write.format("parquet").save("namesAndAges.parquet")
// $example off:manual_load_options$
+ // $example on:manual_load_options_csv$
+ val peopleDFCsv = spark.read.format("csv")
+ .option("sep", ";")
+ .option("inferSchema", "true")
+ .option("header", "true")
+ .load("examples/src/main/resources/people.csv")
--- End diff --
Could you change the indents of line 54-57 to 2 spaces?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19429#discussion_r143287807
--- Diff: examples/src/main/python/sql/datasource.py ---
@@ -53,6 +53,11 @@ def basic_datasource_example(spark):
df.select("name", "age").write.save("namesAndAges.parquet", format="parquet")
# $example off:manual_load_options$
+ # $example on:manual_load_options_csv$
+ df = spark.read.load("examples/src/main/resources/people.csv",
+ format="csv", sep=":", inferSchema="true", header="true")
+ # $example off:manual_load_options_csv
--- End diff --
This need to be corrected to
> # $example off:manual_load_options_csv$
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19429#discussion_r143288202
--- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java ---
@@ -115,7 +115,20 @@ private static void runBasicDataSourceExample(SparkSession spark) {
Dataset<Row> peopleDF =
spark.read().format("json").load("examples/src/main/resources/people.json");
peopleDF.select("name", "age").write().format("parquet").save("namesAndAges.parquet");
- // $example off:manual_load_options$
+ // $example on:manual_load_options_csv$
+ Dataset<Row> peopleDFCsv = spark.read().format("csv")
+ .option("sep", ";")
+ .option("inferSchema", "true")
+ .option("header", "true")
+ .load("examples/src/main/resources/people.csv");
+ // $example off:manual_load_options_csv$
+ // $example on:manual_load_options_csv$
+ Dataset<Row> peopleDFCsv = spark.read().format("csv")
+ .option("sep", ";")
+ .option("inferSchema", "true")
+ .option("header", "true")
+ .load("examples/src/main/resources/people.csv");
+ // $example off:manual_load_options_csv$
--- End diff --
Line 125-131 is a duplicate.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19429#discussion_r143929114
--- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java ---
@@ -116,6 +116,13 @@ private static void runBasicDataSourceExample(SparkSession spark) {
spark.read().format("json").load("examples/src/main/resources/people.json");
peopleDF.select("name", "age").write().format("parquet").save("namesAndAges.parquet");
// $example off:manual_load_options$
+ // $example on:manual_load_options_csv$
+ Dataset<Row> peopleDFCsv = spark.read().format("csv")
+ .option("sep", ";")
+ .option("inferSchema", "true")
+ .option("header", "true")
--- End diff --
Could you change the indents of line 121-123 to 2 spaces?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19429
When I opened a JIRA, I thought a chapter such as https://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets. This chapter, `Manually Specifying Options`, looks describing how to specify options BTW.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19429
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19429#discussion_r143935085
--- Diff: docs/sql-programming-guide.md ---
@@ -479,6 +481,25 @@ source type can be converted into other types using this syntax.
</div>
</div>
+To load a csv file you can use:
--- End diff --
ditto
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19429
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82630/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org