You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by jomach <gi...@git.apache.org> on 2017/10/04 13:14:16 UTC

[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

GitHub user jomach opened a pull request:

    https://github.com/apache/spark/pull/19429

    [SPARK-20055] [Docs] Added documentation for loading csv files into DataFrames

     
    
    ## What changes were proposed in this pull request?
    
     Added documentation for loading csv files into Dataframes
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jomach/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19429.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19429
    
----
commit f5941bf196a36afe8715d713fcaaf3f1a136d9e8
Author: Jorge Machado <jo...@hotmail.com>
Date:   2017-10-04T13:09:16Z

    SPARK-20055 Documentation
     -Added documentation for loading csv files into Dataframes

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19429
  
    @jomach is a new contributor to Apache Spark. It might be hard for him to address the above comments. Please submit a separate PR for addressing it. Will review it. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19429
  
    **[Test build #82628 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82628/testReport)** for PR 19429 at commit [`cd69fa2`](https://github.com/apache/spark/commit/cd69fa240d453a7b8344796349a2bf03a20ffbfc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19429#discussion_r143287505
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java ---
    @@ -115,7 +115,20 @@ private static void runBasicDataSourceExample(SparkSession spark) {
         Dataset<Row> peopleDF =
           spark.read().format("json").load("examples/src/main/resources/people.json");
         peopleDF.select("name", "age").write().format("parquet").save("namesAndAges.parquet");
    -    // $example off:manual_load_options$
    +    // $example on:manual_load_options_csv$
    --- End diff --
    
    You still need to keep 
    > // $example off:manual_load_options$


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19429#discussion_r143933800
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java ---
    @@ -116,6 +116,13 @@ private static void runBasicDataSourceExample(SparkSession spark) {
           spark.read().format("json").load("examples/src/main/resources/people.json");
         peopleDF.select("name", "age").write().format("parquet").save("namesAndAges.parquet");
         // $example off:manual_load_options$
    +    // $example on:manual_load_options_csv$
    +    Dataset<Row> peopleDFCsv = spark.read().format("csv")
    +	  .option("sep", ";")
    --- End diff --
    
    ditto


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19429#discussion_r143288308
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -479,6 +481,47 @@ source type can be converted into other types using this syntax.
     </div>
     </div>
     
    +To load a csv file you can use:
    +
    +<div class="codetabs">
    +<div data-lang="scala"  markdown="1">
    +{% include_example manual_load_options_csv scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
    +</div>
    +
    +<div data-lang="java"  markdown="1">
    +{% include_example manual_load_options_csv java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
    +</div>
    +
    +<div data-lang="python"  markdown="1">
    +{% include_example manual_load_options_csv python/sql/datasource.py %}
    +</div>
    +
    +<div data-lang="r"  markdown="1">
    +{% include_example manual_load_options_csv r/RSparkSQLExample.R %}
    +</div>
    +</div>
    +
    +To load a csv file you can use:
    --- End diff --
    
    This is also a duplicate. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19429
  
    **[Test build #82629 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82629/testReport)** for PR 19429 at commit [`68799ed`](https://github.com/apache/spark/commit/68799ede999ec1874c80d242441032cd29a2f695).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19429


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19429#discussion_r143933594
  
    --- Diff: examples/src/main/r/RSparkSQLExample.R ---
    @@ -112,6 +112,11 @@ namesAndAges <- select(df, "name", "age")
     write.df(namesAndAges, "namesAndAges.parquet", "parquet")
     # $example off:manual_load_options$
     
    +# $example on:manual_load_options_csv$
    --- End diff --
    
    I'd add a newline here above to keep consistent in this file


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19429
  
    Thanks! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19429
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19429
  
    I intended to explain `multiLine`, `inferSchema` and `header` which are quite arguably commonly used rather than just show up the examples. JSON one explains `multiLine` and each line of the examples with detailed comments. 
    
    Another point is, I'd like to let users see the options (without duplications) rather than checking API documentation as I am quite sure newbies often misunderstand this. For example, I happen to see newbies setting `inferSchema` to `true` to non-CSV datasources time to time, or setting `com.databricks.spark.csv` instead of `csv`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

Posted by jomach <gi...@git.apache.org>.

Github user jomach commented on the issue:

    https://github.com/apache/spark/pull/19429
  
    @gatorsmile pr comments fixed. Sorry but is my first time.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19429
  
    I wasn't even quite sure when I opened the JIRA. That's why I asked it to one of PMCs who might have a better insight. I am okay with going ahead as a small improvement in the docs if any committer likes it (though I don't support) but please leave the JIRA open. I think this PR does not fully solve the issue.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

Posted by jomach <gi...@git.apache.org>.

Github user jomach commented on the issue:

    https://github.com/apache/spark/pull/19429
  
    @gatorsmile  I dressed your comments. Still I cannot use the jekyll build...
    `SKIP_API=1 jekyll build --incremental
    Configuration file: /Users/jorge/Downloads/spark/docs/_config.yml
           Deprecation: The 'gems' configuration option has been renamed to 'plugins'. Please update your config file accordingly.
                Source: /Users/jorge/Downloads/spark/docs
           Destination: /Users/jorge/Downloads/spark/docs/_site
     Incremental build: enabled
          Generating... 
      Liquid Exception: invalid byte sequence in US-ASCII in _layouts/redirect.html
    `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19429
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82628/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/19429
  
    +1 for more detailed documentation (we should steer away from `inferSchema`)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19429#discussion_r143933737
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala ---
    @@ -49,6 +49,14 @@ object SQLDataSourceExample {
         val peopleDF = spark.read.format("json").load("examples/src/main/resources/people.json")
         peopleDF.select("name", "age").write.format("parquet").save("namesAndAges.parquet")
         // $example off:manual_load_options$
    +    // $example on:manual_load_options_csv$
    +    val peopleDFCsv = spark.read.format("csv")
    +	  .option("sep", ";")
    --- End diff --
    
    double-spaced (no tab of course ..)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19429#discussion_r143287943
  
    --- Diff: examples/src/main/resources/people.csv ---
    @@ -0,0 +1,3 @@
    +name;age;job
    +Jorge;30;Developer
    +Bob;32;Developer
    --- End diff --
    
    Add an empty line.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

Posted by jomach <gi...@git.apache.org>.

Github user jomach commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19429#discussion_r144321507
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -479,6 +481,26 @@ source type can be converted into other types using this syntax.
     </div>
     </div>
     
    +To load a CSV file you can use:
    +
    +<div class="codetabs">
    +<div data-lang="scala"  markdown="1">
    +{% include_example manual_load_options_csv scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
    +</div>
    +
    +<div data-lang="java"  markdown="1">
    +{% include_example manual_load_options_csv java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
    +</div>
    +
    +<div data-lang="python"  markdown="1">
    +{% include_example manual_load_options_csv python/sql/datasource.py %}
    +</div>
    +
    +<div data-lang="r"  markdown="1">
    +{% include_example manual_load_options_csv r/RSparkSQLExample.R %}
    +
    +</div>
    +</div>
     ### Run SQL on files directly
    --- End diff --
    
    @HyukjinKwon  should I add a new line between line 503 and 504 ? 
    For example : 
    ```
    {% include_example generic_load_save_functions r/RSparkSQLExample.R %}
    
    </div>
    </div>
    
    ### Manually Specifying Options
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19429
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19429
  
    **[Test build #82628 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82628/testReport)** for PR 19429 at commit [`cd69fa2`](https://github.com/apache/spark/commit/cd69fa240d453a7b8344796349a2bf03a20ffbfc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19429#discussion_r144201090
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -479,6 +481,26 @@ source type can be converted into other types using this syntax.
     </div>
     </div>
     
    +To load a CSV file you can use:
    +
    +<div class="codetabs">
    +<div data-lang="scala"  markdown="1">
    +{% include_example manual_load_options_csv scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
    +</div>
    +
    +<div data-lang="java"  markdown="1">
    +{% include_example manual_load_options_csv java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
    +</div>
    +
    +<div data-lang="python"  markdown="1">
    +{% include_example manual_load_options_csv python/sql/datasource.py %}
    +</div>
    +
    +<div data-lang="r"  markdown="1">
    +{% include_example manual_load_options_csv r/RSparkSQLExample.R %}
    +
    +</div>
    +</div>
     ### Run SQL on files directly
    --- End diff --
    
    Yup, that's okay. BTW, I initially what I meant in https://github.com/apache/spark/pull/19429#discussion_r143932389 was a newline between `</div>` and `### Run ..` (not `...ample.R %}` and `</div>`. This breaks rendering:
    
    <img src="https://user-images.githubusercontent.com/6477701/31481516-cd9ddb80-af5e-11e7-970b-d2c279f025d4.png" width="200" />
    
    
    Let's don't forget to fix this up before the release if the followup couldn't be made ahead.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19429
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

Posted by jomach <gi...@git.apache.org>.

Github user jomach commented on the issue:

    https://github.com/apache/spark/pull/19429
  
    @felixcheung  Sorry for that. Should be there now. Can you test ? thanks


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19429
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19429#discussion_r143932389
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -479,6 +481,25 @@ source type can be converted into other types using this syntax.
     </div>
     </div>
     
    +To load a csv file you can use:
    +
    +<div class="codetabs">
    +<div data-lang="scala"  markdown="1">
    +{% include_example manual_load_options_csv scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
    +</div>
    +
    +<div data-lang="java"  markdown="1">
    +{% include_example manual_load_options_csv java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
    +</div>
    +
    +<div data-lang="python"  markdown="1">
    +{% include_example manual_load_options_csv python/sql/datasource.py %}
    +</div>
    +
    +<div data-lang="r"  markdown="1">
    +{% include_example manual_load_options_csv r/RSparkSQLExample.R %}
    +</div>
    +</div>
    --- End diff --
    
    Let's add another newline here. It breaks rendering.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19429#discussion_r144330025
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -479,6 +481,26 @@ source type can be converted into other types using this syntax.
     </div>
     </div>
     
    +To load a CSV file you can use:
    +
    +<div class="codetabs">
    +<div data-lang="scala"  markdown="1">
    +{% include_example manual_load_options_csv scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
    +</div>
    +
    +<div data-lang="java"  markdown="1">
    +{% include_example manual_load_options_csv java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
    +</div>
    +
    +<div data-lang="python"  markdown="1">
    +{% include_example manual_load_options_csv python/sql/datasource.py %}
    +</div>
    +
    +<div data-lang="r"  markdown="1">
    +{% include_example manual_load_options_csv r/RSparkSQLExample.R %}
    +
    +</div>
    +</div>
     ### Run SQL on files directly
    --- End diff --
    
    Yup, a newline between 503 and 504.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19429
  
    **[Test build #82629 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82629/testReport)** for PR 19429 at commit [`68799ed`](https://github.com/apache/spark/commit/68799ede999ec1874c80d242441032cd29a2f695).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19429
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82629/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19429
  
    **[Test build #82630 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82630/testReport)** for PR 19429 at commit [`7ff1d84`](https://github.com/apache/spark/commit/7ff1d84779acc50ab3c63d9bc0651ac53193f555).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19429
  
    **[Test build #82630 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82630/testReport)** for PR 19429 at commit [`7ff1d84`](https://github.com/apache/spark/commit/7ff1d84779acc50ab3c63d9bc0651ac53193f555).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

Posted by jomach <gi...@git.apache.org>.

Github user jomach commented on the issue:

    https://github.com/apache/spark/pull/19429
  
    @gatorsmile pr comments fixed.  The problem with the actual docs is that people wen start with spark usually don't start with JSON files but with CSV files to "see" something.... 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19429#discussion_r143932676
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -461,6 +461,8 @@ name (i.e., `org.apache.spark.sql.parquet`), but for built-in sources you can al
     names (`json`, `parquet`, `jdbc`, `orc`, `libsvm`, `csv`, `text`). DataFrames loaded from any data
     source type can be converted into other types using this syntax.
     
    +To load a json file you can use:
    --- End diff --
    
    I'd say `JSON` instead of `json`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19429#discussion_r143929178
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala ---
    @@ -49,6 +49,14 @@ object SQLDataSourceExample {
         val peopleDF = spark.read.format("json").load("examples/src/main/resources/people.json")
         peopleDF.select("name", "age").write.format("parquet").save("namesAndAges.parquet")
         // $example off:manual_load_options$
    +    // $example on:manual_load_options_csv$
    +    val peopleDFCsv = spark.read.format("csv")
    +         .option("sep", ";")
    +         .option("inferSchema", "true")
    +         .option("header", "true")
    +         .load("examples/src/main/resources/people.csv")
    --- End diff --
    
    Could you change the indents of line 54-57 to 2 spaces?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19429#discussion_r143287807
  
    --- Diff: examples/src/main/python/sql/datasource.py ---
    @@ -53,6 +53,11 @@ def basic_datasource_example(spark):
         df.select("name", "age").write.save("namesAndAges.parquet", format="parquet")
         # $example off:manual_load_options$
     
    +    # $example on:manual_load_options_csv$
    +    df = spark.read.load("examples/src/main/resources/people.csv",
    +                         format="csv", sep=":", inferSchema="true", header="true")
    +    # $example off:manual_load_options_csv
    --- End diff --
    
    This need to be corrected to 
    > # $example off:manual_load_options_csv$


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19429#discussion_r143288202
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java ---
    @@ -115,7 +115,20 @@ private static void runBasicDataSourceExample(SparkSession spark) {
         Dataset<Row> peopleDF =
           spark.read().format("json").load("examples/src/main/resources/people.json");
         peopleDF.select("name", "age").write().format("parquet").save("namesAndAges.parquet");
    -    // $example off:manual_load_options$
    +    // $example on:manual_load_options_csv$
    +    Dataset<Row> peopleDFCsv = spark.read().format("csv")
    +              .option("sep", ";")
    +              .option("inferSchema", "true")
    +              .option("header", "true")
    +              .load("examples/src/main/resources/people.csv");
    +    // $example off:manual_load_options_csv$
    +    // $example on:manual_load_options_csv$
    +    Dataset<Row> peopleDFCsv = spark.read().format("csv")
    +              .option("sep", ";")
    +              .option("inferSchema", "true")
    +              .option("header", "true")
    +              .load("examples/src/main/resources/people.csv");
    +    // $example off:manual_load_options_csv$
    --- End diff --
    
    Line 125-131 is a duplicate. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19429#discussion_r143929114
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java ---
    @@ -116,6 +116,13 @@ private static void runBasicDataSourceExample(SparkSession spark) {
           spark.read().format("json").load("examples/src/main/resources/people.json");
         peopleDF.select("name", "age").write().format("parquet").save("namesAndAges.parquet");
         // $example off:manual_load_options$
    +    // $example on:manual_load_options_csv$
    +    Dataset<Row> peopleDFCsv = spark.read().format("csv")
    +              .option("sep", ";")
    +              .option("inferSchema", "true")
    +              .option("header", "true")
    --- End diff --
    
    Could you change the indents of line 121-123 to 2 spaces?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19429
  
    When I opened a JIRA, I thought a chapter such as https://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets. This chapter, `Manually Specifying Options`, looks describing how to specify options BTW.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19429
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19429#discussion_r143935085
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -479,6 +481,25 @@ source type can be converted into other types using this syntax.
     </div>
     </div>
     
    +To load a csv file you can use:
    --- End diff --
    
    ditto


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19429
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82630/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org