You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dongjoon-hyun <gi...@git.apache.org> on 2018/10/22 04:15:50 UTC
[GitHub] spark pull request #22791: [SPARK-25795][R][EXAMPLE] Fix CSV SparkR SQL Exam...
GitHub user dongjoon-hyun opened a pull request:
https://github.com/apache/spark/pull/22791
[SPARK-25795][R][EXAMPLE] Fix CSV SparkR SQL Example
## What changes were proposed in this pull request?
This PR aims to fix the following SparkR example in Spark 2.3.0 ~ 2.4.0.
```r
> df <- read.df("examples/src/main/resources/people.csv", "csv")
> namesAndAges <- select(df, "name", "age")
...
Caused by: org.apache.spark.sql.AnalysisException: cannot resolve '`name`' given input columns: [_c0];;
'Project ['name, 'age]
+- AnalysisBarrier
+- Relation[_c0#97] csv
```
- https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc3-docs/_site/sql-programming-guide.html#manually-specifying-options
- http://spark.apache.org/docs/2.3.2/sql-programming-guide.html#manually-specifying-options
- http://spark.apache.org/docs/2.3.1/sql-programming-guide.html#manually-specifying-options
- http://spark.apache.org/docs/2.3.0/sql-programming-guide.html#manually-specifying-options
## How was this patch tested?
Manual test in SparkR. (Please note that `RSparkSQLExample.R` fails at the last JDBC example)
```
> df <- read.df("examples/src/main/resources/people.csv", "csv", sep=";", inferSchema=T, header=T)
> namesAndAges <- select(df, "name", "age")
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dongjoon-hyun/spark SPARK-25795
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22791.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22791
----
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22791: [SPARK-25795][R][EXAMPLE] Fix CSV SparkR SQL Exam...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22791#discussion_r227493674
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -114,7 +114,7 @@ write.df(namesAndAges, "namesAndAges.parquet", "parquet")
# $example on:manual_load_options_csv$
-df <- read.df("examples/src/main/resources/people.csv", "csv")
+df <- read.df("examples/src/main/resources/people.csv", "csv", sep=";", inferSchema=T, header=T)
--- End diff --
If you don't mind, I included that [here](https://github.com/apache/spark/pull/22801/files#diff-eeffb959b904ebb5c864bc3dafe6437dR117)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22791: [SPARK-25795][R][EXAMPLE] Fix CSV SparkR SQL Exam...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22791#discussion_r227014553
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -114,7 +114,7 @@ write.df(namesAndAges, "namesAndAges.parquet", "parquet")
# $example on:manual_load_options_csv$
-df <- read.df("examples/src/main/resources/people.csv", "csv")
+df <- read.df("examples/src/main/resources/people.csv", "csv", sep=";", inferSchema=T, header=T)
--- End diff --
Hi, @felixcheung .
Could you review this?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22791: [SPARK-25795][R][EXAMPLE] Fix CSV SparkR SQL Exam...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22791
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22791: [SPARK-25795][R][EXAMPLE] Fix CSV SparkR SQL Example
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22791
**[Test build #97793 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97793/testReport)** for PR 22791 at commit [`f160711`](https://github.com/apache/spark/commit/f160711e57871d5865e842dbec1d1cf70e688659).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22791: [SPARK-25795][R][EXAMPLE] Fix CSV SparkR SQL Exam...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/22791#discussion_r227465232
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -114,7 +114,7 @@ write.df(namesAndAges, "namesAndAges.parquet", "parquet")
# $example on:manual_load_options_csv$
-df <- read.df("examples/src/main/resources/people.csv", "csv")
+df <- read.df("examples/src/main/resources/people.csv", "csv", sep=";", inferSchema=T, header=T)
--- End diff --
in R style we typical put space after param name, ie. https://github.com/apache/spark/pull/22791/files#diff-eeffb959b904ebb5c864bc3dafe6437dR168
`, sep = ";", inferSchema = TRUE, header = TRUE`
and pls don't use `T` for readability
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22791: [SPARK-25795][R][EXAMPLE] Fix CSV SparkR SQL Exam...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22791#discussion_r227041945
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -114,7 +114,7 @@ write.df(namesAndAges, "namesAndAges.parquet", "parquet")
# $example on:manual_load_options_csv$
-df <- read.df("examples/src/main/resources/people.csv", "csv")
+df <- read.df("examples/src/main/resources/people.csv", "csv", sep=";", inferSchema=T, header=T)
--- End diff --
Also, ping @jomach and @gatorsmile because it was added by the following PR at Spark 2.3.
- https://github.com/apache/spark/pull/19429/files#diff-eeffb959b904ebb5c864bc3dafe6437dR117
BTW, [SPARK-20055](https://issues.apache.org/jira/browse/SPARK-20055) is still open.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22791: [SPARK-25795][R][EXAMPLE] Fix CSV SparkR SQL Example
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22791
**[Test build #97811 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97811/testReport)** for PR 22791 at commit [`f160711`](https://github.com/apache/spark/commit/f160711e57871d5865e842dbec1d1cf70e688659).
* This patch passes all tests.
* This patch **does not merge cleanly**.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22791: [SPARK-25795][R][EXAMPLE] Fix CSV SparkR SQL Exam...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22791#discussion_r227493317
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -114,7 +114,7 @@ write.df(namesAndAges, "namesAndAges.parquet", "parquet")
# $example on:manual_load_options_csv$
-df <- read.df("examples/src/main/resources/people.csv", "csv")
+df <- read.df("examples/src/main/resources/people.csv", "csv", sep=";", inferSchema=T, header=T)
--- End diff --
Thank you, @felixcheung .
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22791: [SPARK-25795][R][EXAMPLE] Fix CSV SparkR SQL Example
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22791
Thank you for review and merging, @srowen .
Merged to `master/branch-2.4/branch-2.3`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org