You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/30 11:48:50 UTC

[GitHub] [arrow] eitsupi opened a new pull request, #13038: MINOR: [R][DOCS] encoding setting of `read_csv_arrow`

eitsupi opened a new pull request, #13038:
URL: https://github.com/apache/arrow/pull/13038

   In the documentation, the `encoding` argument was of `CsvConvertOptions`, but it is correctly of `CsvReadOptions`.
   And, for the beginners, the method of setting `encoding` from `read_csv_arrow()` function was not obvious, so I will add it to the example.
   
   I also corrected some formatting for automatic linking in the doc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13038: MINOR: [R][Doc] encoding setting of `read_csv_arrow`

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13038:
URL: https://github.com/apache/arrow/pull/13038#discussion_r865064555


##########
r/R/csv.R:
##########
@@ -135,6 +135,8 @@
 #' write.csv(mtcars, file = tf)
 #' df <- read_csv_arrow(tf)
 #' dim(df)
+#' # Can specify the encoding of the file
+#' df <- read_csv_arrow(tf, read_options = CsvReadOptions$create(encoding = "utf8"))

Review Comment:
   Thanks for this PR!  I'm in favour of making things easier for users, and these doc updates are definitely important, but I'm a little opposed to including examples using the `CsvReadOptions$create()` type syntax.  
   
   I wonder if a solution would be to merge this as-is, but open up a JIRA for a follow-up PR which allows users to pass a named list into `read_options`, on which we call `CSVReadOptions$create()`, so this example could be rewritten as:
   
   `df <- read_csv_arrow(tf, read_options = list(encoding = "utf8"))`.
   
   @jonkeane - would be good to get your thoughts here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] eitsupi commented on a diff in pull request #13038: MINOR: [R][Doc] encoding setting of `read_csv_arrow`

Posted by GitBox <gi...@apache.org>.
eitsupi commented on code in PR #13038:
URL: https://github.com/apache/arrow/pull/13038#discussion_r865526778


##########
r/R/csv.R:
##########
@@ -135,6 +135,8 @@
 #' write.csv(mtcars, file = tf)
 #' df <- read_csv_arrow(tf)
 #' dim(df)
+#' # Can specify the encoding of the file
+#' df <- read_csv_arrow(tf, read_options = CsvReadOptions$create(encoding = "utf8"))

Review Comment:
   I have removed these lines.
   Even if we delete the example here, fortunately the pyarrow 8.0.0 documentation has examples of how to use it, so we can look there.
   
   Looking forward to seeing the API refined.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13038: MINOR: [R][Doc] encoding setting of `read_csv_arrow`

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13038:
URL: https://github.com/apache/arrow/pull/13038#discussion_r865747450


##########
r/R/csv.R:
##########
@@ -135,6 +135,8 @@
 #' write.csv(mtcars, file = tf)
 #' df <- read_csv_arrow(tf)
 #' dim(df)
+#' # Can specify the encoding of the file
+#' df <- read_csv_arrow(tf, read_options = CsvReadOptions$create(encoding = "utf8"))

Review Comment:
   Apologies, @eitsupi , I had previously tagged the wrong person in my original comment above! 
   
   Good point that it is documented there, though I wonder how many R users would think to use the pyarrow docs.  I think that your original point still stands that it needs documenting.  I've opened this JIRA ticket to make the relevant changes and then subsequently document the encoding argument: https://issues.apache.org/jira/browse/ARROW-16480 .  
   
   Thanks for flagging this up - I think passing through arguments as a list is going to make this much nicer for end-users.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] eitsupi commented on pull request #13038: MINOR: [R][Doc] encoding setting of `read_csv_arrow`

Posted by GitBox <gi...@apache.org>.
eitsupi commented on PR #13038:
URL: https://github.com/apache/arrow/pull/13038#issuecomment-1118418143

   Thank you for merging this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13038: MINOR: [R][Doc] encoding setting of `read_csv_arrow`

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13038:
URL: https://github.com/apache/arrow/pull/13038#discussion_r865747450


##########
r/R/csv.R:
##########
@@ -135,6 +135,8 @@
 #' write.csv(mtcars, file = tf)
 #' df <- read_csv_arrow(tf)
 #' dim(df)
+#' # Can specify the encoding of the file
+#' df <- read_csv_arrow(tf, read_options = CsvReadOptions$create(encoding = "utf8"))

Review Comment:
   Apologies, @eitsupi , I tagged the wrong person in my original comment above! 
   
   Good point that it is documented there, though I wonder how many R users would think to use the pyarrow docs.  I think that your original point still stands that it needs documenting.  I've opened this JIRA ticket to make the relevant changes: https://issues.apache.org/jira/browse/ARROW-16480 .  
   
   Thanks for flagging this up - I think passing through arguments as a list is going to make this much nicer for end-users.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] eitsupi commented on a diff in pull request #13038: MINOR: [R][Doc] encoding setting of `read_csv_arrow`

Posted by GitBox <gi...@apache.org>.
eitsupi commented on code in PR #13038:
URL: https://github.com/apache/arrow/pull/13038#discussion_r865750800


##########
r/R/csv.R:
##########
@@ -135,6 +135,8 @@
 #' write.csv(mtcars, file = tf)
 #' df <- read_csv_arrow(tf)
 #' dim(df)
+#' # Can specify the encoding of the file
+#' df <- read_csv_arrow(tf, read_options = CsvReadOptions$create(encoding = "utf8"))

Review Comment:
   Are you saying you do not want to merge this PR?
   If so, I would close this in favor of ARROW-16480.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic closed pull request #13038: MINOR: [R][Doc] encoding setting of `read_csv_arrow`

Posted by GitBox <gi...@apache.org>.
thisisnic closed pull request #13038: MINOR: [R][Doc] encoding setting of `read_csv_arrow`
URL: https://github.com/apache/arrow/pull/13038


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] eitsupi commented on a diff in pull request #13038: MINOR: [R][Doc] encoding setting of `read_csv_arrow`

Posted by GitBox <gi...@apache.org>.
eitsupi commented on code in PR #13038:
URL: https://github.com/apache/arrow/pull/13038#discussion_r865526778


##########
r/R/csv.R:
##########
@@ -135,6 +135,8 @@
 #' write.csv(mtcars, file = tf)
 #' df <- read_csv_arrow(tf)
 #' dim(df)
+#' # Can specify the encoding of the file
+#' df <- read_csv_arrow(tf, read_options = CsvReadOptions$create(encoding = "utf8"))

Review Comment:
   I have removed these lines.
   Even if you delete the example here, fortunately the pyarrow 8.0.0 documentation has examples of how to use it, so we can look there.
   
   Looking forward to seeing the API refined.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #13038: MINOR: [R][Doc] encoding setting of `read_csv_arrow`

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #13038:
URL: https://github.com/apache/arrow/pull/13038#issuecomment-1120496906

   Benchmark runs are scheduled for baseline = 9232ed67c3a8e746e0c7fed8c06f2548adc5814e and contender = 9e9b2fec3ce9e8f7355e03a84f9eb8a0f745c683. 9e9b2fec3ce9e8f7355e03a84f9eb8a0f745c683 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/749a2db47b614496ae706a6dbdf6aa94...fba32048e1c641a5b6093173799feffe/)
   [Finished :arrow_down:0.78% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/5013fa52b07b483fb09e12d4a90574b5...747ce1fcfb0f4fd98ea608f3b7af15b0/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/d2424168643541e6bba05fa11db51228...f6b1d5f7159a4f4689fc73fa1ea660bc/)
   [Finished :arrow_down:0.16% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/6a7f56d4eab348f4894518e1c5b0adb5...7eeeedd0e7c04537aa4f7976cec75a92/)
   Buildkite builds:
   [Finished] [`9e9b2fec` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/702)
   [Finished] [`9e9b2fec` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/699)
   [Finished] [`9e9b2fec` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/688)
   [Finished] [`9e9b2fec` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/704)
   [Finished] [`9232ed67` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/701)
   [Finished] [`9232ed67` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/698)
   [Finished] [`9232ed67` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/687)
   [Finished] [`9232ed67` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/703)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jonkeane commented on a diff in pull request #13038: MINOR: [R][Doc] encoding setting of `read_csv_arrow`

Posted by GitBox <gi...@apache.org>.
jonkeane commented on code in PR #13038:
URL: https://github.com/apache/arrow/pull/13038#discussion_r865107403


##########
r/R/csv.R:
##########
@@ -135,6 +135,8 @@
 #' write.csv(mtcars, file = tf)
 #' df <- read_csv_arrow(tf)
 #' dim(df)
+#' # Can specify the encoding of the file
+#' df <- read_csv_arrow(tf, read_options = CsvReadOptions$create(encoding = "utf8"))

Review Comment:
   I definitely think that `CsvReadOptions$create()` is not an ideal API for end users here. I'm very pro opening up a ticket to make that easier (even if it's "just" a mapping of a list to args like Nic is suggesting)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] eitsupi commented on a diff in pull request #13038: MINOR: [R][Doc] encoding setting of `read_csv_arrow`

Posted by GitBox <gi...@apache.org>.
eitsupi commented on code in PR #13038:
URL: https://github.com/apache/arrow/pull/13038#discussion_r865526778


##########
r/R/csv.R:
##########
@@ -135,6 +135,8 @@
 #' write.csv(mtcars, file = tf)
 #' df <- read_csv_arrow(tf)
 #' dim(df)
+#' # Can specify the encoding of the file
+#' df <- read_csv_arrow(tf, read_options = CsvReadOptions$create(encoding = "utf8"))

Review Comment:
   I have removed these lines.
   Even if we delete the example here, fortunately the pyarrow 8.0.0 documentation has examples of how to use it, so we can look there.
   https://arrow.apache.org/docs/dev/python/generated/pyarrow.csv.ReadOptions.html#pyarrow.csv.ReadOptions
   
   Looking forward to seeing the API refined.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13038: MINOR: [R][Doc] encoding setting of `read_csv_arrow`

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13038:
URL: https://github.com/apache/arrow/pull/13038#discussion_r865747450


##########
r/R/csv.R:
##########
@@ -135,6 +135,8 @@
 #' write.csv(mtcars, file = tf)
 #' df <- read_csv_arrow(tf)
 #' dim(df)
+#' # Can specify the encoding of the file
+#' df <- read_csv_arrow(tf, read_options = CsvReadOptions$create(encoding = "utf8"))

Review Comment:
   Apologies, @eitsupi , I had previously tagged the wrong person in my original comment above! 
   
   Good point that it is documented there, though I wonder how many R users would think to use the pyarrow docs.  I think that your original point still stands that it needs documenting.  I've opened this JIRA ticket to make the relevant changes: https://issues.apache.org/jira/browse/ARROW-16480 .  
   
   Thanks for flagging this up - I think passing through arguments as a list is going to make this much nicer for end-users.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13038: MINOR: [R][Doc] encoding setting of `read_csv_arrow`

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13038:
URL: https://github.com/apache/arrow/pull/13038#discussion_r865064555


##########
r/R/csv.R:
##########
@@ -135,6 +135,8 @@
 #' write.csv(mtcars, file = tf)
 #' df <- read_csv_arrow(tf)
 #' dim(df)
+#' # Can specify the encoding of the file
+#' df <- read_csv_arrow(tf, read_options = CsvReadOptions$create(encoding = "utf8"))

Review Comment:
   Thanks for this PR @pitrou!  I'm in favour of making things easier for users, and these doc updates are definitely important, but I'm a little opposed to including examples using the `CsvReadOptions$create()` type syntax.  
   
   I wonder if a solution would be to merge this as-is, but in a follow-up PR, allow users to pass a named list into `read_options`, on which we call `CSVReadOptions$create()`, so this example could be rewritten as:
   
   `df <- read_csv_arrow(tf, read_options = list(encoding = "utf8"))`.
   
   @jonkeane - would be good to get your thoughts here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] eitsupi commented on a diff in pull request #13038: MINOR: [R][Doc] encoding setting of `read_csv_arrow`

Posted by GitBox <gi...@apache.org>.
eitsupi commented on code in PR #13038:
URL: https://github.com/apache/arrow/pull/13038#discussion_r865526778


##########
r/R/csv.R:
##########
@@ -135,6 +135,8 @@
 #' write.csv(mtcars, file = tf)
 #' df <- read_csv_arrow(tf)
 #' dim(df)
+#' # Can specify the encoding of the file
+#' df <- read_csv_arrow(tf, read_options = CsvReadOptions$create(encoding = "utf8"))

Review Comment:
   I have removed these lines.
   Looking forward to seeing the API refined.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13038: MINOR: [R][Doc] encoding setting of `read_csv_arrow`

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13038:
URL: https://github.com/apache/arrow/pull/13038#discussion_r865064555


##########
r/R/csv.R:
##########
@@ -135,6 +135,8 @@
 #' write.csv(mtcars, file = tf)
 #' df <- read_csv_arrow(tf)
 #' dim(df)
+#' # Can specify the encoding of the file
+#' df <- read_csv_arrow(tf, read_options = CsvReadOptions$create(encoding = "utf8"))

Review Comment:
   Thanks for this PR @pitrou!  I'm in favour of making things easier for users, and these doc updates are definitely important, but I'm a little opposed to including examples using the `CsvReadOptions$create()` type syntax.  
   
   I wonder if a solution would be to merge this as-is, but open up a JIRA for a follow-up PR which allows users to pass a named list into `read_options`, on which we call `CSVReadOptions$create()`, so this example could be rewritten as:
   
   `df <- read_csv_arrow(tf, read_options = list(encoding = "utf8"))`.
   
   @jonkeane - would be good to get your thoughts here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org