You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/20 05:58:10 UTC

[GitHub] [arrow] eitsupi opened a new pull request #12474: ARROW-15599: [R] can't explicitly convert a column as a sub-seconds typestamp from CSV (or other delimited) file

eitsupi opened a new pull request #12474:
URL: https://github.com/apache/arrow/pull/12474


   The "T" option that can be set for `col_types` is mapped to `timestamp[s]` now, so it does not support sub-millisecond timestamp.
   Change the mapping to `timestamp[ns]`, which is the smallest unit, so that sub-millisecond time can be read.
   And add examples of specifying the column types to the reference.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #12474: ARROW-15599: [R] can't explicitly convert a column as a sub-seconds typestamp from CSV (or other delimited) file

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12474:
URL: https://github.com/apache/arrow/pull/12474#issuecomment-1046168893






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #12474: ARROW-15599: [R] Convert a column as a sub-second timestamp from CSV file with the `T` col type option

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #12474:
URL: https://github.com/apache/arrow/pull/12474#discussion_r815039157



##########
File path: r/R/csv.R
##########
@@ -61,8 +61,8 @@
 #' * "l": `bool()`
 #' * "f": `dictionary()`
 #' * "D": `date32()`
-#' * "T": `timestamp()`
-#' * "t": `time32()`
+#' * "T": `timestamp(unit = "ns")`
+#' * "t": `time32(unit = "ms")`

Review comment:
       ```suggestion
   #' * "t": `time32()`
   ```
   Let's leave this as it is as this is the default value

##########
File path: r/R/csv.R
##########
@@ -599,8 +605,8 @@ readr_to_csv_convert_options <- function(na,
         "l" = bool(),
         "f" = dictionary(),
         "D" = date32(),
-        "T" = timestamp(),
-        "t" = time32(),
+        "T" = timestamp(unit = "ns"),
+        "t" = time32(unit = "ms"),

Review comment:
       ```suggestion
           "t" = time32(),
   ```
   Same here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane closed pull request #12474: ARROW-15599: [R] Convert a column as a sub-second timestamp from CSV file with the `T` col type option

Posted by GitBox <gi...@apache.org>.
jonkeane closed pull request #12474:
URL: https://github.com/apache/arrow/pull/12474


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] eitsupi commented on pull request #12474: ARROW-15599: [R] Convert a column as a sub-second timestamp from CSV file with the `T` col type option

Posted by GitBox <gi...@apache.org>.
eitsupi commented on pull request #12474:
URL: https://github.com/apache/arrow/pull/12474#issuecomment-1056151182


   Sorry, I forgot to update the Rd file, so I push it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12474: ARROW-15599: [R] Convert a column as a sub-second timestamp from CSV file with the `T` col type option

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12474:
URL: https://github.com/apache/arrow/pull/12474#issuecomment-1056973163


   Benchmark runs are scheduled for baseline = 165ac992abeef68e0a36bf4b03fffa018c783554 and contender = 4aaf1f2e7e741fae5a8b02a514dcc891083d57b7. 4aaf1f2e7e741fae5a8b02a514dcc891083d57b7 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/4262382650d04958835c9fd2a616d760...0fa3d5cdb2984d0784c4461153f9e519/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/831a070bf64f4cd790d27cfa65c06d86...6226535b2b454d31a3a07f594f1d386c/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/c2213d4871ae4e688f245b7931ce90fa...25c6ba066a4142548735f0cfb18db54a/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/c8b6618790ad4e17bcdabf28a73a2d4c...5104a3cc178648f79adbb16daae87d16/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12474: ARROW-15599: [R] Convert a column as a sub-second timestamp from CSV file with the `T` col type option

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12474:
URL: https://github.com/apache/arrow/pull/12474#issuecomment-1056973163


   Benchmark runs are scheduled for baseline = 165ac992abeef68e0a36bf4b03fffa018c783554 and contender = 4aaf1f2e7e741fae5a8b02a514dcc891083d57b7. 4aaf1f2e7e741fae5a8b02a514dcc891083d57b7 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/4262382650d04958835c9fd2a616d760...0fa3d5cdb2984d0784c4461153f9e519/)
   [Finished :arrow_down:0.38% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/831a070bf64f4cd790d27cfa65c06d86...6226535b2b454d31a3a07f594f1d386c/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/c2213d4871ae4e688f245b7931ce90fa...25c6ba066a4142548735f0cfb18db54a/)
   [Finished :arrow_down:1.66% :arrow_up:0.04%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/c8b6618790ad4e17bcdabf28a73a2d4c...5104a3cc178648f79adbb16daae87d16/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12474: ARROW-15599: [R] Convert a column as a sub-second timestamp from CSV file with the `T` col type option

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12474:
URL: https://github.com/apache/arrow/pull/12474#issuecomment-1056973163


   Benchmark runs are scheduled for baseline = 165ac992abeef68e0a36bf4b03fffa018c783554 and contender = 4aaf1f2e7e741fae5a8b02a514dcc891083d57b7. 4aaf1f2e7e741fae5a8b02a514dcc891083d57b7 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/4262382650d04958835c9fd2a616d760...0fa3d5cdb2984d0784c4461153f9e519/)
   [Finished :arrow_down:0.38% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/831a070bf64f4cd790d27cfa65c06d86...6226535b2b454d31a3a07f594f1d386c/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/c2213d4871ae4e688f245b7931ce90fa...25c6ba066a4142548735f0cfb18db54a/)
   [Finished :arrow_down:1.66% :arrow_up:0.04%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/c8b6618790ad4e17bcdabf28a73a2d4c...5104a3cc178648f79adbb16daae87d16/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] eitsupi commented on a change in pull request #12474: ARROW-15599: [R] Convert a column as a sub-second timestamp from CSV file with the `T` col type option

Posted by GitBox <gi...@apache.org>.
eitsupi commented on a change in pull request #12474:
URL: https://github.com/apache/arrow/pull/12474#discussion_r815221124



##########
File path: r/R/csv.R
##########
@@ -61,8 +61,8 @@
 #' * "l": `bool()`
 #' * "f": `dictionary()`
 #' * "D": `date32()`
-#' * "T": `timestamp()`
-#' * "t": `time32()`
+#' * "T": `timestamp(unit = "ns")`
+#' * "t": `time32(unit = "ms")`

Review comment:
       This is the default value in the current implementation. However, I think it should be explicit because it is not explicit for users looking at the help page for this function.
   
   ``` r
   arrow::time32() == arrow::time32(unit = "ms")
   #> [1] TRUE
   arrow::time32() == arrow::time32(unit = "s")
   #> [1] FALSE
   ```
   
   <sup>Created on 2022-02-25 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup>
   
   Even if it should not be changed in this PullRequest, it should be changed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] eitsupi commented on a change in pull request #12474: ARROW-15599: [R] Convert a column as a sub-second timestamp from CSV file with the `T` col type option

Posted by GitBox <gi...@apache.org>.
eitsupi commented on a change in pull request #12474:
URL: https://github.com/apache/arrow/pull/12474#discussion_r815221124



##########
File path: r/R/csv.R
##########
@@ -61,8 +61,8 @@
 #' * "l": `bool()`
 #' * "f": `dictionary()`
 #' * "D": `date32()`
-#' * "T": `timestamp()`
-#' * "t": `time32()`
+#' * "T": `timestamp(unit = "ns")`
+#' * "t": `time32(unit = "ms")`

Review comment:
       This is the default value in the current implementation. However, I think it should be explicit because it is not explicit for users looking at the help page for this function (`read_delim_arrow`).
   
   ``` r
   arrow::time32() == arrow::time32(unit = "ms")
   #> [1] TRUE
   arrow::time32() == arrow::time32(unit = "s")
   #> [1] FALSE
   ```
   
   <sup>Created on 2022-02-25 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup>
   
   Even if it should not be changed in this PullRequest, it should be changed, I think.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #12474: ARROW-15599: [R] Convert a column as a sub-second timestamp from CSV file with the `T` col type option

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #12474:
URL: https://github.com/apache/arrow/pull/12474#discussion_r817229371



##########
File path: r/tests/testthat/test-csv.R
##########
@@ -541,4 +541,16 @@ test_that("write_csv_arrow can write from RecordBatchReader objects", {
   tbl_in <- read_csv_arrow(csv_file)
   expect_named(tbl_in, c("dbl", "lgl", "false", "chr"))
   expect_equal(nrow(tbl_in), 3)
+
 })
+
+test_that("read_csv_arrow() can read sub-second timestamps with col_types T setting (ARROW-15599)", {
+  tbl <- tibble::tibble(time = c("2018-10-07 19:04:05.000", "2018-10-07 19:04:05.001"))
+  tf <- tempfile()
+  on.exit(unlink(tf))
+  write.csv(tbl, tf, row.names = FALSE)
+
+  df <- read_csv_arrow(tf, col_types = "T", col_names = "time", skip = 1)
+  expected <- as.POSIXct(tbl$time, tz = "UTC")
+  expect_equal(df$time, expected, ignore_attr = "tzone")
+})

Review comment:
       ```suggestion
   })
   
   ```

##########
File path: r/tests/testthat/test-csv.R
##########
@@ -541,4 +541,16 @@ test_that("write_csv_arrow can write from RecordBatchReader objects", {
   tbl_in <- read_csv_arrow(csv_file)
   expect_named(tbl_in, c("dbl", "lgl", "false", "chr"))
   expect_equal(nrow(tbl_in), 3)
+

Review comment:
       ```suggestion
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #12474: ARROW-15599: [R] Convert a column as a sub-second timestamp from CSV file with the `T` col type option

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #12474:
URL: https://github.com/apache/arrow/pull/12474#issuecomment-1056973163


   Benchmark runs are scheduled for baseline = 165ac992abeef68e0a36bf4b03fffa018c783554 and contender = 4aaf1f2e7e741fae5a8b02a514dcc891083d57b7. 4aaf1f2e7e741fae5a8b02a514dcc891083d57b7 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/4262382650d04958835c9fd2a616d760...0fa3d5cdb2984d0784c4461153f9e519/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/831a070bf64f4cd790d27cfa65c06d86...6226535b2b454d31a3a07f594f1d386c/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/c2213d4871ae4e688f245b7931ce90fa...25c6ba066a4142548735f0cfb18db54a/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/c8b6618790ad4e17bcdabf28a73a2d4c...5104a3cc178648f79adbb16daae87d16/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #12474: ARROW-15599: [R] Convert a column as a sub-second timestamp from CSV file with the `T` col type option

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #12474:
URL: https://github.com/apache/arrow/pull/12474#discussion_r816168255



##########
File path: r/R/csv.R
##########
@@ -61,8 +61,8 @@
 #' * "l": `bool()`
 #' * "f": `dictionary()`
 #' * "D": `date32()`
-#' * "T": `timestamp()`
-#' * "t": `time32()`
+#' * "T": `timestamp(unit = "ns")`
+#' * "t": `time32(unit = "ms")`

Review comment:
       OK, fine to update it but add a comment after pointing out this is the default value, so there's no ambiguity.  And let's leave it as it is (without the default option explicitly stated) in the code.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org