You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/14 17:37:34 UTC
[GitHub] [arrow] jonkeane opened a new pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible
jonkeane opened a new pull request #9205:
URL: https://github.com/apache/arrow/pull/9205
Don't assert that the large strings are the same (since they might vary acorss R versions if those versions change the way the seed works e.g. 3.6.0)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9205:
URL: https://github.com/apache/arrow/pull/9205#issuecomment-760464630
https://issues.apache.org/jira/browse/ARROW-11253
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jonkeane commented on pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible
Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #9205:
URL: https://github.com/apache/arrow/pull/9205#issuecomment-760350728
@github-actions crossbow submit test-r-versions
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nealrichardson commented on a change in pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible
Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #9205:
URL: https://github.com/apache/arrow/pull/9205#discussion_r557577729
##########
File path: r/tests/testthat/test-backwards-compatibility.R
##########
@@ -50,7 +50,15 @@ test_that("reading a known Parquet file to dataframe with 3.0.0", {
pq_file <- test_path("golden-files/data-arrow-extra-meta_3.0.0.parquet")
df <- read_parquet(pq_file)
- # this is equivalent to `expect_identical()`
+
+ # test the shape (but not the contents of) the column-level attributes for the
+ # large column since those are random and seed changes will make them different
Review comment:
Why are they random?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nealrichardson closed pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible
Posted by GitBox <gi...@apache.org>.
nealrichardson closed pull request #9205:
URL: https://github.com/apache/arrow/pull/9205
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jonkeane commented on a change in pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible
Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #9205:
URL: https://github.com/apache/arrow/pull/9205#discussion_r557722887
##########
File path: r/tests/testthat/helper-data.R
##########
@@ -68,9 +68,13 @@ make_big_string <- function() {
rep(purrr::map_chr(2047:2050, ~paste(sample(letters, ., replace = TRUE), collapse = "")), 2^18)
}
-make_string_of_size <- function(size = 1) {
+make_random_string_of_size <- function(size = 1) {
purrr::map_chr(1000*size, ~paste(sample(letters, ., replace = TRUE), collapse = ""))
}
+make_string_of_size <- function(size = 1) {
+ purrr::map_chr(1000*size, ~paste(rep(letters, length = .), collapse = ""))
Review comment:
I was lazily copying and pasting, but I've updated it
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] emkornfield commented on pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible
Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #9205:
URL: https://github.com/apache/arrow/pull/9205#issuecomment-905929679
Is there a technical blocker for keeping binary files in one of the submodule testdata repos? I don't think other languages keep any data locally in this repo.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ianmcook commented on a change in pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible
Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9205:
URL: https://github.com/apache/arrow/pull/9205#discussion_r557722447
##########
File path: r/tests/testthat/helper-data.R
##########
@@ -68,9 +68,13 @@ make_big_string <- function() {
rep(purrr::map_chr(2047:2050, ~paste(sample(letters, ., replace = TRUE), collapse = "")), 2^18)
}
-make_string_of_size <- function(size = 1) {
+make_random_string_of_size <- function(size = 1) {
purrr::map_chr(1000*size, ~paste(sample(letters, ., replace = TRUE), collapse = ""))
}
+make_string_of_size <- function(size = 1) {
+ purrr::map_chr(1000*size, ~paste(rep(letters, length = .), collapse = ""))
Review comment:
Oh, I see—because size can be a vector with length > 1
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ianmcook commented on a change in pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible
Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9205:
URL: https://github.com/apache/arrow/pull/9205#discussion_r557722447
##########
File path: r/tests/testthat/helper-data.R
##########
@@ -68,9 +68,13 @@ make_big_string <- function() {
rep(purrr::map_chr(2047:2050, ~paste(sample(letters, ., replace = TRUE), collapse = "")), 2^18)
}
-make_string_of_size <- function(size = 1) {
+make_random_string_of_size <- function(size = 1) {
purrr::map_chr(1000*size, ~paste(sample(letters, ., replace = TRUE), collapse = ""))
}
+make_string_of_size <- function(size = 1) {
+ purrr::map_chr(1000*size, ~paste(rep(letters, length = .), collapse = ""))
Review comment:
Oh, I see—because size can be a vector with length > 1?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible
Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #9205:
URL: https://github.com/apache/arrow/pull/9205#issuecomment-760358376
Ah. I think I used sample-with-replacement there just for recycling. Maybe `rep(letters, length = .)` instead?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9205:
URL: https://github.com/apache/arrow/pull/9205#issuecomment-760470022
Revision: db6c5e8a795ff221fece3f8846f3256ec1517b6e
Submitted crossbow builds: [ursa-labs/crossbow @ actions-892](https://github.com/ursa-labs/crossbow/branches/all?query=actions-892)
|Task|Status|
|----|------|
|test-r-versions|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-892-github-test-r-versions)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-892-github-test-r-versions)|
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ianmcook commented on a change in pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible
Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9205:
URL: https://github.com/apache/arrow/pull/9205#discussion_r557718717
##########
File path: r/tests/testthat/helper-data.R
##########
@@ -68,9 +68,13 @@ make_big_string <- function() {
rep(purrr::map_chr(2047:2050, ~paste(sample(letters, ., replace = TRUE), collapse = "")), 2^18)
}
-make_string_of_size <- function(size = 1) {
+make_random_string_of_size <- function(size = 1) {
purrr::map_chr(1000*size, ~paste(sample(letters, ., replace = TRUE), collapse = ""))
}
+make_string_of_size <- function(size = 1) {
+ purrr::map_chr(1000*size, ~paste(rep(letters, length = .), collapse = ""))
Review comment:
It's not immediately clear to me why this can't just be:
```r
paste(rep(letters, length = 1000*size), collapse = "")
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jonkeane commented on a change in pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible
Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #9205:
URL: https://github.com/apache/arrow/pull/9205#discussion_r557579022
##########
File path: r/tests/testthat/test-backwards-compatibility.R
##########
@@ -50,7 +50,15 @@ test_that("reading a known Parquet file to dataframe with 3.0.0", {
pq_file <- test_path("golden-files/data-arrow-extra-meta_3.0.0.parquet")
df <- read_parquet(pq_file)
- # this is equivalent to `expect_identical()`
+
+ # test the shape (but not the contents of) the column-level attributes for the
+ # large column since those are random and seed changes will make them different
Review comment:
They don't necessarily have to be, but I was following the pattern in [`make_big_string()`](https://github.com/apache/arrow/blob/master/r/tests/testthat/helper-data.R#L66-L69)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org