You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/14 17:37:34 UTC

[GitHub] [arrow] jonkeane opened a new pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible

jonkeane opened a new pull request #9205:
URL: https://github.com/apache/arrow/pull/9205


   Don't assert that the large strings are the same (since they might vary acorss R versions if those versions change the way the seed works e.g. 3.6.0)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9205:
URL: https://github.com/apache/arrow/pull/9205#issuecomment-760464630


   https://issues.apache.org/jira/browse/ARROW-11253


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #9205:
URL: https://github.com/apache/arrow/pull/9205#issuecomment-760350728


   @github-actions crossbow submit test-r-versions


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #9205:
URL: https://github.com/apache/arrow/pull/9205#discussion_r557577729



##########
File path: r/tests/testthat/test-backwards-compatibility.R
##########
@@ -50,7 +50,15 @@ test_that("reading a known Parquet file to dataframe with 3.0.0", {
   pq_file <- test_path("golden-files/data-arrow-extra-meta_3.0.0.parquet")
 
   df <- read_parquet(pq_file)
-  # this is equivalent to `expect_identical()`
+
+  # test the shape (but not the contents of) the column-level attributes for the
+  # large column since those are random and seed changes will make them different

Review comment:
       Why are they random?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson closed pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible

Posted by GitBox <gi...@apache.org>.
nealrichardson closed pull request #9205:
URL: https://github.com/apache/arrow/pull/9205


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #9205:
URL: https://github.com/apache/arrow/pull/9205#discussion_r557722887



##########
File path: r/tests/testthat/helper-data.R
##########
@@ -68,9 +68,13 @@ make_big_string <- function() {
   rep(purrr::map_chr(2047:2050, ~paste(sample(letters, ., replace = TRUE), collapse = "")), 2^18)
 }
 
-make_string_of_size <- function(size = 1) {
+make_random_string_of_size <- function(size = 1) {
   purrr::map_chr(1000*size, ~paste(sample(letters, ., replace = TRUE), collapse = ""))
 }
 
+make_string_of_size <- function(size = 1) {
+  purrr::map_chr(1000*size, ~paste(rep(letters, length = .), collapse = ""))

Review comment:
       I was lazily copying and pasting, but I've updated it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #9205:
URL: https://github.com/apache/arrow/pull/9205#issuecomment-905929679


   Is there a technical blocker for keeping binary files in one of the submodule testdata repos?  I don't think other languages keep any data locally in this repo.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9205:
URL: https://github.com/apache/arrow/pull/9205#discussion_r557722447



##########
File path: r/tests/testthat/helper-data.R
##########
@@ -68,9 +68,13 @@ make_big_string <- function() {
   rep(purrr::map_chr(2047:2050, ~paste(sample(letters, ., replace = TRUE), collapse = "")), 2^18)
 }
 
-make_string_of_size <- function(size = 1) {
+make_random_string_of_size <- function(size = 1) {
   purrr::map_chr(1000*size, ~paste(sample(letters, ., replace = TRUE), collapse = ""))
 }
 
+make_string_of_size <- function(size = 1) {
+  purrr::map_chr(1000*size, ~paste(rep(letters, length = .), collapse = ""))

Review comment:
       Oh, I see—because size can be a vector with length > 1




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9205:
URL: https://github.com/apache/arrow/pull/9205#discussion_r557722447



##########
File path: r/tests/testthat/helper-data.R
##########
@@ -68,9 +68,13 @@ make_big_string <- function() {
   rep(purrr::map_chr(2047:2050, ~paste(sample(letters, ., replace = TRUE), collapse = "")), 2^18)
 }
 
-make_string_of_size <- function(size = 1) {
+make_random_string_of_size <- function(size = 1) {
   purrr::map_chr(1000*size, ~paste(sample(letters, ., replace = TRUE), collapse = ""))
 }
 
+make_string_of_size <- function(size = 1) {
+  purrr::map_chr(1000*size, ~paste(rep(letters, length = .), collapse = ""))

Review comment:
       Oh, I see—because size can be a vector with length > 1?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #9205:
URL: https://github.com/apache/arrow/pull/9205#issuecomment-760358376


   Ah. I think I used sample-with-replacement there just for recycling. Maybe `rep(letters, length = .)` instead?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9205:
URL: https://github.com/apache/arrow/pull/9205#issuecomment-760470022


   Revision: db6c5e8a795ff221fece3f8846f3256ec1517b6e
   
   Submitted crossbow builds: [ursa-labs/crossbow @ actions-892](https://github.com/ursa-labs/crossbow/branches/all?query=actions-892)
   
   |Task|Status|
   |----|------|
   |test-r-versions|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-892-github-test-r-versions)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-892-github-test-r-versions)|


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9205:
URL: https://github.com/apache/arrow/pull/9205#discussion_r557718717



##########
File path: r/tests/testthat/helper-data.R
##########
@@ -68,9 +68,13 @@ make_big_string <- function() {
   rep(purrr::map_chr(2047:2050, ~paste(sample(letters, ., replace = TRUE), collapse = "")), 2^18)
 }
 
-make_string_of_size <- function(size = 1) {
+make_random_string_of_size <- function(size = 1) {
   purrr::map_chr(1000*size, ~paste(sample(letters, ., replace = TRUE), collapse = ""))
 }
 
+make_string_of_size <- function(size = 1) {
+  purrr::map_chr(1000*size, ~paste(rep(letters, length = .), collapse = ""))

Review comment:
       It's not immediately clear to me why this can't just be:
   ```r
   paste(rep(letters, length = 1000*size), collapse = "")
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #9205: ARROW-11253 [R]: Make sure that large metadata tests are reproducible

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #9205:
URL: https://github.com/apache/arrow/pull/9205#discussion_r557579022



##########
File path: r/tests/testthat/test-backwards-compatibility.R
##########
@@ -50,7 +50,15 @@ test_that("reading a known Parquet file to dataframe with 3.0.0", {
   pq_file <- test_path("golden-files/data-arrow-extra-meta_3.0.0.parquet")
 
   df <- read_parquet(pq_file)
-  # this is equivalent to `expect_identical()`
+
+  # test the shape (but not the contents of) the column-level attributes for the
+  # large column since those are random and seed changes will make them different

Review comment:
       They don't necessarily have to be, but I was following the pattern in [`make_big_string()`](https://github.com/apache/arrow/blob/master/r/tests/testthat/helper-data.R#L66-L69)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org