You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/09 23:47:49 UTC

[GitHub] [arrow] nealrichardson commented on a diff in pull request #13082: ARROW-16489: [R] wrong encoding causes parsing error

nealrichardson commented on code in PR #13082:
URL: https://github.com/apache/arrow/pull/13082#discussion_r868603063


##########
r/tests/testthat/test-utf.R:
##########
@@ -17,8 +17,63 @@
 
 
 test_that("We handle non-UTF strings", {
-  # Move the code with non-UTF strings to a separate file so that we don't
-  # get a parse error on *cough* certain platforms
-  skip_on_cran()
-  source("latin1.R", encoding = "latin1")
+  x <- iconv("Veitingastaðir", to = "latin1")
+  df <- tibble::tibble(
+    chr = x,
+    fct = as.factor(x)
+  )
+  names(df) <- iconv(paste(x, names(df), sep = "_"), to = "latin1")
+  df_struct <- tibble::tibble(a = df)
+
+  raw_schema <- list(utf8(), dictionary(int8(), utf8()))
+  names(raw_schema) <- names(df)
+
+  # Confirm setup
+  expect_identical(Encoding(x), "latin1")
+  expect_identical(Encoding(names(df)), c("latin1", "latin1"))
+  expect_identical(Encoding(df[[1]]), "latin1")
+  expect_identical(Encoding(levels(df[[2]])), "latin1")
+
+  # Array
+  expect_identical(as.vector(Array$create(x)), x)
+  # struct
+  expect_identical(as.vector(Array$create(df)), df)
+
+  # ChunkedArray
+  expect_identical(as.vector(ChunkedArray$create(x)), x)
+  # struct
+  expect_identical(as.vector(ChunkedArray$create(df)), df)
+
+  # Table (including field name)
+  expect_identical(as.data.frame(Table$create(df)), df)
+  expect_identical(as.data.frame(Table$create(df_struct)), df_struct)
+
+  # RecordBatch
+  expect_identical(as.data.frame(record_batch(df)), df)
+  expect_identical(as.data.frame(record_batch(df_struct)), df_struct)
+
+  # Schema field name
+  df_schema <- do.call(schema, raw_schema)
+  expect_identical(names(df_schema), names(df))
+
+  df_struct_schema <- schema(a = do.call(struct, raw_schema))
+  # StructType doesn't expose names (in C++)
+  # expect_identical(names(df_struct_schema$a), names(df))

Review Comment:
   Looks like you need to resolve this lint check



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org