You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/05/01 16:16:41 UTC

[GitHub] [arrow] nealrichardson commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

nealrichardson commented on pull request #6985:
URL: https://github.com/apache/arrow/pull/6985#issuecomment-622453709


   @emkornfield yeah everything is run twice, 32-bit and 64-bit, because Windows. 
   
   Usually when there's an failure with no clear error, it means that the process segfaulted. Based on the code that's running (https://github.com/apache/arrow/blob/master/r/R/parquet.R#L442-L449) and the output we do see before the crash, my guess is that ReadTable is crashing. What's odd is that that method is called, even with the same test file, several times in r/tests/testthat/test-parquet.R, and those don't crash.
   
   One difference I notice is in how the file connection is opened; not sure how your patch would make that suddenly matter though. But you could try applying this diff and see if the failure goes away:
   
   ```
   diff --git a/r/R/parquet.R b/r/R/parquet.R
   index 8d363e4cb..1415a06da 100644
   --- a/r/R/parquet.R
   +++ b/r/R/parquet.R
   @@ -439,7 +439,7 @@ ParquetFileWriter$create <- function(
    #' @export
    #' @examples
    #' \donttest{
   -#' f <- system.file("v0.7.1.parquet", package="arrow")
   +#' f <- ReadableFile$create(system.file("v0.7.1.parquet", package="arrow"))
    #' pq <- ParquetFileReader$create(f)
    #' pq$GetSchema()
    #' if (codec_is_available("snappy")) {
   @@ -447,6 +447,7 @@ ParquetFileWriter$create <- function(
    #'   tab <- pq$ReadTable(starts_with("c"))
    #'   tab$schema
    #' }
   +#' f$close()
    #' }
    #' @include arrow-package.R
    ParquetFileReader <- R6Class("ParquetFileReader",
   diff --git a/r/man/ParquetFileReader.Rd b/r/man/ParquetFileReader.Rd
   index 1e9f78f4d..3e052ae20 100644
   --- a/r/man/ParquetFileReader.Rd
   +++ b/r/man/ParquetFileReader.Rd
   @@ -33,7 +33,7 @@ with columns filtered by a character vector of column names or a
    
    \examples{
    \donttest{
   -f <- system.file("v0.7.1.parquet", package="arrow")
   +f <- ReadableFile$create(system.file("v0.7.1.parquet", package="arrow"))
    pq <- ParquetFileReader$create(f)
    pq$GetSchema()
    if (codec_is_available("snappy")) {
   @@ -41,5 +41,6 @@ if (codec_is_available("snappy")) {
      tab <- pq$ReadTable(starts_with("c"))
      tab$schema
    }
   +f$close()
    }
    }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org