You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/27 06:51:22 UTC

[GitHub] [arrow] JasperSch opened a new issue #10809: Errors/warnings when saving open_dataset() to variable in R

JasperSch opened a new issue #10809:
URL: https://github.com/apache/arrow/issues/10809


   Hello,
   
   When saving the result from `open_dataset()` to a variable, I get various sporadic warning and error messages throughout the code that is executed afterwards. It stops once I remove the variable.
   
   The warnings/errors vary, but here some examples:
   ```
   Error: Invalid <Schema>, external pointer to null
   
   Warning message:
   Number of rows unknown; returning NA 
   
   Error: IOError: Failed to open local file 'xxxx'. Detail: [Windows error 3] The system cannot find the path specified.
   
   ```
   
   Tested on following:
   Ubuntu 20.04.2 LTS - R 3.6.3 
   Windows 10 - R 4.1.0.
   
   Using arrow_4.0.1.
   
   MWE:
   ```
   library(arrow)
   library(dplyr)
   
   dir <- tempdir()
   
   df <- mtcars
   
   arrow::write_dataset(df, dir)
   
   # Directly using the object returned by open_dataset gives no messages
   result <- arrow::open_dataset(dir) %>%
       collect()
   1+1
   
   # Leaving dataset open causes messages
   result <- arrow::open_dataset(dir)
   
   1+1
   1+2
   1+3
   
   remove(result)
   # No messages anymore
   
   unlink(dir, recursive = TRUE)
   ```
   
   Thanks in advance for your feedback and for maintaining such a great package.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on issue #10809: Errors/warnings when saving open_dataset() to variable in R

Posted by GitBox <gi...@apache.org>.
thisisnic commented on issue #10809:
URL: https://github.com/apache/arrow/issues/10809#issuecomment-888188222


   One more thing to note @JasperSch, though I'm not sure if it will be relevant to your particular case or not - the way the temporary directory has been set up in the example above using `tempdir()` actually gives you a session-specific temporary directory which may already have items in it, and so the call to `open_dataset()` will raise an error if these other files are there.
   
   If instead, you create a temporary directory like so:
   ```
   tf <- tempfile()
   dir.create(tf)
   ```
   you'll then end up with a completely empty temporary directory.  It may not be relevant here, but highlighting it just in case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on issue #10809: Errors/warnings when saving open_dataset() to variable in R

Posted by GitBox <gi...@apache.org>.
thisisnic commented on issue #10809:
URL: https://github.com/apache/arrow/issues/10809#issuecomment-887329526


   Hi @JasperSch, thanks for reporting this and for providing a minimal working example!  
   
   Just wanting to check - in the actual code you run, are you using a temporary directory?  And if not, does the directory you're using just contain parquet file or anything else?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] JasperSch commented on issue #10809: Errors/warnings when saving open_dataset() to variable in R

Posted by GitBox <gi...@apache.org>.
JasperSch commented on issue #10809:
URL: https://github.com/apache/arrow/issues/10809#issuecomment-888056142


   Hi @thisisnic, it is indeed a completely empty temporary directory.
   
   In the meantime I found out that running the code straight from the terminal seems to work.
   The problem very likely is caused by some of console settings in my IDE. I've notified the developers about this.
   
   I'll close the issue here since I don't think there is something to solve on your side.
   In case I find out something useful, I'll drop an extra comment here later on.
   
   Thank you for your time.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] JasperSch commented on issue #10809: Errors/warnings when saving open_dataset() to variable in R

Posted by GitBox <gi...@apache.org>.
JasperSch commented on issue #10809:
URL: https://github.com/apache/arrow/issues/10809#issuecomment-888204370


   Thanks @thisisnic. I was aware of this and can confirm the problem remains in an empty directory. But it's indeed a pitfall in the MWE. I edited it to include `unlink` to clear out the folder first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] JasperSch closed issue #10809: Errors/warnings when saving open_dataset() to variable in R

Posted by GitBox <gi...@apache.org>.
JasperSch closed issue #10809:
URL: https://github.com/apache/arrow/issues/10809


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org