You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/28 15:43:10 UTC

[GitHub] [arrow] romainfrancois commented on pull request #8533: ARROW-10080: [R] Call gc() and try again in MemoryPool

romainfrancois commented on pull request #8533:
URL: https://github.com/apache/arrow/pull/8533#issuecomment-718021313


   I also had, in a branch that builds on top of #8256 ways to prematurely invalidate objects when we know they won't be used anymore. For example, in this function: 
   
   ```r
   collect.arrow_dplyr_query <- function(x, as_data_frame = TRUE, ...) {
     x <- ensure_group_vars(x)
     # Pull only the selected rows and cols into R
     if (query_on_dataset(x)) {
       # See dataset.R for Dataset and Scanner(Builder) classes
       tab <- Scanner$create(x)$ToTable()
     } else {
       # This is a Table/RecordBatch. See record-batch.R for the [ method
       tab <- x$.data[x$filtered_rows, x$selected_columns, keep_na = FALSE]
     }
     if (as_data_frame) {
       df <- as.data.frame(tab)
       tab$invalidate()
       restore_dplyr_features(df, x)
     } else {
       restore_dplyr_features(tab, x)
     }
   }
   ```
   
   inside the `if (as_data_frame)` as soon as `tab` is converted to a `data.frame` we will no longer need or use `tab`, so calling `$invalidate()` on it calls the destructor of the shared pointer held by the external pointer that lives in `tab`. 
   
   Is this still worth having ? And in that case should I push this to #8256 cc @nealrichardson 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org