You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/28 15:44:38 UTC

[GitHub] [arrow] romainfrancois edited a comment on pull request #8533: ARROW-10080: [R] Call gc() and try again in MemoryPool

romainfrancois edited a comment on pull request #8533:
URL: https://github.com/apache/arrow/pull/8533#issuecomment-718021313


   I also had, in a branch that builds on top of #8256 ways to prematurely invalidate objects when we know they won't be used anymore. For example, in this function: 
   
   ```r
   collect.arrow_dplyr_query <- function(x, as_data_frame = TRUE, ...) {
     x <- ensure_group_vars(x)
     # Pull only the selected rows and cols into R
     if (query_on_dataset(x)) {
       # See dataset.R for Dataset and Scanner(Builder) classes
       tab <- Scanner$create(x)$ToTable()
     } else {
       # This is a Table/RecordBatch. See record-batch.R for the [ method
       tab <- x$.data[x$filtered_rows, x$selected_columns, keep_na = FALSE]
     }
     if (as_data_frame) {
       df <- as.data.frame(tab)
       tab$invalidate() # HERE <<<<<<------------- 
       restore_dplyr_features(df, x)
     } else {
       restore_dplyr_features(tab, x)
     }
   }
   ```
   
   inside the `if (as_data_frame)` as soon as `tab` is converted to a `data.frame` we will no longer need or use `tab`, so calling `$invalidate()` on it calls the destructor of the shared pointer held by the external pointer that lives in `tab`, so that the memory is free right now instead of later when the gc is called
   
   Is this still worth having ? And in that case should I push this to #8256 cc @nealrichardson 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org