You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/28 15:44:38 UTC
[GitHub] [arrow] romainfrancois edited a comment on pull request #8533: ARROW-10080: [R] Call gc() and try again in MemoryPool
romainfrancois edited a comment on pull request #8533:
URL: https://github.com/apache/arrow/pull/8533#issuecomment-718021313
I also had, in a branch that builds on top of #8256 ways to prematurely invalidate objects when we know they won't be used anymore. For example, in this function:
```r
collect.arrow_dplyr_query <- function(x, as_data_frame = TRUE, ...) {
x <- ensure_group_vars(x)
# Pull only the selected rows and cols into R
if (query_on_dataset(x)) {
# See dataset.R for Dataset and Scanner(Builder) classes
tab <- Scanner$create(x)$ToTable()
} else {
# This is a Table/RecordBatch. See record-batch.R for the [ method
tab <- x$.data[x$filtered_rows, x$selected_columns, keep_na = FALSE]
}
if (as_data_frame) {
df <- as.data.frame(tab)
tab$invalidate() # HERE <<<<<<-------------
restore_dplyr_features(df, x)
} else {
restore_dplyr_features(tab, x)
}
}
```
inside the `if (as_data_frame)` as soon as `tab` is converted to a `data.frame` we will no longer need or use `tab`, so calling `$invalidate()` on it calls the destructor of the shared pointer held by the external pointer that lives in `tab`, so that the memory is free right now instead of later when the gc is called
Is this still worth having ? And in that case should I push this to #8256 cc @nealrichardson
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org