You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/09/26 08:21:13 UTC

[GitHub] [iceberg] liupan664021 opened a new issue #3179: ParallelIterable may cause memory pressure for presto coordinator

liupan664021 opened a new issue #3179:
URL: https://github.com/apache/iceberg/issues/3179

- No limit for split queue
Presto iceberg connector use `BaseTableScan#planTasks()` to generate splits which is processed by a `ParallelIterable` to plan parallelly. The base logic of `ParallelIterable` is that: when `hasNext()` is called, plan tasks(each task for each manifest) are submitted to a thread pool to plan data files from each manifest, and the generated `FileScanTask` are stored in a queue and then are consumed by `next()`. However there is no limit size or target size for the queue. That is to say when a task is submitted, all data files of the manifest in the task will be added to the queue continuously unitl finished. So in extreme cases, if a manifest has a huge amount of a datafiles and the consume speed is slow meanwhile, the size of the queue will grow quickly and cause memory pressure of the presto coordinator , especially then the `FileScanTask` size is big, eg. planning table with extremelly complicated schema.

- Split queue is not cleared after iterator closed
Another problem is that when the `ParallelIterable` is closed, the queue is not cleared as the code shows below.
Since presto keeps history queries for a while, the `FileScanTask` remain in the queue are referenced by scheduler and then reference by a presto query which can not collected by gc. This may cause the coordinator OOM in extremelly cases.

In out test, we set up a presto coordinator (worker node as well) of -Xmx=40GB, and query a sql like `select col from table limit 1` in which the `table` has a verg complicated schema. When this sql is submitted continuously 8-10 times, the coordinator crashed caused of a oom error. We found that the `FileScanTask`s in the `ParallelIterator#queue` reached 2.5GB per query in the heap dump, and filled the heap of jvm quickly.

Of course, we can set `null` explicitly to the iterator reference of `combinedScanIterable` to let gc collect the splits after query completed, but I think it's resonable to clear the queue when closing the `ParallelIterator` since the iterator can not be accessed anymore after closed.
```
@Override
public void close() {
// cancel background tasks
for (int i = 0; i < taskFutures.length; i += 1) {
if (taskFutures[i] != null && !taskFutures[i].isDone()) {
taskFutures[i].cancel(true);
}
}
this.closed = true;
}
```

Maybe we can add some improvement for the `ParallelIterable` eg. add limit size and clear the queue after closed.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org