You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2022/11/09 10:17:00 UTC
[jira] [Created] (ARROW-18293) [C++] Proxy memory pool crashes with Dataset scanning
Joris Van den Bossche created ARROW-18293:
---------------------------------------------
Summary: [C++] Proxy memory pool crashes with Dataset scanning
Key: ARROW-18293
URL: https://issues.apache.org/jira/browse/ARROW-18293
Project: Apache Arrow
Issue Type: Bug
Components: C++
Reporter: Joris Van den Bossche
Discovered while trying to use the proxy memory pool for testing ARROW-18164
See https://github.com/apache/arrow/pull/14516#discussion_r1005433867
This test segfaults (using the fixture in {{test_dataset.py}}:
{code:python}
@pytest.mark.parquet
def test_scanner_proxy_memory_pool(dataset):
proxy_pool = pa.proxy_memory_pool(pa.default_memory_pool())
_ = dataset.to_table(memory_pool=proxy_pool)
{code}
Response of [~westonpace]:
{quote}My guess is that the problem is that the scanner erroneously returns before all work is completely finished. Changing the thread pool or the memory pool too quickly after a scan can lead to this kind of error. The new scanner was created specifically to avoid this problem but it isn't the default yet (still working through some follow-up PRs to make sure we have the same functionality).{quote}
So once that becomes the default new scanner, we can see if this is fixed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)