You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2022/11/09 10:17:00 UTC

[jira] [Created] (ARROW-18293) [C++] Proxy memory pool crashes with Dataset scanning

Joris Van den Bossche created ARROW-18293:
---------------------------------------------

             Summary: [C++] Proxy memory pool crashes with Dataset scanning
                 Key: ARROW-18293
                 URL: https://issues.apache.org/jira/browse/ARROW-18293
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++
            Reporter: Joris Van den Bossche


Discovered while trying to use the proxy memory pool for testing ARROW-18164

See https://github.com/apache/arrow/pull/14516#discussion_r1005433867

This test segfaults (using the fixture in {{test_dataset.py}}:

{code:python}
@pytest.mark.parquet
def test_scanner_proxy_memory_pool(dataset):
    proxy_pool = pa.proxy_memory_pool(pa.default_memory_pool())
    _ = dataset.to_table(memory_pool=proxy_pool)
{code}

Response of [~westonpace]:

{quote}My guess is that the problem is that the scanner erroneously returns before all work is completely finished. Changing the thread pool or the memory pool too quickly after a scan can lead to this kind of error. The new scanner was created specifically to avoid this problem but it isn't the default yet (still working through some follow-up PRs to make sure we have the same functionality).{quote}

So once that becomes the default new scanner, we can see if this is fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)