You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "David Li (Jira)" <ji...@apache.org> on 2021/02/11 16:29:00 UTC

[jira] [Created] (ARROW-11596) [C++][Python][Dataset] SIGSEGV when executing scan tasks with Python executors

David Li created ARROW-11596:
--------------------------------

             Summary: [C++][Python][Dataset] SIGSEGV when executing scan tasks with Python executors
                 Key: ARROW-11596
                 URL: https://issues.apache.org/jira/browse/ARROW-11596
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++, Python
    Affects Versions: 3.0.0
            Reporter: David Li
            Assignee: David Li


This crashes for me with a segfault:
{code:python}
import concurrent.futures
import queue

import numpy as np
import pyarrow as pa
import pyarrow.dataset as ds
import pyarrow.fs as fs
import pyarrow.parquet as pq


schema = pa.schema([("foo", pa.float64())])
table = pa.table([np.random.uniform(size=1024)], schema=schema)
path = "/tmp/foo.parquet"
pq.write_table(table, path)
dataset = pa.dataset.FileSystemDataset.from_paths(
    [path],
    schema=schema,
    format=ds.ParquetFileFormat(),
    filesystem=fs.LocalFileSystem(),
)

with concurrent.futures.ThreadPoolExecutor(2) as executor:
    tasks = dataset.scan()
    q = queue.Queue()

    def _prebuffer():
        for task in tasks:
            iterator = task.execute()
            next(iterator)
            q.put(iterator)

    executor.submit(_prebuffer).result()
    next(q.get())
{code}

{noformat}
$ uname -a
Linux chaconne 5.10.4-arch2-1 #1 SMP PREEMPT Fri, 01 Jan 2021 05:29:53 +0000 x86_64 GNU/Linux
$ pip freeze
numpy==1.20.1
pyarrow==3.0.0
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)