You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "David Li (Jira)" <ji...@apache.org> on 2021/02/11 16:29:00 UTC
[jira] [Created] (ARROW-11596) [C++][Python][Dataset] SIGSEGV when
executing scan tasks with Python executors
David Li created ARROW-11596:
--------------------------------
Summary: [C++][Python][Dataset] SIGSEGV when executing scan tasks with Python executors
Key: ARROW-11596
URL: https://issues.apache.org/jira/browse/ARROW-11596
Project: Apache Arrow
Issue Type: Bug
Components: C++, Python
Affects Versions: 3.0.0
Reporter: David Li
Assignee: David Li
This crashes for me with a segfault:
{code:python}
import concurrent.futures
import queue
import numpy as np
import pyarrow as pa
import pyarrow.dataset as ds
import pyarrow.fs as fs
import pyarrow.parquet as pq
schema = pa.schema([("foo", pa.float64())])
table = pa.table([np.random.uniform(size=1024)], schema=schema)
path = "/tmp/foo.parquet"
pq.write_table(table, path)
dataset = pa.dataset.FileSystemDataset.from_paths(
[path],
schema=schema,
format=ds.ParquetFileFormat(),
filesystem=fs.LocalFileSystem(),
)
with concurrent.futures.ThreadPoolExecutor(2) as executor:
tasks = dataset.scan()
q = queue.Queue()
def _prebuffer():
for task in tasks:
iterator = task.execute()
next(iterator)
q.put(iterator)
executor.submit(_prebuffer).result()
next(q.get())
{code}
{noformat}
$ uname -a
Linux chaconne 5.10.4-arch2-1 #1 SMP PREEMPT Fri, 01 Jan 2021 05:29:53 +0000 x86_64 GNU/Linux
$ pip freeze
numpy==1.20.1
pyarrow==3.0.0
{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)