You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "David Li (Jira)" <ji...@apache.org> on 2021/02/12 13:58:00 UTC

[jira] [Updated] (ARROW-11596) [C++][Python][Dataset] SIGSEGV when executing scan tasks with Python executors

     [ https://issues.apache.org/jira/browse/ARROW-11596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Li updated ARROW-11596:
-----------------------------
    Component/s:     (was: C++)

> [C++][Python][Dataset] SIGSEGV when executing scan tasks with Python executors
> ------------------------------------------------------------------------------
>
>                 Key: ARROW-11596
>                 URL: https://issues.apache.org/jira/browse/ARROW-11596
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 3.0.0
>            Reporter: David Li
>            Assignee: David Li
>            Priority: Major
>              Labels: dataset, datasets
>
> This crashes for me with a segfault:
> {code:python}
> import concurrent.futures
> import queue
> import numpy as np
> import pyarrow as pa
> import pyarrow.dataset as ds
> import pyarrow.fs as fs
> import pyarrow.parquet as pq
> schema = pa.schema([("foo", pa.float64())])
> table = pa.table([np.random.uniform(size=1024)], schema=schema)
> path = "/tmp/foo.parquet"
> pq.write_table(table, path)
> dataset = pa.dataset.FileSystemDataset.from_paths(
>     [path],
>     schema=schema,
>     format=ds.ParquetFileFormat(),
>     filesystem=fs.LocalFileSystem(),
> )
> with concurrent.futures.ThreadPoolExecutor(2) as executor:
>     tasks = dataset.scan()
>     q = queue.Queue()
>     def _prebuffer():
>         for task in tasks:
>             iterator = task.execute()
>             next(iterator)
>             q.put(iterator)
>     executor.submit(_prebuffer).result()
>     next(q.get())
> {code}
> {noformat}
> $ uname -a
> Linux chaconne 5.10.4-arch2-1 #1 SMP PREEMPT Fri, 01 Jan 2021 05:29:53 +0000 x86_64 GNU/Linux
> $ pip freeze
> numpy==1.20.1
> pyarrow==3.0.0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)