You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Alessandro Molina (Jira)" <ji...@apache.org> on 2021/04/15 13:03:00 UTC
[jira] [Commented] (ARROW-10910) [Python] Segmentation Fault when
None given to read_table with legacy dataset
[ https://issues.apache.org/jira/browse/ARROW-10910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17322162#comment-17322162 ]
Alessandro Molina commented on ARROW-10910:
-------------------------------------------
It seems to me that it no longer causes a segfault anymore.
Nor with the legacy implementation:
```
>>> pq.read_table(None, use_legacy_dataset=True)
...
File "pyarrow/io.pxi", line 1474, in pyarrow.lib.get_reader
reader[0] = nf.get_random_access_file()
AttributeError: 'NoneType' object has no attribute 'get_random_access_file'
```
Nor when explicitly making a `ParquetFile`
```
>>> pq.ParquetFile(None)
...
File "pyarrow/io.pxi", line 1474, in pyarrow.lib.get_reader
reader[0] = nf.get_random_access_file()
AttributeError: 'NoneType' object has no attribute 'get_random_access_file'
```
I guess a possible improvement would be to unsupported arguments in `io.get_native_file` and throw a `ValueError` there instead of propagating the `None` value.
> [Python] Segmentation Fault when None given to read_table with legacy dataset
> -----------------------------------------------------------------------------
>
> Key: ARROW-10910
> URL: https://issues.apache.org/jira/browse/ARROW-10910
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.17.0
> Environment: python: 3.8.3.final.0
> python-bits: 64
> OS: Linux
> OS-release: 5.4.0-56-generic
> machine: x86_64
> processor: x86_64
> byteorder: little
> LC_ALL: None
> LANG: en_US.UTF-8
> LOCALE: en_US.UTF-8
> pyarrow: 0.17.0
> Reporter: Charles Burkland
> Assignee: Ian Cook
> Priority: Major
> Labels: Bug:Generic, Python3, Segmenation_Fault, pyarrow
> Fix For: 5.0.0
>
>
> h3. Code Sample (copy-pasteable)
> {code:python}
> import pyarrow.parquet as pq
> pq.read_table(None)
> {code}
> h3. Description
> The above snippet will produce a Segmentation Fault, which is highly undesirable. The reason I discovered this, was I had a function that was supposed to return a file path, but on my first iteration I forgot to return. Thus, when I ran my module with
> {code:python}
> pq.read_table(generate_fp()){code}
> it produced a Segmentation Fault.
> h3. Expected Output
> Ideally this will raise an *ValueError*, indicating to the user that *None* is an invalid source/file path. In my opinion, this is much more desirable than a violent segfault.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)