You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Alessandro Molina (Jira)" <ji...@apache.org> on 2021/04/15 13:03:00 UTC

[jira] [Commented] (ARROW-10910) [Python] Segmentation Fault when None given to read_table with legacy dataset

    [ https://issues.apache.org/jira/browse/ARROW-10910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17322162#comment-17322162 ] 

Alessandro Molina commented on ARROW-10910:
-------------------------------------------

It seems to me that it no longer causes a segfault anymore.

Nor with the legacy implementation:

```

>>> pq.read_table(None, use_legacy_dataset=True)
...

  File "pyarrow/io.pxi", line 1474, in pyarrow.lib.get_reader

    reader[0] = nf.get_random_access_file()

AttributeError: 'NoneType' object has no attribute 'get_random_access_file'
```

Nor when explicitly making a `ParquetFile` 
```


>>> pq.ParquetFile(None)
...

  File "pyarrow/io.pxi", line 1474, in pyarrow.lib.get_reader

    reader[0] = nf.get_random_access_file()

AttributeError: 'NoneType' object has no attribute 'get_random_access_file'
``` 

I guess a possible improvement would be to unsupported arguments in `io.get_native_file` and throw a `ValueError` there instead of propagating the `None`  value.

> [Python] Segmentation Fault when None given to read_table with legacy dataset
> -----------------------------------------------------------------------------
>
>                 Key: ARROW-10910
>                 URL: https://issues.apache.org/jira/browse/ARROW-10910
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.17.0
>         Environment: python: 3.8.3.final.0
> python-bits: 64
> OS: Linux
> OS-release: 5.4.0-56-generic
> machine: x86_64
> processor: x86_64
> byteorder: little
> LC_ALL: None
> LANG: en_US.UTF-8
> LOCALE: en_US.UTF-8
> pyarrow: 0.17.0
>            Reporter: Charles Burkland
>            Assignee: Ian Cook
>            Priority: Major
>              Labels: Bug:Generic, Python3, Segmenation_Fault, pyarrow
>             Fix For: 5.0.0
>
>
> h3. Code Sample (copy-pasteable)
> {code:python}
> import pyarrow.parquet as pq
> pq.read_table(None)
> {code}
> h3. Description
> The above snippet will produce a Segmentation Fault, which is highly undesirable. The reason I discovered this, was I had a function that was supposed to return a file path, but on my first iteration I forgot to return. Thus, when I ran my module with
> {code:python}
> pq.read_table(generate_fp()){code}
> it produced a Segmentation Fault.
> h3. Expected Output 
> Ideally this will raise an *ValueError*, indicating to the user that *None* is an invalid source/file path. In my opinion, this is much more desirable than a violent segfault.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)