You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Alessandro Molina (Jira)" <ji...@apache.org> on 2021/12/20 15:57:00 UTC

[jira] [Updated] (ARROW-13763) [Python] Files opened for read with pyarrow.parquet are not explicitly closed

     [ https://issues.apache.org/jira/browse/ARROW-13763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alessandro Molina updated ARROW-13763:
--------------------------------------
    Fix Version/s: 8.0.0
                       (was: 7.0.0)

> [Python] Files opened for read with pyarrow.parquet are not explicitly closed
> -----------------------------------------------------------------------------
>
>                 Key: ARROW-13763
>                 URL: https://issues.apache.org/jira/browse/ARROW-13763
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Parquet, Python
>    Affects Versions: 5.0.0
>         Environment: fsspec 2021.4.0
>            Reporter: Richard Kimoto
>            Assignee: Alessandro Molina
>            Priority: Major
>             Fix For: 8.0.0
>
>         Attachments: test.py
>
>
> It appears that files opened for read using pyarrow.parquet.read_table (and therefore pyarrow.parquet.ParquetDataset) are not explicitly closed.  
> This seems to be the case for both use_legacy_dataset=True and False.  The files don't remain open at the os level (verified using lsof).  They do however seem to rely on the python gc to close.  
> My use case is that i'd like to use a custom fsspec filesystem that interfaces to an s3 like API. It handles the remote download of the parquet file and passes to pyarrow a handle of a temporary file downloaded locally.  It then is looking for an explicit close() or __exit__() to then clean up the temp file.  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)