You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Gregory Hayes (JIRA)" <ji...@apache.org> on 2019/08/10 19:34:00 UTC

[jira] [Commented] (ARROW-5131) [Python] Add Azure Datalake Filesystem Gen1 Wrapper for pyarrow

    [ https://issues.apache.org/jira/browse/ARROW-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904502#comment-16904502 ] 

Gregory Hayes commented on ARROW-5131:
--------------------------------------

Wanted to touch base on this.  I’ve been looking at rewriting and cleaning up the dask-adlfs package.  Wanted to get input on making it the AzureDatalakeFileSystem class a  subclass of both the AbstractFileSystem and the AzureDLFileSystem, using multiple inheritance.  This allows us to pull in all of the methods from both classes and utilize the Azure library that already exists.

If that seems reasonable, then I’ll proceed along those lines.



> [Python] Add Azure Datalake Filesystem Gen1 Wrapper for pyarrow
> ---------------------------------------------------------------
>
>                 Key: ARROW-5131
>                 URL: https://issues.apache.org/jira/browse/ARROW-5131
>             Project: Apache Arrow
>          Issue Type: Wish
>          Components: Python
>    Affects Versions: 0.12.1
>            Reporter: Gregory Hayes
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> The current pyarrow package can only read parquet files that have been written to Gen1 Azure Datalake using the fastparquet engine.  This only works if the dask-adlfs package is explicitly installed and imported.  I've added a method to the dask-adlfs package, found [here|https://github.com/dask/dask-adlfs], and issued a PR for that change.  To support this capability, added an ADLFSWrapper to filesystem.py file.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)