You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Otávio Vasques (Jira)" <ji...@apache.org> on 2020/11/25 21:28:00 UTC
[jira] [Closed] (ARROW-10737) [Python] Pyarrow 2.0.0 seems to not have the filesystem module.

     [ https://issues.apache.org/jira/browse/ARROW-10737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otávio Vasques closed ARROW-10737.
----------------------------------
    Resolution: Fixed

> [Python] Pyarrow 2.0.0 seems to not have the filesystem module.
> ---------------------------------------------------------------
>
>                 Key: ARROW-10737
>                 URL: https://issues.apache.org/jira/browse/ARROW-10737
>             Project: Apache Arrow
>          Issue Type: Bug
>         Environment: requirements:
> numpy==1.19.0
> pandas==1.1.4
> scikit-learn==0.23.2
> matplotlib==3.3.3
> seaborn==0.11.0
> fastapi==0.61.2
> uvicorn==0.12.2
> shap==0.37.0
> pyarrow==2.0.0
> datalab==0.7.0
> PyHive==0.6.3
> fsspec
> jupyter
> requests
> Dockerfile:
> FROM python:3.8.6-slim-buster
> RUN apt-get update -y && \
>     apt-get install -y libgomp1 build-essential wget apt-transport-https gnupg
> # Java
> RUN mkdir -p /usr/share/man/man1 && \
>     wget -qO - https://adoptopenjdk.jfrog.io/adoptopenjdk/api/gpg/key/public | apt-key add - && \
>     echo "deb https://adoptopenjdk.jfrog.io/adoptopenjdk/deb buster main" | tee /etc/apt/sources.list.d/adoptopenjdk.list && \
>     apt-get update && apt-get install -y adoptopenjdk-8-hotspot
> ENV JAVA_HOME /usr/lib/jvm/adoptopenjdk-8-hotspot-amd64
> # Hadoop Installation
> ENV HADOOP_USER_NAME hdfs
> ENV HADOOP_PREFIX /usr/local/hadoop
> ENV HADOOP_COMMON_HOME /usr/local/hadoop
> ENV HADOOP_HDFS_HOME /usr/local/hadoop
> ENV CONF_PREFIX /opt/hadoop
> ENV HADOOP_CONF_DIR /opt/hadoop/hadoop-conf
> ENV YARN_CONF_DIR /opt/hadoop/yarn-conf
> ENV ARROW_LIBHDFS_DIR /usr/local/hadoop/lib/native/
> ENV PATH="/usr/local/hadoop/bin:${PATH}"
> RUN mkdir -p ${CONF_PREFIX}
> COPY hadoop/conf/hadoop-conf /opt/hadoop/hadoop-conf 
> COPY hadoop/conf/yarn-conf /opt/hadoop/yarn-conf 
> RUN wget -qO - https://downloads.apache.org/hadoop/common/hadoop-2.10.1/hadoop-2.10.1.tar.gz | tar -xzf - -C /usr/local \
> && ln -s /usr/local/hadoop-2.10.1 /usr/local/hadoop
> COPY requirements.txt .
> RUN pip install -U setuptools pip 
>     pip install -r requirements.txt
> RUN mkdir /repo
> ENV HOME /repo
> WORKDIR /repo
>            Reporter: Otávio Vasques
>            Priority: Major
>
> {code:java}
> Python 3.8.6 (default, Nov 18 2020, 14:00:57)
> [GCC 8.3.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyarrow as pa
> >>> pa.__version__
> '2.0.0'
> >>> pa.fs
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "/usr/local/lib/python3.8/site-packages/pyarrow/__init__.py", line 252, in __getattr__
>  raise AttributeError(
> AttributeError: module 'pyarrow' has no attribute 'fs'
> >>>{code}
> I was using the previous pa.hdfs method that is now deprecated. I tried to update and use the new HadoopFileSystem class from the fs module but I got this error. What could be causing this?
> This is running inside a docker container. I will put requirements and the dockerfile in the environment section.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)