You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Otávio Vasques (Jira)" <ji...@apache.org> on 2020/11/25 19:33:00 UTC

[jira] [Created] (ARROW-10737) [Python] Pyarrow 2.0.0 seems to not have the filesystem module.

Otávio Vasques created ARROW-10737:
--------------------------------------

             Summary: [Python] Pyarrow 2.0.0 seems to not have the filesystem module.
                 Key: ARROW-10737
                 URL: https://issues.apache.org/jira/browse/ARROW-10737
             Project: Apache Arrow
          Issue Type: Bug
         Environment: requirements:

numpy==1.19.0
pandas==1.1.4
scikit-learn==0.23.2
matplotlib==3.3.3
seaborn==0.11.0
fastapi==0.61.2
uvicorn==0.12.2
shap==0.37.0
pyarrow==2.0.0
datalab==0.7.0
PyHive==0.6.3
fsspec
jupyter
requests

Dockerfile:
FROM python:3.8.6-slim-buster

RUN apt-get update -y && \
    apt-get install -y libgomp1 build-essential wget apt-transport-https gnupg

# Java
RUN mkdir -p /usr/share/man/man1 && \
    wget -qO - https://adoptopenjdk.jfrog.io/adoptopenjdk/api/gpg/key/public | apt-key add - && \
    echo "deb https://adoptopenjdk.jfrog.io/adoptopenjdk/deb buster main" | tee /etc/apt/sources.list.d/adoptopenjdk.list && \
    apt-get update && apt-get install -y adoptopenjdk-8-hotspot
ENV JAVA_HOME /usr/lib/jvm/adoptopenjdk-8-hotspot-amd64

# Hadoop Installation
ENV HADOOP_USER_NAME hdfs
ENV HADOOP_PREFIX /usr/local/hadoop
ENV HADOOP_COMMON_HOME /usr/local/hadoop
ENV HADOOP_HDFS_HOME /usr/local/hadoop
ENV CONF_PREFIX /opt/hadoop
ENV HADOOP_CONF_DIR /opt/hadoop/hadoop-conf
ENV YARN_CONF_DIR /opt/hadoop/yarn-conf
ENV ARROW_LIBHDFS_DIR /usr/local/hadoop/lib/native/
ENV PATH="/usr/local/hadoop/bin:${PATH}"

RUN mkdir -p ${CONF_PREFIX}

COPY hadoop/conf/hadoop-conf /opt/hadoop/hadoop-conf 
COPY hadoop/conf/yarn-conf /opt/hadoop/yarn-conf 

RUN wget -qO - https://downloads.apache.org/hadoop/common/hadoop-2.10.1/hadoop-2.10.1.tar.gz | tar -xzf - -C /usr/local \
&& ln -s /usr/local/hadoop-2.10.1 /usr/local/hadoop

COPY requirements.txt .

RUN pip install -U setuptools pip 
    pip install -r requirements.txt

RUN mkdir /repo
ENV HOME /repo
WORKDIR /repo
            Reporter: Otávio Vasques


{code:java}
Python 3.8.6 (default, Nov 18 2020, 14:00:57)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow as pa
>>> pa.__version__
'2.0.0'
>>> pa.fs
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/usr/local/lib/python3.8/site-packages/pyarrow/__init__.py", line 252, in __getattr__
 raise AttributeError(
AttributeError: module 'pyarrow' has no attribute 'fs'
>>>{code}

I was using the previous pa.hdfs method that is now deprecated. I tried to update and use the new HadoopFileSystem class from the fs module but I got this error. What could be causing this?

This is running inside a docker container. I will put requirements and the dockerfile in the environment section.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)