You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Otávio Vasques (Jira)" <ji...@apache.org> on 2020/11/25 19:33:00 UTC
[jira] [Created] (ARROW-10737) [Python] Pyarrow 2.0.0 seems to not
have the filesystem module.
Otávio Vasques created ARROW-10737:
--------------------------------------
Summary: [Python] Pyarrow 2.0.0 seems to not have the filesystem module.
Key: ARROW-10737
URL: https://issues.apache.org/jira/browse/ARROW-10737
Project: Apache Arrow
Issue Type: Bug
Environment: requirements:
numpy==1.19.0
pandas==1.1.4
scikit-learn==0.23.2
matplotlib==3.3.3
seaborn==0.11.0
fastapi==0.61.2
uvicorn==0.12.2
shap==0.37.0
pyarrow==2.0.0
datalab==0.7.0
PyHive==0.6.3
fsspec
jupyter
requests
Dockerfile:
FROM python:3.8.6-slim-buster
RUN apt-get update -y && \
apt-get install -y libgomp1 build-essential wget apt-transport-https gnupg
# Java
RUN mkdir -p /usr/share/man/man1 && \
wget -qO - https://adoptopenjdk.jfrog.io/adoptopenjdk/api/gpg/key/public | apt-key add - && \
echo "deb https://adoptopenjdk.jfrog.io/adoptopenjdk/deb buster main" | tee /etc/apt/sources.list.d/adoptopenjdk.list && \
apt-get update && apt-get install -y adoptopenjdk-8-hotspot
ENV JAVA_HOME /usr/lib/jvm/adoptopenjdk-8-hotspot-amd64
# Hadoop Installation
ENV HADOOP_USER_NAME hdfs
ENV HADOOP_PREFIX /usr/local/hadoop
ENV HADOOP_COMMON_HOME /usr/local/hadoop
ENV HADOOP_HDFS_HOME /usr/local/hadoop
ENV CONF_PREFIX /opt/hadoop
ENV HADOOP_CONF_DIR /opt/hadoop/hadoop-conf
ENV YARN_CONF_DIR /opt/hadoop/yarn-conf
ENV ARROW_LIBHDFS_DIR /usr/local/hadoop/lib/native/
ENV PATH="/usr/local/hadoop/bin:${PATH}"
RUN mkdir -p ${CONF_PREFIX}
COPY hadoop/conf/hadoop-conf /opt/hadoop/hadoop-conf
COPY hadoop/conf/yarn-conf /opt/hadoop/yarn-conf
RUN wget -qO - https://downloads.apache.org/hadoop/common/hadoop-2.10.1/hadoop-2.10.1.tar.gz | tar -xzf - -C /usr/local \
&& ln -s /usr/local/hadoop-2.10.1 /usr/local/hadoop
COPY requirements.txt .
RUN pip install -U setuptools pip
pip install -r requirements.txt
RUN mkdir /repo
ENV HOME /repo
WORKDIR /repo
Reporter: Otávio Vasques
{code:java}
Python 3.8.6 (default, Nov 18 2020, 14:00:57)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow as pa
>>> pa.__version__
'2.0.0'
>>> pa.fs
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.8/site-packages/pyarrow/__init__.py", line 252, in __getattr__
raise AttributeError(
AttributeError: module 'pyarrow' has no attribute 'fs'
>>>{code}
I was using the previous pa.hdfs method that is now deprecated. I tried to update and use the new HadoopFileSystem class from the fs module but I got this error. What could be causing this?
This is running inside a docker container. I will put requirements and the dockerfile in the environment section.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)