You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Sujeet-A (Jira)" <ji...@apache.org> on 2019/11/18 05:29:00 UTC
[jira] [Commented] (ARROW-5236) [Python] hdfs.connect() is trying to load libjvm in windows

    [ https://issues.apache.org/jira/browse/ARROW-5236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976293#comment-16976293 ] 

Sujeet-A commented on ARROW-5236:
---------------------------------

I am also facing the same issue on Windows 7 Professional.

Do we have any update on the resolution ?

 

> [Python] hdfs.connect() is trying to load libjvm in windows
> -----------------------------------------------------------
>
>                 Key: ARROW-5236
>                 URL: https://issues.apache.org/jira/browse/ARROW-5236
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>         Environment: Windows 7 Enterprise, pyarrow 0.13.0
>            Reporter: Kamaraju
>            Priority: Major
>              Labels: hdfs
>
> This issue was originally reported at [https://github.com/apache/arrow/issues/4215] . Raising a Jira as per Wes McKinney's request.
> Summary:
>  The following script
> {code}
> $ cat expt2.py
> import pyarrow as pa
> fs = pa.hdfs.connect()
> {code}
> tries to load libjvm in windows 7 which is not expected.
> {noformat}
> $ python ./expt2.py
> Traceback (most recent call last):
>   File "./expt2.py", line 3, in <module>
>     fs = pa.hdfs.connect()
>   File "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py", line 183, in connect
>     extra_conf=extra_conf)
>   File "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py", line 37, in __init__
>     self._connect(host, port, user, kerb_ticket, driver, extra_conf)
>   File "pyarrow\io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect
>   File "pyarrow\error.pxi", line 83, in pyarrow.lib.check_status
> pyarrow.lib.ArrowIOError: Unable to load libjvm
> {noformat}
> There is no libjvm file in Windows Java installation.
> {noformat}
> $ echo $JAVA_HOME
> C:\Progra~1\Java\jdk1.8.0_141
> $ find $JAVA_HOME -iname '*libjvm*'
> <returns nothing.>
> {noformat}
> I see the libjvm error with both 0.11.1 and 0.13.0 versions of pyarrow.
> Steps to reproduce the issue (with more details):
> Create the environment
> {noformat}
> $ cat scratch_py36_pyarrow.yml
> name: scratch_py36_pyarrow
> channels:
>   - defaults
> dependencies:
>   - python=3.6.8
>   - pyarrow
> {noformat}
> {noformat}
> $ conda env create -f scratch_py36_pyarrow.yml
> {noformat}
> Apply the following patch to lib/site-packages/pyarrow/hdfs.py . I had to do this since the Hadoop installation that comes with MapR <[https://mapr.com/]> windows client only has $HADOOP_HOME/bin/hadoop.cmd . There is no file named $HADOOP_HOME/bin/hadoop and so the subsequent subprocess.check_output call fails with FileNotFoundError if this patch is not applied.
> {noformat}
> $ cat ~/x/patch.txt
> 131c131
> <         hadoop_bin = '{0}/bin/hadoop'.format(os.environ['HADOOP_HOME'])
> ---
> >         hadoop_bin = '{0}/bin/hadoop.cmd'.format(os.environ['HADOOP_HOME'])
> $ patch /c/ProgramData/Continuum/Anaconda/envs/scratch_py36_pyarrow/lib/site-packages/pyarrow/hdfs.py ~/x/patch.txt
> patching file /c/ProgramData/Continuum/Anaconda/envs/scratch_py36_pyarrow/lib/site-packages/pyarrow/hdfs.py
> {noformat}
> Activate the environment
> {noformat}
> $ source activate scratch_py36_pyarrow
> {noformat}
> Sample script
> {noformat}
> $ cat expt2.py
> import pyarrow as pa
> fs = pa.hdfs.connect()
> {noformat}
> Execute the script
> {noformat}
> $ python ./expt2.py
> Traceback (most recent call last):
>   File "./expt2.py", line 3, in <module>
>     fs = pa.hdfs.connect()
>   File "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py", line 183, in connect
>     extra_conf=extra_conf)
>   File "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py", line 37, in __init__
>     self._connect(host, port, user, kerb_ticket, driver, extra_conf)
>   File "pyarrow\io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect
>   File "pyarrow\error.pxi", line 83, in pyarrow.lib.check_status
> pyarrow.lib.ArrowIOError: Unable to load libjvm
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)