You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Panagiotis Nezis (Jira)" <ji...@apache.org> on 2019/12/20 19:35:00 UTC

[jira] [Created] (ARROW-7451) pyarrow.hdfs.connect crashes when executed asynchronously in processes

Panagiotis Nezis created ARROW-7451:
---------------------------------------

             Summary: pyarrow.hdfs.connect crashes when executed asynchronously in processes
                 Key: ARROW-7451
                 URL: https://issues.apache.org/jira/browse/ARROW-7451
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.15.1
            Reporter: Panagiotis Nezis


When trying to connect to {{hdfs}} from a {{ProcessPoolExecutor}} then the first call raises an Exception and the function never returns (potential deadlock?). On the other hand it works as expected with a {{ThreadPoolExecutor}}.

Sample code that reproduces the problem follows:

{code:python}
import pyarrow as pa

from concurrent.futures import (
        ThreadPoolExecutor,
        ProcessPoolExecutor,
        wait,
        ALL_COMPLETED)

def ls():
    fs = pa.hdfs.connect('hdfs://host')
    print(fs.ls('/'))

# This works as expected
ls()

# Running in parallel
thread_pool = ThreadPoolExecutor(max_workers=4)
process_pool = ProcessPoolExecutor(max_workers=4)

def run(pool):
    futures = [pool.submit(ls) for _ in range(5)]
    wait(futures, return_when=ALL_COMPLETED)

# The thread_pool works as expected
run(thread_pool)

# The process_pool raises an exception
run(process_pool)
{code}

The following exception is raised:


{noformat}
java.lang.ClassFormatError: Incompatible magic value 1347093252 in class file org/xml/sax/helpers/LocatorImpl
        at java.lang.ClassLoader.findBootstrapClass(Native Method)
        at java.lang.ClassLoader.findBootstrapClassOrNull(ClassLoader.java:1015)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:413)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
        at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
        at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2684)
        at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2672)
        at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2746)
        at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2696)
        at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2579)
        at org.apache.hadoop.conf.Configuration.get(Configuration.java:1091)
        at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:404)
{noformat}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)