You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Panagiotis Nezis (Jira)" <ji...@apache.org> on 2019/12/20 19:41:00 UTC

[jira] [Updated] (ARROW-7451) pyarrow.hdfs.connect crashes when executed asynchronously in processes

     [ https://issues.apache.org/jira/browse/ARROW-7451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Panagiotis Nezis updated ARROW-7451:
------------------------------------
    Priority: Critical  (was: Major)

> pyarrow.hdfs.connect crashes when executed asynchronously in processes
> ----------------------------------------------------------------------
>
>                 Key: ARROW-7451
>                 URL: https://issues.apache.org/jira/browse/ARROW-7451
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.15.1
>            Reporter: Panagiotis Nezis
>            Priority: Critical
>
> When trying to connect to {{hdfs}} from a {{ProcessPoolExecutor}} then the first call raises an Exception and the function never returns (potential deadlock?). On the other hand it works as expected with a {{ThreadPoolExecutor}}.
> Sample code that reproduces the problem follows:
> {code:python}
> import pyarrow as pa
> from concurrent.futures import (
>         ThreadPoolExecutor,
>         ProcessPoolExecutor,
>         wait,
>         ALL_COMPLETED)
> def ls():
>     fs = pa.hdfs.connect('hdfs://host')
>     print(fs.ls('/'))
> # This works as expected
> ls()
> # Running in parallel
> thread_pool = ThreadPoolExecutor(max_workers=4)
> process_pool = ProcessPoolExecutor(max_workers=4)
> def run(pool):
>     futures = [pool.submit(ls) for _ in range(5)]
>     wait(futures, return_when=ALL_COMPLETED)
> # The thread_pool works as expected
> run(thread_pool)
> # The process_pool raises an exception
> run(process_pool)
> {code}
> The following exception is raised:
> {noformat}
> java.lang.ClassFormatError: Incompatible magic value 1347093252 in class file org/xml/sax/helpers/LocatorImpl
>         at java.lang.ClassLoader.findBootstrapClass(Native Method)
>         at java.lang.ClassLoader.findBootstrapClassOrNull(ClassLoader.java:1015)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:413)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>         at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>         at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>         at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
>         at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2684)
>         at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2672)
>         at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2746)
>         at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2696)
>         at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2579)
>         at org.apache.hadoop.conf.Configuration.get(Configuration.java:1091)
>         at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:404)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)