You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Panagiotis Nezis (Jira)" <ji...@apache.org> on 2019/12/20 19:41:00 UTC
[jira] [Updated] (ARROW-7451) pyarrow.hdfs.connect crashes when
executed asynchronously in processes
[ https://issues.apache.org/jira/browse/ARROW-7451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Panagiotis Nezis updated ARROW-7451:
------------------------------------
Priority: Critical (was: Major)
> pyarrow.hdfs.connect crashes when executed asynchronously in processes
> ----------------------------------------------------------------------
>
> Key: ARROW-7451
> URL: https://issues.apache.org/jira/browse/ARROW-7451
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.15.1
> Reporter: Panagiotis Nezis
> Priority: Critical
>
> When trying to connect to {{hdfs}} from a {{ProcessPoolExecutor}} then the first call raises an Exception and the function never returns (potential deadlock?). On the other hand it works as expected with a {{ThreadPoolExecutor}}.
> Sample code that reproduces the problem follows:
> {code:python}
> import pyarrow as pa
> from concurrent.futures import (
> ThreadPoolExecutor,
> ProcessPoolExecutor,
> wait,
> ALL_COMPLETED)
> def ls():
> fs = pa.hdfs.connect('hdfs://host')
> print(fs.ls('/'))
> # This works as expected
> ls()
> # Running in parallel
> thread_pool = ThreadPoolExecutor(max_workers=4)
> process_pool = ProcessPoolExecutor(max_workers=4)
> def run(pool):
> futures = [pool.submit(ls) for _ in range(5)]
> wait(futures, return_when=ALL_COMPLETED)
> # The thread_pool works as expected
> run(thread_pool)
> # The process_pool raises an exception
> run(process_pool)
> {code}
> The following exception is raised:
> {noformat}
> java.lang.ClassFormatError: Incompatible magic value 1347093252 in class file org/xml/sax/helpers/LocatorImpl
> at java.lang.ClassLoader.findBootstrapClass(Native Method)
> at java.lang.ClassLoader.findBootstrapClassOrNull(ClassLoader.java:1015)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:413)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2684)
> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2672)
> at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2746)
> at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2696)
> at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2579)
> at org.apache.hadoop.conf.Configuration.get(Configuration.java:1091)
> at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:404)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)