You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Panagiotis Nezis (Jira)" <ji...@apache.org> on 2019/12/20 19:35:00 UTC
[jira] [Created] (ARROW-7451) pyarrow.hdfs.connect crashes when
executed asynchronously in processes
Panagiotis Nezis created ARROW-7451:
---------------------------------------
Summary: pyarrow.hdfs.connect crashes when executed asynchronously in processes
Key: ARROW-7451
URL: https://issues.apache.org/jira/browse/ARROW-7451
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.15.1
Reporter: Panagiotis Nezis
When trying to connect to {{hdfs}} from a {{ProcessPoolExecutor}} then the first call raises an Exception and the function never returns (potential deadlock?). On the other hand it works as expected with a {{ThreadPoolExecutor}}.
Sample code that reproduces the problem follows:
{code:python}
import pyarrow as pa
from concurrent.futures import (
ThreadPoolExecutor,
ProcessPoolExecutor,
wait,
ALL_COMPLETED)
def ls():
fs = pa.hdfs.connect('hdfs://host')
print(fs.ls('/'))
# This works as expected
ls()
# Running in parallel
thread_pool = ThreadPoolExecutor(max_workers=4)
process_pool = ProcessPoolExecutor(max_workers=4)
def run(pool):
futures = [pool.submit(ls) for _ in range(5)]
wait(futures, return_when=ALL_COMPLETED)
# The thread_pool works as expected
run(thread_pool)
# The process_pool raises an exception
run(process_pool)
{code}
The following exception is raised:
{noformat}
java.lang.ClassFormatError: Incompatible magic value 1347093252 in class file org/xml/sax/helpers/LocatorImpl
at java.lang.ClassLoader.findBootstrapClass(Native Method)
at java.lang.ClassLoader.findBootstrapClassOrNull(ClassLoader.java:1015)
at java.lang.ClassLoader.loadClass(ClassLoader.java:413)
at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2684)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2672)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2746)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2696)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2579)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:1091)
at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:404)
{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)