You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Matthias Boehm <mb...@gmail.com> on 2023/02/26 12:42:38 UTC
dev@systemds.apache.org
Hi all,
we recently setup a new cluster and in this process upgraded from Hadoop
3.3.1 to 3.3.4. However, we face an issue with multi-threaded read and
write of independent files that seems to get serialized under the covers
of the HDFS implementations despite each node having 8+2 disks.
While debugging this issue, we also tested writing local files through
the HDFS API which works perfectly fine, top shows very good core
utilization, and is much faster than writes to HDFS. In contrast,
multi-threaded reads of multiple files from HDFS show only a 100%
(single core) utilization, and writes are even below that. Do you have
any pointers on configurations knobs (maybe related to [1]) to fix that
or is this a known bug? I cannot imagine that this is intended behavior
because it would similarly mean that all I/O requests from threads of a
Spark executor would get serialized.
[1]
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/FairCallQueue.html
Regards,
Matthias
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org