You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Matthias Boehm <mb...@gmail.com> on 2023/02/26 12:42:38 UTC

dev@systemds.apache.org

Hi all,

we recently setup a new cluster and in this process upgraded from Hadoop 
3.3.1 to 3.3.4. However, we face an issue with multi-threaded read and 
write of independent files that seems to get serialized under the covers 
of the HDFS implementations despite each node having 8+2 disks.

While debugging this issue, we also tested writing local files through 
the HDFS API which works perfectly fine, top shows very good core 
utilization, and is much faster than writes to HDFS. In contrast, 
multi-threaded reads of multiple files from HDFS show only a 100% 
(single core) utilization, and writes are even below that. Do you have 
any pointers on configurations knobs (maybe related to [1]) to fix that 
or is this a known bug? I cannot imagine that this is intended behavior 
because it would similarly mean that all I/O requests from threads of a 
Spark executor would get serialized.

[1] 
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/FairCallQueue.html 


Regards,
Matthias

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org