You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Stephen O'Donnell (Jira)" <ji...@apache.org> on 2020/06/12 15:36:00 UTC
[jira] [Created] (HDDS-3794) Topology Aware read does not work
correctly in XceiverClientGrpc
Stephen O'Donnell created HDDS-3794:
---------------------------------------
Summary: Topology Aware read does not work correctly in XceiverClientGrpc
Key: HDDS-3794
URL: https://issues.apache.org/jira/browse/HDDS-3794
Project: Hadoop Distributed Data Store
Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Stephen O'Donnell
Assignee: Stephen O'Donnell
Fix For: 0.6.0
In XceiverClientGrpc.java, the calls to read a block or chunks for a Datanode end up in the private method sendCommandWithRetry(). In this method it decides which datanode it should send the request to. To do that, it checks if there is a cached DN connection for the given block and if so it uses that. If there is no cached connection, it should take network topology into account or shuffle the nodes:
{code}
List<DatanodeDetails> datanodeList = null;
DatanodeBlockID blockID = null;
if (request.getCmdType() == ContainerProtos.Type.ReadChunk) {
blockID = request.getReadChunk().getBlockID();
} else if (request.getCmdType() == ContainerProtos.Type.GetSmallFile) {
blockID = request.getGetSmallFile().getBlock().getBlockID();
}
if (blockID != null) {
LOG.info("blockid is not null");
// Check if the DN to which the GetBlock command was sent has been cached.
DatanodeDetails cachedDN = getBlockDNcache.get(blockID);
if (cachedDN != null) {
LOG.info("Cached DN is not null");
datanodeList = pipeline.getNodes();
int getBlockDNCacheIndex = datanodeList.indexOf(cachedDN);
if (getBlockDNCacheIndex > 0) {
LOG.info("pulling cached dn to top of list");
// Pull the Cached DN to the top of the DN list
Collections.swap(datanodeList, 0, getBlockDNCacheIndex);
}
} else if (topologyAwareRead) {
LOG.info("topology aware - order DNs");
datanodeList = pipeline.getNodesInOrder();
}
}
if (datanodeList == null) {
LOG.info("List is null - shuffling");
datanodeList = pipeline.getNodes();
// Shuffle datanode list so that clients do not read in the same order
// every time.
Collections.shuffle(datanodeList);
}
<call to DN after here>
{code}
The normal flow for the client is to first make a getBlock() call to the DN and then a readChunk() call.
Due to the logic at the top of the block above, blockID is always going to be null for the getBlock() call, then it never checks the topologyAwareRead section and shuffles the node.
Then for readChunk, it will find the blockID, find a cached DN, which was the result of the shuffle, and then it reuses that DN.
Therefore the topologyAwareRead does not work as expected.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org