You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-issues@hadoop.apache.org by "ZanderXu (Jira)" <ji...@apache.org> on 2022/10/05 04:48:00 UTC
[jira] [Created] (HDFS-16793) ObserverNameNode fails to select streaming inputStream with a timeout exception

ZanderXu created HDFS-16793:
-------------------------------

             Summary: ObserverNameNode fails to select streaming inputStream with a timeout exception 
                 Key: HDFS-16793
                 URL: https://issues.apache.org/jira/browse/HDFS-16793
             Project: Hadoop HDFS
          Issue Type: Improvement
            Reporter: ZanderXu
            Assignee: ZanderXu


In out prod environment, we encountered one case that observer namenode failed to select streaming inputStream with a timeout exception. And the related code as bellow:

 
{code:java}
@Override
public void selectInputStreams(Collection<EditLogInputStream> estreams,
    long fromTxnId, boolean inProgressOk,
    boolean onlyDurableTxns) throws IOException { 
  if (inProgressOk && inProgressTailingEnabled) {
    ...
  }
  // Timeout here.
  selectStreamingInputStreams(streams, fromTxnId, inProgressOk,
      onlyDurableTxns);
} {code}
 

 

After looked into the code and found that JournalNode contains one very expensive and redundant operation that scan all of edits of the last in-progress segment with IO. The related code as bellow:

 
{code:java}
public List<RemoteEditLog> getRemoteEditLogs(long firstTxId,
    boolean inProgressOk) throws IOException {
  File currentDir = sd.getCurrentDir();
  List<EditLogFile> allLogFiles = matchEditLogs(currentDir);
  List<RemoteEditLog> ret = Lists.newArrayListWithCapacity(
      allLogFiles.size());
  for (EditLogFile elf : allLogFiles) {
    if (elf.hasCorruptHeader() || (!inProgressOk && elf.isInProgress())) {
      continue;
    }
    // Here.
    if (elf.isInProgress()) {
      try {
        elf.scanLog(getLastReadableTxId(), true);
      } catch (IOException e) {
        LOG.error("got IOException while trying to validate header of " +
            elf + ".  Skipping.", e);
        continue;
      }
    }
    if (elf.getFirstTxId() >= firstTxId) {
      ret.add(new RemoteEditLog(elf.firstTxId, elf.lastTxId,
          elf.isInProgress()));
    } else if (elf.getFirstTxId() < firstTxId && firstTxId <= elf.getLastTxId()) {
      // If the firstTxId is in the middle of an edit log segment. Return this
      // anyway and let the caller figure out whether it wants to use it.
      ret.add(new RemoteEditLog(elf.firstTxId, elf.lastTxId,
          elf.isInProgress()));
    }
  }
  
  Collections.sort(ret);
  
  return ret;
} {code}
 

Expensive:
 * This scan operation will scan all of the edits of the in-progress segment with IO.

Redundant:
 * This scan operation just find the lastTxId of this in-progress segment
 * But the caller method getEditLogManifest(long sinceTxId, boolean inProgressOk) in Journal.java just ignore the lastTxId of the in-progress segment and use getHighestWrittenTxId() as the lastTxId of the in-progress and return to namenode.
 * So, the scan operation is redundant.

 

If end user enable the Observer Read feature, the delay of the tailing edits from journalnode is very important, whether it is normal process or fallback process. 

And there is no more comments about this scan logic after looked into the code and HDFS-6634 which added this logic.

The only effect I can get is to scan the in-progress segment for corruption. But namenode can handle the corrupted in-progress segment.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org