You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-issues@hadoop.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/07/13 14:57:00 UTC

[jira] [Work logged] (HDFS-16659) JournalNode should throw CacheMissException if SinceTxId is bigger than HighestWrittenTxId

     [ https://issues.apache.org/jira/browse/HDFS-16659?focusedWorklogId=790466&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-790466 ]

ASF GitHub Bot logged work on HDFS-16659:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Jul/22 14:56
            Start Date: 13/Jul/22 14:56
    Worklog Time Spent: 10m 
      Work Description: ZanderXu opened a new pull request, #4560:
URL: https://github.com/apache/hadoop/pull/4560

   ### Description of PR
   JournalNode should throw `CacheMissException` if `sinceTxId` is bigger than `highestWrittenTxId`. And it will caused EditlogTailer can not able to tail edits. And it maybe caused ObserverNameNode can not able handle requests from clients.
   
   Suppose there are 3 journalNodes, JN0 ~ JN1.
   The corner case as blew:
   * JN0 has some abnormal cases when Active Namenode is journaling Edits with start txId 11
   * NameNode just ignore the abnormal JN0 and continue to write Edits to Journal 1 and 2
   * JN0 backed to health
   * Observer NameNode try to select EditLogInputStream vis PRC with start txId 21
   * Journal 1 has some abnormal cases caused slow rpc response
   
   And the expected selecting result is: Response should contain 20 Edits from txId 21 to txId 40 from JN1 and JN2. Because Active NameNode successfully write these Edits to JN1 and JN2 and failed write these edits to JN0, so there is no Edits from id 21 to 40 in the cache of JN0.
   
   But in the current implementation,  there is no Edits in the Response. Because namenode successfully got a response from JN0 that did not contains any Edits.
   And the bug code as blew:
   ```
   if (sinceTxId > getHighestWrittenTxId()) {
       // Requested edits that don't exist yet; short-circuit the cache here
       metrics.rpcEmptyResponses.incr();
       return GetJournaledEditsResponseProto.newBuilder().setTxnCount(0).build(); 
   }
   ```
   
   




Issue Time Tracking
-------------------

            Worklog Id:     (was: 790466)
    Remaining Estimate: 0h
            Time Spent: 10m

> JournalNode should throw CacheMissException if SinceTxId is bigger than HighestWrittenTxId
> ------------------------------------------------------------------------------------------
>
>                 Key: HDFS-16659
>                 URL: https://issues.apache.org/jira/browse/HDFS-16659
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: ZanderXu
>            Assignee: ZanderXu
>            Priority: Critical
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> JournalNode should throw `CacheMissException` if `sinceTxId` is bigger than `highestWrittenTxId`. And it will caused EditlogTailer can not able to tail edits. And it maybe caused ObserverNameNode can not able handle requests from clients.
> Suppose there are 3 journalNodes, JN0 ~ JN1.
> The corner case as blew:
> * JN0 has some abnormal cases when Active Namenode is journaling Edits with start txId 11
> * NameNode just ignore the abnormal JN0 and continue to write Edits to Journal 1 and 2
> * JN0 backed to health
> * Observer NameNode try to select EditLogInputStream vis PRC with start txId 21
> * Journal 1 has some abnormal cases caused slow rpc response
> And the expected selecting result is: Response should contain 20 Edits from txId 21 to txId 40 from JN1 and JN2. Because Active NameNode successfully write these Edits to JN1 and JN2 and failed write these edits to JN0, so there is no Edits from id 21 to 40 in the cache of JN0.
> But in the current implementation,  there is no Edits in the Response. Because namenode successfully got a response from JN0 that did not contains any Edits.
> And the bug code as blew:
> {code:java}
> if (sinceTxId > getHighestWrittenTxId()) {
>     // Requested edits that don't exist yet; short-circuit the cache here
>     metrics.rpcEmptyResponses.incr();
>     return GetJournaledEditsResponseProto.newBuilder().setTxnCount(0).build(); 
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org