You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Giridhar Addepalli <gi...@gmail.com> on 2014/09/28 11:17:53 UTC

Doubt Regarding QJM protocol - example 2.10.6 of Quorum-Journal Design document

Hi All,

I am going through Quorum Journal Design document.

It is mentioned in Section 2.8 - In Accept Recovery RPC section
"
If the current on-disk log is missing, or a *different length *than the
proposed recovery, the JN downloads the log from the provided URI,
replacing any current copy of the log segment.
"

I can see it that the code follows above design

Source :: Journal.java
             ....

  public synchronized void acceptRecovery(RequestInfo reqInfo,
      SegmentStateProto segment, URL fromUrl)
      throws IOException {

      ....
      if (currentSegment == null ||
        currentSegment.getEndTxId() != segment.getEndTxId()) {
      ....
      } else {
      LOG.info("Skipping download of log " +
          TextFormat.shortDebugString(segment) +
          ": already have up-to-date logs");
      }
      ....
  }
....

My question is what if on-disk log is present and is of *same length *as
the proposed recovery

If JournalNode is skipping download because the logs are of same length,
then we could end up in a situation where finalized log segments contain
different data !

This could happen if we follow example 2.10.6

As per that example we write transactions (151-153 ) on JN1
then when recovery proceeded with only JN2 & JN3 let us assume that we
write again *different transactions* as (151-153) . Then after the crash
when we run recovery , JN1 will skip downloading correct segment from
JN2/JN3 as it thinks it has correct segment( as per the code pasted above).
This will result in a situation where finalized segment ( edits_151-153 )
on JN1 is different from finalized segment edits_151-153 on JN2/JN3.

Please let me know if i have gone wrong some where, and this situation is
taken care of.

Thanks,
Giridhar.

Re: Doubt Regarding QJM protocol - example 2.10.6 of Quorum-Journal Design document

Posted by Ulul <ha...@ulul.org>.
Hi

A developer should answer that but a quick look to an edit file with od 
suggests that record are not fixed length. So maybe the likeliness of 
the situation you suggest is so low that there is no need to check more 
than file size

Ulul

Le 28/09/2014 11:17, Giridhar Addepalli a écrit :
> Hi All,
>
> I am going through Quorum Journal Design document.
>
> It is mentioned in Section 2.8 - In Accept Recovery RPC section
> "
> If the current on-disk log is missing, or a /different length /than 
> the proposed recovery, the JN downloads the log from the provided URI, 
> replacing any current copy of the log segment.
> "
>
> I can see it that the code follows above design
>
> Source :: Journal.java
>              ....
>
>       public synchronized void acceptRecovery(RequestInfo reqInfo,
>           SegmentStateProto segment, URL fromUrl)
>           throws IOException {
>
>           ....
>           if (currentSegment == null ||
>             currentSegment.getEndTxId() != segment.getEndTxId()) {
>           ....
>           } else {
>           LOG.info("Skipping download of log " +
>               TextFormat.shortDebugString(segment) +
>               ": already have up-to-date logs");
>           }
>           ....
>       }
>     ....
>
> My question is what if on-disk log is present and is of /same length 
> /as the proposed recovery
>
> If JournalNode is skipping download because the logs are of same 
> length, then we could end up in a situation where finalized log 
> segments contain different data !
>
> This could happen if we follow example 2.10.6
>
> As per that example we write transactions (151-153 ) on JN1
> then when recovery proceeded with only JN2 & JN3 let us assume that we 
> write again /different transactions/ as (151-153) . Then after the 
> crash when we run recovery , JN1 will skip downloading correct segment 
> from JN2/JN3 as it thinks it has correct segment( as per the code 
> pasted above). This will result in a situation where finalized segment 
> ( edits_151-153 ) on JN1 is different from finalized segment 
> edits_151-153 on JN2/JN3.
>
> Please let me know if i have gone wrong some where, and this situation 
> is taken care of.
>
> Thanks,
> Giridhar.


Re: Doubt Regarding QJM protocol - example 2.10.6 of Quorum-Journal Design document

Posted by Ulul <ha...@ulul.org>.
Hi

A developer should answer that but a quick look to an edit file with od 
suggests that record are not fixed length. So maybe the likeliness of 
the situation you suggest is so low that there is no need to check more 
than file size

Ulul

Le 28/09/2014 11:17, Giridhar Addepalli a écrit :
> Hi All,
>
> I am going through Quorum Journal Design document.
>
> It is mentioned in Section 2.8 - In Accept Recovery RPC section
> "
> If the current on-disk log is missing, or a /different length /than 
> the proposed recovery, the JN downloads the log from the provided URI, 
> replacing any current copy of the log segment.
> "
>
> I can see it that the code follows above design
>
> Source :: Journal.java
>              ....
>
>       public synchronized void acceptRecovery(RequestInfo reqInfo,
>           SegmentStateProto segment, URL fromUrl)
>           throws IOException {
>
>           ....
>           if (currentSegment == null ||
>             currentSegment.getEndTxId() != segment.getEndTxId()) {
>           ....
>           } else {
>           LOG.info("Skipping download of log " +
>               TextFormat.shortDebugString(segment) +
>               ": already have up-to-date logs");
>           }
>           ....
>       }
>     ....
>
> My question is what if on-disk log is present and is of /same length 
> /as the proposed recovery
>
> If JournalNode is skipping download because the logs are of same 
> length, then we could end up in a situation where finalized log 
> segments contain different data !
>
> This could happen if we follow example 2.10.6
>
> As per that example we write transactions (151-153 ) on JN1
> then when recovery proceeded with only JN2 & JN3 let us assume that we 
> write again /different transactions/ as (151-153) . Then after the 
> crash when we run recovery , JN1 will skip downloading correct segment 
> from JN2/JN3 as it thinks it has correct segment( as per the code 
> pasted above). This will result in a situation where finalized segment 
> ( edits_151-153 ) on JN1 is different from finalized segment 
> edits_151-153 on JN2/JN3.
>
> Please let me know if i have gone wrong some where, and this situation 
> is taken care of.
>
> Thanks,
> Giridhar.


Re: Doubt Regarding QJM protocol - example 2.10.6 of Quorum-Journal Design document

Posted by Ulul <ha...@ulul.org>.
Hi

A developer should answer that but a quick look to an edit file with od 
suggests that record are not fixed length. So maybe the likeliness of 
the situation you suggest is so low that there is no need to check more 
than file size

Ulul

Le 28/09/2014 11:17, Giridhar Addepalli a écrit :
> Hi All,
>
> I am going through Quorum Journal Design document.
>
> It is mentioned in Section 2.8 - In Accept Recovery RPC section
> "
> If the current on-disk log is missing, or a /different length /than 
> the proposed recovery, the JN downloads the log from the provided URI, 
> replacing any current copy of the log segment.
> "
>
> I can see it that the code follows above design
>
> Source :: Journal.java
>              ....
>
>       public synchronized void acceptRecovery(RequestInfo reqInfo,
>           SegmentStateProto segment, URL fromUrl)
>           throws IOException {
>
>           ....
>           if (currentSegment == null ||
>             currentSegment.getEndTxId() != segment.getEndTxId()) {
>           ....
>           } else {
>           LOG.info("Skipping download of log " +
>               TextFormat.shortDebugString(segment) +
>               ": already have up-to-date logs");
>           }
>           ....
>       }
>     ....
>
> My question is what if on-disk log is present and is of /same length 
> /as the proposed recovery
>
> If JournalNode is skipping download because the logs are of same 
> length, then we could end up in a situation where finalized log 
> segments contain different data !
>
> This could happen if we follow example 2.10.6
>
> As per that example we write transactions (151-153 ) on JN1
> then when recovery proceeded with only JN2 & JN3 let us assume that we 
> write again /different transactions/ as (151-153) . Then after the 
> crash when we run recovery , JN1 will skip downloading correct segment 
> from JN2/JN3 as it thinks it has correct segment( as per the code 
> pasted above). This will result in a situation where finalized segment 
> ( edits_151-153 ) on JN1 is different from finalized segment 
> edits_151-153 on JN2/JN3.
>
> Please let me know if i have gone wrong some where, and this situation 
> is taken care of.
>
> Thanks,
> Giridhar.


Re: Doubt Regarding QJM protocol - example 2.10.6 of Quorum-Journal Design document

Posted by Ulul <ha...@ulul.org>.
Hi

A developer should answer that but a quick look to an edit file with od 
suggests that record are not fixed length. So maybe the likeliness of 
the situation you suggest is so low that there is no need to check more 
than file size

Ulul

Le 28/09/2014 11:17, Giridhar Addepalli a écrit :
> Hi All,
>
> I am going through Quorum Journal Design document.
>
> It is mentioned in Section 2.8 - In Accept Recovery RPC section
> "
> If the current on-disk log is missing, or a /different length /than 
> the proposed recovery, the JN downloads the log from the provided URI, 
> replacing any current copy of the log segment.
> "
>
> I can see it that the code follows above design
>
> Source :: Journal.java
>              ....
>
>       public synchronized void acceptRecovery(RequestInfo reqInfo,
>           SegmentStateProto segment, URL fromUrl)
>           throws IOException {
>
>           ....
>           if (currentSegment == null ||
>             currentSegment.getEndTxId() != segment.getEndTxId()) {
>           ....
>           } else {
>           LOG.info("Skipping download of log " +
>               TextFormat.shortDebugString(segment) +
>               ": already have up-to-date logs");
>           }
>           ....
>       }
>     ....
>
> My question is what if on-disk log is present and is of /same length 
> /as the proposed recovery
>
> If JournalNode is skipping download because the logs are of same 
> length, then we could end up in a situation where finalized log 
> segments contain different data !
>
> This could happen if we follow example 2.10.6
>
> As per that example we write transactions (151-153 ) on JN1
> then when recovery proceeded with only JN2 & JN3 let us assume that we 
> write again /different transactions/ as (151-153) . Then after the 
> crash when we run recovery , JN1 will skip downloading correct segment 
> from JN2/JN3 as it thinks it has correct segment( as per the code 
> pasted above). This will result in a situation where finalized segment 
> ( edits_151-153 ) on JN1 is different from finalized segment 
> edits_151-153 on JN2/JN3.
>
> Please let me know if i have gone wrong some where, and this situation 
> is taken care of.
>
> Thanks,
> Giridhar.