You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Yongjun Zhang (JIRA)" <ji...@apache.org> on 2016/09/30 23:24:20 UTC
[jira] [Created] (HDFS-10942) Incorrect handling of flushing edit logs to JN

Yongjun Zhang created HDFS-10942:
------------------------------------

             Summary: Incorrect handling of flushing edit logs to JN
                 Key: HDFS-10942
                 URL: https://issues.apache.org/jira/browse/HDFS-10942
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: s, hdfs
            Reporter: Yongjun Zhang


We use EditsDoubleBuffer to handle edit logs:
{code}
/**
 * A double-buffer for edits. New edits are written into the first buffer
 * while the second is available to be flushed. Each time the double-buffer
 * is flushed, the two internal buffers are swapped. This allows edits
 * to progress concurrently to flushes without allocating new buffers each
 * time.
 */
{code}

With the following code, that flush the ready buffer, it copy the ready buffer to a local copy, then flush.

{code}

QuarumOutputStream (buf in the code below is an instance of EditsDoubleBuffer):

  @Override
  protected void flushAndSync(boolean durable) throws IOException {
    int numReadyBytes = buf.countReadyBytes();
    if (numReadyBytes > 0) {
      int numReadyTxns = buf.countReadyTxns();
      long firstTxToFlush = buf.getFirstReadyTxId();

      assert numReadyTxns > 0;

      // Copy from our double-buffer into a new byte array. This is for
      // two reasons:
      // 1) The IPC code has no way of specifying to send only a slice of
      //    a larger array.
      // 2) because the calls to the underlying nodes are asynchronous, we
      //    need a defensive copy to avoid accidentally mutating the buffer
      //    before it is sent.
      DataOutputBuffer bufToSend = new DataOutputBuffer(numReadyBytes);
      buf.flushTo(bufToSend);
      assert bufToSend.getLength() == numReadyBytes;
      byte[] data = bufToSend.getData();
      assert data.length == bufToSend.getLength();
{code}

 The above call doesn't seem to prevent the orginal copy of the buffer inside buf from being swapped by  the following method. 

{code}
EditsDoubleBuffer:

 public void setReadyToFlush() {
    assert isFlushed() : "previous data not flushed yet";
    TxnBuffer tmp = bufReady;
    bufReady = bufCurrent;
    bufCurrent = tmp;
  }

{code}

Though we have some runtime assertion in the code, the assertion is not enabled in production, so the expected condition in the assert may be false at runtime. This would possibly cause a mess.  When any condition is not as expected by the assertion, it seems a real exception should be thrown instead.

So two issues in summary:
- How we synchronize between the flush and the swap of the two buffers
- Whether we should throw real exception instead of the assert that's disabled at runtime normally.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org