You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Steven Phillips (JIRA)" <ji...@apache.org> on 2013/11/01 11:11:18 UTC

[jira] [Commented] (DRILL-274) Spooling batch buffer

    [ https://issues.apache.org/jira/browse/DRILL-274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811144#comment-13811144 ] 

Steven Phillips commented on DRILL-274:
---------------------------------------

I wrote up a quick description of what I'm thinking of doing.

class SpoolingRawFragmentBatchBuffer implements RawBatchBuffer {
  Queue<RawFragmentBatchWrap> buffer;
  private SpoolingManager spoolingManager;
  private boolean spool;

  public void enqueue(ConnectionThrottle throttle, RawFragmentBatch batch); // if !spool, wrap and add batch to queue. if spool, wrap, set available = false, add to spoolingManager
  public RawFragmentBatch getNext(); // get the RFB from the next RFBW on the queue. this will block if the RFB has not been recovered from disk yet.

  class RawFragmentBatchWrap {
    private RawFragmentBatch batch;
    private boolean available;

    RawFragmentBatch get(); // returns batch if available. blocks if not available until it available
    void writeToStream(OutputStream stream); // write RFB to output stream
    void readFromStream(InputStream stream); // reconstruct RFB from input stream
  }

  class SpoolingManager {
    Queue<RawFragmentBatchWrap> incoming;
    Queue<RawFragmentBatchWrap> outgoing;

    public addBatch(RawFragmentBatch batch); // add batch to incoming queue
    private void spool(); // starting spooling batches in incoming and moving to outgoing queue
    private void despool(); // start reconstructing batches in outgoing queue, removing them from queue when done
  }
}

This works similar to the UnlimitedRawBatchBuffer. But the incoming batches will get wrapped and added to the queue, even after we've started spooling. Once we've reached a threshold and we decide to start spooling, the batches will be wrapped and marked unavailable, and handed to the SpoolingManager, which works in a separate thread writing these batches to disk. Once we decide to stop spooling and start reading back, the SpoolingManager will handle this as well. It will close the OutputStream and open an InputStream. I will need to take care to handle the case where we may start spooling, start reading back, and then start spooling again before we have finished reading back. I think I will create a new file each time it starts spooling again. I don't want to have to worry about dealing with things like append, or trying to read from a file that is still open.

If RawBatchBuffer.getNext() is called and the next RFB is not available, it will block until it is available. Hopefully this won't be a problem because we will start reading back from disk well in advance so the data is available when it is called. But we may not be able to read back fast enough.

> Spooling batch buffer
> ---------------------
>
>                 Key: DRILL-274
>                 URL: https://issues.apache.org/jira/browse/DRILL-274
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Steven Phillips
>            Assignee: Steven Phillips
>
> incoming batches are currently queued up and held in memory without limit. If execution on a node is delayed, or is not able keep up with the rate of incoming batches, this could result in out of memory errors. We want to allow spooling incoming batches to disk once the total size of the batches in the queue reaches a threshold.



--
This message was sent by Atlassian JIRA
(v6.1#6144)