You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "LakshSingla (via GitHub)" <gi...@apache.org> on 2023/03/29 03:01:42 UTC

[GitHub] [druid] LakshSingla commented on a diff in pull request #13981: Eagerly fetching remote s3 files leading to out of disk (OOD)

LakshSingla commented on code in PR #13981:
URL: https://github.com/apache/druid/pull/13981#discussion_r1151335632


##########
processing/src/main/java/org/apache/druid/frame/channel/ReadableInputStreamFrameChannel.java:
##########
@@ -40,6 +40,7 @@
  */
 public class ReadableInputStreamFrameChannel implements ReadableFrameChannel
 {
+  private boolean toStart = true;

Review Comment:
   ```suggestion
     private boolean started = true;
   ```
   `toStart` seems unclear for a boolean, something like `started` or `hasStarted` seems appropriate



##########
extensions-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/output/S3StorageConnector.java:
##########
@@ -142,6 +144,12 @@ public boolean hasMoreElements()
       @Override
       public InputStream nextElement()
       {
+        // since Sequence input stream calls nextElement in the constructor, we start chunking as soon as we call read.
+        // to avoid that we pass a nullInputStream for the first iteration.
+        if (!initStream) {

Review Comment:
   I was thinking if there can be a slightly better way to achieve this, instead of this one off handling. The issue is arising because `nextElement()` prefetches the InputStream which should return the data, however the S3StorageConnector is also chunking the file simultaneously.
   
   If we create a new InputStream that acts like a wrapper over this initializing code ( which copies to the file, that is present in the `nextElement`) we won't require this `initStream` flag and special handling.
   
   ```
             InputStream x = new InputStream()
             {
   
               @Override
               public int read() throws IOException
               {
               File f = chunkFromS3(range); // Implement this
               FileInputStream fis = new FileInputStream(f);
               this.fis = fis;
               return fis.read();
               }
             };
   ```
   
   One advantage of the above approach would be that we wouldn't be prefetching a chunk. (When read() is called in the SequenceInputStream, it calls the `nextStream()` to prepare the next element in the iteration. Since the code change handles only the initialization, we would be prefetching before we call read() on the next element. There might not be a huge benefit to this though if we are reading the S3 object continuously).



##########
processing/src/main/java/org/apache/druid/frame/channel/ReadableInputStreamFrameChannel.java:
##########
@@ -150,6 +152,10 @@ public void close()
 
   private void startReading()
   {
+    if (!toStart) {

Review Comment:
   Is this change required, once the cause is fixed in the `S3StorageConnector`? (If so, then can you document the behavior that we buffer the input stream lazily)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org