You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "huyuanfeng2018 (via GitHub)" <gi...@apache.org> on 2023/05/24 12:38:48 UTC

[GitHub] [iceberg] huyuanfeng2018 opened a new pull request, #7696: Parquet: Fix possible stream duplicate close issue when using DelegateingOutputStream

huyuanfeng2018 opened a new pull request, #7696:
URL: https://github.com/apache/iceberg/pull/7696

   When I used Flink to write to the iceberg table, I made a layer of encapsulation for the outputStream, which will eventually be converted into a HadoopPositionOutputStream object. It implements DelegatingOutputStream, and the HadoopPositionOutputStream object rewrites the finalize method. This method will try to close the stream, but there will be problems in parquetIO:
   
   ```
     static PositionOutputStream stream(org.apache.iceberg.io.PositionOutputStream stream) {
       if (stream instanceof DelegatingOutputStream) {
         OutputStream wrapped = ((DelegatingOutputStream) stream).getDelegate();
         if (wrapped instanceof FSDataOutputStream) {
           return HadoopStreams.wrap((FSDataOutputStream) wrapped);
         }
       }
       return new ParquetOutputStreamAdapter(stream);
     }
   ```
   Here, the Delegate will be taken out and repackaged, and the HadoopPositionOutputStream object will be discarded. At this time, the HadoopPositionOutputStream object may not be continuously referenced, and may be recycled during gc. If gc is triggered to recycle the HadoopPositionOutputStream object during the stream writing process, close will be called at this time. The method closes the stream, causing file corruption.
   
   This pr is used to solve this problem. Add takeDelegate to DelegatingOutputStream to take the Delegate out and set it to null to prevent finalize close
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org