You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/08/28 10:06:20 UTC

[GitHub] [iceberg] ldwnt opened a new issue, #5652: DecoderResolver may lead to OOM of flink jobs writing to iceberg tables

ldwnt opened a new issue, #5652:
URL: https://github.com/apache/iceberg/issues/5652

   ### Apache Iceberg version
   
   0.13.1
   
   ### Query engine
   
   Flink
   
   ### Please describe the bug 🐞
   
   I have a flink job writing to iceberg tables. When the job runs for several days, an OOM occurs. The reason is described below:
   
   The DecoderResolver holds a ThreadLocal variable of a two-layer map:
   ```
       private static final ThreadLocal<Map<Schema, Map<Schema, ResolvingDecoder>>> DECODER_CACHES = ThreadLocal.withInitial(() -> {
           return (new MapMaker()).weakKeys().makeMap();
       });
   
       private static ResolvingDecoder resolve(Decoder decoder, Schema readSchema, Schema fileSchema) throws IOException {
           Map<Schema, Map<Schema, ResolvingDecoder>> cache = (Map)DECODER_CACHES.get();
           Map<Schema, ResolvingDecoder> fileSchemaToResolver = (Map)cache.computeIfAbsent(readSchema, (k) -> {
               return Maps.newHashMap();
           });
   ...
       }
   ```
   
   The outer map has a weak key while the inner map has a strong one. As the inner map holds a reference to a Schema object, the outer map holding the same weak reference to the Schema object will not release the weak key. That leads to the OOM.
   
   What I suggest is to change the inner map to one with weak key, too:
   ```
           Map<Schema, ResolvingDecoder> fileSchemaToResolver = (Map)cache.computeIfAbsent(readSchema, (k) -> {
   //            return Maps.newHashMap();
               return new WeakHashMap<>();
           });
   ```
   
   So far it seems working with my jobs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on issue #5652: DecoderResolver may lead to OOM of flink jobs writing to iceberg tables

Posted by GitBox <gi...@apache.org>.
stevenzwu commented on issue #5652:
URL: https://github.com/apache/iceberg/issues/5652#issuecomment-1238829266

   @ldwnt while your analysis on the inner map makes send to me, I have a couple of questions. 
   
   - shouldn't `DecoderResolver` only be used by reader (not writer)?
   - I assume you confirmed the memory issue from a heap dump? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on issue #5652: DecoderResolver may lead to OOM of flink jobs writing to iceberg tables

Posted by GitBox <gi...@apache.org>.
stevenzwu commented on issue #5652:
URL: https://github.com/apache/iceberg/issues/5652#issuecomment-1264014228

   > I have a job writing to around 100 iceberg tables. The table schema barely changes, but snapshots and manifests are continuously generated due to incoming data.
   
   We also have a similar setup internally. We didn't run into OOM issue. Not sure what's the difference here.
   
   Checking your heap dump. The whole map has retained memory of 109 MB, which doesn't seem too much? is that the main cause of the OOM?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #5652: DecoderResolver may lead to OOM of flink jobs writing to iceberg tables

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #5652:
URL: https://github.com/apache/iceberg/issues/5652#issuecomment-1498316253

   This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ldwnt commented on issue #5652: DecoderResolver may lead to OOM of flink jobs writing to iceberg tables

Posted by GitBox <gi...@apache.org>.
ldwnt commented on issue #5652:
URL: https://github.com/apache/iceberg/issues/5652#issuecomment-1246837352

   > * shouldn't `DecoderResolver` only be used by reader (not writer)?
   I'm not familiar with Iceberg source code, but are meta files (manifest list/manifest) read during writing and that involves DecoderResolver?
   > * I assume you confirmed the memory issue from a heap dump?
   Yes, please refer to the snapshot below:
   ![image](https://user-images.githubusercontent.com/7655486/190180075-aa92912f-fc2c-4bf6-84ae-5aaed85fcb1e.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ldwnt commented on issue #5652: DecoderResolver may lead to OOM of flink jobs writing to iceberg tables

Posted by GitBox <gi...@apache.org>.
ldwnt commented on issue #5652:
URL: https://github.com/apache/iceberg/issues/5652#issuecomment-1262117284

   > @ldwnt can you create a PR for the weak reference fix?
   
   Sure, I'll cerate one. 
   
   > Regarding the OOM, does your use case have many different read schemas and file schemas?
   
   I have a job writing to around 100 iceberg tables. The table schema barely changes, but snapshots and manifests are continuously generated due to incoming data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on issue #5652: DecoderResolver may lead to OOM of flink jobs writing to iceberg tables

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on issue #5652:
URL: https://github.com/apache/iceberg/issues/5652#issuecomment-1580338593

   We have met a similar problem and I have submitted a PR for this. @stevenzwu Could you help to review it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] magus0219 commented on issue #5652: DecoderResolver may lead to OOM of flink jobs writing to iceberg tables

Posted by GitBox <gi...@apache.org>.
magus0219 commented on issue #5652:
URL: https://github.com/apache/iceberg/issues/5652#issuecomment-1260314780

   @ldwnt @stevenzwu 
   I meet the same issue and hope for the official fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] takeono commented on issue #5652: DecoderResolver may lead to OOM of flink jobs writing to iceberg tables

Posted by "takeono (via GitHub)" <gi...@apache.org>.
takeono commented on issue #5652:
URL: https://github.com/apache/iceberg/issues/5652#issuecomment-1507848954

   Our flink app that commits every minute causes OOM after a month. This may be because each commit generates a avro reader/writer to read/writer the manifest.
   
   With the current implementation, when readSchema and fileSchema are the same reference (this can happen), Schema object is not released even if the reference holder disappears. This is because the reference continues to exist as a key for the inner map.
   
   So the fileSchema used as the key should be a cloned one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on issue #5652: DecoderResolver may lead to OOM of flink jobs writing to iceberg tables

Posted by GitBox <gi...@apache.org>.
stevenzwu commented on issue #5652:
URL: https://github.com/apache/iceberg/issues/5652#issuecomment-1260387381

   @ldwnt can you create a PR for the weak reference fix?
   
   Regarding the OOM, does your use case have many different read schemas and file schemas?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary closed issue #5652: DecoderResolver may lead to OOM of flink jobs writing to iceberg tables

Posted by "pvary (via GitHub)" <gi...@apache.org>.
pvary closed issue #5652: DecoderResolver may lead to OOM of flink jobs writing to iceberg tables
URL: https://github.com/apache/iceberg/issues/5652


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org