You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "gaoshihang (via GitHub)" <gi...@apache.org> on 2023/02/16 10:17:01 UTC

[GitHub] [hudi] gaoshihang opened a new issue, #7979: [SUPPORT]Hudi-cli cleans show OOM

gaoshihang opened a new issue, #7979:
URL: https://github.com/apache/hudi/issues/7979

   I use hudi-cli(0.11.1 version) to do cleans show command, and I get a OOM exception:
   ```
   hudi:ds_segments->cleans show
   2023-02-15 02:33:00,699 INFO timeline.HoodieActiveTimeline: Loaded instants upto : Option{val=[20230214035435843__clean__COMPLETED]}
   Command failed java.lang.OutOfMemoryError: Java heap space
   Exception in thread "Spring Shell" java.lang.OutOfMemoryError: Java heap space
       at java.lang.StringCoding.decode(StringCoding.java:215)
       at java.lang.String.<init>(String.java:463)
       at org.apache.avro.util.Utf8.toString(Utf8.java:158)
       at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:322)
       at org.apache.avro.io.ResolvingDecoder.readString(ResolvingDecoder.java:219)
       at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:456)
       at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:191)
       at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:298)
       at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:183)
       at org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:136)
       at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
       at org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
       at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
       at org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:354)
       at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:185)
       at org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:136)
       at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
       at org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
       at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
       at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
       at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
       at org.apache.avro.file.DataFileStream.next(DataFileStream.java:251)
       at org.apache.avro.file.DataFileStream.next(DataFileStream.java:236)
       at [org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:206](http://org.apache.hudi.common.table.timeline.timelinemetadatautils.deserializeavrometadata%28timelinemetadatautils.java:206/))
       at [org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeHoodieCleanMetadata(TimelineMetadataUtils.java:170](http://org.apache.hudi.common.table.timeline.timelinemetadatautils.deserializehoodiecleanmetadata%28timelinemetadatautils.java:170/))
       at [org.apache.hudi.cli.commands.CleansCommand.showCleans(CleansCommand.java:74](http://org.apache.hudi.cli.commands.cleanscommand.showcleans%28cleanscommand.java:74/))
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:498)
       at org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:216)
       at org.springframework.shell.core.SimpleExecutionStrategy.invoke(SimpleExecutionStrategy.java:68)
   2023-02-15 02:36:14,433 INFO support.GenericApplicationContext: Closing org.springframework.context.support.GenericApplicationContext@47ef968d: startup date [Wed Feb 15 02:32:39 UTC 2023]; root of context hierarchy
   2023-02-15 02:36:14,435 INFO support.DefaultLifecycleProcessor: Stopping beans in phase 1
   2023-02-15 02:36:14,441 INFO impl.MetricsSystemImpl: Stopping s3a-file-system metrics system...
   2023-02-15 02:36:14,441 INFO impl.MetricsSystemImpl: s3a-file-system metrics system stopped.
   2023-02-15 02:36:14,442 INFO impl.MetricsSystemImpl: s3a-file-system metrics system shutdown complete.
   ```
   
   Then I checked the code in CleansCommand.java and found that when I do cleans show, it will get all the clean first, and deserialize avro, which causes OOM.
   ```
       HoodieActiveTimeline activeTimeline = HoodieCLI.getTableMetaClient().getActiveTimeline();
       HoodieTimeline timeline = activeTimeline.getCleanerTimeline().filterCompletedInstants();
       List<HoodieInstant> cleans = timeline.getReverseOrderedInstants().collect(Collectors.toList());
       List<Comparable[]> rows = new ArrayList<>();
       **for (HoodieInstant clean : cleans) {
         HoodieCleanMetadata cleanMetadata = TimelineMetadataUtils.deserializeHoodieCleanMetadata(timeline.getInstantDetails(clean).get());
         rows.add(new Comparable[]{clean.getTimestamp(), cleanMetadata.getEarliestCommitToRetain(),
                 cleanMetadata.getTotalFilesDeleted(), cleanMetadata.getTimeTakenInMillis()});
         cleanMetadata = null;
       }**
   
       TableHeader header =
           new TableHeader().addTableHeaderField(HoodieTableHeaderFields.HEADER_CLEAN_TIME)
               .addTableHeaderField(HoodieTableHeaderFields.HEADER_EARLIEST_COMMAND_RETAINED)
               .addTableHeaderField(HoodieTableHeaderFields.HEADER_TOTAL_FILES_DELETED)
               .addTableHeaderField(HoodieTableHeaderFields.HEADER_TOTAL_TIME_TAKEN);
       return HoodiePrintHelper.print(header, new HashMap<>(), sortByField, descending, limit, headerOnly, rows);
   ```
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] SteNicholas commented on issue #7979: [SUPPORT]Hudi-cli cleans show OOM

Posted by "SteNicholas (via GitHub)" <gi...@apache.org>.
SteNicholas commented on issue #7979:
URL: https://github.com/apache/hudi/issues/7979#issuecomment-1436999462

   @gaoshihang, do you configure the `clean.retain_commits` so much?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] ad1happy2go commented on issue #7979: [SUPPORT]Hudi-cli cleans show OOM

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #7979:
URL: https://github.com/apache/hudi/issues/7979#issuecomment-1558731971

   @gaoshihang Gentle ping.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] gaoshihang commented on issue #7979: [SUPPORT]Hudi-cli cleans show OOM

Posted by "gaoshihang (via GitHub)" <gi...@apache.org>.
gaoshihang commented on issue #7979:
URL: https://github.com/apache/hudi/issues/7979#issuecomment-1434163875

   Thanks! I will add some log and do some test to identify which clean instant causes this exception and how large it is.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on issue #7979: [SUPPORT]Hudi-cli cleans show OOM

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on issue #7979:
URL: https://github.com/apache/hudi/issues/7979#issuecomment-1433413065

   Hi @gaoshihang thanks for reporting this.  Are able to identify which clean instant causes the OOM exception?  How large are the `<instant_time>.clean` files under `.hoodie/` folder?  I'm wondering if leveraging Spark to deserialize the clean metadata is going to help here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] ad1happy2go commented on issue #7979: [SUPPORT]Hudi-cli cleans show OOM

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #7979:
URL: https://github.com/apache/hudi/issues/7979#issuecomment-1508910481

   @gaoshihang Did you hot the chance to work on above. Are you still facing this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org