You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2019/07/03 15:23:56 UTC
[GitHub] [incubator-hudi] smdahmed opened a new issue #776: Incorrect
averageBytesPerRecord Causes OOM
smdahmed opened a new issue #776: Incorrect averageBytesPerRecord Causes OOM
URL: https://github.com/apache/incubator-hudi/issues/776
Historically I see an issue that has been closed at: https://github.com/apache/incubator-hudi/issues/270.
I am not sure what the fix was for the above issue.
I have hit the issue today. Lets say there are about thousands of records but none get written (which may happen in my case as we want to selectively write records). This results total records written to 0 leading to avgSize going to Infinity.
``
scala> val l = Math.ceil( 1.0 / 0 )
l: Double = Infinity
scala> val l = Math.ceil( 1.0 / 0 ).toLong
l: Long = 9223372036854775807
``
This causes OOM.
``
protected long averageBytesPerRecord() {
long avgSize = 0L;
HoodieTimeline commitTimeline = metaClient.getActiveTimeline().getCommitTimeline()
.filterCompletedInstants();
try {
if (!commitTimeline.empty()) {
HoodieInstant latestCommitTime = commitTimeline.lastInstant().get();
HoodieCommitMetadata commitMetadata = HoodieCommitMetadata
.fromBytes(commitTimeline.getInstantDetails(latestCommitTime).get(), HoodieCommitMetadata.class);
avgSize = (long) Math.ceil(
(1.0 * commitMetadata.fetchTotalBytesWritten()) / commitMetadata
.fetchTotalRecordsWritten());
}
} catch (Throwable t) {
// make this fail safe.
logger.error("Error trying to compute average bytes/record ", t);
}
return avgSize <= 0L ? config.getCopyOnWriteRecordSizeEstimate() : avgSize;
}
``
I have now managed to work around it by editing the last line in the code as below.
return (avgSize <= 0L | avgSize >= Integer.MAX_VALUE) ? config.getCopyOnWriteRecordSizeEstimate() : avgSize;
But I believe someone more knowledgeable about this should take a look at it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services