You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Uma Maheswara Rao G (Jira)" <ji...@apache.org> on 2021/10/05 20:21:00 UTC
[jira] [Assigned] (HDDS-5822) Writing a large buffer to an EC file
duplicates first chunk in block 1 and 2
[ https://issues.apache.org/jira/browse/HDDS-5822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Uma Maheswara Rao G reassigned HDDS-5822:
-----------------------------------------
Assignee: Uma Maheswara Rao G
> Writing a large buffer to an EC file duplicates first chunk in block 1 and 2
> ----------------------------------------------------------------------------
>
> Key: HDDS-5822
> URL: https://issues.apache.org/jira/browse/HDDS-5822
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Stephen O'Donnell
> Assignee: Uma Maheswara Rao G
> Priority: Major
>
> If you write a large buffer of data containing several chunks of data like this:
> {code}
> byte[] inputData = new byte[dataLength];
> RAND.nextBytes(inputData);
> for (byte b : inputData) {
> key.write(b);
> }
> {code}
> Then the current EC Key logic will write the first chunk twice to block 1 and block 2 and then probably (I have not verified) drop the last chunk completely.
> This is due to a bug in ECKeyOutputStream.write(…):
> {code}
> int currentChunkBufferRemainingLength =
> ecChunkBufferCache.dataBuffers[blockOutputStreamEntryPool.getCurrIdx()]
> .remaining();
> int currentChunkBufferLen =
> ecChunkBufferCache.dataBuffers[blockOutputStreamEntryPool.getCurrIdx()]
> .position();
> int maxLenToCurrChunkBuffer = (int) Math.min(len, ecChunkSize);
> int currentWriterChunkLenToWrite =
> Math.min(currentChunkBufferRemainingLength, maxLenToCurrChunkBuffer);
> int pos = handleDataWrite(blockOutputStreamEntryPool.getCurrIdx(), b, off,
> currentWriterChunkLenToWrite,
> currentChunkBufferLen + currentWriterChunkLenToWrite == ecChunkSize);
> checkAndWriteParityCells(pos);
> int remLen = len - currentWriterChunkLenToWrite;
> int iters = remLen / ecChunkSize;
> int lastCellSize = remLen % ecChunkSize;
> while (iters > 0) {
> pos = handleDataWrite(blockOutputStreamEntryPool.getCurrIdx(), b, off,
> ecChunkSize, true);
> off += ecChunkSize;
> iters--;
> checkAndWriteParityCells(pos);
> }
> {code}
> Here we write the first chunk before entering the "iters" loop, but we forget to increment "off" which results in the same data getting written twice.
> We need to add "currentWriterChunkLenToWrite" to "off" before entering the loop.
> We should add a test to reproduce this issue and then add the fix.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org