You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Uma Maheswara Rao G (Jira)" <ji...@apache.org> on 2021/10/06 05:23:00 UTC
[jira] [Updated] (HDDS-5822) EC: Writing a large buffer to an EC file duplicates first chunk in block 1 and 2

     [ https://issues.apache.org/jira/browse/HDDS-5822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uma Maheswara Rao G updated HDDS-5822:
--------------------------------------
    Summary: EC: Writing a large buffer to an EC file duplicates first chunk in block 1 and 2  (was: Writing a large buffer to an EC file duplicates first chunk in block 1 and 2)

> EC: Writing a large buffer to an EC file duplicates first chunk in block 1 and 2
> --------------------------------------------------------------------------------
>
>                 Key: HDDS-5822
>                 URL: https://issues.apache.org/jira/browse/HDDS-5822
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Stephen O'Donnell
>            Assignee: Uma Maheswara Rao G
>            Priority: Major
>              Labels: pull-request-available
>
> If you write a large buffer of data containing several chunks of data like this:
> {code}
> byte[] inputData = new byte[dataLength];
>     RAND.nextBytes(inputData);
>     for (byte b : inputData) {
>       key.write(b);
>     }
> {code}
> Then the current EC Key logic will write the first chunk twice to block 1 and block 2 and then probably (I have not verified) drop the last chunk completely.
> This is due to a bug in ECKeyOutputStream.write(…):
> {code}
>  int currentChunkBufferRemainingLength =
>         ecChunkBufferCache.dataBuffers[blockOutputStreamEntryPool.getCurrIdx()]
>             .remaining();
>     int currentChunkBufferLen =
>         ecChunkBufferCache.dataBuffers[blockOutputStreamEntryPool.getCurrIdx()]
>             .position();
>     int maxLenToCurrChunkBuffer = (int) Math.min(len, ecChunkSize);
>     int currentWriterChunkLenToWrite =
>         Math.min(currentChunkBufferRemainingLength, maxLenToCurrChunkBuffer);
>     int pos = handleDataWrite(blockOutputStreamEntryPool.getCurrIdx(), b, off,
>         currentWriterChunkLenToWrite,
>         currentChunkBufferLen + currentWriterChunkLenToWrite == ecChunkSize);
>     checkAndWriteParityCells(pos);
>     int remLen = len - currentWriterChunkLenToWrite;
>     int iters = remLen / ecChunkSize;
>     int lastCellSize = remLen % ecChunkSize;
>     while (iters > 0) {
>       pos = handleDataWrite(blockOutputStreamEntryPool.getCurrIdx(), b, off,
>           ecChunkSize, true);
>       off += ecChunkSize;
>       iters--;
>       checkAndWriteParityCells(pos);
>     }
> {code}
> Here we write the first chunk before entering the "iters" loop, but we forget to increment "off" which results in the same data getting written twice.
> We need to add "currentWriterChunkLenToWrite" to "off" before entering the loop.
> We should add a test to reproduce this issue and then add the fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org