You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Yiqun Zhang (Jira)" <ji...@apache.org> on 2021/11/02 15:13:00 UTC

[jira] [Comment Edited] (ORC-1041) Fix TestDecompression.testLzoLong on Debian 11

    [ https://issues.apache.org/jira/browse/ORC-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437234#comment-17437234 ] 

Yiqun Zhang edited comment on ORC-1041 at 11/2/21, 3:12 PM:
------------------------------------------------------------

I found the source code snippet that caused the test to fail.
https://github.com/apache/orc/blob/05ff57b2618b56dd670ea9159c11ea00e166adc8/c%2B%2B/src/LzoDecompressor.cc#L342-L347
{code:java}
              while (output < matchOutputLimit) {
                *reinterpret_cast<int64_t*>(output) =
                  *reinterpret_cast<int64_t*>(matchAddress);
                matchAddress += SIZE_OF_LONG;
                output += SIZE_OF_LONG;
              }
{code}
The intent of this piece of code is to be decoding duplicate data, copying it over and over and then moving the pointer. 
`* reinterpret_cast<int64_t*>(output) = * reinterpret_cast<int64_t*>(matchAddress);`
The matchAddress position is catching up after the output. If they are close together, it is possible that the data used for this assignment was just assigned the last time in the loop. 
This statement seems to cause a compiler bug. Analyzing the decompressed data, the 8-byte copy seems to be split into multiple instructions and re-ordered.  

{code:java}
              while (output < matchOutputLimit) {
                memcpy(output, matchAddress, SIZE_OF_LONG);
                matchAddress += SIZE_OF_LONG;
                output += SIZE_OF_LONG;
              }
{code}
I used the `memcpy(output, matchAddress, SIZE_OF_LONG);` statement instead of the original method and tested it successfully.

Since there are many statements like `* reinterpret_cast<int64_t*>(output) = * reinterpret_cast<int64_t*>(matchAddress);` in LzoDecompressor.cc. I want to replace them uniformly, and  I would like to hear your opinion before making the PR.  : )  [~william] [~dongjoon] [~wgtmac]


was (Author: guiyankuang):
I found the source code snippet that caused the test to fail.
https://github.com/apache/orc/blob/05ff57b2618b56dd670ea9159c11ea00e166adc8/c%2B%2B/src/LzoDecompressor.cc#L342-L347
{code:java}
              while (output < matchOutputLimit) {
                *reinterpret_cast<int64_t*>(output) =
                  *reinterpret_cast<int64_t*>(matchAddress);
                matchAddress += SIZE_OF_LONG;
                output += SIZE_OF_LONG;
              }
{code}
The intent of this piece of code is to be decoding duplicate data, copying it over and over and then moving the pointer. 
`*reinterpret_cast<int64_t*>(output) = *reinterpret_cast<int64_t*>(matchAddress);`
The matchAddress position is catching up after the output. If they are close together, it is possible that the data used for this assignment was just assigned the last time in the loop. 
This statement seems to cause a compiler bug. Analyzing the decompressed data, the 8-byte copy seems to be split into multiple instructions and re-ordered.  

{code:java}
              while (output < matchOutputLimit) {
                memcpy(output, matchAddress, SIZE_OF_LONG);
                matchAddress += SIZE_OF_LONG;
                output += SIZE_OF_LONG;
              }
{code}
I used the `memcpy(output, matchAddress, SIZE_OF_LONG);` statement instead of the original method and tested it successfully.

Since there are many statements like `*reinterpret_cast<int64_t*>(output) = *reinterpret_cast<int64_t*>(matchAddress);` in LzoDecompressor.cc. I want to replace them uniformly, and  I would like to hear your opinion before making the PR.  : )  [~william] [~dongjoon] [~wgtmac]

> Fix TestDecompression.testLzoLong on Debian 11
> ----------------------------------------------
>
>                 Key: ORC-1041
>                 URL: https://issues.apache.org/jira/browse/ORC-1041
>             Project: ORC
>          Issue Type: Test
>          Components: C++
>    Affects Versions: 1.8.0
>            Reporter: William Hyun
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)