You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Dong Li (JIRA)" <ji...@apache.org> on 2015/11/12 14:44:10 UTC
[jira] [Comment Edited] (HAWQ-155) Out of range access to the hdfs
file as scan a large tuple
[ https://issues.apache.org/jira/browse/HAWQ-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002077#comment-15002077 ]
Dong Li edited comment on HAWQ-155 at 11/12/15 1:43 PM:
--------------------------------------------------------
I suggest to modify the codes as follows.
The code before: cdbappendonlystorageread.c:731
{code}
fileRemainderLen = storageRead->bufferedRead.fileLen -
*headerOffsetInFile;
{code}
The code should be:
{code}
if (isUseSplitLen)
fileRemainderLen = storageRead->bufferedRead.splitLen -
*headerOffsetInFile;
else
fileRemainderLen = storageRead->bufferedRead.fileLen -
*headerOffsetInFile;
{code}
was (Author: doli):
I suggest to modify the code as follow.
The code before: cdbappendonlystorageread.c:731
{code}
fileRemainderLen = storageRead->bufferedRead.fileLen -
*headerOffsetInFile;
{code}
The code should be:
{code}
if (isUseSplitLen)
fileRemainderLen = storageRead->bufferedRead.splitLen -
*headerOffsetInFile;
else
fileRemainderLen = storageRead->bufferedRead.fileLen -
*headerOffsetInFile;
{code}
> Out of range access to the hdfs file as scan a large tuple
> -----------------------------------------------------------
>
> Key: HAWQ-155
> URL: https://issues.apache.org/jira/browse/HAWQ-155
> Project: Apache HAWQ
> Issue Type: Bug
> Reporter: Dong Li
> Assignee: Lei Chang
>
> This may occur when your cluster has more than one physical segments.
> 1. Set guc value "appendonly_split_write_size_mb"
> hawq config -c appendonly_split_write_size_mb -v 2
> 2.Run sql
> {code}
> set default_segment_num=1;
> create table eightbytleft_for_readsplit(str varchar) with (appendonly=true,blocksize=2097152,checksum=true);
> insert into eightbytleft_for_readsplit select repeat('a',2097136*63-8);
> insert into eightbytleft_for_readsplit select repeat('a',2097136*63-8);
> TRUNCATE table eightbytleft_for_readsplit ;
> insert into eightbytleft_for_readsplit select repeat('a',2097136*63-12-8);
> insert into eightbytleft_for_readsplit select repeat('a',2097136*63-12-8);
> {code}
> When run
> {code}
> select count(*) from eightbytleft_for_readsplit;
> {code}
> ERROR: Header checksum does not match. Expected 0x0 and found 0xA92A344A headerOffsetInFile is134217728 overallBlockLen is 0 (cdbappendonlystorageread.c:913) (seg0 test3:31100 pid=7878) (dispatcher.c:1700)
> DETAIL:
> Append-Only storage header kind 0 unknown
> Scan of Append-Only Row-Oriented relation 'eightbytleft_for_readsplit'. Append-Only segment file 'hdfs://test5:9000/hawq/hawq-1447309068/16385/16532/17522/1', block header offset in file = 134217728, bufferCount 65
> More specifically, The two large tuple is on two appendonly_read_split, as each of them is 128MB-8BYTE large, and the last 8 bytes are filed with zeros.
> when a segment scan the first tuple and finish scan the tuple, it will scan the 8 bytes zero, as these are zeros, it skipped zeros. That is ok.
> But the code in cdbappendonlystorageread.c:731 has some problems.
> {code}
> if (i > 0)
> {
> if (storageRead->storageAttributes.version == AORelationVersion_Original)
> {
> i = i / 4 * 4;
> }
> else if (storageRead->storageAttributes.version == AORelationVersion_Aligned64bit)
> {
> i = i / 8 * 8;
> }
> *headerOffsetInFile += i;
> *header += i;
> storageRead->bufferedRead.bufferOffset += i;
> }
> /*
> * Determine the maximum boundary of the block.
> * UNDONE: When we have a block directory, we will tighten the limit down.
> */
> fileRemainderLen = storageRead->bufferedRead.fileLen -
> *headerOffsetInFile;
> if (storageRead->maxBufferLen > fileRemainderLen)
> *blockLimitLen = (int32)fileRemainderLen;
> else
> *blockLimitLen = storageRead->maxBufferLen;
> return (*blockLimitLen > 0);
> {code}
> The fileRemainderLen is calculated wrong. As it should be the spiltlen-*headerOffsetInFile if the isUseSplitLen is true.
> At the moment, fileLen = 268435456, splitLen = 134217728 , isUseSplitLen=0x01
> And the problem is because of a out of range access to the hdfs file.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)