You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2020/06/16 09:14:00 UTC

[jira] [Commented] (IMPALA-9831) TestScannersFuzzing::test_fuzz_alltypes() hits DCHECK in parquet-page-reader.cc

    [ https://issues.apache.org/jira/browse/IMPALA-9831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136461#comment-17136461 ] 

Quanlong Huang commented on IMPALA-9831:
----------------------------------------

Hit this in a precommit job: [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/2596]. The fatal message is
{code:java}
F0616 04:17:54.291808 53142 parquet-page-reader.cc:67] 694f212f226cd592:70f4923000000002] Check failed: col_end < file_desc.file_length (6911 vs. 6911){code}
In the console log, the first failure is
{code:java}
02:39:09.239 [gw5] FAILED query_test/test_decimal_fuzz.py::TestDecimalFuzz::test_width_bucket[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}] {code}

> TestScannersFuzzing::test_fuzz_alltypes() hits DCHECK in parquet-page-reader.cc
> -------------------------------------------------------------------------------
>
>                 Key: IMPALA-9831
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9831
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Critical
>              Labels: flaky
>
> In a recent precommit job, an Impalad crashed with the following DCHECK:
> {noformat}
> F0604 01:18:36.921769 30923 parquet-page-reader.cc:67] b64df3da7eea7c65:16f9c6e800000001] Check failed: col_end < file_desc.file_length (6820 vs. 6820) {noformat}
> The assert is checking that the end of a column is before the end of the file. This must be true, because the footer takes up space at the end of the file.
> The code for this DCHECK is:
> {noformat}
>   int64_t col_end = col_start + col_len;
>   // Already validated in ValidateColumnOffsets()
>   DCHECK_GT(col_end, 0);
>   DCHECK_LT(col_end, file_desc.file_length); <---------{noformat}
> This mentions that this was already validated in ParquetMetadataUtils::ValidateColumnOffsets(). That is where the problem is:
> {noformat}
> int64_t col_len = col_chunk.meta_data.total_compressed_size;
> int64_t col_end = col_start + col_len;
> if (col_end <= 0 || col_end > file_length) {
>   return Status(Substitute("Parquet file '$0': metadata is corrupt. Column $1 has "
>       "invalid column offsets (offset=$2, size=$3, file_size=$4).", filename, i,
>       col_start, col_len, file_length));
> }{noformat}
> The condition should be "col_end >= file_length".
> If we knew the size of the parquet footer, this check could be stricter as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org