You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Wenzhe Zhou (Jira)" <ji...@apache.org> on 2021/03/25 06:05:00 UTC

[jira] [Commented] (IMPALA-10607) TestDecimalOverflowExprs::test_ctas_exprs failed in S3 build

    [ https://issues.apache.org/jira/browse/IMPALA-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308405#comment-17308405 ] 

Wenzhe Zhou commented on IMPALA-10607:
--------------------------------------

When tried to read the table after CTAS failed, got following error "Query aborted:Parquet file s3a://impala-test-uswest2-1/test-warehouse/test_ctas_exprs_7304e515.db/overflowed_decimal_tbl_1/b74f0ce129189cf1-4c3c5bd600000000_1609291350_data.0.parq has an invalid file length: 4".  The query ended up with a corrupt table on S3 when the CTAS  finished with error. It seems the Parquet file is not finalized on S3 when the query was aborted. 

That sounds like a bug. It's low priority since the table isn't expected to have meaningful contents anyways.

Before the patch for IMPALA-10564 was merged, CTAS with selection from other source table (for example, create table t11 as select id, cast(a*b*c as decimal (28,10)) from t10) fails when there is decimal overflow. Verified that we got same error on S3 when tried to access the table after CTAS failed. So this is NOT a new issue.

When HdfsParquetTableWriter::AppendRows() return an error,  HdfsTableSink::WriteRowsToPartition return error without calling 

HdfsTableSink::FinalizePartitionFile() so that HdfsParquetTableWriter::Finalize() is not called. This could cause data file corruption. It's tricky to fix the issue. If HdfsParquetTableWriter::Finalize() is called, NULL will be wrote to table. But we don't expect to insert into the table.

> TestDecimalOverflowExprs::test_ctas_exprs failed in S3 build
> ------------------------------------------------------------
>
>                 Key: IMPALA-10607
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10607
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.0
>            Reporter: Wenzhe Zhou
>            Assignee: Wenzhe Zhou
>            Priority: Major
>
> TestDecimalOverflowExprs::test_ctas_exprs failed in S3 build
> Stack trace:
> Stack trace for S3 build. [https://master-03.jenkins.cloudera.com/job/impala-cdpd-master-staging-core-s3/34/]
> query_test.test_decimal_queries.TestDecimalOverflowExprs.test_ctas_exprs[protocol: beeswax | exec_option: \\{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] (from pytest)
> Failing for the past 1 build (Since Failed#34 )
> Took 13 sec.
> Error Message
> ImpalaBeeswaxException: ImpalaBeeswaxException: Query aborted:Parquet file s3a://impala-test-uswest2-1/test-warehouse/test_ctas_exprs_7304e515.db/overflowed_decimal_tbl_1/b74f0ce129189cf1-4c3c5bd600000000_1609291350_data.0.parq has an invalid file length: 4
> Stacktrace
> query_test/test_decimal_queries.py:170: in test_ctas_exprs
> "SELECT count(*) FROM %s" % TBL_NAME_1)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/common/impala_test_suite.py:814: in wrapper
> return function(*args, **kwargs)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/common/impala_test_suite.py:822: in execute_query_expect_success
> result = cls.__execute_query(impalad_client, query, query_options, user)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/common/impala_test_suite.py:923: in __execute_query
> return impalad_client.execute(query, user=user)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/common/impala_connection.py:205: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/beeswax/impala_beeswax.py:187: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/beeswax/impala_beeswax.py:365: in __execute_query
> self.wait_for_finished(handle)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/beeswax/impala_beeswax.py:386: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E ImpalaBeeswaxException: ImpalaBeeswaxException:
> E Query aborted:Parquet file s3a://impala-test-uswest2-1/test-warehouse/test_ctas_exprs_7304e515.db/overflowed_decimal_tbl_1/b74f0ce129189cf1-4c3c5bd600000000_1609291350_data.0.parq has an invalid file length: 4
> Standard Error
> SET client_identifier=query_test/test_decimal_queries.py::TestDecimalOverflowExprs::()::test_ctas_exprs[protocol:beeswax|exec_option:\{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_threshold':0};
> SET sync_ddl=False;
> – executing against localhost:21000
> DROP DATABASE IF EXISTS `test_ctas_exprs_7304e515` CASCADE;
> – 2021-03-24 03:56:00,840 INFO MainThread: Started query 574a532f47ac7c80:c1c62ae000000000
> SET client_identifier=query_test/test_decimal_queries.py::TestDecimalOverflowExprs::()::test_ctas_exprs[protocol:beeswax|exec_option:\{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_threshold':0};
> SET sync_ddl=False;
> – executing against localhost:21000
> CREATE DATABASE `test_ctas_exprs_7304e515`;
> – 2021-03-24 03:56:03,120 INFO MainThread: Started query 424b970f206e271f:ade0b52400000000
> – 2021-03-24 03:56:03,121 INFO MainThread: Created database "test_ctas_exprs_7304e515" for test ID "query_test/test_decimal_queries.py::TestDecimalOverflowExprs::()::test_ctas_exprs[protocol: beeswax | exec_option: \\{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none]"
> – executing against localhost:21000
> SET decimal_v2=true;
> – 2021-03-24 03:56:03,126 INFO MainThread: Started query 4545d8b9db5e9342:8b3ba57000000000
> – executing against localhost:21000
> DROP TABLE IF EXISTS `test_ctas_exprs_7304e515`.`overflowed_decimal_tbl_1`;
> – 2021-03-24 03:56:03,131 INFO MainThread: Started query 2c4bc9fc85e2b8e8:05e35eed00000000
> SET client_identifier=query_test/test_decimal_queries.py::TestDecimalOverflowExprs::()::test_ctas_exprs[protocol:beeswax|exec_option:\{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_threshold':0};
> – executing against localhost:21000
> use functional_parquet;
> – 2021-03-24 03:56:03,135 INFO MainThread: Started query 38403231c3885691:b0ba2cc400000000
> SET client_identifier=query_test/test_decimal_queries.py::TestDecimalOverflowExprs::()::test_ctas_exprs[protocol:beeswax|exec_option:\{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_threshold':0};
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=False;
> SET abort_on_error=1;
> SET exec_single_node_rows_threshold=0;
> – executing against localhost:21000
> CREATE TABLE `test_ctas_exprs_7304e515`.`overflowed_decimal_tbl_1` STORED AS PARQUET AS SELECT 1 as i, cast(a*a*a as decimal (28,10)) as d_28 FROM (SELECT cast(654964569154.9565 as decimal (28,7)) as a) q;
> – 2021-03-24 03:56:03,399 INFO MainThread: Started query b74f0ce129189cf1:4c3c5bd600000000
> – executing against localhost:21000
> SELECT count(*) FROM `test_ctas_exprs_7304e515`.`overflowed_decimal_tbl_1`;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org