You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2020/07/17 09:47:00 UTC

[jira] [Commented] (IMPALA-9759) Revisit integration of snapshot dataload with s3guard

    [ https://issues.apache.org/jira/browse/IMPALA-9759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17159820#comment-17159820 ] 

Steve Loughran commented on IMPALA-9759:
----------------------------------------

+1 for unique keys. Otherwise: if you config s3guard to use etag version tracking, it will detect mismatch and when opening a file, retry awaiting the version it knows about to be available 

> Revisit integration of snapshot dataload with s3guard
> -----------------------------------------------------
>
>                 Key: IMPALA-9759
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9759
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Infrastructure
>    Affects Versions: Impala 4.0
>            Reporter: Joe McDonnell
>            Assignee: Sahil Takiar
>            Priority: Critical
>              Labels: broken-build, flaky
>
> Sometimes, the s3 jobs (which use s3guard for consistency) sees test failures due to missing files from the dataload snapshot (see bottom). This may be related to the interaction of snapshot loading with s3guard. We should nail down exactly the right procedure for loading the snapshot. Currently, we do the following:
> 1. Remove any data from the s3bucket via the s3 commandline
> 2. Create the s3guard dynamodb table (or reuse existing one if a previous job failed without deleting the old dynamodb table)
> 3. Prune any existing entries from that table
> 4. Load the snapshot to the s3 bucket
> In theory, this leave s3guard with an empty dynamodb table and an s3bucket with data. As tests progress and try to access the s3 bucket, s3guard would see that there is no entry in the dynamodb table and then check the underlying s3 bucket.
> We need to revisit these steps and verify that everything is being done correctly.
> {noformat}
> metadata/test_metadata_query_statements.py:70: in test_show_stats
>     self.run_test_case('QueryTest/show-stats', vector, "functional")
> common/impala_test_suite.py:687: in run_test_case
>     self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:523: in __verify_results_and_errors
>     replace_filenames_with_placeholder)
> common/test_result_verifier.py:456: in verify_raw_results
>     VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
>     assert expected_results == actual_results
> E assert Comparing QueryTestResults (expected vs actual):
> E '2009','1',310,1,'19.95KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=1&#39; == '2009','1',310,1,'19.95KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=1&#39;
> E '2009','10',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=10&#39; == '2009','10',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=10&#39;
> E '2009','11',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=11&#39; == '2009','11',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=11&#39;
> E '2009','12',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=12&#39; == '2009','12',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=12&#39;
> E '2009','2',280,1,'18.12KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=2&#39; == '2009','2',280,1,'18.12KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=2&#39;
> E '2009','3',310,1,'20.06KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=3&#39; == '2009','3',310,1,'20.06KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=3&#39;
> E '2009','4',300,1,'19.61KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=4&#39; == '2009','4',300,1,'19.61KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=4&#39;
> E '2009','5',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=5&#39; != '2009','5',0,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=5&#39;
> E '2009','6',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=6&#39; == '2009','6',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=6&#39;
> E '2009','7',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=7&#39; == '2009','7',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=7&#39;
> E '2009','8',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=8&#39; == '2009','8',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=8&#39;
> E '2009','9',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=9&#39; == '2009','9',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=9&#39;
> E '2010','1',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=1&#39; == '2010','1',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=1&#39;
> E '2010','10',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=10&#39; == '2010','10',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=10&#39;
> E '2010','11',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=11&#39; == '2010','11',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=11&#39;
> E '2010','12',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=12&#39; == '2010','12',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=12&#39;
> E '2010','2',280,1,'18.39KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=2&#39; == '2010','2',280,1,'18.39KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=2&#39;
> E '2010','3',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=3&#39; == '2010','3',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=3&#39;
> E '2010','4',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=4&#39; == '2010','4',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=4&#39;
> E '2010','5',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=5&#39; == '2010','5',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=5&#39;
> E '2010','6',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=6&#39; == '2010','6',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=6&#39;
> E '2010','7',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=7&#39; == '2010','7',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=7&#39;
> E '2010','8',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=8&#39; == '2010','8',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=8&#39;
> E '2010','9',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=9&#39; == '2010','9',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=9&#39;
> E 'Total','',7300,24,'478.45KB','0B','','','','' != 'Total','',6990,24,'478.45KB','0B','','','',''
> {noformat}
> This also shows up in cardinality calculations:
> {noformat}
> metadata/test_explain.py:113: in test_explain_validate_cardinality_estimates
>     check_cardinality(result.data, '7.30K')
> metadata/test_explain.py:98: in check_cardinality
>     query_result, expected_cardinality=expected_cardinality)
> metadata/test_explain.py:86: in check_row_size_and_cardinality
>     assert m.groups()[1] == expected_cardinality
> E assert '6.99K' == '7.30K'
> E - 6.99K
> E + 7.30K
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org