You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Joe McDonnell (Jira)" <ji...@apache.org> on 2020/05/18 22:07:00 UTC

[jira] [Created] (IMPALA-9759) Revisit integration of snapshot dataload with s3guard

Joe McDonnell created IMPALA-9759:
-------------------------------------

             Summary: Revisit integration of snapshot dataload with s3guard
                 Key: IMPALA-9759
                 URL: https://issues.apache.org/jira/browse/IMPALA-9759
             Project: IMPALA
          Issue Type: Bug
          Components: Infrastructure
    Affects Versions: Impala 4.0
            Reporter: Joe McDonnell


Sometimes, the s3 jobs (which use s3guard for consistency) sees test failures due to missing files from the dataload snapshot (see bottom). This may be related to the interaction of snapshot loading with s3guard. We should nail down exactly the right procedure for loading the snapshot. Currently, we do the following:
1. Remove any data from the s3bucket via the s3 commandline
2. Create the s3guard dynamodb table (or reuse existing one if a previous job failed without deleting the old dynamodb table)
3. Prune any existing entries from that table
4. Load the snapshot to the s3 bucket

In theory, this leave s3guard with an empty dynamodb table and an s3bucket with data. As tests progress and try to access the s3 bucket, s3guard would see that there is no entry in the dynamodb table and then check the underlying s3 bucket.

We need to revisit these steps and verify that everything is being done correctly.

{noformat}
metadata/test_metadata_query_statements.py:70: in test_show_stats
    self.run_test_case('QueryTest/show-stats', vector, "functional")
common/impala_test_suite.py:687: in run_test_case
    self.__verify_results_and_errors(vector, test_section, result, use_db)
common/impala_test_suite.py:523: in __verify_results_and_errors
    replace_filenames_with_placeholder)
common/test_result_verifier.py:456: in verify_raw_results
    VERIFIER_MAP[verifier](expected, actual)
common/test_result_verifier.py:278: in verify_query_result_is_equal
    assert expected_results == actual_results
E assert Comparing QueryTestResults (expected vs actual):
E '2009','1',310,1,'19.95KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=1&#39; == '2009','1',310,1,'19.95KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=1&#39;
E '2009','10',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=10&#39; == '2009','10',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=10&#39;
E '2009','11',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=11&#39; == '2009','11',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=11&#39;
E '2009','12',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=12&#39; == '2009','12',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=12&#39;
E '2009','2',280,1,'18.12KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=2&#39; == '2009','2',280,1,'18.12KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=2&#39;
E '2009','3',310,1,'20.06KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=3&#39; == '2009','3',310,1,'20.06KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=3&#39;
E '2009','4',300,1,'19.61KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=4&#39; == '2009','4',300,1,'19.61KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=4&#39;
E '2009','5',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=5&#39; != '2009','5',0,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=5&#39;
E '2009','6',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=6&#39; == '2009','6',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=6&#39;
E '2009','7',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=7&#39; == '2009','7',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=7&#39;
E '2009','8',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=8&#39; == '2009','8',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=8&#39;
E '2009','9',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=9&#39; == '2009','9',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=9&#39;
E '2010','1',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=1&#39; == '2010','1',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=1&#39;
E '2010','10',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=10&#39; == '2010','10',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=10&#39;
E '2010','11',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=11&#39; == '2010','11',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=11&#39;
E '2010','12',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=12&#39; == '2010','12',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=12&#39;
E '2010','2',280,1,'18.39KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=2&#39; == '2010','2',280,1,'18.39KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=2&#39;
E '2010','3',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=3&#39; == '2010','3',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=3&#39;
E '2010','4',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=4&#39; == '2010','4',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=4&#39;
E '2010','5',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=5&#39; == '2010','5',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=5&#39;
E '2010','6',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=6&#39; == '2010','6',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=6&#39;
E '2010','7',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=7&#39; == '2010','7',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=7&#39;
E '2010','8',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=8&#39; == '2010','8',310,1,'20.36KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=8&#39;
E '2010','9',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=9&#39; == '2010','9',300,1,'19.71KB','NOT CACHED','NOT CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=9&#39;
E 'Total','',7300,24,'478.45KB','0B','','','','' != 'Total','',6990,24,'478.45KB','0B','','','',''
{noformat}

This also shows up in cardinality calculations:

{noformat}
metadata/test_explain.py:113: in test_explain_validate_cardinality_estimates
    check_cardinality(result.data, '7.30K')
metadata/test_explain.py:98: in check_cardinality
    query_result, expected_cardinality=expected_cardinality)
metadata/test_explain.py:86: in check_row_size_and_cardinality
    assert m.groups()[1] == expected_cardinality
E assert '6.99K' == '7.30K'
E - 6.99K
E + 7.30K
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)