You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Michael Ho (JIRA)" <ji...@apache.org> on 2019/04/05 22:30:00 UTC

[jira] [Comment Edited] (IMPALA-8394) Inconsistent data read from S3a connector

    [ https://issues.apache.org/jira/browse/IMPALA-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16811317#comment-16811317 ] 

Michael Ho edited comment on IMPALA-8394 at 4/5/19 10:29 PM:
-------------------------------------------------------------

Thanks [~stakiar]. I just noticed that we could be issuing multiple calls to {{ReadFromPos}} in some cases so yes most likely we need to force a seek to move the offset to the right position as we are using exclusive file handle for S3. Let me give that a try.

One piece which I have missed is that we always seek when using file handle cache in {{ReadFromPosInternal}}. I mis-read that part and assumed that we would always seek before reading which apparently is not the case:

{noformat}
    if (is_borrowed_fh) {
      if (hdfsSeek(hdfs_fs_, hdfs_file, position_in_file) != 0) {
        return Status(TErrorCode::DISK_IO_ERROR, GetBackendString(),
            Substitute("Error seeking to $0 in file: $1: $2",
                position_in_file, *scan_range_->file_string(), GetHdfsErrorMsg("")));
      }
    }
{noformat}


was (Author: kwho):
Thanks [~stakiar]. I just noticed that we could be issuing multiple calls to {{ReadFromPos}} in some cases so yes most likely we need to force a seek to move the offset to the right position as we are using exclusive file handle for S3. Let me give that a try.

> Inconsistent data read from S3a connector
> -----------------------------------------
>
>                 Key: IMPALA-8394
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8394
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.2.0, Impala 3.3.0
>            Reporter: Michael Ho
>            Assignee: Michael Ho
>            Priority: Critical
>
> While testing a build with remote data cache (https://github.com/michaelhkw/impala/commits/remote-cache-debug) with S3, it was noticed that data read back from S3 through the HDFS S3 adaptor was inconsistent. This was confirmed by computing the checksum of the buffer right after a successful read. The following are the activities of 2 threads in the log.
> Both thread 18922 and 18924 tried to look up s3a://impala-remote-reads/tpcds_3000_decimal_parquet.db/store_sales/ss_sold_date_sk=2451097/46490ec1e79b939b-b427e7c50000000b_986349703_data.0.parq at offset: 89814317. Both of them hit cache miss. They both read from S3 for the content. Thread 18924 won the race to insert into the cache. When 18922 came around later to try to insert the same entry into the cache, it noticed that the checksum of the content inserted by thread 18924 was different from its own content. 
> Please note that the checksum of the bytes read from S3 were computed and logged in {{hdfs-file-reader.cc}} before the insertion into the cache (which also computed the checksum again) and the inconsistency was also observed in {{hdfs-file-reader.cc}} already, with thread 18924 computing {{8299739883147237483}} while thread 18922 computing {{9118051972380785265}}.
> We re-ran the same experiment with {{--use_hdfs_pread=true}} and the problem went away. While I don't rule out bugs in the cache prototype at this point, the debugging so far suggests the content read back from S3 via HDFS S3a connector is inconsistent when pread was disabled. It could be that we inadvertently shared the file handle somehow or there are some race conditions in the S3a connector which got exposed by the timing change with the cache enabled.
> FWIW, we also ran the same experiment in HDFS remote read configuration and it was not reproducible there either.
> Thread 18924
> {noformat}
> I0405 12:02:15.316999 18924 data-cache.cc:344] ed4c2ab7791b5883:9f1507450000005f] Looking up s3a://impala-remote-reads/tpcds_3000_decimal_parquet.db/store_sales/ss_sold_date_sk=2451097/46490ec1e79b939b-b427e7c50000000b_986349703_data.0.parq mtime: 1549425284000 offset: 89814317 bytes_to_read: 8332914 bytes_read: 0 buffer: 4d600000
> I0405 12:02:15.593314 18924 hdfs-file-reader.cc:185] ed4c2ab7791b5883:9f1507450000005f] Caching file s3a://impala-remote-reads/tpcds_3000_decimal_parquet.db/store_sales/ss_sold_date_sk=2451097/46490ec1e79b939b-b427e7c50000000b_986349703_data.0.parq mtime: 1549425284000 offset: 89814317 bytes_read 8332914 checksum 8299739883147237483
> I0405 12:02:15.596087 18924 data-cache.cc:233] ed4c2ab7791b5883:9f1507450000005f] Storing file /data0/1/impala/datacache/cf4b57f89e5985f2:487084b5c69b208b offset 1669431296 len 8332914 checksum 8299739883147237483
> I0405 12:02:15.602699 18924 data-cache.cc:361] ed4c2ab7791b5883:9f1507450000005f] Storing s3a://impala-remote-reads/tpcds_3000_decimal_parquet.db/store_sales/ss_sold_date_sk=2451097/46490ec1e79b939b-b427e7c50000000b_986349703_data.0.parq mtime: 1549425284000 offset: 89814317 bytes_to_read: 8332914 buffer: 4d600000 stored: true
> {noformat}
> Thread 18922:
> {noformat}
> I0405 12:02:15.011065 18922 data-cache.cc:344] ed4c2ab7791b5883:9f150745000000da] Looking up s3a://impala-remote-reads/tpcds_3000_decimal_parquet.db/store_sales/ss_sold_date_sk=2451097/46490ec1e79b939b-b427e7c50000000b_986349703_data.0.parq mtime: 1549425284000 offset: 89814317 bytes_to_read: 8332914 bytes_read: 0 buffer: 59200000
> I0405 12:02:16.281126 18922 hdfs-file-reader.cc:185] ed4c2ab7791b5883:9f150745000000da] Caching file s3a://impala-remote-reads/tpcds_3000_decimal_parquet.db/store_sales/ss_sold_date_sk=2451097/46490ec1e79b939b-b427e7c50000000b_986349703_data.0.parq mtime: 1549425284000 offset: 89814317 bytes_read 8332914 checksum 9118051972380785265
> I0405 12:02:16.282948 18922 data-cache.cc:166] ed4c2ab7791b5883:9f150745000000da] Storing duplicated file /data0/1/impala/datacache/cf4b57f89e5985f2:487084b5c69b208b offset 1669431296 len 8332914 checksum 8299739883147237483 buffer checksum: 9118051972380785265
> E0405 12:02:16.282974 18922 data-cache.cc:171] ed4c2ab7791b5883:9f150745000000da] Write checksum mismatch for file /data0/1/impala/datacache/cf4b57f89e5985f2:487084b5c69b208b offset 1669431296 entry len: 8332914 store_len: 8332914 Expected 8299739883147237483, Got 9118051972380785265.
> I0405 12:02:16.283023 18922 data-cache.cc:361] ed4c2ab7791b5883:9f150745000000da] Storing s3a://impala-remote-reads/tpcds_3000_decimal_parquet.db/store_sales/ss_sold_date_sk=2451097/46490ec1e79b939b-b427e7c50000000b_986349703_data.0.parq mtime: 1549425284000 offset: 89814317 bytes_to_read: 8332914 buffer: 59200000 stored: false
> {noformat}
> The problem is quite reproducible with TPCDS Q28 at TPCDS 3000 with parquet format.
> {noformat}
> select  *
> from (select avg(ss_list_price) B1_LP
>             ,count(ss_list_price) B1_CNT
>             ,count(distinct ss_list_price) B1_CNTD
>       from store_sales
>       where ss_quantity between 0 and 5
>         and (ss_list_price between 185 and 185+10 
>              or ss_coupon_amt between 10548 and 10548+1000
>              or ss_wholesale_cost between 6 and 6+20)) B1,
>      (select avg(ss_list_price) B2_LP
>             ,count(ss_list_price) B2_CNT
>             ,count(distinct ss_list_price) B2_CNTD
>       from store_sales
>       where ss_quantity between 6 and 10
>         and (ss_list_price between 28 and 28+10
>           or ss_coupon_amt between 6100 and 6100+1000
>           or ss_wholesale_cost between 27 and 27+20)) B2,
>      (select avg(ss_list_price) B3_LP
>             ,count(ss_list_price) B3_CNT
>             ,count(distinct ss_list_price) B3_CNTD
>       from store_sales
>       where ss_quantity between 11 and 15
>         and (ss_list_price between 173 and 173+10
>           or ss_coupon_amt between 6371 and 6371+1000
>           or ss_wholesale_cost between 32 and 32+20)) B3,
>      (select avg(ss_list_price) B4_LP
>             ,count(ss_list_price) B4_CNT
>             ,count(distinct ss_list_price) B4_CNTD
>       from store_sales
>       where ss_quantity between 16 and 20
>         and (ss_list_price between 101 and 101+10
>           or ss_coupon_amt between 2938 and 2938+1000
>           or ss_wholesale_cost between 21 and 21+20)) B4,
>      (select avg(ss_list_price) B5_LP
>             ,count(ss_list_price) B5_CNT
>             ,count(distinct ss_list_price) B5_CNTD
>       from store_sales
>       where ss_quantity between 21 and 25
>         and (ss_list_price between 8 and 8+10
>           or ss_coupon_amt between 5093 and 5093+1000
>           or ss_wholesale_cost between 50 and 50+20)) B5,
>      (select avg(ss_list_price) B6_LP
>             ,count(ss_list_price) B6_CNT
>             ,count(distinct ss_list_price) B6_CNTD
>       from store_sales
>       where ss_quantity between 26 and 30
>         and (ss_list_price between 110 and 110+10
>           or ss_coupon_amt between 2276 and 2276+1000
>           or ss_wholesale_cost between 36 and 36+20)) B6
> limit 100;
> {noformat}
> cc'ing [~stakiar], [~joemcdonnell] [~lv] [~tlipcon] [~drorke]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org