You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2018/10/23 21:31:00 UTC

[jira] [Assigned] (IMPALA-6374) test tpcds-q98.test has some incorrect data

     [ https://issues.apache.org/jira/browse/IMPALA-6374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong reassigned IMPALA-6374:
-------------------------------------

    Assignee: Tim Armstrong  (was: Tim Wood)

> test tpcds-q98.test has some incorrect data
> -------------------------------------------
>
>                 Key: IMPALA-6374
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6374
>             Project: IMPALA
>          Issue Type: Test
>          Components: Infrastructure
>    Affects Versions: Impala 2.9.0
>            Reporter: Stephen Carlin
>            Assignee: Tim Armstrong
>            Priority: Critical
>
> I happened to look through the unit tests and it looks like tpcds-q98.test has some bad data in it, but it is verifying correctly.
> One example (among maybe 12 or so) is on line 469:
> line 468: 'AAAAAAAAEKGDAAAA','Houses should ','Books','mystery',1.77,3341.80,1.96
> line 469: 'AAAAAAAAFFDDAAAA',','Books','mystery',2.79,4237.23,2.49
> Note that the 2nd field for line 468 looks normal, but line 469 has just a single quote.
> I believe this is happening on all strings that end with a comma for this test.  The correct result for this line (I believe) should be (note the comma after Poor):
> 'AAAAAAAAFFDDAAAA','French, civil hours must report essential values. Reasonable, complete judges vary clearly homes; often pleasant women would watch. Poor,','Books','mystery',2.79,4237.23,2.48
> My guess as to why this is happening is some code in test_result_verifier.py, specifically in the part that says:
>     for col_val in row_string.split(','):
>       # This is a bit tricky because we need to handle the case where a comma may be in
>       # the middle of a string. We detect this by finding a split that starts with an
>       # opening string character but that doesn't end in a string character. It is
>       # possible for the first character to be a single-quote, so handle that case
>       if (col_val.startswith("'") and not col_val.endswith("'")) or (col_val == "'"):



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org