You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/10/02 17:34:00 UTC

[jira] [Commented] (IMPALA-7595) Check failed: IsValidTime(time_) at timestamp-value.h:322

    [ https://issues.apache.org/jira/browse/IMPALA-7595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635854#comment-16635854 ] 

ASF subversion and git services commented on IMPALA-7595:
---------------------------------------------------------

Commit 810841115a4f62dffd219cca8a9fbd34ea73e37c in impala's branch refs/heads/master from [~csringhofer]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=8108411 ]

IMPALA-7595: Check the validity of the time part of Parquet timestamps

Before this fix Impala did not check whether a timestamp's time part
is out of the valid [0, 24 hour) range when reading Parquet files,
so these timestamps were memcopied as they were to slots, leading to
results like:
1970-01-01 -00:00:00.000000001
1970-01-01 24:00:00

Different parts of Impala treat these timestamp differently:
- string conversion leads to invalid representation that cannot be
  converted back to timestamp
- timezone conversions handle the overflowing time part and give
  a valid timestamp result (at least since CCTZ, I did not check
  older versions of Impala)
- Parquet writing inserts these timestamp as they are, so the
  resulting Parquet file will also contain corrupt timestamps

The fix adds a check that converts these corrupt timestamps to NULL,
similarly to the handling of timestamp outside the [1400..10000)
range. A new error code is added for this case. If both the date
and the time part is corrupt, then error about corrupt time is
returned.

Testing:
- added a new scanner test that reads a corrupted Parquet file
  with edge values

Change-Id: Ibc0ae651b6a0a028c61a15fd069ef9e904231058
Reviewed-on: http://gerrit.cloudera.org:8080/11521
Reviewed-by: Csaba Ringhofer <cs...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Check failed: IsValidTime(time_) at timestamp-value.h:322 
> ----------------------------------------------------------
>
>                 Key: IMPALA-7595
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7595
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.1.0
>            Reporter: Tim Armstrong
>            Assignee: Csaba Ringhofer
>            Priority: Blocker
>              Labels: broken-build, crash
>
> See https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/3197/. hash is 23c7d7e57b7868eedbf5a9a4bc4aafd6066a04fb
> Some of the fuzz tests stand out amongst the tests that were running at the same time as the crash, particularly:
>  19:12:17 [gw4] PASSED query_test/test_scanners_fuzz.py::TestScannersFuzzing::test_fuzz_alltypes[exec_option: {'debug_action': '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', 'abort_on_error': False, 'mem_limit': '512m', 'num_nodes': 0} | table_format: parquet/none] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org