You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Lars Volker (JIRA)" <ji...@apache.org> on 2017/09/13 14:33:00 UTC
[jira] [Resolved] (IMPALA-5890) Segmentation fault in ScannerContext::Stream::GetBytesInternal(long, unsigned char**, bool, long*)

     [ https://issues.apache.org/jira/browse/IMPALA-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Volker resolved IMPALA-5890.
---------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.11.0

IMPALA-5890: Abort queries if scanner hits IO errors

Prior to this fix, an error in ScannerContext::Stream::GetNextBuffer()
could leave the stream in an inconsistent state:

- The DiskIoMgr hits EOF unexpected, cancels the scan range and enqueues
a buffer with eosr set.
- The ScannerContext::Stream tries to read more bytes, but since it has
hit eosr, it tries to read beyond the end of the scan range using
DiskIoMgr::Read().
- The previous read error resulted in a new file handle being opened.
The now truncated, smaller file causes the seek to fail.
- Then during error handling, the BaseSequenceScanner calls SkipToSync()
and trips over the NULL pointer in in the IO buffer.

In my reproduction this only happens with the file handle cache enabled,
which causes Impala to see two different sized handles: the one from the
cache when the query starts, and the one after reopening the file.

To fix this, we change the I/O manager to always return DISK_IO_ERROR
for errors and we abort a query if we receive such an error in the
scanner.

This change also fixes GetBytesInternal() to maintain the invariant that
the output buffer points to the boundary buffer whenever the latter
contains some data.

I tested this by running the repro from the JIRA and impalad did not
crash but aborted the queries. I also ran the repro with
abort_on_error=1, and with the file handle cache disabled.

Text files are not affected by this problem, since the
text scanner doesn't try to recover from errors during ProcessRange()
but wraps it in RETURN_IF_ERROR instead. With this change queries abort
with the same error.

Parquet files are also not affected since they have the metadata at the
end. Truncated files immediately fail with this error:
WARNINGS: File 'hdfs://localhost:20500/test-warehouse/tpch.partsupp_parquet/foo.0.parq'
has an invalid version number: <UTF8 Garbage>

Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509
Reviewed-on: http://gerrit.cloudera.org:8080/8011
Reviewed-by: Dan Hecht <dh...@cloudera.com>
Tested-by: Impala Public Jenkins

> Segmentation fault in ScannerContext::Stream::GetBytesInternal(long, unsigned char**, bool, long*)
> --------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-5890
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5890
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.10.0
>            Reporter: Lars Volker
>            Assignee: Lars Volker
>            Priority: Blocker
>              Labels: crash, regression
>             Fix For: Impala 2.11.0
>
>
> While investigating IMPALA-5889, I was able to reproducibly crash Impala on latest master. The crash doesn't repro on 2.9 so I suspect it's a regression. I will add reproduction steps shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)