You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Joe McDonnell (JIRA)" <ji...@apache.org> on 2019/06/14 17:39:00 UTC

[jira] [Created] (IMPALA-8666) HdfsParquetScanner::ProcessFooter() should do validation when it reads a bigger footer

Joe McDonnell created IMPALA-8666:
-------------------------------------

             Summary: HdfsParquetScanner::ProcessFooter() should do validation when it reads a bigger footer
                 Key: IMPALA-8666
                 URL: https://issues.apache.org/jira/browse/IMPALA-8666
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 3.3.0
            Reporter: Joe McDonnell


In IMPALA-8561, a user encountered an error deserializing the footer when HdfsParquetScanner::ProcessFooter() issues an IO for a Parquet footer that exceeds the default 100KB size. IMPALA-8561 fixed an underlying issue that would result in stale data being returned by DiskIoMgr in this case, but HdfsParquetScanner::ProcessFooter() needs to add validation to the codepath reading the larger footer. Specifically, it does not check the magic value that should be at the end of the file ([https://github.com/apache/impala/blob/11a2e86c28c7c7dcf9f394a82fc4045760fff97b/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1392-L1399]):
{code:java}
// Validate magic file bytes are correct.
	uint8_t* magic_number_ptr = buffer + scan_range_len - sizeof(PARQUET_VERSION_NUMBER);
	if (memcmp(magic_number_ptr, PARQUET_VERSION_NUMBER,
	sizeof(PARQUET_VERSION_NUMBER)) != 0) {
	return Status(TErrorCode::PARQUET_BAD_VERSION_NUMBER, filename(),
	string(reinterpret_cast<char*>(magic_number_ptr), sizeof(PARQUET_VERSION_NUMBER)),
	scan_node_->hdfs_table()->fully_qualified_name());
	}
{code}
It should do this check on the new larger footer. It should also verify that the size of the new larger footer is the same as what it saw earlier in the initial 100KB IO.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org