You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "carolinchen (Jira)" <ji...@apache.org> on 2022/01/25 03:49:00 UTC
[jira] [Created] (IMPALA-11081) Partition key scan optimization may return incorrect results when partition file have more than one block
carolinchen created IMPALA-11081:
------------------------------------
Summary: Partition key scan optimization may return incorrect results when partition file have more than one block
Key: IMPALA-11081
URL: https://issues.apache.org/jira/browse/IMPALA-11081
Project: IMPALA
Issue Type: Bug
Reporter: carolinchen
Assignee: carolinchen
In https://issues.apache.org/jira/browse/IMPALA-8834 will only generate one scan range for partition key's scan, but it may cause wrong result.
In this case, when a file with more than one block.
# The planner will only transforms the first block into TScanRange, which does not include footer.
# The backend can't find the split with the footer, so that can neither parse the footer nor do the scan.
so that the paritition key scan's result will be incorrect.
see this snippet in HdfsScanNode.java:
{code:java}
private Pair<Boolean, Long> transformBlocksToScanRanges(
FeFsPartition partition, FileDescriptor fileDesc,
boolean fsHasBlocks, long scanRangeBytesLimit,
Analyzer analyzer) {
for (int i = 0; i < fileDesc.getNumFileBlocks(); ++i) {
// Only generate one scan range for partition key scans.
if (isPartitionKeyScan_) break;
}
}{code}
In FE, if file with more than one block do partition key scan, transformBlocksToScanRanges will not include footer range.
see this snippet in hdfs-scanner.cc:
{code:java}
/// Issue just the footer range for each file. This function is only used /// in parquet and orc scanners. We'll then parse the footer and pick out /// the columns we want.
Status HdfsScanner::IssueFooterRanges(HdfsScanNodeBase* scan_node,
const THdfsFileFormat::type& file_type,
const std::vector<HdfsFileDesc*>& files) {
// Try to find the split with the footer.
ScanRange* footer_split = FindFooterSplit(files[i]);
}{code}
In BE, there no footer split won't add range to do the scan.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org