You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "carolinchen (Jira)" <ji...@apache.org> on 2022/01/25 03:51:00 UTC

[jira] [Created] (IMPALA-11085) Partition key scan optimization may return incorrect results when partition file have more than one block

carolinchen created IMPALA-11085:
------------------------------------

             Summary: Partition key scan optimization may return incorrect results when partition file have more than one block
                 Key: IMPALA-11085
                 URL: https://issues.apache.org/jira/browse/IMPALA-11085
             Project: IMPALA
          Issue Type: Bug
            Reporter: carolinchen
            Assignee: carolinchen


In https://issues.apache.org/jira/browse/IMPALA-8834  will only generate one scan range for partition key's scan, but it may cause wrong result. 

In this case, when a partition file with more than one block:
 # The planner will only transforms the first block into TScanRange,  which does not include footer.
 # The backend can't find the split with the footer,  so that can neither parse the footer nor do the scan.

so that  the paritition key scan's result will be incorrect. 

 

see this snippet in HdfsScanNode.java:

 
{code:java}
private Pair<Boolean, Long> transformBlocksToScanRanges(
    FeFsPartition partition, FileDescriptor fileDesc, 
    boolean fsHasBlocks, long scanRangeBytesLimit, 
    Analyzer analyzer) { 
    for (int i = 0; i < fileDesc.getNumFileBlocks(); ++i) {
      // Only generate one scan range for partition key scans.      
      if (isPartitionKeyScan_) break;
    }
}{code}
In FE,  if file with more than one block do partition key scan,  transformBlocksToScanRanges will not include footer range. 

 

see this snippet in hdfs-scanner.cc:

 
{code:java}
/// Issue just the footer range for each file. This function is only used /// in parquet and orc scanners. We'll then parse the footer and pick out /// the columns we want.  
Status HdfsScanner::IssueFooterRanges(HdfsScanNodeBase* scan_node, 
    const THdfsFileFormat::type& file_type, 
    const std::vector<HdfsFileDesc*>& files) {
    // Try to find the split with the footer.    
    ScanRange* footer_split = FindFooterSplit(files[i]);
}{code}
In BE, there no footer split won't add range to do the scan. 

 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org