You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@impala.apache.org by st...@apache.org on 2022/06/09 07:40:55 UTC

[impala] branch master updated (13bbff4e4 -> 23d09638d)

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


    from 13bbff4e4 IMPALA-11323: Don't evaluate constants-only inferred predicates
     new 97d3b25be IMPALA-11338: Update IMPALA_VERSION to 4.2.0-SNAPSHOT
     new 7273cfdfb IMPALA-5845: Limit the number of non-fatal errors logging to INFO
     new 23d09638d IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/exec/file-metadata-utils.cc                 |  52 ++++--
 be/src/exec/file-metadata-utils.h                  |  10 +-
 be/src/exec/hdfs-orc-scanner.cc                    |   4 +-
 be/src/exec/hdfs-scan-node-base.cc                 |  21 ++-
 be/src/exec/hdfs-scan-node-base.h                  |  12 ++
 be/src/exec/hdfs-scanner.cc                        |  18 +-
 be/src/exec/orc-column-readers.cc                  |   5 +-
 be/src/exec/parquet/hdfs-parquet-scanner.cc        |   2 +-
 be/src/runtime/descriptors.cc                      |  10 +-
 be/src/runtime/descriptors.h                       |   8 +-
 be/src/runtime/query-state.cc                      |   6 -
 be/src/runtime/runtime-state.cc                    |  42 ++++-
 be/src/runtime/runtime-state.h                     |  12 ++
 bin/impala-config.sh                               |   2 +-
 common/thrift/CatalogObjects.thrift                |  59 ++++---
 common/thrift/Descriptors.thrift                   |   2 +
 fe/pom.xml                                         |   2 +-
 .../java/org/apache/impala/analysis/Analyzer.java  |  13 +-
 .../org/apache/impala/analysis/InlineViewRef.java  |  29 +++-
 .../main/java/org/apache/impala/analysis/Path.java |  46 +++++-
 .../org/apache/impala/analysis/SelectListItem.java |  13 +-
 .../org/apache/impala/analysis/SelectStmt.java     |   1 +
 .../org/apache/impala/analysis/SlotDescriptor.java |  13 +-
 .../java/org/apache/impala/catalog/Column.java     |   1 +
 .../java/org/apache/impala/catalog/FeTable.java    |   8 +
 .../java/org/apache/impala/catalog/HdfsTable.java  |   6 +
 .../org/apache/impala/catalog/IcebergTable.java    |   5 +
 .../org/apache/impala/catalog/StructField.java     |  10 +-
 .../main/java/org/apache/impala/catalog/Table.java |  20 +++
 .../org/apache/impala/catalog/VirtualColumn.java   |  68 ++++++++
 .../apache/impala/catalog/local/LocalFsTable.java  |   6 +
 .../impala/catalog/local/LocalIcebergTable.java    |   7 +
 .../apache/impala/catalog/local/LocalTable.java    |  10 ++
 .../org/apache/impala/analysis/AnalyzerTest.java   |  36 ++++
 java/TableFlattener/pom.xml                        |   2 +-
 java/datagenerator/pom.xml                         |   2 +-
 java/executor-deps/pom.xml                         |   2 +-
 java/ext-data-source/api/pom.xml                   |   2 +-
 java/ext-data-source/pom.xml                       |   2 +-
 java/ext-data-source/sample/pom.xml                |   2 +-
 java/ext-data-source/test/pom.xml                  |   2 +-
 java/pom.xml                                       |   2 +-
 java/query-event-hook-api/pom.xml                  |   2 +-
 java/shaded-deps/hive-exec/pom.xml                 |   2 +-
 java/shaded-deps/s3a-aws-sdk/pom.xml               |   2 +-
 java/test-hive-udfs/pom.xml                        |   2 +-
 java/yarn-extras/pom.xml                           |   2 +-
 .../queries/QueryTest/ranger_column_masking.test   | 153 +++++++++++++++++
 ...irtual-column-input-file-name-complextypes.test |  72 ++++++++
 .../virtual-column-input-file-name-in-table.test   |  25 +++
 .../QueryTest/virtual-column-input-file-name.test  | 183 +++++++++++++++++++++
 tests/authorization/test_ranger.py                 |   6 +
 tests/custom_cluster/test_logging.py               |  61 +++++++
 tests/query_test/test_scanners.py                  |  12 ++
 54 files changed, 1006 insertions(+), 91 deletions(-)
 create mode 100644 fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java
 create mode 100644 testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-complextypes.test
 create mode 100644 testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-in-table.test
 create mode 100644 testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test
 create mode 100644 tests/custom_cluster/test_logging.py

[impala] 03/03: IMPALA-801, IMPALA-8011: Add INPUTFILENAME virtual column for file name

Posted by st...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 23d09638de35dcec6419a5e30df08fd5d8b27e7d
Author: Zoltan Borok-Nagy <bo...@cloudera.com>
AuthorDate: Mon May 2 19:14:49 2022 +0200

    IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name
    
    Hive has virtual column INPUT__FILE__NAME which returns the data file
    name that stores the actual row. It can be used in several ways, see the
    above two Jira tickets for examples. This virtual column is also needed
    to support position-based delete files in Iceberg V2 tables.
    
    This patch also adds the foundations to support further table-level
    virtual columns later. Virtual columns are stored at the table level
    in a separate list from the table schema. During path resolution
    in Path.resolve() we also try to resolve virtual columns. Slot
    descriptors also store the information whether they refer to a virtual
    column.
    
    Currently we only add the INPUT__FILE__NAME virtual column. The value
    of this column can be set in the template tuple of the scanners.
    
    All kinds of operations are possible on this virtual column, users
    can invoke additional functions on it, can filter rows, can group by,
    etc.
    
    Special care is needed for virtual columns when column masking/row
    filtering is applicable on them. They are added as "hidden" select
    list items to the table masking views which means they don't
    expand by * expressions. They still need to be included in *
    expressions though when they are coming from user-written views.
    
    Testing:
     * analyzer tests
     * added e2e tests
    
    Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
    Reviewed-on: http://gerrit.cloudera.org:8080/18514
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/exec/file-metadata-utils.cc                 |  52 ++++--
 be/src/exec/file-metadata-utils.h                  |  10 +-
 be/src/exec/hdfs-orc-scanner.cc                    |   4 +-
 be/src/exec/hdfs-scan-node-base.cc                 |  21 ++-
 be/src/exec/hdfs-scan-node-base.h                  |  12 ++
 be/src/exec/hdfs-scanner.cc                        |  18 +-
 be/src/exec/orc-column-readers.cc                  |   5 +-
 be/src/exec/parquet/hdfs-parquet-scanner.cc        |   2 +-
 be/src/runtime/descriptors.cc                      |  10 +-
 be/src/runtime/descriptors.h                       |   8 +-
 common/thrift/CatalogObjects.thrift                |  59 ++++---
 common/thrift/Descriptors.thrift                   |   2 +
 .../java/org/apache/impala/analysis/Analyzer.java  |  13 +-
 .../org/apache/impala/analysis/InlineViewRef.java  |  29 +++-
 .../main/java/org/apache/impala/analysis/Path.java |  46 +++++-
 .../org/apache/impala/analysis/SelectListItem.java |  13 +-
 .../org/apache/impala/analysis/SelectStmt.java     |   1 +
 .../org/apache/impala/analysis/SlotDescriptor.java |  13 +-
 .../java/org/apache/impala/catalog/Column.java     |   1 +
 .../java/org/apache/impala/catalog/FeTable.java    |   8 +
 .../java/org/apache/impala/catalog/HdfsTable.java  |   6 +
 .../org/apache/impala/catalog/IcebergTable.java    |   5 +
 .../org/apache/impala/catalog/StructField.java     |  10 +-
 .../main/java/org/apache/impala/catalog/Table.java |  20 +++
 .../org/apache/impala/catalog/VirtualColumn.java   |  68 ++++++++
 .../apache/impala/catalog/local/LocalFsTable.java  |   6 +
 .../impala/catalog/local/LocalIcebergTable.java    |   7 +
 .../apache/impala/catalog/local/LocalTable.java    |  10 ++
 .../org/apache/impala/analysis/AnalyzerTest.java   |  36 ++++
 .../queries/QueryTest/ranger_column_masking.test   | 153 +++++++++++++++++
 ...irtual-column-input-file-name-complextypes.test |  72 ++++++++
 .../virtual-column-input-file-name-in-table.test   |  25 +++
 .../QueryTest/virtual-column-input-file-name.test  | 183 +++++++++++++++++++++
 tests/authorization/test_ranger.py                 |   6 +
 tests/query_test/test_scanners.py                  |  12 ++
 35 files changed, 882 insertions(+), 64 deletions(-)

diff --git a/be/src/exec/file-metadata-utils.cc b/be/src/exec/file-metadata-utils.cc
index 708b4f5c5..467b522da 100644
--- a/be/src/exec/file-metadata-utils.cc
+++ b/be/src/exec/file-metadata-utils.cc
@@ -47,9 +47,35 @@ Tuple* FileMetadataUtils::CreateTemplateTuple(MemPool* mem_pool) {
     template_tuple =
         template_tuple->DeepCopy(*scan_node_->tuple_desc(), mem_pool);
   }
-  if (!scan_node_->hdfs_table()->IsIcebergTable()) {
-    return template_tuple;
+  if (UNLIKELY(!scan_node_->virtual_column_slots().empty())) {
+    AddFileLevelVirtualColumns(mem_pool, template_tuple);
   }
+  if (scan_node_->hdfs_table()->IsIcebergTable()) {
+    AddIcebergColumns(mem_pool, &template_tuple);
+  }
+  return template_tuple;
+}
+
+void FileMetadataUtils::AddFileLevelVirtualColumns(MemPool* mem_pool,
+    Tuple* template_tuple) {
+  DCHECK(template_tuple != nullptr);
+  for (int i = 0; i < scan_node_->virtual_column_slots().size(); ++i) {
+    const SlotDescriptor* slot_desc = scan_node_->virtual_column_slots()[i];
+    if (slot_desc->virtual_column_type() != TVirtualColumnType::INPUT_FILE_NAME) {
+      continue;
+    }
+    StringValue* slot = template_tuple->GetStringSlot(slot_desc->tuple_offset());
+    const char* filename = context_->GetStream()->filename();
+    int len = strlen(filename);
+    char* filename_copy = reinterpret_cast<char*>(mem_pool->Allocate(len));
+    Ubsan::MemCpy(filename_copy, filename, len);
+    slot->ptr = filename_copy;
+    slot->len = len;
+    template_tuple->SetNotNull(slot_desc->null_indicator_offset());
+  }
+}
+
+void FileMetadataUtils::AddIcebergColumns(MemPool* mem_pool, Tuple** template_tuple) {
   using namespace org::apache::impala::fb;
   TextConverter text_converter(/* escape_char = */ '\\',
       scan_node_->hdfs_table()->null_partition_key_value(),
@@ -57,11 +83,11 @@ Tuple* FileMetadataUtils::CreateTemplateTuple(MemPool* mem_pool) {
   const FbFileMetadata* file_metadata = file_desc_->file_metadata;
   const FbIcebergMetadata* ice_metadata = file_metadata->iceberg_metadata();
   auto transforms = ice_metadata->partition_keys();
-  if (transforms == nullptr) return template_tuple;
+  if (transforms == nullptr) return;
 
   const TupleDescriptor* tuple_desc = scan_node_->tuple_desc();
-  if (template_tuple == nullptr) {
-    template_tuple = Tuple::Create(tuple_desc->byte_size(), mem_pool);
+  if (*template_tuple == nullptr) {
+    *template_tuple = Tuple::Create(tuple_desc->byte_size(), mem_pool);
   }
   for (const SlotDescriptor* slot_desc : scan_node_->tuple_desc()->slots()) {
     const SchemaPath& path = slot_desc->col_path();
@@ -76,7 +102,7 @@ Tuple* FileMetadataUtils::CreateTemplateTuple(MemPool* mem_pool) {
         continue;
       }
       if (field_id != transform->source_id()) continue;
-      if (!text_converter.WriteSlot(slot_desc, template_tuple,
+      if (!text_converter.WriteSlot(slot_desc, *template_tuple,
                                     transform->transform_value()->c_str(),
                                     transform->transform_value()->size(),
                                     true, false,
@@ -91,14 +117,14 @@ Tuple* FileMetadataUtils::CreateTemplateTuple(MemPool* mem_pool) {
         // Dates are stored as INTs in the partition data in Iceberg, so let's try
         // to parse them as INTs.
         if (col_desc.type().type == PrimitiveType::TYPE_DATE) {
-          int32_t* slot = template_tuple->GetIntSlot(slot_desc->tuple_offset());
+          int32_t* slot = (*template_tuple)->GetIntSlot(slot_desc->tuple_offset());
           StringParser::ParseResult parse_result;
           *slot = StringParser::StringToInt<int32_t>(
               transform->transform_value()->c_str(),
               transform->transform_value()->size(),
               &parse_result);
           if (parse_result == StringParser::ParseResult::PARSE_SUCCESS) {
-            template_tuple->SetNotNull(slot_desc->null_indicator_offset());
+            (*template_tuple)->SetNotNull(slot_desc->null_indicator_offset());
           } else {
             state_->LogError(error_msg);
           }
@@ -108,14 +134,14 @@ Tuple* FileMetadataUtils::CreateTemplateTuple(MemPool* mem_pool) {
       }
     }
   }
-  return template_tuple;
 }
 
 bool FileMetadataUtils::IsValuePartitionCol(const SlotDescriptor* slot_desc) {
   DCHECK(context_ != nullptr);
   DCHECK(file_desc_ != nullptr);
   if (slot_desc->parent() != scan_node_->tuple_desc()) return false;
-  if (slot_desc->col_pos() < scan_node_->num_partition_keys()) {
+  if (slot_desc->col_pos() < scan_node_->num_partition_keys() &&
+      !slot_desc->IsVirtual()) {
     return true;
   }
 
@@ -142,4 +168,10 @@ bool FileMetadataUtils::IsValuePartitionCol(const SlotDescriptor* slot_desc) {
   return false;
 }
 
+bool FileMetadataUtils::NeedDataInFile(const SlotDescriptor* slot_desc) {
+  if (IsValuePartitionCol(slot_desc)) return false;
+  if (slot_desc->IsVirtual()) return false;
+  return true;
+}
+
 } // namespace impala
diff --git a/be/src/exec/file-metadata-utils.h b/be/src/exec/file-metadata-utils.h
index baf2ad717..b83e198db 100644
--- a/be/src/exec/file-metadata-utils.h
+++ b/be/src/exec/file-metadata-utils.h
@@ -44,12 +44,20 @@ public:
   /// for transform-based partition columns and non-partition columns.
   bool IsValuePartitionCol(const SlotDescriptor* slot_desc);
 
+  /// Returns true if the file should contain the column described by 'slot_desc'.
+  /// Returns false when the data can be retrieved from other sources, e.g. value-based
+  /// partition columns, virtual columns.
+  bool NeedDataInFile(const SlotDescriptor* slot_desc);
+
 private:
+  void AddFileLevelVirtualColumns(MemPool* mem_pool, Tuple* template_tuple);
+  void AddIcebergColumns(MemPool* mem_pool, Tuple** template_tuple);
+
   HdfsScanNodeBase* scan_node_;
   RuntimeState* state_;
 
   // Members below are set in Open()
-  const ScannerContext* context_ = nullptr;
+  ScannerContext* context_ = nullptr;
   const HdfsFileDesc* file_desc_ = nullptr;
 };
 
diff --git a/be/src/exec/hdfs-orc-scanner.cc b/be/src/exec/hdfs-orc-scanner.cc
index b79e9a584..6fb905230 100644
--- a/be/src/exec/hdfs-orc-scanner.cc
+++ b/be/src/exec/hdfs-orc-scanner.cc
@@ -598,8 +598,8 @@ Status HdfsOrcScanner::ResolveColumns(const TupleDescriptor& tuple_desc,
   // slot.
   SlotDescriptor* pos_slot_desc = nullptr;
   for (SlotDescriptor* slot_desc : tuple_desc.slots()) {
-    // Skip partition columns
-    if (IsPartitionKeySlot(slot_desc)) continue;
+    // Skip columns not (necessarily) stored in the data files.
+    if (!file_metadata_utils_.NeedDataInFile(slot_desc)) continue;
 
     node = nullptr;
     pos_field = false;
diff --git a/be/src/exec/hdfs-scan-node-base.cc b/be/src/exec/hdfs-scan-node-base.cc
index 019467732..a6f115b02 100644
--- a/be/src/exec/hdfs-scan-node-base.cc
+++ b/be/src/exec/hdfs-scan-node-base.cc
@@ -176,7 +176,9 @@ Status HdfsScanPlanNode::Init(const TPlanNode& tnode, FragmentState* state) {
   // Gather materialized partition-key slots and non-partition slots.
   const vector<SlotDescriptor*>& slots = tuple_desc_->slots();
   for (size_t i = 0; i < slots.size(); ++i) {
-    if (hdfs_table_->IsClusteringCol(slots[i])) {
+    if (UNLIKELY(slots[i]->IsVirtual())) {
+      virtual_column_slots_.push_back(slots[i]);
+    } else if (hdfs_table_->IsClusteringCol(slots[i])) {
       partition_key_slots_.push_back(slots[i]);
     } else {
       materialized_slots_.push_back(slots[i]);
@@ -353,7 +355,7 @@ Status HdfsScanPlanNode::ProcessScanRangesAndInitSharedState(FragmentState* stat
 
 Tuple* HdfsScanPlanNode::InitTemplateTuple(
     const std::vector<ScalarExprEvaluator*>& evals, MemPool* pool) const {
-  if (partition_key_slots_.empty()) return nullptr;
+  if (partition_key_slots_.empty() && !HasVirtualColumnInTemplateTuple()) return nullptr;
   Tuple* template_tuple = Tuple::Create(tuple_desc_->byte_size(), pool);
   for (int i = 0; i < partition_key_slots_.size(); ++i) {
     const SlotDescriptor* slot_desc = partition_key_slots_[i];
@@ -423,6 +425,7 @@ HdfsScanNodeBase::HdfsScanNodeBase(ObjectPool* pool, const HdfsScanPlanNode& pno
     is_materialized_col_(pnode.is_materialized_col_),
     materialized_slots_(pnode.materialized_slots_),
     partition_key_slots_(pnode.partition_key_slots_),
+    virtual_column_slots_(pnode.virtual_column_slots_),
     disks_accessed_bitmap_(TUnit::UNIT, 0),
     active_hdfs_read_thread_counter_(TUnit::UNIT, 0),
     shared_state_(const_cast<ScanRangeSharedState*>(&(pnode.shared_state_))) {}
@@ -988,6 +991,20 @@ void HdfsScanPlanNode::ComputeSlotMaterializationOrder(
   }
 }
 
+bool HdfsScanPlanNode::HasVirtualColumnInTemplateTuple() const {
+  for (SlotDescriptor* sd : virtual_column_slots_) {
+    DCHECK(sd->IsVirtual());
+    if (sd->virtual_column_type() == TVirtualColumnType::INPUT_FILE_NAME) {
+      return true;
+    } else {
+      // Adding DCHECK here so we don't forget to update this when adding new virtual
+      // column.
+      DCHECK(false);
+    }
+  }
+  return false;
+}
+
 void HdfsScanNodeBase::TransferToScanNodePool(MemPool* pool) {
   scan_node_pool_->AcquireData(pool, false);
 }
diff --git a/be/src/exec/hdfs-scan-node-base.h b/be/src/exec/hdfs-scan-node-base.h
index 7f5643756..bb1c72660 100644
--- a/be/src/exec/hdfs-scan-node-base.h
+++ b/be/src/exec/hdfs-scan-node-base.h
@@ -279,6 +279,9 @@ class HdfsScanPlanNode : public ScanPlanNode {
   void ComputeSlotMaterializationOrder(
       const DescriptorTbl& desc_tbl, std::vector<int>* order) const;
 
+ /// Returns true if it has a virtual column that we can materialize in the template tuple
+  bool HasVirtualColumnInTemplateTuple() const;
+
   /// Conjuncts for each materialized tuple (top-level row batch tuples and collection
   /// item tuples). Includes a copy of PlanNode.conjuncts_.
   typedef std::unordered_map<TupleId, std::vector<ScalarExpr*>> ConjunctsMap;
@@ -318,6 +321,9 @@ class HdfsScanPlanNode : public ScanPlanNode {
   /// Vector containing slot descriptors for all partition key slots.
   std::vector<SlotDescriptor*> partition_key_slots_;
 
+  /// Vector containing slot descriptors for virtual columns.
+  std::vector<SlotDescriptor*> virtual_column_slots_;
+
   /// Descriptor for the hdfs table, including partition and format metadata.
   /// Set in Init, owned by QueryState
   const HdfsTableDescriptor* hdfs_table_ = nullptr;
@@ -420,6 +426,10 @@ class HdfsScanNodeBase : public ScanNode {
   const std::vector<SlotDescriptor*>& materialized_slots()
       const { return materialized_slots_; }
 
+  const std::vector<SlotDescriptor*>& virtual_column_slots() const {
+      return virtual_column_slots_;
+  }
+
   /// Returns number of partition keys in the table.
   int num_partition_keys() const { return hdfs_table_->num_clustering_cols(); }
 
@@ -735,6 +745,8 @@ class HdfsScanNodeBase : public ScanNode {
   /// Vector containing slot descriptors for all partition key slots.
   const std::vector<SlotDescriptor*>& partition_key_slots_;
 
+  const std::vector<SlotDescriptor*>& virtual_column_slots_;
+
   /// Counters which track the number of scanners that have codegen enabled for the
   /// materialize and conjuncts evaluation code paths.
   AtomicInt32 num_scanners_codegen_enabled_;
diff --git a/be/src/exec/hdfs-scanner.cc b/be/src/exec/hdfs-scanner.cc
index fad812d26..75d63030f 100644
--- a/be/src/exec/hdfs-scanner.cc
+++ b/be/src/exec/hdfs-scanner.cc
@@ -597,10 +597,20 @@ Status HdfsScanner::CodegenInitTuple(
   DCHECK(*init_tuple_fn != nullptr);
 
   // Replace all of the constants in InitTuple() to specialize the code.
-  bool materialized_partition_keys_exist = !node->partition_key_slots_.empty();
-  int replaced = codegen->ReplaceCallSitesWithBoolConst(
-      *init_tuple_fn, materialized_partition_keys_exist, "has_template_tuple");
-  DCHECK_REPLACE_COUNT(replaced, 1);
+  bool has_template_tuple = !node->partition_key_slots_.empty();
+  if (!has_template_tuple) {
+    has_template_tuple = node->HasVirtualColumnInTemplateTuple();
+  }
+  int replaced = 0;
+  // If has_template_tuple is true, then we can certainly replace the callsites.
+  // If has_template_tuple is false, then we should only replace the callsites for
+  // non-Iceberg tables, as for Iceberg tables we might still create a template
+  // tuple based on the partitioning in the data files.
+  if (has_template_tuple || !node->hdfs_table_->IsIcebergTable()) {
+    replaced = codegen->ReplaceCallSitesWithBoolConst(
+      *init_tuple_fn, has_template_tuple, "has_template_tuple");
+    DCHECK_REPLACE_COUNT(replaced, 1);
+  }
 
   const TupleDescriptor* tuple_desc = node->tuple_desc_;
   replaced = codegen->ReplaceCallSitesWithValue(*init_tuple_fn,
diff --git a/be/src/exec/orc-column-readers.cc b/be/src/exec/orc-column-readers.cc
index 644ac325f..880f4e421 100644
--- a/be/src/exec/orc-column-readers.cc
+++ b/be/src/exec/orc-column-readers.cc
@@ -439,8 +439,9 @@ OrcStructReader::OrcStructReader(const orc::Type* node,
   if (materialize_tuple_) {
     for (SlotDescriptor* child_slot : tuple_desc_->slots()) {
       // Skip partition columns and missed columns
-      if (scanner->IsPartitionKeySlot(child_slot)
-          || scanner->IsMissingField(child_slot)) {
+      if (scanner->IsPartitionKeySlot(child_slot) ||
+          scanner->IsMissingField(child_slot) ||
+          child_slot->IsVirtual()) {
         continue;
       }
       CreateChildForSlot(node, child_slot);
diff --git a/be/src/exec/parquet/hdfs-parquet-scanner.cc b/be/src/exec/parquet/hdfs-parquet-scanner.cc
index f79e02b91..8718331a9 100644
--- a/be/src/exec/parquet/hdfs-parquet-scanner.cc
+++ b/be/src/exec/parquet/hdfs-parquet-scanner.cc
@@ -2717,7 +2717,7 @@ Status HdfsParquetScanner::CreateColumnReaders(const TupleDescriptor& tuple_desc
 
   for (SlotDescriptor* slot_desc: tuple_desc.slots()) {
     // Skip partition columns
-    if (file_metadata_utils_.IsValuePartitionCol(slot_desc)) continue;
+    if (!file_metadata_utils_.NeedDataInFile(slot_desc)) continue;
 
     SchemaNode* node = nullptr;
     bool pos_field;
diff --git a/be/src/runtime/descriptors.cc b/be/src/runtime/descriptors.cc
index 9fa70744c..2afc59e5d 100644
--- a/be/src/runtime/descriptors.cc
+++ b/be/src/runtime/descriptors.cc
@@ -107,7 +107,8 @@ SlotDescriptor::SlotDescriptor(const TSlotDescriptor& tdesc,
     tuple_offset_(tdesc.byteOffset),
     null_indicator_offset_(tdesc.nullIndicatorByte, tdesc.nullIndicatorBit),
     slot_idx_(tdesc.slotIdx),
-    slot_size_(type_.GetSlotSize()) {
+    slot_size_(type_.GetSlotSize()),
+    virtual_column_type_(tdesc.virtual_col_type) {
   DCHECK(parent_ != nullptr) << tdesc.parent;
   if (type_.IsComplexType()) {
     DCHECK(tdesc.__isset.itemTupleId);
@@ -141,8 +142,11 @@ string SlotDescriptor::DebugString() const {
     out << " children_tuple_id=" << children_tuple_descriptor_->id();
   }
   out << " offset=" << tuple_offset_ << " null=" << null_indicator_offset_.DebugString()
-      << " slot_idx=" << slot_idx_ << " field_idx=" << slot_idx_
-      << ")";
+      << " slot_idx=" << slot_idx_ << " field_idx=" << slot_idx_;
+  if (IsVirtual()) {
+    out << " virtual_column_type=" << virtual_column_type_;
+  }
+  out << ")";
   return out.str();
 }
 
diff --git a/be/src/runtime/descriptors.h b/be/src/runtime/descriptors.h
index e4d5326df..007cdb850 100644
--- a/be/src/runtime/descriptors.h
+++ b/be/src/runtime/descriptors.h
@@ -137,6 +137,9 @@ class SlotDescriptor {
   bool is_nullable() const { return null_indicator_offset_.bit_mask != 0; }
   int slot_size() const { return slot_size_; }
 
+  TVirtualColumnType::type virtual_column_type() const { return virtual_column_type_; }
+  bool IsVirtual() const { return virtual_column_type_ != TVirtualColumnType::NONE; }
+
   /// Comparison function for ordering slot descriptors by their col_path_.
   /// Returns true if 'a' comes before 'b'.
   /// Orders the paths as in a depth-first traversal of the schema tree, as follows:
@@ -191,6 +194,8 @@ class SlotDescriptor {
   /// the byte size of this slot.
   const int slot_size_;
 
+  const TVirtualColumnType::type virtual_column_type_;
+
   /// 'children_tuple_descriptor' should be non-NULL iff this is a complex type slot.
   SlotDescriptor(const TSlotDescriptor& tdesc, const TupleDescriptor* parent,
       const TupleDescriptor* children_tuple_descriptor);
@@ -234,7 +239,8 @@ class TableDescriptor {
   /// columns.
   bool IsClusteringCol(const SlotDescriptor* slot_desc) const {
     return slot_desc->col_path().size() == 1 &&
-        slot_desc->col_path()[0] < num_clustering_cols_;
+        slot_desc->col_path()[0] < num_clustering_cols_ &&
+        !slot_desc->IsVirtual();
   }
 
   const std::string& name() const { return name_; }
diff --git a/common/thrift/CatalogObjects.thrift b/common/thrift/CatalogObjects.thrift
index 8aa1116c6..572818e19 100644
--- a/common/thrift/CatalogObjects.thrift
+++ b/common/thrift/CatalogObjects.thrift
@@ -71,6 +71,11 @@ enum THdfsFileFormat {
   JSON = 9
 }
 
+enum TVirtualColumnType {
+  NONE,
+  INPUT_FILE_NAME
+}
+
 // TODO: Since compression is also enabled for Kudu columns, we should
 // rename this enum to not be Hdfs specific.
 enum THdfsCompression {
@@ -249,31 +254,32 @@ struct TColumn {
   4: optional TColumnStats col_stats
   // Ordinal position in the source table
   5: optional i32 position
+  6: optional TVirtualColumnType virtual_column_type = TVirtualColumnType.NONE
 
   // Indicates whether this is an HBase column. If true, implies
   // all following HBase-specific fields are set.
-  6: optional bool is_hbase_column
-  7: optional string column_family
-  8: optional string column_qualifier
-  9: optional bool is_binary
+  7: optional bool is_hbase_column
+  8: optional string column_family
+  9: optional string column_qualifier
+  10: optional bool is_binary
 
   // The followings are Kudu-specific column properties
-  10: optional bool is_kudu_column
-  11: optional bool is_key
-  12: optional bool is_nullable
-  13: optional TColumnEncoding encoding
-  14: optional THdfsCompression compression
-  15: optional Exprs.TExpr default_value
-  16: optional i32 block_size
+  11: optional bool is_kudu_column
+  12: optional bool is_key
+  13: optional bool is_nullable
+  14: optional TColumnEncoding encoding
+  15: optional THdfsCompression compression
+  16: optional Exprs.TExpr default_value
+  17: optional i32 block_size
   // The column name, in the case that it appears in Kudu.
-  17: optional string kudu_column_name
+  18: optional string kudu_column_name
 
   // Here come the Iceberg-specific fields.
-  18: optional bool is_iceberg_column
-  19: optional i32 iceberg_field_id
+  19: optional bool is_iceberg_column
+  20: optional i32 iceberg_field_id
   // Key and value field id for Iceberg column with Map type.
-  20: optional i32 iceberg_field_map_key_id
-  21: optional i32 iceberg_field_map_value_id
+  21: optional i32 iceberg_field_map_key_id
+  22: optional i32 iceberg_field_map_value_id
 }
 
 // Represents an HDFS file in a partition.
@@ -605,34 +611,37 @@ struct TTable {
   // List of clustering columns (empty list if table has no clustering columns)
   6: optional list<TColumn> clustering_columns
 
+  // List of virtual columns (empty list if table has no virtual columns)
+  7: optional list<TColumn> virtual_columns
+
   // Table stats data for the table.
-  7: optional TTableStats table_stats
+  8: optional TTableStats table_stats
 
   // Determines the table type - either HDFS, HBASE, or VIEW.
-  8: optional TTableType table_type
+  9: optional TTableType table_type
 
   // Set iff this is an HDFS table
-  9: optional THdfsTable hdfs_table
+  10: optional THdfsTable hdfs_table
 
   // Set iff this is an Hbase table
-  10: optional THBaseTable hbase_table
+  11: optional THBaseTable hbase_table
 
   // The Hive Metastore representation of this table. May not be set if there were
   // errors loading the table metadata
-  11: optional hive_metastore.Table metastore_table
+  12: optional hive_metastore.Table metastore_table
 
   // Set iff this is a table from an external data source
-  12: optional TDataSourceTable data_source_table
+  13: optional TDataSourceTable data_source_table
 
   // Set iff this a kudu table
-  13: optional TKuduTable kudu_table
+  14: optional TKuduTable kudu_table
 
   // Set if this table needs storage access during metadata load.
   // Time used for storage loading in nanoseconds.
-  15: optional i64 storage_metadata_load_time_ns
+  16: optional i64 storage_metadata_load_time_ns
 
   // Set if this a iceberg table
-  16: optional TIcebergTable iceberg_table
+  17: optional TIcebergTable iceberg_table
 }
 
 // Represents a database.
diff --git a/common/thrift/Descriptors.thrift b/common/thrift/Descriptors.thrift
index 66c9468fb..83bf50d15 100644
--- a/common/thrift/Descriptors.thrift
+++ b/common/thrift/Descriptors.thrift
@@ -49,6 +49,8 @@ struct TSlotDescriptor {
   7: required i32 nullIndicatorByte
   8: required i32 nullIndicatorBit
   9: required i32 slotIdx
+  10: required CatalogObjects.TVirtualColumnType virtual_col_type =
+      CatalogObjects.TVirtualColumnType.NONE
 }
 
 struct TColumnDescriptor {
diff --git a/fe/src/main/java/org/apache/impala/analysis/Analyzer.java b/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
index d9dad77ae..93d3b6edb 100644
--- a/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
+++ b/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
@@ -65,6 +65,7 @@ import org.apache.impala.catalog.StructField;
 import org.apache.impala.catalog.StructType;
 import org.apache.impala.catalog.TableLoadingException;
 import org.apache.impala.catalog.Type;
+import org.apache.impala.catalog.VirtualColumn;
 import org.apache.impala.catalog.local.LocalKuduTable;
 import org.apache.impala.common.AnalysisException;
 import org.apache.impala.common.IdGenerator;
@@ -1709,9 +1710,15 @@ public class Analyzer {
     TupleDescriptor tupleDesc = slotDesc.getParent();
     // Pass the full path for nested types even though these are ignore currently.
     // TODO: RANGER-3525: Clarify handling of column masks on nested types
-    Column column = new Column(
-        String.join(".", slotDesc.getPath().getRawPath()), slotDesc.getType(),
-           /*position*/-1);
+    Column column;
+    if (slotDesc.isVirtualColumn()) {
+      column = VirtualColumn.getVirtualColumn(slotDesc.getVirtualColumnType());
+    }
+    else {
+      column = new Column(
+          String.join(".", slotDesc.getPath().getRawPath()), slotDesc.getType(),
+            /*position*/-1);
+    }
     Analyzer analyzer = this;
     TableRef tblRef;
     do {
diff --git a/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java b/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
index a91070e85..8d7ef97b4 100644
--- a/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
+++ b/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
@@ -160,9 +160,19 @@ public class InlineViewRef extends TableRef {
     List<Column> columns = tableMask.getRequiredColumns();
     List<SelectListItem> items = Lists.newArrayListWithCapacity(columns.size());
     for (Column col: columns) {
-      items.add(new SelectListItem(
-          tableMask.createColumnMask(col.getName(), col.getType(), authzCtx),
-          /*alias*/ col.getName()));
+      Expr maskExpr = tableMask.createColumnMask(col.getName(), col.getType(), authzCtx);
+      // Virtual columns are hidden in the masking view, which means they don't
+      // participate in star expansion.
+      // E.g. during masking the following query is rewritten (where vc is a virtual col):
+      // SELECT vc, * FROM t; ===>
+      //     SELECT vc, * FROM (SELECT MASK(vc) as vc, c1, c2, ... FROM t) v;
+      // In which case the '*' in the outer "SELECT vc, *" shouldn't contain 'v.vc'
+      // because in that case it would be doubled:
+      // SELECT vc, vc, c1, c2, ... FROM (...);
+      // Hence virtual columns are hidden select list items. They are also hidden
+      // when they are not masked, but other columns are.
+      boolean isHidden = col.isVirtual();
+      items.add(new SelectListItem(maskExpr, /*alias*/ col.getName(), isHidden));
     }
     if (tableMask.hasComplexColumnMask()) {
       throw new AnalysisException("Column masking is not supported for complex types");
@@ -446,7 +456,18 @@ public class InlineViewRef extends TableRef {
         throw new AnalysisException("duplicated inline view column alias: '" +
             colAlias + "'" + " in inline view " + "'" + getUniqueAlias() + "'");
       }
-      fields.add(new StructField(colAlias, selectItemExpr.getType(), null));
+      boolean isHidden = false;
+      if (queryStmt_ instanceof SelectStmt) {
+        SelectStmt selectStmt = (SelectStmt)queryStmt_;
+        List<SelectListItem> itemList = selectStmt.getSelectList().getItems();
+        if (itemList.size() == numColLabels) {
+          // 'itemList.size() == numColLabels' is true for table masking views as they
+          // cannot contain '*' (because they need to mask some columns).
+          isHidden = itemList.get(i).isHidden();
+        }
+      }
+      fields.add(new StructField(colAlias, selectItemExpr.getType(), null,
+          isHidden));
     }
 
     // Create the non-materialized tuple and set its type.
diff --git a/fe/src/main/java/org/apache/impala/analysis/Path.java b/fe/src/main/java/org/apache/impala/analysis/Path.java
index 6e3fd2d3b..7cf702b7c 100644
--- a/fe/src/main/java/org/apache/impala/analysis/Path.java
+++ b/fe/src/main/java/org/apache/impala/analysis/Path.java
@@ -28,6 +28,8 @@ import org.apache.impala.catalog.MapType;
 import org.apache.impala.catalog.StructField;
 import org.apache.impala.catalog.StructType;
 import org.apache.impala.catalog.Type;
+import org.apache.impala.catalog.VirtualColumn;
+import org.apache.impala.thrift.TVirtualColumnType;
 import org.apache.impala.util.AcidUtils;
 
 import com.google.common.base.Joiner;
@@ -157,6 +159,8 @@ public class Path {
   // Caches the result of getAbsolutePath() to avoid re-computing it.
   private List<Integer> absolutePath_ = null;
 
+  private TVirtualColumnType virtualColType_ = TVirtualColumnType.NONE;
+
   // Resolved path before we resolved it again inside the table masking view.
   private Path pathBeforeMasking_ = null;
 
@@ -206,11 +210,20 @@ public class Path {
 
   /**
    * Resolves this path in the context of the root tuple descriptor / root table
-   * or continues resolving this relative path from an existing root path.
+   * or continues resolving this relative path from an existing root path. If normal
+   * path resolution fails it tries to resolve the path as a virtual column.
    * Returns true if the path could be fully resolved, false otherwise.
    * A failed resolution leaves this Path in a partially resolved state.
    */
   public boolean resolve() {
+    if (!resolveNonVirtualPath()) {
+      return resolveVirtualColumn();
+    } else {
+      return true;
+    }
+  }
+
+  private boolean resolveNonVirtualPath() {
     if (isResolved_) return true;
     Preconditions.checkState(rootDesc_ != null || rootTable_ != null);
     Type currentType = null;
@@ -269,6 +282,32 @@ public class Path {
     return true;
   }
 
+  private boolean resolveVirtualColumn() {
+    if (isResolved_) return true;
+    if (rootTable_ == null) return false;
+    if (rootDesc_ != null) {
+      if (rootDesc_.getType() != rootTable_.getType().getItemType()) {
+        // 'rootDesc_' describes a collection tuple. Currently we only allow virtual
+        // columns at the table-level.
+        return false;
+      }
+    }
+    if (rawPath_.size() != 1) return false;
+
+    String colName = rawPath_.get(0);
+    List<VirtualColumn> virtualColumns = rootTable_.getVirtualColumns();
+    for (VirtualColumn vCol : virtualColumns) {
+      if (vCol.getName().equalsIgnoreCase(colName)) {
+        virtualColType_ = vCol.getVirtualColumnType();
+        matchedTypes_.add(vCol.getType());
+        matchedPositions_.add(vCol.getPosition());
+        isResolved_ = true;
+        return true;
+      }
+    }
+    return false;
+  }
+
   /**
    * If the given type is a collection, returns a collection struct type representing
    * named fields of its explicit path. Returns the given type itself if it is already
@@ -313,6 +352,10 @@ public class Path {
   public boolean isRootedAtTuple() { return rootDesc_ != null; }
   public List<String> getRawPath() { return rawPath_; }
   public boolean isResolved() { return isResolved_; }
+  public TVirtualColumnType getVirtualColumnType() { return virtualColType_; }
+  public boolean isVirtualColumn() {
+    return virtualColType_ != TVirtualColumnType.NONE;
+  }
   public boolean isMaskedPath() { return pathBeforeMasking_ != null; }
   public Path getPathBeforeMasking() { return pathBeforeMasking_; }
   public void setPathBeforeMasking(Path p) {
@@ -492,6 +535,7 @@ public class Path {
   private void convertToFullAcidFilePath() {
     // For Full ACID tables we need to create a schema path that corresponds to the
     // ACID file schema.
+    if (virtualColType_ != TVirtualColumnType.NONE) return;
     int numPartitions = rootTable_.getNumClusteringCols();
     if (absolutePath_.get(0) == numPartitions) {
       // The path refers to the synthetic "row__id" column.
diff --git a/fe/src/main/java/org/apache/impala/analysis/SelectListItem.java b/fe/src/main/java/org/apache/impala/analysis/SelectListItem.java
index 4849543e2..17884e53d 100644
--- a/fe/src/main/java/org/apache/impala/analysis/SelectListItem.java
+++ b/fe/src/main/java/org/apache/impala/analysis/SelectListItem.java
@@ -31,14 +31,20 @@ public class SelectListItem {
   // for "[path.]*" (excludes trailing '*')
   private final List<String> rawPath_;
   private final boolean isStar_;
+  // True if the item shouldn't be included in star expansion
+  private final boolean isHidden_;
 
-  public SelectListItem(Expr expr, String alias) {
-    super();
+  public SelectListItem(Expr expr, String alias, boolean isHidden) {
     Preconditions.checkNotNull(expr);
     expr_ = expr;
     alias_ = alias;
     isStar_ = false;
     rawPath_ = null;
+    isHidden_ = isHidden;
+  }
+
+  public SelectListItem(Expr expr, String alias) {
+    this(expr, alias, false);
   }
 
   // select list item corresponding to path_to_struct.*
@@ -47,10 +53,10 @@ public class SelectListItem {
   }
 
   private SelectListItem(List<String> path) {
-    super();
     expr_ = null;
     isStar_ = true;
     rawPath_ = path;
+    isHidden_ = false;
   }
 
   public Expr getExpr() { return expr_; }
@@ -58,6 +64,7 @@ public class SelectListItem {
   public boolean isStar() { return isStar_; }
   public String getAlias() { return alias_; }
   public List<String> getRawPath() { return rawPath_; }
+  public boolean isHidden() { return isHidden_; }
 
   @Override
   public String toString() {
diff --git a/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java b/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
index 226c09e6d..e8fbe4821 100644
--- a/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
+++ b/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
@@ -775,6 +775,7 @@ public class SelectStmt extends QueryStmt {
         } else {
           // Default star expansion.
           for (StructField f: structType.getFields()) {
+            if (f.isHidden()) continue;
             addStarResultExpr(resolvedPath, f.getName());
           }
         }
diff --git a/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java b/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
index 8cd50b174..898059842 100644
--- a/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
+++ b/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
@@ -28,6 +28,7 @@ import org.apache.impala.catalog.FeKuduTable;
 import org.apache.impala.catalog.KuduColumn;
 import org.apache.impala.catalog.Type;
 import org.apache.impala.thrift.TSlotDescriptor;
+import org.apache.impala.thrift.TVirtualColumnType;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -178,6 +179,16 @@ public class SlotDescriptor {
   public boolean isScanSlot() { return path_ != null && path_.isRootedAtTable(); }
   public Column getColumn() { return !isScanSlot() ? null : path_.destColumn(); }
 
+  public boolean isVirtualColumn() {
+    if (path_ == null) return false;
+    return path_.getVirtualColumnType() != TVirtualColumnType.NONE;
+  }
+
+  public TVirtualColumnType getVirtualColumnType() {
+    if (path_ == null) return TVirtualColumnType.NONE;
+    return path_.getVirtualColumnType();
+  }
+
   public ColumnStats getStats() {
     if (stats_ == null) {
       Column c = getColumn();
@@ -362,7 +373,7 @@ public class SlotDescriptor {
     TSlotDescriptor result = new TSlotDescriptor(
         id_.asInt(), parent_.getId().asInt(), type_.toThrift(),
         materializedPath, byteOffset_, nullIndicatorByte_, nullIndicatorBit_,
-        slotIdx_);
+        slotIdx_, getVirtualColumnType());
     if (itemTupleDesc_ != null) {
       // Check for recursive or otherwise invalid item tuple descriptors. Since we assign
       // tuple ids globally in increasing order, the id of an item tuple descriptor must
diff --git a/fe/src/main/java/org/apache/impala/catalog/Column.java b/fe/src/main/java/org/apache/impala/catalog/Column.java
index 031435733..5f82acc74 100644
--- a/fe/src/main/java/org/apache/impala/catalog/Column.java
+++ b/fe/src/main/java/org/apache/impala/catalog/Column.java
@@ -67,6 +67,7 @@ public class Column {
   public int getPosition() { return position_; }
   public void setPosition(int position) { this.position_ = position; }
   public ColumnStats getStats() { return stats_; }
+  public boolean isVirtual() { return false; }
 
   public boolean updateStats(ColumnStatisticsData statsData) {
     boolean statsDataCompatibleWithColType = stats_.update(type_, statsData);
diff --git a/fe/src/main/java/org/apache/impala/catalog/FeTable.java b/fe/src/main/java/org/apache/impala/catalog/FeTable.java
index fe5f9f1ff..b09661563 100644
--- a/fe/src/main/java/org/apache/impala/catalog/FeTable.java
+++ b/fe/src/main/java/org/apache/impala/catalog/FeTable.java
@@ -16,6 +16,7 @@
 // under the License.
 package org.apache.impala.catalog;
 
+import java.util.Collections;
 import java.util.Comparator;
 import java.util.List;
 import java.util.Set;
@@ -79,6 +80,13 @@ public interface FeTable {
    */
   List<Column> getColumns();
 
+  /**
+   * @return the virtual columns of this table
+   */
+  default List<VirtualColumn> getVirtualColumns() {
+    return Collections.emptyList();
+  }
+
   /**
    * @return an unmodifiable list of all columns, but with partition columns at the end of
    * the list rather than the beginning. This is equivalent to the order in
diff --git a/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java b/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
index 5867b3262..e880ba935 100644
--- a/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
+++ b/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
@@ -629,6 +629,10 @@ public class HdfsTable extends Table implements FeFsTable {
     addColumnsFromFieldSchemas(fieldSchemas);
   }
 
+  private void addVirtualColumns() {
+    addVirtualColumn(VirtualColumn.INPUT_FILE_NAME);
+  }
+
   /**
    * Clear the partitions of an HdfsTable and the associated metadata.
    * Declared as protected to allow third party extension visibility.
@@ -1785,6 +1789,7 @@ public class HdfsTable extends Table implements FeFsTable {
         clearColumns();
         addColumnsFromFieldSchemas(msTbl.getPartitionKeys());
         addColumnsFromFieldSchemas(nonPartFieldSchemas_);
+        addVirtualColumns();
         loadAllColumnStats(client);
       }
     }
@@ -1816,6 +1821,7 @@ public class HdfsTable extends Table implements FeFsTable {
     } else {
       addColumnsFromFieldSchemas(nonPartFieldSchemas_);
     }
+    addVirtualColumns();
     isSchemaLoaded_ = true;
   }
 
diff --git a/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java b/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
index 8f3cfbf2d..523dcbac8 100644
--- a/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
+++ b/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
@@ -377,6 +377,7 @@ public class IcebergTable extends Table implements FeIcebergTable {
    */
   public void loadSchemaFromIceberg() throws TableLoadingException {
     loadSchema();
+    addVirtualColumns();
     partitionSpecs_ = Utils.loadPartitionSpecByIceberg(this);
     defaultPartitionSpecId_ = icebergApiTable_.spec().specId();
   }
@@ -404,6 +405,10 @@ public class IcebergTable extends Table implements FeIcebergTable {
             iCol.getFieldId()));
   }
 
+  private void addVirtualColumns() {
+    addVirtualColumn(VirtualColumn.INPUT_FILE_NAME);
+  }
+
   @Override
   protected void loadFromThrift(TTable thriftTable) throws TableLoadingException {
     super.loadFromThrift(thriftTable);
diff --git a/fe/src/main/java/org/apache/impala/catalog/StructField.java b/fe/src/main/java/org/apache/impala/catalog/StructField.java
index 7f4a8a375..92af9a3d2 100644
--- a/fe/src/main/java/org/apache/impala/catalog/StructField.java
+++ b/fe/src/main/java/org/apache/impala/catalog/StructField.java
@@ -32,13 +32,20 @@ public class StructField {
   protected final Type type_;
   protected final String comment_;
   protected int position_;  // in struct
+  // True, if the field shouldn't be included in star expansion.
+  protected boolean isHidden_ = false;
 
-  public StructField(String name, Type type, String comment) {
+  public StructField(String name, Type type, String comment, boolean isHidden) {
     // Impala expects field names to be in lower case, but type strings stored in the HMS
     // are not guaranteed to be lower case.
     name_ = name.toLowerCase();
     type_ = type;
     comment_ = comment;
+    isHidden_ = isHidden;
+  }
+
+  public StructField(String name, Type type, String comment) {
+    this(name, type, comment, false);
   }
 
   public StructField(String name, Type type) {
@@ -50,6 +57,7 @@ public class StructField {
   public Type getType() { return type_; }
   public int getPosition() { return position_; }
   public void setPosition(int position) { position_ = position; }
+  public boolean isHidden() { return isHidden_; }
 
   public String toSql(int depth) {
     String typeSql = (depth < Type.MAX_NESTING_DEPTH) ? type_.toSql(depth) : "...";
diff --git a/fe/src/main/java/org/apache/impala/catalog/Table.java b/fe/src/main/java/org/apache/impala/catalog/Table.java
index e5d3955a5..30f276f1c 100644
--- a/fe/src/main/java/org/apache/impala/catalog/Table.java
+++ b/fe/src/main/java/org/apache/impala/catalog/Table.java
@@ -117,6 +117,9 @@ public abstract class Table extends CatalogObjectImpl implements FeTable {
   // the clustering columns.
   protected final ArrayList<Column> colsByPos_ = new ArrayList<>();
 
+  // Virtual columns of this table.
+  protected final ArrayList<VirtualColumn> virtualCols_ = new ArrayList<>();
+
   // map from lowercase column name to Column object.
   protected final Map<String, Column> colsByName_ = new HashMap<>();
 
@@ -420,6 +423,11 @@ public abstract class Table extends CatalogObjectImpl implements FeTable {
     colsByPos_.clear();
     colsByName_.clear();
     ((StructType) type_.getItemType()).clearFields();
+    virtualCols_.clear();
+  }
+
+  protected void addVirtualColumn(VirtualColumn col) {
+    virtualCols_.add(col);
   }
 
   // Returns a list of all column names for this table which we expect to have column
@@ -540,6 +548,11 @@ public abstract class Table extends CatalogObjectImpl implements FeTable {
         colsByName_.put(col.getName().toLowerCase(), col);
         ((StructType) type_.getItemType()).addField(getStructFieldFromColumn(col));
       }
+      virtualCols_.clear();
+      virtualCols_.ensureCapacity(thriftTable.getVirtual_columns().size());
+      for (TColumn tvCol : thriftTable.getVirtual_columns()) {
+        virtualCols_.add(VirtualColumn.fromThrift(tvCol));
+      }
     } catch (ImpalaRuntimeException e) {
       throw new TableLoadingException(String.format("Error loading schema for " +
           "table '%s'", getName()), e);
@@ -616,6 +629,10 @@ public abstract class Table extends CatalogObjectImpl implements FeTable {
         table.addToColumns(colDesc);
       }
     }
+    table.setVirtual_columns(new ArrayList<>());
+    for (VirtualColumn vCol : getVirtualColumns()) {
+      table.addToVirtual_columns(vCol.toThrift());
+    }
 
     org.apache.hadoop.hive.metastore.api.Table msTable = getMetaStoreTable();
     // IMPALA-10243: We should get our own copy of the metastore table, otherwise other
@@ -769,6 +786,9 @@ public abstract class Table extends CatalogObjectImpl implements FeTable {
   @Override // FeTable
   public List<Column> getColumns() { return colsByPos_; }
 
+  @Override // FeTable
+  public List<VirtualColumn> getVirtualColumns() { return virtualCols_; }
+
   @Override // FeTable
   public SqlConstraints getSqlConstraints()  { return sqlConstraints_; }
 
diff --git a/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java b/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java
new file mode 100644
index 000000000..7a3a8314a
--- /dev/null
+++ b/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java
@@ -0,0 +1,68 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.impala.catalog;
+
+import org.apache.impala.thrift.TColumn;
+import org.apache.impala.thrift.TVirtualColumnType;
+
+import com.google.common.base.Preconditions;
+
+/**
+ * Virtual columns are columns that are not stored by the table, but they can reveal some
+ * internal metadata, e.g. input file name of the rows.
+ * They can be specific to table types. E.g. INPUT__FILE__NAME is specific to filesystem
+ * based tables.
+ * Virtual columns are hidden in the sense that they are not included in
+ * SELECT * expansions.
+ */
+public class VirtualColumn extends Column {
+  private final TVirtualColumnType virtualColType_;
+
+  public static VirtualColumn INPUT_FILE_NAME = new VirtualColumn("INPUT__FILE__NAME",
+      Type.STRING, TVirtualColumnType.INPUT_FILE_NAME);
+
+  public static VirtualColumn getVirtualColumn(TVirtualColumnType virtColType) {
+    if (virtColType == TVirtualColumnType.INPUT_FILE_NAME) {
+      return INPUT_FILE_NAME;
+    }
+    return null;
+  }
+
+  public TVirtualColumnType getVirtualColumnType() { return virtualColType_; }
+
+  private VirtualColumn(String name, Type type, TVirtualColumnType virtualColType) {
+    super(name.toLowerCase(), type, 0);
+    virtualColType_ = virtualColType;
+  }
+
+  @Override
+  public boolean isVirtual() { return true; }
+
+  public static VirtualColumn fromThrift(TColumn columnDesc) {
+    Preconditions.checkState(columnDesc.isSetVirtual_column_type());
+    return new VirtualColumn(columnDesc.getColumnName(),
+        Type.fromThrift(columnDesc.getColumnType()),
+        columnDesc.getVirtual_column_type());
+  }
+
+  public TColumn toThrift() {
+    TColumn colDesc = super.toThrift();
+    colDesc.setVirtual_column_type(virtualColType_);
+    return colDesc;
+  }
+}
diff --git a/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java b/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
index 8bb4529b6..9a90b32fb 100644
--- a/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
+++ b/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
@@ -48,6 +48,7 @@ import org.apache.impala.catalog.HdfsStorageDescriptor;
 import org.apache.impala.catalog.HdfsTable;
 import org.apache.impala.catalog.PrunablePartition;
 import org.apache.impala.catalog.SqlConstraints;
+import org.apache.impala.catalog.VirtualColumn;
 import org.apache.impala.catalog.local.MetaProvider.PartitionMetadata;
 import org.apache.impala.catalog.local.MetaProvider.PartitionRef;
 import org.apache.impala.catalog.local.MetaProvider.TableMetaRef;
@@ -182,6 +183,11 @@ public class LocalFsTable extends LocalTable implements FeFsTable {
 
     avroSchema_ = explicitAvroSchema;
     isMarkedCached_ = (ref != null && ref.isMarkedCached());
+    addVirtualColumns();
+  }
+
+  private void addVirtualColumns() {
+    addVirtualColumn(VirtualColumn.INPUT_FILE_NAME);
   }
 
   private static String loadAvroSchema(Table msTbl) throws AnalysisException {
diff --git a/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java b/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
index 57a6268dc..dd78538bc 100644
--- a/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
+++ b/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
@@ -34,6 +34,7 @@ import org.apache.impala.catalog.FeFsTable;
 import org.apache.impala.catalog.FeIcebergTable;
 import org.apache.impala.catalog.HdfsPartition.FileDescriptor;
 import org.apache.impala.catalog.TableLoadingException;
+import org.apache.impala.catalog.VirtualColumn;
 import org.apache.impala.thrift.TCompressionCodec;
 import org.apache.impala.thrift.THdfsPartition;
 import org.apache.impala.thrift.THdfsTable;
@@ -105,6 +106,7 @@ public class LocalIcebergTable extends LocalTable implements FeIcebergTable {
       ColumnMap cmap, TPartialTableInfo tableInfo, TableParams tableParams,
       org.apache.iceberg.Table icebergApiTable) throws TableLoadingException {
     super(db, msTable, ref, cmap);
+
     Preconditions.checkNotNull(tableInfo);
     localFsTable_ = LocalFsTable.load(db, msTable, ref);
     tableParams_ = tableParams;
@@ -121,6 +123,11 @@ public class LocalIcebergTable extends LocalTable implements FeIcebergTable {
     icebergParquetRowGroupSize_ = Utils.getIcebergParquetRowGroupSize(msTable);
     icebergParquetPlainPageSize_ = Utils.getIcebergParquetPlainPageSize(msTable);
     icebergParquetDictPageSize_ = Utils.getIcebergParquetDictPageSize(msTable);
+    addVirtualColumns();
+  }
+
+  private void addVirtualColumns() {
+    addVirtualColumn(VirtualColumn.INPUT_FILE_NAME);
   }
 
   static void validateColumns(List<Column> impalaCols, List<FieldSchema> hmsCols) {
diff --git a/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java b/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
index f056c2be8..60a9979b0 100644
--- a/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
+++ b/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
@@ -45,6 +45,7 @@ import org.apache.impala.catalog.SqlConstraints;
 import org.apache.impala.catalog.StructField;
 import org.apache.impala.catalog.StructType;
 import org.apache.impala.catalog.TableLoadingException;
+import org.apache.impala.catalog.VirtualColumn;
 import org.apache.impala.catalog.local.MetaProvider.TableMetaRef;
 import org.apache.impala.common.Pair;
 import org.apache.impala.thrift.TCatalogObjectType;
@@ -77,6 +78,9 @@ abstract class LocalTable implements FeTable {
 
   private final TTableStats tableStats_;
 
+  // Virtual columns of this table.
+  protected final ArrayList<VirtualColumn> virtualCols_ = new ArrayList<>();
+
   /**
    * Table reference as provided by the initial call to the metadata provider.
    * This must be passed back to any further calls to the metadata provider
@@ -168,6 +172,9 @@ abstract class LocalTable implements FeTable {
     this.tableStats_ = null;
   }
 
+  protected void addVirtualColumn(VirtualColumn col) {
+    virtualCols_.add(col);
+  }
 
   @Override
   public boolean isLoaded() {
@@ -243,6 +250,9 @@ abstract class LocalTable implements FeTable {
     return cols_ == null ? Collections.emptyList() : cols_.getNonClusteringColumns();
   }
 
+  @Override
+  public List<VirtualColumn> getVirtualColumns() { return virtualCols_; }
+
   @Override
   public int getNumClusteringCols() {
     return cols_ == null ? 0 : cols_.getNumClusteringCols();
diff --git a/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java b/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
index 832a198cc..7d4b8b002 100644
--- a/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
+++ b/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
@@ -352,6 +352,42 @@ public class AnalyzerTest extends FrontendTestBase {
     }
   }
 
+  @Test
+  public void TestVirtualColumnInputFileName() {
+    // Select virtual columns.
+    AnalyzesOk("select input__file__name from functional.alltypes");
+    AnalyzesOk("select input__file__name, id from functional_parquet.alltypes");
+    AnalyzesOk("select input__file__name, * from functional_orc_def.alltypes");
+    AnalyzesOk("select input__file__name, * from " +
+        "(select input__file__name, * from functional_avro.alltypes) v");
+    AnalyzesOk(
+        "select id, input__file__name from functional_parquet.iceberg_partitioned");
+    AnalyzesOk(
+            "select input__file__name, * from functional_parquet.complextypestbl c, " +
+            "c.int_array");
+    AnalyzesOk(
+        "select c.input__file__name, c.int_array.* " +
+        "from functional_parquet.complextypestbl c, c.int_array");
+
+    // Error cases:
+    AnalysisError(
+        "select id, nested_struct.input__file__name " +
+        "from functional_parquet.complextypestbl",
+        "Could not resolve column/field reference: 'nested_struct.input__file__name'");
+    AnalysisError(
+        "select c.int_array.input__file__name, c.int_array.* " +
+        "from functional_parquet.complextypestbl c, c.int_array",
+        "Could not resolve column/field reference: 'c.int_array.input__file__name'");
+    AnalysisError(
+        "select id, nested_struct.input__file__name " +
+        "from functional_parquet.complextypestbl",
+        "Could not resolve column/field reference: 'nested_struct.input__file__name'");
+    AnalysisError("select input__file__name from functional_kudu.alltypes",
+        "Could not resolve column/field reference: 'input__file__name'");
+    AnalysisError("select input__file__name from functional_hbase.alltypes",
+        "Could not resolve column/field reference: 'input__file__name'");
+  }
+
   @Test
   public void TestCopyTestCase() {
     AnalyzesOk("copy testcase to 'hdfs:///tmp' select * from functional.alltypes");
diff --git a/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test b/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
index c0b4a3853..97c3f8175 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
@@ -432,3 +432,156 @@ compute stats functional.alltypestiny
 ---- CATCH
 AuthorizationException: User '$USER' does not have privileges to execute 'ALTER' on: functional.alltypestiny
 ====
+---- QUERY
+# Select masked INPUT_FILE__NAME plus all cols
+select input__file__name, * from alltypestiny order by id;
+---- RESULTS
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090101.txt',0,NULL,0,0,0,0,0,0,'01/01/09','0aaa',2009-01-01 00:00:00,2009,1
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090101.txt',100,NULL,1,1,1,10,1.100000023841858,10.1,'01/01/09','1aaa',2009-01-01 00:01:00,2009,1
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090201.txt',200,NULL,0,0,0,0,0,0,'02/01/09','0aaa',2009-02-01 00:00:00,2009,2
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090201.txt',300,NULL,1,1,1,10,1.100000023841858,10.1,'02/01/09','1aaa',2009-02-01 00:01:00,2009,2
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090301.txt',400,NULL,0,0,0,0,0,0,'03/01/09','0aaa',2009-03-01 00:00:00,2009,3
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090301.txt',500,NULL,1,1,1,10,1.100000023841858,10.1,'03/01/09','1aaa',2009-03-01 00:01:00,2009,3
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090401.txt',600,NULL,0,0,0,0,0,0,'04/01/09','0aaa',2009-04-01 00:00:00,2009,4
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090401.txt',700,NULL,1,1,1,10,1.100000023841858,10.1,'04/01/09','1aaa',2009-04-01 00:01:00,2009,4
+---- TYPES
+STRING, INT, BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING, STRING, TIMESTAMP, INT, INT
+====
+---- QUERY
+# Select masked INPUT_FILE__NAME plus a few cols
+select input__file__name, id, bool_col from alltypestiny order by id;
+---- RESULTS
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090101.txt',0,NULL
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090101.txt',100,NULL
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090201.txt',200,NULL
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090201.txt',300,NULL
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090301.txt',400,NULL
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090301.txt',500,NULL
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090401.txt',600,NULL
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090401.txt',700,NULL
+---- TYPES
+STRING, INT, BOOLEAN
+====
+---- QUERY
+# Select masked INPUT_FILE__NAME from a VIEW
+select input__file__name, id, bool_col from (
+    select input__file__name, * from alltypestiny) v
+order by id;
+---- RESULTS
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090101.txt',0,NULL
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090101.txt',100,NULL
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090201.txt',200,NULL
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090201.txt',300,NULL
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090301.txt',400,NULL
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090301.txt',500,NULL
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090401.txt',600,NULL
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090401.txt',700,NULL
+---- TYPES
+STRING, INT, BOOLEAN
+====
+---- QUERY
+# Select masked INPUT_FILE__NAME and * from a VIEW.
+# This doubles input__file__name in the output, as the outer '*' includes v.input__file__name.
+select input__file__name, * from (select input__file__name, * from alltypestiny) v order by id;
+---- RESULTS
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090101.txt',regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090101.txt',0,NULL,0,0,0,0,0,0,'01/01/09','0aaa',2009-01-01 00:00:00,2009,1
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090101.txt',regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090101.txt',100,NULL,1,1,1,10,1.100000023841858,10.1,'01/01/09','1aaa',2009-01-01 00:01:00,2009,1
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090201.txt',regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090201.txt',200,NULL,0,0,0,0,0,0,'02/01/09','0aaa',2009-02-01 00:00:00,2009,2
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090201.txt',regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090201.txt',300,NULL,1,1,1,10,1.100000023841858,10.1,'02/01/09','1aaa',2009-02-01 00:01:00,2009,2
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090301.txt',regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090301.txt',400,NULL,0,0,0,0,0,0,'03/01/09','0aaa',2009-03-01 00:00:00,2009,3
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090301.txt',regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090301.txt',500,NULL,1,1,1,10,1.100000023841858,10.1,'03/01/09','1aaa',2009-03-01 00:01:00,2009,3
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090401.txt',regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090401.txt',600,NULL,0,0,0,0,0,0,'04/01/09','0aaa',2009-04-01 00:00:00,2009,4
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090401.txt',regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090401.txt',700,NULL,1,1,1,10,1.100000023841858,10.1,'04/01/09','1aaa',2009-04-01 00:01:00,2009,4
+---- TYPES
+STRING, STRING, INT, BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING, STRING, TIMESTAMP, INT, INT
+====
+---- QUERY
+# Select unmasked INPUT__FILE__NAME and masked table cols
+select input__file__name, id, string_col from alltypes
+order by id
+limit 10;
+---- RESULTS
+'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt',0,'0ttt'
+'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt',100,'1ttt'
+'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt',200,'2ttt'
+'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt',300,'3ttt'
+'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt',400,'4ttt'
+'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt',500,'5ttt'
+'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt',600,'6ttt'
+'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt',700,'7ttt'
+'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt',800,'8ttt'
+'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt',900,'9ttt'
+---- TYPES
+STRING, INT, STRING
+====
+---- QUERY
+# Select unmasked INPUT__FILE__NAME and all table cols
+select *, input__file__name from alltypes
+order by id
+limit 10;
+---- RESULTS
+0,true,0,0,0,0,0,0,'01/01/09','0ttt',2009-01-01 00:00:00,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+100,false,1,1,1,10,1.100000023841858,10.1,'01/01/09','1ttt',2009-01-01 00:01:00,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+200,true,2,2,2,20,2.200000047683716,20.2,'01/01/09','2ttt',2009-01-01 00:02:00.100000000,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+300,false,3,3,3,30,3.299999952316284,30.3,'01/01/09','3ttt',2009-01-01 00:03:00.300000000,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+400,true,4,4,4,40,4.400000095367432,40.4,'01/01/09','4ttt',2009-01-01 00:04:00.600000000,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+500,false,5,5,5,50,5.5,50.5,'01/01/09','5ttt',2009-01-01 00:05:00.100000000,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+600,true,6,6,6,60,6.599999904632568,60.59999999999999,'01/01/09','6ttt',2009-01-01 00:06:00.150000000,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+700,false,7,7,7,70,7.699999809265137,70.7,'01/01/09','7ttt',2009-01-01 00:07:00.210000000,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+800,true,8,8,8,80,8.800000190734863,80.8,'01/01/09','8ttt',2009-01-01 00:08:00.280000000,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+900,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/01/09','9ttt',2009-01-01 00:09:00.360000000,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+---- TYPES
+INT, BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING, STRING, TIMESTAMP, INT, INT, STRING
+====
+---- QUERY
+# Select unmasked INPUT__FILE__NAME and all table cols from a VIEW
+# The outer '*' must include input__file__name in this case.
+select * from (select *, input__file__name from alltypes) v
+order by id
+limit 10;
+---- RESULTS
+0,true,0,0,0,0,0,0,'01/01/09','0ttt',2009-01-01 00:00:00,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+100,false,1,1,1,10,1.100000023841858,10.1,'01/01/09','1ttt',2009-01-01 00:01:00,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+200,true,2,2,2,20,2.200000047683716,20.2,'01/01/09','2ttt',2009-01-01 00:02:00.100000000,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+300,false,3,3,3,30,3.299999952316284,30.3,'01/01/09','3ttt',2009-01-01 00:03:00.300000000,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+400,true,4,4,4,40,4.400000095367432,40.4,'01/01/09','4ttt',2009-01-01 00:04:00.600000000,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+500,false,5,5,5,50,5.5,50.5,'01/01/09','5ttt',2009-01-01 00:05:00.100000000,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+600,true,6,6,6,60,6.599999904632568,60.59999999999999,'01/01/09','6ttt',2009-01-01 00:06:00.150000000,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+700,false,7,7,7,70,7.699999809265137,70.7,'01/01/09','7ttt',2009-01-01 00:07:00.210000000,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+800,true,8,8,8,80,8.800000190734863,80.8,'01/01/09','8ttt',2009-01-01 00:08:00.280000000,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+900,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/01/09','9ttt',2009-01-01 00:09:00.360000000,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+---- TYPES
+INT, BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING, STRING, TIMESTAMP, INT, INT, STRING
+====
+---- QUERY
+# Do a join between masked tables
+select att.input__file__name, aty.input__file__name from alltypestiny att, alltypes aty
+where att.id=aty.id;
+---- RESULTS
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090101.txt','$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090101.txt','$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090201.txt','$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090201.txt','$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090301.txt','$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090301.txt','$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090401.txt','$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090401.txt','$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+---- TYPES
+STRING, STRING
+====
+---- QUERY
+# Do a join between masked tables, and select every with *.
+select att.input__file__name, *, aty.input__file__name from alltypestiny att, alltypes aty
+where att.id=aty.id;
+---- RESULTS
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090101.txt',0,NULL,0,0,0,0,0,0,'01/01/09','0aaa',2009-01-01 00:00:00,2009,1,0,true,0,0,0,0,0,0,'01/01/09','0ttt',2009-01-01 00:00:00,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090101.txt',100,NULL,1,1,1,10,1.100000023841858,10.1,'01/01/09','1aaa',2009-01-01 00:01:00,2009,1,100,false,1,1,1,10,1.100000023841858,10.1,'01/01/09','1ttt',2009-01-01 00:01:00,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090201.txt',200,NULL,0,0,0,0,0,0,'02/01/09','0aaa',2009-02-01 00:00:00,2009,2,200,true,2,2,2,20,2.200000047683716,20.2,'01/01/09','2ttt',2009-01-01 00:02:00.100000000,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090201.txt',300,NULL,1,1,1,10,1.100000023841858,10.1,'02/01/09','1aaa',2009-02-01 00:01:00,2009,2,300,false,3,3,3,30,3.299999952316284,30.3,'01/01/09','3ttt',2009-01-01 00:03:00.300000000,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090301.txt',400,NULL,0,0,0,0,0,0,'03/01/09','0aaa',2009-03-01 00:00:00,2009,3,400,true,4,4,4,40,4.400000095367432,40.4,'01/01/09','4ttt',2009-01-01 00:04:00.600000000,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090301.txt',500,NULL,1,1,1,10,1.100000023841858,10.1,'03/01/09','1aaa',2009-03-01 00:01:00,2009,3,500,false,5,5,5,50,5.5,50.5,'01/01/09','5ttt',2009-01-01 00:05:00.100000000,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090401.txt',600,NULL,0,0,0,0,0,0,'04/01/09','0aaa',2009-04-01 00:00:00,2009,4,600,true,6,6,6,60,6.599999904632568,60.59999999999999,'01/01/09','6ttt',2009-01-01 00:06:00.150000000,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+regex:'.*/xxxx-xxxxxxxxx/xxxxxxxxxxxx/xxxx=xxxx/xxxxx=x/090401.txt',700,NULL,1,1,1,10,1.100000023841858,10.1,'04/01/09','1aaa',2009-04-01 00:01:00,2009,4,700,false,7,7,7,70,7.699999809265137,70.7,'01/01/09','7ttt',2009-01-01 00:07:00.210000000,2009,1,'$NAMENODE/test-warehouse/alltypes/year=2009/month=1/090101.txt'
+---- TYPES
+STRING, INT, BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING, STRING, TIMESTAMP, INT, INT, INT, BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING, STRING, TIMESTAMP, INT, INT, STRING
+====
diff --git a/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-complextypes.test b/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-complextypes.test
new file mode 100644
index 000000000..e66eb5ae8
--- /dev/null
+++ b/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-complextypes.test
@@ -0,0 +1,72 @@
+====
+---- QUERY
+select input__file__name, id from complextypestbl
+order by id;
+---- RESULTS
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',1
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',2
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',3
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',4
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',5
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',6
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',7
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nonnullable.parq|bucket_.*)',8
+---- TYPES
+STRING, BIGINT
+====
+---- QUERY
+select input__file__name, item from complextypestbl c, c.int_array
+order by item;
+---- RESULTS
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nonnullable.parq|bucket_.*)',-1
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',1
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',1
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',2
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',2
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',3
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',3
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',NULL
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',NULL
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',NULL
+---- TYPES
+STRING, INT
+====
+---- QUERY
+select c.input__file__name, item from complextypestbl c, c.int_array
+order by item;
+---- RESULTS
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nonnullable.parq|bucket_.*)',-1
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',1
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',1
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',2
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',2
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',3
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',3
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',NULL
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',NULL
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',NULL
+---- TYPES
+STRING, INT
+====
+---- QUERY
+select input__file__name, item from complextypestbl c, c.int_array_array.item
+order by item;
+---- RESULTS
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nonnullable.parq|bucket_.*)',-2
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nonnullable.parq|bucket_.*)',-1
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',1
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',1
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',2
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',2
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',3
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',3
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',4
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',4
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',5
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',6
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',NULL
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',NULL
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/(base_\d*/)?(nullable.parq|bucket_.*)',NULL
+---- TYPES
+STRING, INT
+====
diff --git a/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-in-table.test b/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-in-table.test
new file mode 100644
index 000000000..2eb25b1c9
--- /dev/null
+++ b/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-in-table.test
@@ -0,0 +1,25 @@
+====
+---- QUERY
+# User column INPUT__FILE__NAME hides virtual column INPUT__FILE__NAME
+create table i_n_f (input__file__name string);
+insert into table i_n_f values ('impala');
+select input__file__name from i_n_f;
+---- RESULTS
+'impala'
+---- TYPES
+STRING
+====
+---- QUERY
+select * from i_n_f;
+---- RESULTS
+'impala'
+---- TYPES
+STRING
+====
+---- QUERY
+select input__file__name, * from i_n_f;
+---- RESULTS
+'impala','impala'
+---- TYPES
+STRING, STRING
+====
diff --git a/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test b/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test
new file mode 100644
index 000000000..e97445814
--- /dev/null
+++ b/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test
@@ -0,0 +1,183 @@
+====
+---- QUERY
+# Select INPUT_FILE__NAME plus all cols
+select input__file__name, * from alltypestiny;
+---- RESULTS
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=1(/base_\d*/|/).*',0,true,0,0,0,0,0,0,'01/01/09','0',2009-01-01 00:00:00,2009,1
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=1(/base_\d*/|/).*',1,false,1,1,1,10,1.100000023841858,10.1,'01/01/09','1',2009-01-01 00:01:00,2009,1
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=2(/base_\d*/|/).*',2,true,0,0,0,0,0,0,'02/01/09','0',2009-02-01 00:00:00,2009,2
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=2(/base_\d*/|/).*',3,false,1,1,1,10,1.100000023841858,10.1,'02/01/09','1',2009-02-01 00:01:00,2009,2
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=3(/base_\d*/|/).*',4,true,0,0,0,0,0,0,'03/01/09','0',2009-03-01 00:00:00,2009,3
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=3(/base_\d*/|/).*',5,false,1,1,1,10,1.100000023841858,10.1,'03/01/09','1',2009-03-01 00:01:00,2009,3
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=4(/base_\d*/|/).*',6,true,0,0,0,0,0,0,'04/01/09','0',2009-04-01 00:00:00,2009,4
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=4(/base_\d*/|/).*',7,false,1,1,1,10,1.100000023841858,10.1,'04/01/09','1',2009-04-01 00:01:00,2009,4
+---- TYPES
+STRING, INT, BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING, STRING, TIMESTAMP, INT, INT
+====
+---- QUERY
+# Select INPUT_FILE__NAME plus non-clustering col
+select input__file__name, id from alltypestiny;
+---- RESULTS
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=1(/base_\d*/|/).*',0
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=1(/base_\d*/|/).*',1
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=2(/base_\d*/|/).*',2
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=2(/base_\d*/|/).*',3
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=3(/base_\d*/|/).*',4
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=3(/base_\d*/|/).*',5
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=4(/base_\d*/|/).*',6
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=4(/base_\d*/|/).*',7
+---- TYPES
+STRING, INT
+====
+---- QUERY
+# Select INPUT_FILE__NAME plus clustering col
+select input__file__name, month from alltypestiny;
+---- RESULTS
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=1(/base_\d*/|/).*',1
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=1(/base_\d*/|/).*',1
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=2(/base_\d*/|/).*',2
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=2(/base_\d*/|/).*',2
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=3(/base_\d*/|/).*',3
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=3(/base_\d*/|/).*',3
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=4(/base_\d*/|/).*',4
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=4(/base_\d*/|/).*',4
+---- TYPES
+STRING, INT
+====
+---- QUERY
+# Select INPUT_FILE__NAME only
+select input__file__name from alltypestiny;
+---- RESULTS
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=1(/base_\d*/|/).*'
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=1(/base_\d*/|/).*'
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=2(/base_\d*/|/).*'
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=2(/base_\d*/|/).*'
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=3(/base_\d*/|/).*'
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=3(/base_\d*/|/).*'
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=4(/base_\d*/|/).*'
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=4(/base_\d*/|/).*'
+---- TYPES
+STRING
+====
+---- QUERY
+# Select INPUT_FILE__NAME multiple times
+select input__file__name, input__file__name from alltypestiny;
+---- RESULTS
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=1(/base_\d*/|/).*',regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=1(/base_\d*/|/).*'
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=1(/base_\d*/|/).*',regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=1(/base_\d*/|/).*'
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=2(/base_\d*/|/).*',regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=2(/base_\d*/|/).*'
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=2(/base_\d*/|/).*',regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=2(/base_\d*/|/).*'
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=3(/base_\d*/|/).*',regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=3(/base_\d*/|/).*'
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=3(/base_\d*/|/).*',regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=3(/base_\d*/|/).*'
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=4(/base_\d*/|/).*',regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=4(/base_\d*/|/).*'
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=4(/base_\d*/|/).*',regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=4(/base_\d*/|/).*'
+---- TYPES
+STRING, STRING
+====
+---- QUERY
+# Select INPUT_FILE__NAME from two tables
+select att.input__file__name, att.id, ats.input__file__name from alltypestiny att join alltypessmall ats on att.id=ats.id;
+---- RESULTS
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*alltypestiny[^/]*/year=2009/month=1(/base_\d*/|/).*',0,regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*alltypessmall[^/]*/year=2009/month=1(/base_\d*/|/).*'
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*alltypestiny[^/]*/year=2009/month=1(/base_\d*/|/).*',1,regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*alltypessmall[^/]*/year=2009/month=1(/base_\d*/|/).*'
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*alltypestiny[^/]*/year=2009/month=2(/base_\d*/|/).*',2,regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*alltypessmall[^/]*/year=2009/month=1(/base_\d*/|/).*'
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*alltypestiny[^/]*/year=2009/month=2(/base_\d*/|/).*',3,regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*alltypessmall[^/]*/year=2009/month=1(/base_\d*/|/).*'
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*alltypestiny[^/]*/year=2009/month=3(/base_\d*/|/).*',4,regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*alltypessmall[^/]*/year=2009/month=1(/base_\d*/|/).*'
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*alltypestiny[^/]*/year=2009/month=3(/base_\d*/|/).*',5,regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*alltypessmall[^/]*/year=2009/month=1(/base_\d*/|/).*'
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*alltypestiny[^/]*/year=2009/month=4(/base_\d*/|/).*',6,regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*alltypessmall[^/]*/year=2009/month=1(/base_\d*/|/).*'
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*alltypestiny[^/]*/year=2009/month=4(/base_\d*/|/).*',7,regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*alltypessmall[^/]*/year=2009/month=1(/base_\d*/|/).*'
+---- TYPES
+STRING, INT, STRING
+====
+---- QUERY
+# Group by INPUT__FILE__NAME
+select input__file__name, count(*) from alltypes
+group by input__file__name
+order by input__file__name;
+---- RESULTS
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=1(/base_\d*/|/).*',310
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=10(/base_\d*/|/).*',310
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=11(/base_\d*/|/).*',300
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=12(/base_\d*/|/).*',310
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=2(/base_\d*/|/).*',280
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=3(/base_\d*/|/).*',310
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=4(/base_\d*/|/).*',300
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=5(/base_\d*/|/).*',310
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=6(/base_\d*/|/).*',300
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=7(/base_\d*/|/).*',310
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=8(/base_\d*/|/).*',310
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=9(/base_\d*/|/).*',300
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2010/month=1(/base_\d*/|/).*',310
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2010/month=10(/base_\d*/|/).*',310
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2010/month=11(/base_\d*/|/).*',300
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2010/month=12(/base_\d*/|/).*',310
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2010/month=2(/base_\d*/|/).*',280
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2010/month=3(/base_\d*/|/).*',310
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2010/month=4(/base_\d*/|/).*',300
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2010/month=5(/base_\d*/|/).*',310
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2010/month=6(/base_\d*/|/).*',300
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2010/month=7(/base_\d*/|/).*',310
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2010/month=8(/base_\d*/|/).*',310
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2010/month=9(/base_\d*/|/).*',300
+---- TYPES
+STRING, BIGINT
+====
+---- QUERY
+# Filter results by LIKE
+select input__file__name, count(*) from alltypes
+where input__file__name like '%year=2009/month=1%'
+group by input__file__name
+order by input__file__name;
+---- RESULTS
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=1(/base_\d*/|/).*',310
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=10(/base_\d*/|/).*',310
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=11(/base_\d*/|/).*',300
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2009/month=12(/base_\d*/|/).*',310
+---- TYPES
+STRING, BIGINT
+====
+---- QUERY
+# REGEXP_LIKE
+select input__file__name, count(*) from alltypes
+where regexp_like(input__file__name, 'year=2010/month=2')
+group by input__file__name
+order by input__file__name;
+---- RESULTS
+regex:'$NAMENODE/test-warehouse(/managed/functional[^/]*)?/[^/]*/year=2010/month=2(/base_\d*/|/).*',280
+---- TYPES
+STRING, BIGINT
+====
+---- QUERY
+# REGEXP_EXTRACT
+select regexp_extract(input__file__name, 'year=\\d+/month=\\d+', 0)
+from alltypestiny;
+---- RESULTS
+'year=2009/month=1'
+'year=2009/month=1'
+'year=2009/month=2'
+'year=2009/month=2'
+'year=2009/month=3'
+'year=2009/month=3'
+'year=2009/month=4'
+'year=2009/month=4'
+---- TYPES
+STRING
+====
+---- QUERY
+# REGEXP_REPLACE
+select regexp_replace(regexp_extract(input__file__name, 'year=\\d+/month=\\d+', 0),
+                      'year=(\\d+)/month=(\\d+)',
+                      '\\2/\\1')
+from alltypestiny;
+---- RESULTS
+'1/2009'
+'1/2009'
+'2/2009'
+'2/2009'
+'3/2009'
+'3/2009'
+'4/2009'
+'4/2009'
+---- TYPES
+STRING
+====
diff --git a/tests/authorization/test_ranger.py b/tests/authorization/test_ranger.py
index d0caa7b17..c538a6a16 100644
--- a/tests/authorization/test_ranger.py
+++ b/tests/authorization/test_ranger.py
@@ -1043,6 +1043,12 @@ class TestRanger(CustomClusterTestSuite):
         unique_name + str(policy_cnt), user, "functional", "alltypesagg", "bigint_col",
         "CUSTOM", "(select count(*) from functional.alltypestiny)")
       policy_cnt += 1
+      # Add column masking policy for virtual column INPUT__FILE__NAME
+      TestRanger._add_column_masking_policy(
+        unique_name + str(policy_cnt), user, "functional", "alltypestiny",
+        "input__file__name",
+        "CUSTOM", "mask_show_last_n({col}, 10, 'x', 'x', 'x', -1, '1')")
+      policy_cnt += 1
       self.execute_query_expect_success(admin_client, "refresh authorization",
                                         user=ADMIN)
       self.run_test_case("QueryTest/ranger_column_masking", vector,
diff --git a/tests/query_test/test_scanners.py b/tests/query_test/test_scanners.py
index 112b53dc0..a57f38e55 100644
--- a/tests/query_test/test_scanners.py
+++ b/tests/query_test/test_scanners.py
@@ -123,6 +123,18 @@ class TestScannersAllTableFormats(ImpalaTestSuite):
     else:
       self.run_test_case('QueryTest/string-escaping', vector)
 
+  def test_virtual_column_input_file_name(self, vector, unique_database):
+    file_format = vector.get_value('table_format').file_format
+    if file_format in ['hbase', 'kudu']:
+      # Virtual column INPUT__FILE__NAME is only supported for filesystem-based tables.
+      pytest.skip()
+    self.run_test_case('QueryTest/virtual-column-input-file-name', vector)
+    if file_format in ['orc', 'parquet']:
+      self.run_test_case('QueryTest/virtual-column-input-file-name-complextypes', vector)
+    if file_format == 'text':
+      self.run_test_case('QueryTest/virtual-column-input-file-name-in-table', vector,
+          use_db=unique_database)
+
 # Test all the scanners with a simple limit clause. The limit clause triggers
 # cancellation in the scanner code paths.
 class TestScannersAllTableFormatsWithLimit(ImpalaTestSuite):

[impala] 01/03: IMPALA-11338: Update IMPALA_VERSION to 4.2.0-SNAPSHOT

Posted by st...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 97d3b25be3d32c5b3b10e5785cfb32351c4065b0
Author: Tamas Mate <tm...@apache.org>
AuthorDate: Tue Jun 7 10:36:39 2022 +0200

    IMPALA-11338: Update IMPALA_VERSION to 4.2.0-SNAPSHOT
    
    As 4.1.0 has been released this commit updates the master to 4.2.0.
    This step needs to happen on each release, related changes are:
    IMPALA-10198, IMPALA-10057
    
    Testing:
     - Ran a build
    
    Change-Id: Idab47eedb27ca4be42300dfc2eeb81eefe407b70
    Reviewed-on: http://gerrit.cloudera.org:8080/18595
    Reviewed-by: Zoltan Borok-Nagy <bo...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 bin/impala-config.sh                 | 2 +-
 fe/pom.xml                           | 2 +-
 java/TableFlattener/pom.xml          | 2 +-
 java/datagenerator/pom.xml           | 2 +-
 java/executor-deps/pom.xml           | 2 +-
 java/ext-data-source/api/pom.xml     | 2 +-
 java/ext-data-source/pom.xml         | 2 +-
 java/ext-data-source/sample/pom.xml  | 2 +-
 java/ext-data-source/test/pom.xml    | 2 +-
 java/pom.xml                         | 2 +-
 java/query-event-hook-api/pom.xml    | 2 +-
 java/shaded-deps/hive-exec/pom.xml   | 2 +-
 java/shaded-deps/s3a-aws-sdk/pom.xml | 2 +-
 java/test-hive-udfs/pom.xml          | 2 +-
 java/yarn-extras/pom.xml             | 2 +-
 15 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/bin/impala-config.sh b/bin/impala-config.sh
index 8c48ffc55..07bb3856e 100755
--- a/bin/impala-config.sh
+++ b/bin/impala-config.sh
@@ -70,7 +70,7 @@ fi
 # WARNING: If changing this value, also run these commands:
 # cd ${IMPALA_HOME}/java
 # mvn versions:set -DnewVersion=YOUR_NEW_VERSION
-export IMPALA_VERSION=4.1.0-SNAPSHOT
+export IMPALA_VERSION=4.2.0-SNAPSHOT
 
 # The unique build id of the toolchain to use if bootstrapping. This is generated by the
 # native-toolchain build when publishing its build artifacts. This should be changed when
diff --git a/fe/pom.xml b/fe/pom.xml
index e3a28cfc8..3409151fa 100644
--- a/fe/pom.xml
+++ b/fe/pom.xml
@@ -23,7 +23,7 @@ under the License.
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-parent</artifactId>
-    <version>4.1.0-SNAPSHOT</version>
+    <version>4.2.0-SNAPSHOT</version>
     <relativePath>../java/pom.xml</relativePath>
   </parent>
   <modelVersion>4.0.0</modelVersion>
diff --git a/java/TableFlattener/pom.xml b/java/TableFlattener/pom.xml
index 789fd09c8..439ce6bef 100644
--- a/java/TableFlattener/pom.xml
+++ b/java/TableFlattener/pom.xml
@@ -22,7 +22,7 @@
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-parent</artifactId>
-    <version>4.1.0-SNAPSHOT</version>
+    <version>4.2.0-SNAPSHOT</version>
   </parent>
   <modelVersion>4.0.0</modelVersion>
   <artifactId>nested-table-flattener</artifactId>
diff --git a/java/datagenerator/pom.xml b/java/datagenerator/pom.xml
index 89e64ca1b..9b58b5051 100644
--- a/java/datagenerator/pom.xml
+++ b/java/datagenerator/pom.xml
@@ -23,7 +23,7 @@ under the License.
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-parent</artifactId>
-    <version>4.1.0-SNAPSHOT</version>
+    <version>4.2.0-SNAPSHOT</version>
   </parent>
   <modelVersion>4.0.0</modelVersion>
 
diff --git a/java/executor-deps/pom.xml b/java/executor-deps/pom.xml
index 05d4aeb1c..17ce82097 100644
--- a/java/executor-deps/pom.xml
+++ b/java/executor-deps/pom.xml
@@ -34,7 +34,7 @@ under the License.
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-parent</artifactId>
-    <version>4.1.0-SNAPSHOT</version>
+    <version>4.2.0-SNAPSHOT</version>
   </parent>
   <modelVersion>4.0.0</modelVersion>
   <groupId>org.apache.impala</groupId>
diff --git a/java/ext-data-source/api/pom.xml b/java/ext-data-source/api/pom.xml
index 11ef1ba71..635dae872 100644
--- a/java/ext-data-source/api/pom.xml
+++ b/java/ext-data-source/api/pom.xml
@@ -23,7 +23,7 @@
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-data-source</artifactId>
-    <version>4.1.0-SNAPSHOT</version>
+    <version>4.2.0-SNAPSHOT</version>
   </parent>
   <artifactId>impala-data-source-api</artifactId>
   <name>Apache Impala External Data Source API</name>
diff --git a/java/ext-data-source/pom.xml b/java/ext-data-source/pom.xml
index 89a514e29..1d95c06bd 100644
--- a/java/ext-data-source/pom.xml
+++ b/java/ext-data-source/pom.xml
@@ -22,7 +22,7 @@
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-parent</artifactId>
-    <version>4.1.0-SNAPSHOT</version>
+    <version>4.2.0-SNAPSHOT</version>
   </parent>
   <modelVersion>4.0.0</modelVersion>
   <artifactId>impala-data-source</artifactId>
diff --git a/java/ext-data-source/sample/pom.xml b/java/ext-data-source/sample/pom.xml
index aa12f0d1f..8e1ec9744 100644
--- a/java/ext-data-source/sample/pom.xml
+++ b/java/ext-data-source/sample/pom.xml
@@ -23,7 +23,7 @@
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-data-source</artifactId>
-    <version>4.1.0-SNAPSHOT</version>
+    <version>4.2.0-SNAPSHOT</version>
   </parent>
   <artifactId>impala-data-source-sample</artifactId>
   <name>Apache Impala External Data Source Sample</name>
diff --git a/java/ext-data-source/test/pom.xml b/java/ext-data-source/test/pom.xml
index fd16e7892..2e1f9fe33 100644
--- a/java/ext-data-source/test/pom.xml
+++ b/java/ext-data-source/test/pom.xml
@@ -23,7 +23,7 @@
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-data-source</artifactId>
-    <version>4.1.0-SNAPSHOT</version>
+    <version>4.2.0-SNAPSHOT</version>
   </parent>
   <artifactId>impala-data-source-test</artifactId>
   <name>Apache Impala External Data Source Test Library</name>
diff --git a/java/pom.xml b/java/pom.xml
index dbb931325..051f5f100 100644
--- a/java/pom.xml
+++ b/java/pom.xml
@@ -21,7 +21,7 @@ under the License.
   <modelVersion>4.0.0</modelVersion>
   <groupId>org.apache.impala</groupId>
   <artifactId>impala-parent</artifactId>
-  <version>4.1.0-SNAPSHOT</version>
+  <version>4.2.0-SNAPSHOT</version>
   <packaging>pom</packaging>
   <name>Apache Impala Parent POM</name>
 
diff --git a/java/query-event-hook-api/pom.xml b/java/query-event-hook-api/pom.xml
index eeae99595..9a41f76d9 100644
--- a/java/query-event-hook-api/pom.xml
+++ b/java/query-event-hook-api/pom.xml
@@ -22,7 +22,7 @@
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-parent</artifactId>
-    <version>4.1.0-SNAPSHOT</version>
+    <version>4.2.0-SNAPSHOT</version>
   </parent>
   <modelVersion>4.0.0</modelVersion>
   <artifactId>query-event-hook-api</artifactId>
diff --git a/java/shaded-deps/hive-exec/pom.xml b/java/shaded-deps/hive-exec/pom.xml
index e0a577f63..417a087ae 100644
--- a/java/shaded-deps/hive-exec/pom.xml
+++ b/java/shaded-deps/hive-exec/pom.xml
@@ -27,7 +27,7 @@ the same dependencies
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-parent</artifactId>
-    <version>4.1.0-SNAPSHOT</version>
+    <version>4.2.0-SNAPSHOT</version>
     <relativePath>../../pom.xml</relativePath>
   </parent>
   <modelVersion>4.0.0</modelVersion>
diff --git a/java/shaded-deps/s3a-aws-sdk/pom.xml b/java/shaded-deps/s3a-aws-sdk/pom.xml
index 9040a4635..13978a242 100644
--- a/java/shaded-deps/s3a-aws-sdk/pom.xml
+++ b/java/shaded-deps/s3a-aws-sdk/pom.xml
@@ -25,7 +25,7 @@ though some of them might not be necessary. The exclusions are sorted alphabetic
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-parent</artifactId>
-    <version>4.1.0-SNAPSHOT</version>
+    <version>4.2.0-SNAPSHOT</version>
     <relativePath>../../pom.xml</relativePath>
   </parent>
   <modelVersion>4.0.0</modelVersion>
diff --git a/java/test-hive-udfs/pom.xml b/java/test-hive-udfs/pom.xml
index cd241bdae..c319acbfa 100644
--- a/java/test-hive-udfs/pom.xml
+++ b/java/test-hive-udfs/pom.xml
@@ -22,7 +22,7 @@ under the License.
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-parent</artifactId>
-    <version>4.1.0-SNAPSHOT</version>
+    <version>4.2.0-SNAPSHOT</version>
   </parent>
   <modelVersion>4.0.0</modelVersion>
 
diff --git a/java/yarn-extras/pom.xml b/java/yarn-extras/pom.xml
index d45fcd3e6..b44049479 100644
--- a/java/yarn-extras/pom.xml
+++ b/java/yarn-extras/pom.xml
@@ -22,7 +22,7 @@
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-parent</artifactId>
-    <version>4.1.0-SNAPSHOT</version>
+    <version>4.2.0-SNAPSHOT</version>
   </parent>
   <modelVersion>4.0.0</modelVersion>
   <artifactId>yarn-extras</artifactId>

[impala] 02/03: IMPALA-5845: Limit the number of non-fatal errors logging to INFO

Posted by st...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 7273cfdfb901b9ef564c2737cf00c7a8abb57f07
Author: Riza Suminto <ri...@cloudera.com>
AuthorDate: Wed May 25 23:51:58 2022 -0700

    IMPALA-5845: Limit the number of non-fatal errors logging to INFO
    
    RuntimeState::LogError() does both error aggregation to the coordinator
    and logging the error to the log file depending on the vlog_level. This
    can flood INFO log if the specified vlog_level is 1 and makes it
    difficult to analyze other more significant log lines. This patch limits
    the number of errors logged to INFO based on max_error_logs_per_instance
    flag (default is 2000). When this number is exceeded, vlog_level=1 will
    be downgraded to vlog_level=2.
    
    To allow easy debugging in the future, this flag will be ignored if the
    user sets query option max_errors < 0, which in that case all errors
    targetting vlog_level 1 will be logged.
    
    This patch also fixes a bug where the error count is not increased for
    non-general error code that is already in 'error_log_' map.
    
    Testing:
    - Add test_logging.py::TestLoggingCore
    
    Change-Id: I924768ec461735c172fbf75d6415033bbdb77f9b
    Reviewed-on: http://gerrit.cloudera.org:8080/18565
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/runtime/query-state.cc        |  6 ----
 be/src/runtime/runtime-state.cc      | 42 +++++++++++++++++++++----
 be/src/runtime/runtime-state.h       | 12 +++++++
 tests/custom_cluster/test_logging.py | 61 ++++++++++++++++++++++++++++++++++++
 4 files changed, 109 insertions(+), 12 deletions(-)

diff --git a/be/src/runtime/query-state.cc b/be/src/runtime/query-state.cc
index 548396732..733a90b51 100644
--- a/be/src/runtime/query-state.cc
+++ b/be/src/runtime/query-state.cc
@@ -118,12 +118,6 @@ QueryState::QueryState(
   }
   TQueryOptions& query_options =
       const_cast<TQueryOptions&>(query_ctx_.client_request.query_options);
-  // max_errors does not indicate how many errors in total have been recorded, but rather
-  // how many are distinct. It is defined as the sum of the number of generic errors and
-  // the number of distinct other errors.
-  if (query_options.max_errors <= 0) {
-    query_options.max_errors = 100;
-  }
   if (query_options.batch_size <= 0) {
     query_options.__set_batch_size(DEFAULT_BATCH_SIZE);
   }
diff --git a/be/src/runtime/runtime-state.cc b/be/src/runtime/runtime-state.cc
index 8a994b35e..d9c83e624 100644
--- a/be/src/runtime/runtime-state.cc
+++ b/be/src/runtime/runtime-state.cc
@@ -59,7 +59,11 @@
 
 using strings::Substitute;
 
-DECLARE_int32(max_errors);
+DEFINE_int32(max_error_logs_per_instance, 2000,
+    "Maximum number of non-fatal error to be logged in log level 1 (INFO). "
+    "Once this number exceeded, further non-fatal error will be logged at log level 2 "
+    "(DEBUG) severity. This flag is ignored if user set negative max_errors query "
+    "option. Default to 2000");
 
 namespace impala {
 
@@ -190,11 +194,37 @@ string RuntimeState::ErrorLog() {
 
 bool RuntimeState::LogError(const ErrorMsg& message, int vlog_level) {
   lock_guard<SpinLock> l(error_log_lock_);
-  // All errors go to the log, unreported_error_count_ is counted independently of the
-  // size of the error_log to account for errors that were already reported to the
-  // coordinator
-  VLOG(vlog_level) << "Error from query " << PrintId(query_id()) << ": " << message.msg();
-  if (ErrorCount(error_log_) < query_options().max_errors) {
+  // All errors go to the log. If the amount of errors logged to vlog level 1 exceed
+  // or equal max_error_logs_per_instance, then that error will be downgraded to vlog
+  // level 2.
+  int user_max_errors = query_options().max_errors;
+  if (vlog_level == 1 && user_max_errors >= 0
+      && vlog_1_errors >= FLAGS_max_error_logs_per_instance) {
+    vlog_level = 2;
+  }
+
+  if (VLOG_IS_ON(vlog_level)) {
+    VLOG(vlog_level) << "Error from query " << PrintId(query_id()) << ": "
+                     << message.msg();
+  }
+
+  if (vlog_level == 1 && user_max_errors >= 0) {
+    vlog_1_errors++;
+    DCHECK_LE(vlog_1_errors, FLAGS_max_error_logs_per_instance);
+    if (vlog_1_errors == FLAGS_max_error_logs_per_instance) {
+      VLOG(vlog_level) << "Query " << PrintId(query_id()) << " printed "
+                       << FLAGS_max_error_logs_per_instance
+                       << " non-fatal error to log level 1 (INFO). Further non-fatal "
+                       << "error will be downgraded to log level 2 (DEBUG).";
+    }
+  }
+
+  TErrorCode::type code = message.error();
+  if (ErrorCount(error_log_) < max_errors()
+      || (code != TErrorCode::GENERAL && error_log_.find(code) != error_log_.end())) {
+    // Appending general error is expensive since it writes the entire message to the
+    // error_log_ map. Meanwhile, appending non-general (specific) error that already
+    // exist in error_log_ is cheap since it only increment count.
     AppendError(&error_log_, message);
     return true;
   }
diff --git a/be/src/runtime/runtime-state.h b/be/src/runtime/runtime-state.h
index 7d2d57bb3..5d00df7ae 100644
--- a/be/src/runtime/runtime-state.h
+++ b/be/src/runtime/runtime-state.h
@@ -166,6 +166,15 @@ class RuntimeState {
     return Status::OK();
   }
 
+  /// Return maximum number of non-fatal error to report to client through coordinator.
+  /// max_errors does not indicate how many errors in total have been recorded, but rather
+  /// how many are distinct. It is defined as the sum of the number of generic errors and
+  /// the number of distinct other errors. Default to 100 if non-positive number is
+  /// specified in max_errors query option.
+  inline int max_errors() const {
+    return query_options().max_errors <= 0 ? 100 : query_options().max_errors;
+  }
+
   /// Log an error that will be sent back to the coordinator based on an instance of the
   /// ErrorMsg class. The runtime state aggregates log messages based on type with one
   /// exception: messages with the GENERAL type are not aggregated but are kept
@@ -318,6 +327,9 @@ class RuntimeState {
   /// Logs error messages.
   ErrorLogMap error_log_;
 
+  /// Track how many error has been printed to VLOG(1).
+  int64_t vlog_1_errors = 0;
+
   /// Global QueryState and original thrift descriptors for this fragment instance.
   QueryState* const query_state_;
   const TPlanFragment* const fragment_;
diff --git a/tests/custom_cluster/test_logging.py b/tests/custom_cluster/test_logging.py
new file mode 100644
index 000000000..34cf2578e
--- /dev/null
+++ b/tests/custom_cluster/test_logging.py
@@ -0,0 +1,61 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import pytest
+
+from tests.common.custom_cluster_test_suite import CustomClusterTestSuite
+
+
+class TestLoggingCore(CustomClusterTestSuite):
+  """Test existence of certain log lines under some scenario."""
+
+  @classmethod
+  def get_workload(cls):
+    return 'functional-query'
+
+  def _test_max_errors(self, max_error_logs_per_instance, max_errors, expect_downgraded):
+    """Test that number of non-fatal error printed to INFO log is limited by
+    max_errors and max_error_logs_per_instance."""
+
+    query = ("select id, bool_col, tinyint_col, smallint_col "
+        "from functional.alltypeserror order by id")
+    client = self.create_impala_client()
+
+    self.execute_query_expect_success(client, query, {'max_errors': max_errors})
+    self.assert_impalad_log_contains("INFO", "Error parsing row",
+        max_error_logs_per_instance if expect_downgraded else 8)
+    self.assert_impalad_log_contains("INFO",
+        "printed {0} non-fatal error to log level 1".format(max_error_logs_per_instance),
+        1 if expect_downgraded else 0)
+
+  @pytest.mark.execute_serially
+  @CustomClusterTestSuite.with_args(cluster_size=1,
+      impalad_args="--max_error_logs_per_instance=2")
+  def test_max_errors(self):
+    self._test_max_errors(2, 4, True)
+
+  @pytest.mark.execute_serially
+  @CustomClusterTestSuite.with_args(cluster_size=1,
+      impalad_args="--max_error_logs_per_instance=3")
+  def test_max_errors_0(self):
+    self._test_max_errors(3, 0, True)
+
+  @pytest.mark.execute_serially
+  @CustomClusterTestSuite.with_args(cluster_size=1,
+      impalad_args="--max_error_logs_per_instance=2")
+  def test_max_errors_no_downgrade(self):
+    self._test_max_errors(2, -1, False)

[impala] branch master updated (13bbff4e4 -> 23d09638d)

[impala] 03/03: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name

[impala] 01/03: IMPALA-11338: Update IMPALA_VERSION to 4.2.0-SNAPSHOT

[impala] 02/03: IMPALA-5845: Limit the number of non-fatal errors logging to INFO

[impala] 03/03: IMPALA-801, IMPALA-8011: Add INPUTFILENAME virtual column for file name