You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org> on 2022/08/02 17:03:19 UTC

[Impala-ASF-CR] IMPALA-11350: Add virtual column FILE POSITION for Parquet tables

Hello Tamas Mate, Gabor Kaszab, lipenglin@sensorsdata.cn, Gergely Fürnstáhl, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18704

to look at the new patch set (#5).

Change subject: IMPALA-11350: Add virtual column FILE__POSITION for Parquet tables
......................................................................

IMPALA-11350: Add virtual column FILE__POSITION for Parquet tables

Virtual column FILE__POSITION returns the ordinal position of the row
in the data file. It will be useful to add support for Iceberg's
position-based delete files

This patch only adds FILE__POSITION to Parquet tables. It works
similarly to the handling of collection position slots. I.e. we
add the responsibility of dealing with the file position slot to
an existing column reader. Because of page-filtering and late
materialization we already tracked the file position in member
'current_row_' during scanning.

Querying the FILE__POSITION in other file formats raises an error.

Testing:
 * added e2e tests

Change-Id: I4ef72c683d0d5ae2898bca36fa87e74b663671f7
---
M be/src/exec/base-sequence-scanner.h
M be/src/exec/file-metadata-utils.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-collection-column-reader.cc
M be/src/exec/parquet/parquet-column-readers.cc
M be/src/exec/parquet/parquet-column-readers.h
M common/thrift/CatalogObjects.thrift
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A testdata/workloads/functional-query/queries/QueryTest/mixing-virtual-columns.test
A testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test
A testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-parquet.test
M tests/query_test/test_scanners.py
30 files changed, 661 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/18704/5
-- 
To view, visit http://gerrit.cloudera.org:8080/18704
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4ef72c683d0d5ae2898bca36fa87e74b663671f7
Gerrit-Change-Number: 18704
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>