You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/03/18 09:02:00 UTC
[jira] [Updated] (ARROW-15658) [C++] Parquet pushdown filtering fails if the filter expression uses numeric field references
[ https://issues.apache.org/jira/browse/ARROW-15658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-15658:
-----------------------------------
Labels: pull-request-available (was: )
> [C++] Parquet pushdown filtering fails if the filter expression uses numeric field references
> ---------------------------------------------------------------------------------------------
>
> Key: ARROW-15658
> URL: https://issues.apache.org/jira/browse/ARROW-15658
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Affects Versions: 7.0.0
> Reporter: Weston Pace
> Assignee: Vibhatha Lakmal Abeykoon
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> We can refer to a field by name (e.g. {{compute::field_ref("foo")}}) or by index (e.g. {{compute::field_ref(0)}}).
> The latter is not supported when doing parquet projection. A test can demonstrating this can be found here: https://github.com/westonpace/arrow/commit/2f92ed0764cf2e1388dac053aeb4e1b923c6872e
> Copied here for posterity (this would go in the dataset fixture mixin):
> {code}
> void TestScanWithFieldPathFilter() {
> auto i32 = field("i32", int32());
> auto i64 = field("i64", int64());
> this->opts_->dataset_schema = schema({i32, i64});
> this->Project({"i64"});
> // This should be the column i32
> this->SetFilter(equal(field_ref(0), literal(0)));
> auto expected_schema = schema({i64});
> auto reader = this->GetRecordBatchReader(opts_->dataset_schema);
> auto source = this->GetFileSource(reader.get());
> auto fragment = this->MakeFragment(*source);
> int64_t row_count = 0;
> for (auto maybe_batch : PhysicalBatches(fragment)) {
> ASSERT_OK_AND_ASSIGN(auto batch, maybe_batch);
> row_count += batch->num_rows();
> AssertSchemaEqual(*batch->schema(), *expected_schema,
> /*check_metadata=*/false);
> }
> ASSERT_EQ(row_count, expected_rows());
> }
> {code}
> I would expect this to work. Instead I get the error:
> {noformat}
> /home/pace/dev/arrow/cpp/src/arrow/dataset/test_util.h:840: Failure
> Failed
> '_error_or_value83.status()' failed with NotImplemented: Inferring column projection from FieldRef FieldRef.FieldPath(0)
> /home/pace/dev/arrow/cpp/src/arrow/dataset/file_parquet.cc:262 ResolveOneFieldRef(manifest, ref, field_lookup, duplicate_fields, &columns_selection)
> /home/pace/dev/arrow/cpp/src/arrow/dataset/file_parquet.cc:437 InferColumnProjection(*reader, *options)
> /home/pace/dev/arrow/cpp/src/arrow/util/iterator.h:152 value_.status()
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)