You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Todd Farmer (Jira)" <ji...@apache.org> on 2022/07/12 14:05:03 UTC

[jira] [Assigned] (ARROW-15658) [C++] Parquet pushdown filtering fails if the filter expression uses numeric field references

     [ https://issues.apache.org/jira/browse/ARROW-15658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Farmer reassigned ARROW-15658:
-----------------------------------

    Assignee:     (was: Vibhatha Lakmal Abeykoon)

This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.

> [C++] Parquet pushdown filtering fails if the filter expression uses numeric field references
> ---------------------------------------------------------------------------------------------
>
>                 Key: ARROW-15658
>                 URL: https://issues.apache.org/jira/browse/ARROW-15658
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>    Affects Versions: 7.0.0
>            Reporter: Weston Pace
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> We can refer to a field by name (e.g. {{compute::field_ref("foo")}}) or by index (e.g. {{compute::field_ref(0)}}).
> The latter is not supported when doing parquet projection.  A test can demonstrating this can be found here: https://github.com/westonpace/arrow/commit/2f92ed0764cf2e1388dac053aeb4e1b923c6872e
> Copied here for posterity (this would go in the dataset fixture mixin):
> {code}
>   void TestScanWithFieldPathFilter() {
>     auto i32 = field("i32", int32());
>     auto i64 = field("i64", int64());
>     this->opts_->dataset_schema = schema({i32, i64});
>     this->Project({"i64"});
>     // This should be the column i32
>     this->SetFilter(equal(field_ref(0), literal(0)));
>     auto expected_schema = schema({i64});
>     auto reader = this->GetRecordBatchReader(opts_->dataset_schema);
>     auto source = this->GetFileSource(reader.get());
>     auto fragment = this->MakeFragment(*source);
>     int64_t row_count = 0;
>     for (auto maybe_batch : PhysicalBatches(fragment)) {
>       ASSERT_OK_AND_ASSIGN(auto batch, maybe_batch);
>       row_count += batch->num_rows();
>       AssertSchemaEqual(*batch->schema(), *expected_schema,
>                         /*check_metadata=*/false);
>     }
>     ASSERT_EQ(row_count, expected_rows());
>   }
> {code}
> I would expect this to work.  Instead I get the error:
> {noformat}
> /home/pace/dev/arrow/cpp/src/arrow/dataset/test_util.h:840: Failure
> Failed
> '_error_or_value83.status()' failed with NotImplemented: Inferring column projection from FieldRef FieldRef.FieldPath(0)
> /home/pace/dev/arrow/cpp/src/arrow/dataset/file_parquet.cc:262  ResolveOneFieldRef(manifest, ref, field_lookup, duplicate_fields, &columns_selection)
> /home/pace/dev/arrow/cpp/src/arrow/dataset/file_parquet.cc:437  InferColumnProjection(*reader, *options)
> /home/pace/dev/arrow/cpp/src/arrow/util/iterator.h:152  value_.status()
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)