You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2022/09/14 19:49:00 UTC
[jira] [Resolved] (ARROW-17521) [Python] Add python bindings for NamedTableProvider for Substrait consumer
[ https://issues.apache.org/jira/browse/ARROW-17521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Weston Pace resolved ARROW-17521.
---------------------------------
Fix Version/s: 10.0.0
Resolution: Fixed
Issue resolved by pull request 14024
https://github.com/apache/arrow/pull/14024
> [Python] Add python bindings for NamedTableProvider for Substrait consumer
> --------------------------------------------------------------------------
>
> Key: ARROW-17521
> URL: https://issues.apache.org/jira/browse/ARROW-17521
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Weston Pace
> Assignee: Vibhatha Lakmal Abeykoon
> Priority: Major
> Labels: pull-request-available
> Fix For: 10.0.0
>
> Time Spent: 5h 50m
> Remaining Estimate: 0h
>
> The C++ Substrait consumer currently supports a named table provider to handle the NamedTable relation:
> {noformat}
> using NamedTableProvider =
> std::function<Result<compute::Declaration>(const std::vector<std::string>&)>;
> static NamedTableProvider kDefaultNamedTableProvider;
> /// Options that control the conversion between Substrait and Acero representations of a
> /// plan.
> struct ConversionOptions {
> /// \brief How strictly the converter should adhere to the structure of the input.
> ConversionStrictness strictness = ConversionStrictness::BEST_EFFORT;
> /// \brief A custom strategy to be used for providing named tables
> ///
> /// The default behavior will return an invalid status if the plan has any
> /// named table relations.
> NamedTableProvider named_table_provider = kDefaultNamedTableProvider;
> };
> {noformat}
> This is very useful for testing and experimenting as it allows you to provide tables from memory (using a table_source node for example). We should add pyarrow bindings. I don't think they need to expose the full compute::DeclarationInfo range of table sources. A simple approach might be a function that, given a list of names, returns either a table, an iterable of batches, or a record batch reader.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)