You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2022/03/19 01:56:00 UTC
[jira] [Resolved] (ARROW-15820) [C++][Doc] Add table_source to streaming_execution.rst & clarify parameter name
[ https://issues.apache.org/jira/browse/ARROW-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Weston Pace resolved ARROW-15820.
---------------------------------
Fix Version/s: 8.0.0
Resolution: Fixed
Issue resolved by pull request 12555
[https://github.com/apache/arrow/pull/12555]
> [C++][Doc] Add table_source to streaming_execution.rst & clarify parameter name
> -------------------------------------------------------------------------------
>
> Key: ARROW-15820
> URL: https://issues.apache.org/jira/browse/ARROW-15820
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Weston Pace
> Assignee: Vibhatha Lakmal Abeykoon
> Priority: Major
> Labels: pull-request-available
> Fix For: 8.0.0
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Currently the table_source node does not appear in our documentation.
> Also, in {{TableSourceNodeOptions}} we have:
> {noformat}
> // Size of batches to emit from this node
> // If the table is larger the node will emit multiple batches from the
> // the table to be processed in parallel.
> int64_t batch_size;
> {noformat}
> However, when looking into a performance issue today, I realized this description is incomplete. In reality we should probably call this parameter {{max_batch_size}}.
> Furthermore, we should make it clear that a table with smaller batches will emit smaller batches directly (this is a good thing in my case) and will not concatenate small batches together into a larger batch.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)