You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2021/09/07 21:53:00 UTC

[jira] [Commented] (ARROW-3998) Support TPC-H dbgen in Arrow

    [ https://issues.apache.org/jira/browse/ARROW-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17411555#comment-17411555 ] 

Wes McKinney commented on ARROW-3998:
-------------------------------------

DuckDB provides TPC-H dataset generation as an extension and can generate the datasets at different scale factors. Given that DuckDB can return result sets as Arrow format in Python and R, we could use it as a utility to generate testing files

> Support TPC-H dbgen in Arrow
> ----------------------------
>
>                 Key: ARROW-3998
>                 URL: https://issues.apache.org/jira/browse/ARROW-3998
>             Project: Apache Arrow
>          Issue Type: Wish
>          Components: Benchmarking, Integration
>            Reporter: Francois Saint-Jacques
>            Priority: Minor
>
> Integration tests and benchmarks should read TPC-H data. This is going to be useful for future query execution engine benchmarking.
> It could also attract researchers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)