You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jorge Leitão (Jira)" <ji...@apache.org> on 2020/12/22 05:32:00 UTC

[jira] [Resolved] (ARROW-10999) [Rust] TPC-H parquet files cannot be read by Apache Spark

     [ https://issues.apache.org/jira/browse/ARROW-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jorge Leitão resolved ARROW-10999.
----------------------------------
    Resolution: Fixed

Issue resolved by pull request 8980
[https://github.com/apache/arrow/pull/8980]

> [Rust] TPC-H parquet files cannot be read by Apache Spark
> ---------------------------------------------------------
>
>                 Key: ARROW-10999
>                 URL: https://issues.apache.org/jira/browse/ARROW-10999
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Rust
>            Reporter: Andy Grove
>            Assignee: Andy Grove
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.0.0
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> The TPC-H parquet files generated by the benchmark crate cannot be read by Apache Spark because they use unsigned ints, which cannot be read in Spark (I am guessing because Java only has signed ints).
> I would  like to use the same data sets for benchmarking DataFusion, Apache Spark, and other tools.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)