You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neville Dipale (Jira)" <ji...@apache.org> on 2020/09/29 23:54:00 UTC
[jira] [Updated] (ARROW-8258) [Rust] [Parquet] ArrowReader fails on
some timestamp types
[ https://issues.apache.org/jira/browse/ARROW-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neville Dipale updated ARROW-8258:
----------------------------------
Fix Version/s: (was: 2.0.0)
3.0.0
> [Rust] [Parquet] ArrowReader fails on some timestamp types
> ----------------------------------------------------------
>
> Key: ARROW-8258
> URL: https://issues.apache.org/jira/browse/ARROW-8258
> Project: Apache Arrow
> Issue Type: Bug
> Components: Rust
> Reporter: Andy Grove
> Assignee: Renjie Liu
> Priority: Major
> Fix For: 3.0.0
>
>
> I discovered this bug with this query
> {code:java}
> > SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
> General("InvalidArgumentError(\"column types must match schema types, expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") {code}
> The parquet reader detects this schema when reading from the file:
> {code:java}
> Schema {
> fields: [
> Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, None), nullable: true, dict_id: 0, dict_is_ordered: false }
> ],
> metadata: {}
> } {code}
> The struct array read from the file contains:
> {code:java}
> [PrimitiveArray<UInt64>
> [
> 1567318008000000,
> 1567319357000000,
> 1567320092000000,
> 1567321151000000, {code}
> When the Parquet arrow reader creates the record batch, the following validation logic fails:
> {code:java}
> for i in 0..columns.len() {
> if columns[i].len() != len {
> return Err(ArrowError::InvalidArgumentError(
> "all columns in a record batch must have the same length".to_string(),
> ));
> }
> if columns[i].data_type() != schema.field(i).data_type() {
> return Err(ArrowError::InvalidArgumentError(format!(
> "column types must match schema types, expected {:?} but found {:?} at column index {}",
> schema.field(i).data_type(),
> columns[i].data_type(),
> i)));
> }
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)