You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2019/06/24 16:55:00 UTC
[jira] [Commented] (ARROW-5712) [C++][Parquet] Arrow
time32/time64/timestamp ConvertedType not being restored properly
[ https://issues.apache.org/jira/browse/ARROW-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871515#comment-16871515 ]
Wes McKinney commented on ARROW-5712:
-------------------------------------
I have to dig into this in the course of the other statistics issues I'm working on (ARROW-4139, ARROW-5166)
note that the converted type is being restored correctly for UINT64 and UTF8
{code}
(Pdb) stats
<pyarrow._parquet.Statistics object at 0x7f9b0360cf08>
has_min_max: True
min: 10
max: 11164359321221007157
null_count: 0
distinct_count: 0
num_values: 2
converted_type: UINT_64
physical_type: INT64
(Pdb) c
(Pdb) stats
<pyarrow._parquet.Statistics object at 0x7f9b0360cdc8>
has_min_max: True
min: ähnlich
max: öffentlich
null_count: 0
distinct_count: 0
num_values: 2
converted_type: UTF8
physical_type: BYTE_ARRAY
(Pdb) c
{code}
But it's missing for time32/time64/timestamp
{code}
(Pdb) stats
<pyarrow._parquet.Statistics object at 0x7f9b0360cf58>
has_min_max: True
min: 37800001
max: 55800001
null_count: 0
distinct_count: 0
num_values: 2
converted_type: NONE
physical_type: INT32
(Pdb) c
{code}
> [C++][Parquet] Arrow time32/time64/timestamp ConvertedType not being restored properly
> --------------------------------------------------------------------------------------
>
> Key: ARROW-5712
> URL: https://issues.apache.org/jira/browse/ARROW-5712
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, Python
> Reporter: Wes McKinney
> Assignee: Wes McKinney
> Priority: Major
> Fix For: 1.0.0
>
>
> These values are currently being written as raw INT32 without a logical or converted type set
> Example statistics for {{time32('ms')}} with {{version='2.0'}} set
> {code}
> (Pdb) stats
> <pyarrow._parquet.Statistics object at 0x7f6a9dca9f30>
> has_min_max: True
> min: 37800001
> max: 55800001
> null_count: 0
> distinct_count: 0
> num_values: 2
> converted_type: NONE
> physical_type: INT32
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)