You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2019/06/24 16:55:00 UTC

[jira] [Commented] (ARROW-5712) [C++][Parquet] Arrow time32/time64/timestamp ConvertedType not being restored properly

    [ https://issues.apache.org/jira/browse/ARROW-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871515#comment-16871515 ] 

Wes McKinney commented on ARROW-5712:
-------------------------------------

I have to dig into this in the course of the other statistics issues I'm working on (ARROW-4139, ARROW-5166)

note that the converted type is being restored correctly for UINT64 and UTF8


{code}
(Pdb) stats
<pyarrow._parquet.Statistics object at 0x7f9b0360cf08>
  has_min_max: True
  min: 10
  max: 11164359321221007157
  null_count: 0
  distinct_count: 0
  num_values: 2
  converted_type: UINT_64
  physical_type: INT64
(Pdb) c

(Pdb) stats
<pyarrow._parquet.Statistics object at 0x7f9b0360cdc8>
  has_min_max: True
  min: ähnlich
  max: öffentlich
  null_count: 0
  distinct_count: 0
  num_values: 2
  converted_type: UTF8
  physical_type: BYTE_ARRAY
(Pdb) c
{code}

But it's missing for time32/time64/timestamp

{code}
(Pdb) stats
<pyarrow._parquet.Statistics object at 0x7f9b0360cf58>
  has_min_max: True
  min: 37800001
  max: 55800001
  null_count: 0
  distinct_count: 0
  num_values: 2
  converted_type: NONE
  physical_type: INT32
(Pdb) c
{code}

> [C++][Parquet] Arrow time32/time64/timestamp ConvertedType not being restored properly
> --------------------------------------------------------------------------------------
>
>                 Key: ARROW-5712
>                 URL: https://issues.apache.org/jira/browse/ARROW-5712
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>            Reporter: Wes McKinney
>            Assignee: Wes McKinney
>            Priority: Major
>             Fix For: 1.0.0
>
>
> These values are currently being written as raw INT32 without a logical or converted type set
> Example statistics for {{time32('ms')}} with {{version='2.0'}} set
> {code}
> (Pdb) stats
> <pyarrow._parquet.Statistics object at 0x7f6a9dca9f30>
>   has_min_max: True
>   min: 37800001
>   max: 55800001
>   null_count: 0
>   distinct_count: 0
>   num_values: 2
>   converted_type: NONE
>   physical_type: INT32
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)