You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2019/05/02 19:37:00 UTC

[jira] [Updated] (PARQUET-1546) [C++] page level min / max written by parquet-cpp is not recognized by parquet-tools

     [ https://issues.apache.org/jira/browse/PARQUET-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney updated PARQUET-1546:
----------------------------------
    Summary: [C++] page level min / max written by parquet-cpp  is not recognized by parquet-tools  (was: page level min / max written by parquet-cpp  is not recognized by parquet-tools)

> [C++] page level min / max written by parquet-cpp  is not recognized by parquet-tools
> -------------------------------------------------------------------------------------
>
>                 Key: PARQUET-1546
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1546
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-cpp
>            Reporter: colin fang
>            Priority: Minor
>
> test parquet is created by
> {code}
> n = 1000000
> x = [1.0, 2.0, 3.0, 4.0, 5.0, 5.0, None] * n
> y = [u'é', u'é', u'é', u'é'] * n + [u'a', None, u'a'] * n
> z = np.random.rand(len(x)).tolist()
> df = pd.DataFrame({'x': x, 'y': y, 'z': z})
> df.to_parquet('test_arrow.parquet', use_dictionary=False, row_group_size= 1900100)
> {code}
>  
> output from parquet-tools
>  
> {code:java}
>     y TV=1900100 RL=0 DL=1
>     ----------------------------------------------------------------------------
>     page 0:   DLE:RLE RLE:RLE VLE:PLAIN ST:[min: é, max: é, num_nulls: 0] SZ:1050632 VC:175104
>     page 1:   DLE:RLE RLE:RLE VLE:PLAIN ST:[num_nulls: 90072, min/max not defined] SZ:1083218 VC:294912
>     page 2:   DLE:RLE RLE:RLE VLE:PLAIN ST:[min: a, max: a, num_nulls: 105131] SZ:1091359 VC:315392
>     page 3:   DLE:RLE RLE:RLE VLE:PLAIN ST:[min: a, max: a, num_nulls: 105130] SZ:1091364 VC:315392
> {code}
>  
>  In the above "min/max not defined"
> The parquet generated by `parquet-mr` has the correct page min  max.
>  
> It would be nice if it can show as {code}ST:[min: a, max: é, num_nulls: 90072]{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)