You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2016/12/22 21:38:58 UTC

[jira] [Updated] (PARQUET-816) [C++] Failure decoding sample dict-encoded file from parquet-compatibility project

     [ https://issues.apache.org/jira/browse/PARQUET-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney updated PARQUET-816:
---------------------------------
    Description: 
See attached. This throws an exception when read:

{code}
$ debug/parquet_reader nation.dict.parquet 
File statistics:
Version: 1
Created By: parquet-mr
Total rows: 25
Number of RowGroups: 1
Number of Real Columns: 4
Number of Columns: 4
Number of Selected Columns: 4
Column 0: nation_key (INT32)
Column 1: name (BYTE_ARRAY)
Column 2: region_key (INT32)
Column 3: comment_col (BYTE_ARRAY)
--- Row Group 0 ---
--- Total Bytes 0 ---
  rows: 25---
Column 0
, values: 25  Statistics Not Set
  compression: UNCOMPRESSED, encodings: 
  uncompressed size: 125, compressed size: 125
Column 1
, values: 25  Statistics Not Set
  compression: UNCOMPRESSED, encodings: 
  uncompressed size: 322, compressed size: 322
Column 2
, values: 25  Statistics Not Set
  compression: UNCOMPRESSED, encodings: 
  uncompressed size: 125, compressed size: 125
Column 3
, values: 25  Statistics Not Set
  compression: UNCOMPRESSED, encodings: 
  uncompressed size: 2002, compressed size: 2002
nation_key              name                    region_key              comment_col             
0                       Parquet error: Unexpected end of stream.
{code}

However, I checked that I can read this file with Impala:

{code}
In [13]: hdfs.put('/tmp/nation-dict-test/test.parq', 'nation.dict.parquet')
Out[13]: '/tmp/nation-dict-test/test.parq'

In [14]: pf = con.parquet_file('/tmp/nation-dict-test')

In [15]: pf.execute()
Out[15]: 
    nation_key            name  region_key  \
0            0         ALGERIA           0   
1            1       ARGENTINA           1   
2            2          BRAZIL           1   
3            3          CANADA           1   
4            4           EGYPT           4   
5            5        ETHIOPIA           0   
6            6          FRANCE           3   
7            7         GERMANY           3   
8            8           INDIA           2   
9            9       INDONESIA           2   
10          10            IRAN           4   
11          11            IRAQ           4   
12          12           JAPAN           2   
13          13          JORDAN           4   
14          14           KENYA           0   
15          15         MOROCCO           0   
16          16      MOZAMBIQUE           0   
17          17            PERU           1   
18          18           CHINA           2   
19          19         ROMANIA           3   
20          20    SAUDI ARABIA           4   
21          21         VIETNAM           2   
22          22          RUSSIA           3   
23          23  UNITED KINGDOM           3   
24          24   UNITED STATES           1   

                                          comment_col  
0    haggle. carefully final deposits detect slyly...  
1   al foxes promise slyly according to the regula...  
2   y alongside of the pending deposits. carefully...  
3   eas hang ironic, silent packages. slyly regula...  
4   y above the carefully unusual theodolites. fin...  
5                     ven packages wake quickly. regu  
6              refully final requests. regular, ironi  
7   l platelets. regular accounts x-ray: unusual, ...  
8   ss excuses cajole slyly across the packages. d...  
9    slyly express asymptotes. regular deposits ha...  
10  efully alongside of the slyly final dependenci...  
11  nic deposits boost atop the quickly final requ...  
12               ously. final, express gifts cajole a  
13  ic deposits are blithely about the carefully r...  
14   pending excuses haggle furiously deposits. pe...  
15  rns. blithely bold courts among the closely re...  
16      s. ironic, unusual asymptotes wake blithely r  
17  platelets. blithely pending dependencies use f...  
18  c dependencies. furiously express notornis sle...  
19  ular asymptotes are about the furious multipli...  
20  ts. silent requests haggle. closely express pa...  
21     hely enticingly express accounts. even, final   
22   requests against the platelets use never acco...  
23  eans boost carefully special requests. account...  
24  y final packages. slow foxes cajole quickly. q...  
{code}

  was:
See attached. This throws an exception when read:

{code}
$ debug/parquet_reader ~/code/fastparquet/test-data/nation.dict.parquet 
File statistics:
Version: 1
Created By: parquet-mr
Total rows: 25
Number of RowGroups: 1
Number of Real Columns: 4
Number of Columns: 4
Number of Selected Columns: 4
Column 0: nation_key (INT32)
Column 1: name (BYTE_ARRAY)
Column 2: region_key (INT32)
Column 3: comment_col (BYTE_ARRAY)
--- Row Group 0 ---
--- Total Bytes 0 ---
  rows: 25---
Column 0
, values: 25  Statistics Not Set
  compression: UNCOMPRESSED, encodings: 
  uncompressed size: 125, compressed size: 125
Column 1
, values: 25  Statistics Not Set
  compression: UNCOMPRESSED, encodings: 
  uncompressed size: 322, compressed size: 322
Column 2
, values: 25  Statistics Not Set
  compression: UNCOMPRESSED, encodings: 
  uncompressed size: 125, compressed size: 125
Column 3
, values: 25  Statistics Not Set
  compression: UNCOMPRESSED, encodings: 
  uncompressed size: 2002, compressed size: 2002
nation_key              name                    region_key              comment_col             
0                       Parquet error: Unexpected end of stream.
{code}

However, I checked that I can read this file with Impala:

{code}
In [13]: hdfs.put('/tmp/nation-dict-test/test.parq', 'nation.dict.parquet')
Out[13]: '/tmp/nation-dict-test/test.parq'

In [14]: pf = con.parquet_file('/tmp/nation-dict-test')

In [15]: pf.execute()
Out[15]: 
    nation_key            name  region_key  \
0            0         ALGERIA           0   
1            1       ARGENTINA           1   
2            2          BRAZIL           1   
3            3          CANADA           1   
4            4           EGYPT           4   
5            5        ETHIOPIA           0   
6            6          FRANCE           3   
7            7         GERMANY           3   
8            8           INDIA           2   
9            9       INDONESIA           2   
10          10            IRAN           4   
11          11            IRAQ           4   
12          12           JAPAN           2   
13          13          JORDAN           4   
14          14           KENYA           0   
15          15         MOROCCO           0   
16          16      MOZAMBIQUE           0   
17          17            PERU           1   
18          18           CHINA           2   
19          19         ROMANIA           3   
20          20    SAUDI ARABIA           4   
21          21         VIETNAM           2   
22          22          RUSSIA           3   
23          23  UNITED KINGDOM           3   
24          24   UNITED STATES           1   

                                          comment_col  
0    haggle. carefully final deposits detect slyly...  
1   al foxes promise slyly according to the regula...  
2   y alongside of the pending deposits. carefully...  
3   eas hang ironic, silent packages. slyly regula...  
4   y above the carefully unusual theodolites. fin...  
5                     ven packages wake quickly. regu  
6              refully final requests. regular, ironi  
7   l platelets. regular accounts x-ray: unusual, ...  
8   ss excuses cajole slyly across the packages. d...  
9    slyly express asymptotes. regular deposits ha...  
10  efully alongside of the slyly final dependenci...  
11  nic deposits boost atop the quickly final requ...  
12               ously. final, express gifts cajole a  
13  ic deposits are blithely about the carefully r...  
14   pending excuses haggle furiously deposits. pe...  
15  rns. blithely bold courts among the closely re...  
16      s. ironic, unusual asymptotes wake blithely r  
17  platelets. blithely pending dependencies use f...  
18  c dependencies. furiously express notornis sle...  
19  ular asymptotes are about the furious multipli...  
20  ts. silent requests haggle. closely express pa...  
21     hely enticingly express accounts. even, final   
22   requests against the platelets use never acco...  
23  eans boost carefully special requests. account...  
24  y final packages. slow foxes cajole quickly. q...  
{code}


> [C++] Failure decoding sample dict-encoded file from parquet-compatibility project
> ----------------------------------------------------------------------------------
>
>                 Key: PARQUET-816
>                 URL: https://issues.apache.org/jira/browse/PARQUET-816
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-cpp
>            Reporter: Wes McKinney
>         Attachments: nation.dict.parquet
>
>
> See attached. This throws an exception when read:
> {code}
> $ debug/parquet_reader nation.dict.parquet 
> File statistics:
> Version: 1
> Created By: parquet-mr
> Total rows: 25
> Number of RowGroups: 1
> Number of Real Columns: 4
> Number of Columns: 4
> Number of Selected Columns: 4
> Column 0: nation_key (INT32)
> Column 1: name (BYTE_ARRAY)
> Column 2: region_key (INT32)
> Column 3: comment_col (BYTE_ARRAY)
> --- Row Group 0 ---
> --- Total Bytes 0 ---
>   rows: 25---
> Column 0
> , values: 25  Statistics Not Set
>   compression: UNCOMPRESSED, encodings: 
>   uncompressed size: 125, compressed size: 125
> Column 1
> , values: 25  Statistics Not Set
>   compression: UNCOMPRESSED, encodings: 
>   uncompressed size: 322, compressed size: 322
> Column 2
> , values: 25  Statistics Not Set
>   compression: UNCOMPRESSED, encodings: 
>   uncompressed size: 125, compressed size: 125
> Column 3
> , values: 25  Statistics Not Set
>   compression: UNCOMPRESSED, encodings: 
>   uncompressed size: 2002, compressed size: 2002
> nation_key              name                    region_key              comment_col             
> 0                       Parquet error: Unexpected end of stream.
> {code}
> However, I checked that I can read this file with Impala:
> {code}
> In [13]: hdfs.put('/tmp/nation-dict-test/test.parq', 'nation.dict.parquet')
> Out[13]: '/tmp/nation-dict-test/test.parq'
> In [14]: pf = con.parquet_file('/tmp/nation-dict-test')
> In [15]: pf.execute()
> Out[15]: 
>     nation_key            name  region_key  \
> 0            0         ALGERIA           0   
> 1            1       ARGENTINA           1   
> 2            2          BRAZIL           1   
> 3            3          CANADA           1   
> 4            4           EGYPT           4   
> 5            5        ETHIOPIA           0   
> 6            6          FRANCE           3   
> 7            7         GERMANY           3   
> 8            8           INDIA           2   
> 9            9       INDONESIA           2   
> 10          10            IRAN           4   
> 11          11            IRAQ           4   
> 12          12           JAPAN           2   
> 13          13          JORDAN           4   
> 14          14           KENYA           0   
> 15          15         MOROCCO           0   
> 16          16      MOZAMBIQUE           0   
> 17          17            PERU           1   
> 18          18           CHINA           2   
> 19          19         ROMANIA           3   
> 20          20    SAUDI ARABIA           4   
> 21          21         VIETNAM           2   
> 22          22          RUSSIA           3   
> 23          23  UNITED KINGDOM           3   
> 24          24   UNITED STATES           1   
>                                           comment_col  
> 0    haggle. carefully final deposits detect slyly...  
> 1   al foxes promise slyly according to the regula...  
> 2   y alongside of the pending deposits. carefully...  
> 3   eas hang ironic, silent packages. slyly regula...  
> 4   y above the carefully unusual theodolites. fin...  
> 5                     ven packages wake quickly. regu  
> 6              refully final requests. regular, ironi  
> 7   l platelets. regular accounts x-ray: unusual, ...  
> 8   ss excuses cajole slyly across the packages. d...  
> 9    slyly express asymptotes. regular deposits ha...  
> 10  efully alongside of the slyly final dependenci...  
> 11  nic deposits boost atop the quickly final requ...  
> 12               ously. final, express gifts cajole a  
> 13  ic deposits are blithely about the carefully r...  
> 14   pending excuses haggle furiously deposits. pe...  
> 15  rns. blithely bold courts among the closely re...  
> 16      s. ironic, unusual asymptotes wake blithely r  
> 17  platelets. blithely pending dependencies use f...  
> 18  c dependencies. furiously express notornis sle...  
> 19  ular asymptotes are about the furious multipli...  
> 20  ts. silent requests haggle. closely express pa...  
> 21     hely enticingly express accounts. even, final   
> 22   requests against the platelets use never acco...  
> 23  eans boost carefully special requests. account...  
> 24  y final packages. slow foxes cajole quickly. q...  
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)