You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neville Dipale (Jira)" <ji...@apache.org> on 2021/03/29 15:11:00 UTC

[jira] [Commented] (ARROW-12124) [Rust] Parquet writer creates invalid parquet files

    [ https://issues.apache.org/jira/browse/ARROW-12124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310716#comment-17310716 ] 

Neville Dipale commented on ARROW-12124:
----------------------------------------

[~domoritz] I've commented on https://github.com/domoritz/csv2parquet/issues/2 with the solution to the issue

> [Rust] Parquet writer creates invalid parquet files
> ---------------------------------------------------
>
>                 Key: ARROW-12124
>                 URL: https://issues.apache.org/jira/browse/ARROW-12124
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Rust
>            Reporter: Dominik Moritz
>            Priority: Major
>
> I wrote a simple CSV to Parquet converter at https://github.com/domoritz/csv2parquet/blob/f53feb5bd995eab41dee09f2c4d722512052d7ca/src/main.rs. 
> Running it (`csv2parquet test.txt test.parquet`) with a simple file such as 
> ```
> a,b,c
> 0,1,hello world
> 0,1,hello world
> 0,1,hello world
> 0,1,hello world
> 0,1,hello world
> 0,1,hello world
> 0,1,hello world
> ```
> And then trying to read in Python with
> ```
> import pandas as pd
> df = pd.read_parquet('test.parquet')
> df.to_csv('test2.csv')
> ```
> Results in this error
> ```
> OSError: Could not open parquet input source '<Buffer>': Invalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.
> ```
> The schema seems to be inferred correctly
> ```
> Inferred Schema:
> {
>   "fields": [
>     {
>       "name": "a",
>       "nullable": false,
>       "type": {
>         "name": "int",
>         "bitWidth": 64,
>         "isSigned": true
>       },
>       "children": []
>     },
>     {
>       "name": "b",
>       "nullable": false,
>       "type": {
>         "name": "int",
>         "bitWidth": 64,
>         "isSigned": true
>       },
>       "children": []
>     },
>     {
>       "name": "c",
>       "nullable": false,
>       "type": {
>         "name": "utf8"
>       },
>       "children": []
>     }
>   ],
>   "metadata": {}
> }
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)