You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Dominik Moritz (Jira)" <ji...@apache.org> on 2021/03/28 19:19:00 UTC
[jira] [Created] (ARROW-12124) [Rust] Parquet writer creates
invalid parquet files
Dominik Moritz created ARROW-12124:
--------------------------------------
Summary: [Rust] Parquet writer creates invalid parquet files
Key: ARROW-12124
URL: https://issues.apache.org/jira/browse/ARROW-12124
Project: Apache Arrow
Issue Type: Bug
Components: Rust
Reporter: Dominik Moritz
I wrote a simple CSV to Parquet converter at https://github.com/domoritz/csv2parquet/blob/f53feb5bd995eab41dee09f2c4d722512052d7ca/src/main.rs.
Running it (`csv2parquet test.txt test.parquet`) with a simple file such as
```
a,b,c
0,1,hello world
0,1,hello world
0,1,hello world
0,1,hello world
0,1,hello world
0,1,hello world
0,1,hello world
```
And then trying to read in Python with
```
import pandas as pd
df = pd.read_parquet('test.parquet')
df.to_csv('test2.csv')
```
Results in this error
```
OSError: Could not open parquet input source '<Buffer>': Invalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.
```
The schema seems to be inferred correctly
```
Inferred Schema:
{
"fields": [
{
"name": "a",
"nullable": false,
"type": {
"name": "int",
"bitWidth": 64,
"isSigned": true
},
"children": []
},
{
"name": "b",
"nullable": false,
"type": {
"name": "int",
"bitWidth": 64,
"isSigned": true
},
"children": []
},
{
"name": "c",
"nullable": false,
"type": {
"name": "utf8"
},
"children": []
}
],
"metadata": {}
}
```
--
This message was sent by Atlassian Jira
(v8.3.4#803005)