You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neville Dipale (Jira)" <ji...@apache.org> on 2021/03/29 15:11:00 UTC
[jira] [Commented] (ARROW-12124) [Rust] Parquet writer creates
invalid parquet files
[ https://issues.apache.org/jira/browse/ARROW-12124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310716#comment-17310716 ]
Neville Dipale commented on ARROW-12124:
----------------------------------------
[~domoritz] I've commented on https://github.com/domoritz/csv2parquet/issues/2 with the solution to the issue
> [Rust] Parquet writer creates invalid parquet files
> ---------------------------------------------------
>
> Key: ARROW-12124
> URL: https://issues.apache.org/jira/browse/ARROW-12124
> Project: Apache Arrow
> Issue Type: Bug
> Components: Rust
> Reporter: Dominik Moritz
> Priority: Major
>
> I wrote a simple CSV to Parquet converter at https://github.com/domoritz/csv2parquet/blob/f53feb5bd995eab41dee09f2c4d722512052d7ca/src/main.rs.
> Running it (`csv2parquet test.txt test.parquet`) with a simple file such as
> ```
> a,b,c
> 0,1,hello world
> 0,1,hello world
> 0,1,hello world
> 0,1,hello world
> 0,1,hello world
> 0,1,hello world
> 0,1,hello world
> ```
> And then trying to read in Python with
> ```
> import pandas as pd
> df = pd.read_parquet('test.parquet')
> df.to_csv('test2.csv')
> ```
> Results in this error
> ```
> OSError: Could not open parquet input source '<Buffer>': Invalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.
> ```
> The schema seems to be inferred correctly
> ```
> Inferred Schema:
> {
> "fields": [
> {
> "name": "a",
> "nullable": false,
> "type": {
> "name": "int",
> "bitWidth": 64,
> "isSigned": true
> },
> "children": []
> },
> {
> "name": "b",
> "nullable": false,
> "type": {
> "name": "int",
> "bitWidth": 64,
> "isSigned": true
> },
> "children": []
> },
> {
> "name": "c",
> "nullable": false,
> "type": {
> "name": "utf8"
> },
> "children": []
> }
> ],
> "metadata": {}
> }
> ```
--
This message was sent by Atlassian Jira
(v8.3.4#803005)