You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Nithin Kumara Narayanaswamy Teekaramanaa (Jira)" <ji...@apache.org> on 2021/03/17 13:55:00 UTC
[jira] [Created] (ARROW-12001) pyarrow.lib.ArrowInvalid: CSV parse
error: Expected 4 columns, got 6
Nithin Kumara Narayanaswamy Teekaramanaa created ARROW-12001:
----------------------------------------------------------------
Summary: pyarrow.lib.ArrowInvalid: CSV parse error: Expected 4 columns, got 6
Key: ARROW-12001
URL: https://issues.apache.org/jira/browse/ARROW-12001
Project: Apache Arrow
Issue Type: Bug
Reporter: Nithin Kumara Narayanaswamy Teekaramanaa
Attachments: test.csv
Test scenario :
I read the same attched csv file in pandas and pyarrow to make a comparison,
# With pandas it reads it into a df without problems and result is as follows:
{code:java}
import pandas as pd
df = pd.read_csv('test.csv', names=['col1', 'col2', 'col3', 'col4', 'col5','col6'])
>>df
col1 col2 col3 col4 col5 col6
0 20210317 julie 23434 test data 1.0
1 20210316 adam 232423 test NaN NaN{code}
2. With pyarrow csv, I get a parse error:
{code:java}
from pyarrow import csv
import pyarrow as pa
read_options = csv.ReadOptions(column_names=['col1', 'col2', 'col3', 'col4', 'col5', 'col6'])
convert_options = csv.ConvertOptions(column_types=pa.schema(fields))
table = csv.read_csv('test.csv', read_options=read_options, convert_options=convert_options)
ERROR:
Traceback (most recent call last):
File ".../test_pyarr.py", line 71, in <module>
table = csv.read_csv('test.csv',
File "pyarrow/_csv.pyx", line 714, in pyarrow._csv.read_csv
File "pyarrow/error.pxi", line 122, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: CSV parse error: Expected 6 columns, got 4
{code}
Is there a parameter that can be set to fill null values in case the column values are missing for the specified schema?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)