You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Nithin Kumara Narayanaswamy Teekaramanaa (Jira)" <ji...@apache.org> on 2021/03/17 13:55:00 UTC

[jira] [Created] (ARROW-12001) pyarrow.lib.ArrowInvalid: CSV parse error: Expected 4 columns, got 6

Nithin Kumara Narayanaswamy Teekaramanaa created ARROW-12001:
----------------------------------------------------------------

             Summary: pyarrow.lib.ArrowInvalid: CSV parse error: Expected 4 columns, got 6
                 Key: ARROW-12001
                 URL: https://issues.apache.org/jira/browse/ARROW-12001
             Project: Apache Arrow
          Issue Type: Bug
            Reporter: Nithin Kumara Narayanaswamy Teekaramanaa
         Attachments: test.csv

Test scenario :

I read the same attched csv file in pandas and pyarrow to make a comparison,
 # With pandas it reads it into a df without problems and result is as follows:
{code:java}
import pandas as pd

df = pd.read_csv('test.csv', names=['col1', 'col2', 'col3', 'col4', 'col5','col6'])

>>df
       col1   col2    col3  col4  col5  col6
0  20210317  julie   23434  test  data   1.0
1  20210316   adam  232423  test   NaN   NaN{code}

 2.  With pyarrow csv, I get a parse error:
{code:java}
from pyarrow import csv
import pyarrow as pa

read_options = csv.ReadOptions(column_names=['col1', 'col2', 'col3', 'col4', 'col5', 'col6'])
convert_options = csv.ConvertOptions(column_types=pa.schema(fields))
table = csv.read_csv('test.csv', read_options=read_options,                     convert_options=convert_options)

ERROR:

Traceback (most recent call last):
  File ".../test_pyarr.py", line 71, in <module>
    table = csv.read_csv('test.csv',
  File "pyarrow/_csv.pyx", line 714, in pyarrow._csv.read_csv
  File "pyarrow/error.pxi", line 122, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: CSV parse error: Expected 6 columns, got 4
{code}
Is there a parameter that can be set to fill null values in case the column values are missing for the specified schema?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)