You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Gary Clark <gc...@gmail.com> on 2020/09/03 15:23:02 UTC

[Python] Failure parsing string into timestamp

Hi,

Not sure if I am missing something, but I am unable to get pyarrow to parse
my datetimes that are being inferred as strings, to be timestamps.

My strings are arriving in CSVs with this format: '2015-01-09 00:00:00.000'

I have tried creating:
convert_ops = csv.ConvertOptions(timestamp_parsers=['%Y-%m-%d %H:%M:%S.%f])
df = csv.read_csv('path_to_csv', convert_options=convert_opts)
print(df.schema)

This yields no change and has my columns with these formatted timestamps
still showing as strings.

Additionally, I have tried casting as well:

dfschema = pa.schema([
('date_column', pa.timestamp('ms'))
])
df = csv.read_csv('path_to_csv')
df.cast(target_schema=dfschema)

This way yields the error: "pyarrow.lib.ArrowInvalid: Failed to parse
string: 2015-01-09 00:00:00.000"

I am using pyarrow=1.0.1 on a linux docker container.

Thanks,

-- 
Gary Clark
*Data Scientist & Data Engineer*
*B.S. Mechanical Engineering, Howard University '13*
+1 (717) 798-6916
gclarkjr5@gmail.com

Re: [Python] Failure parsing string into timestamp

Posted by Micah Kornfield <em...@gmail.com>.
Hi Gary,
According to ARROW-9561 <https://issues.apache.org/jira/browse/ARROW-9561> [1]
you need to pass in the type for the column specifically. i.e.

csv.ConvertOptions(column_types={'t': pa.timestamp('us')})


Hope this helps.

-Micah

[1] https://issues.apache.org/jira/browse/ARROW-9561

On Thu, Sep 3, 2020 at 8:23 AM Gary Clark <gc...@gmail.com> wrote:

> Hi,
>
> Not sure if I am missing something, but I am unable to get pyarrow to
> parse my datetimes that are being inferred as strings, to be timestamps.
>
> My strings are arriving in CSVs with this format: '2015-01-09 00:00:00.000'
>
> I have tried creating:
> convert_ops = csv.ConvertOptions(timestamp_parsers=['%Y-%m-%d %H:%M:%S.%f])
> df = csv.read_csv('path_to_csv', convert_options=convert_opts)
> print(df.schema)
>
> This yields no change and has my columns with these formatted timestamps
> still showing as strings.
>
> Additionally, I have tried casting as well:
>
> dfschema = pa.schema([
> ('date_column', pa.timestamp('ms'))
> ])
> df = csv.read_csv('path_to_csv')
> df.cast(target_schema=dfschema)
>
> This way yields the error: "pyarrow.lib.ArrowInvalid: Failed to parse
> string: 2015-01-09 00:00:00.000"
>
> I am using pyarrow=1.0.1 on a linux docker container.
>
> Thanks,
>
> --
> Gary Clark
> *Data Scientist & Data Engineer*
> *B.S. Mechanical Engineering, Howard University '13*
> +1 (717) 798-6916
> gclarkjr5@gmail.com
>