You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by be...@datalab.run on 2022/05/09 13:43:29 UTC

How do I read parquet with python object

# python:

import pandas as pd

a = pd.DataFrame([[1, [2.3, 1.2]]], columns=['a', 'b'])
a.to_parquet('a.parquet')

# pyspark:

d2 = spark.read.parquet('a.parquet')

will return error:

An error was encountered: An error occurred while calling o277.showString. :
org.apache.spark.SparkException: Job aborted due to stage failure: Task 14
in stage 9.0 failed 4 times, most recent failure: Lost task 14.2 in stage
9.0 (TID 63, 10.169.0.196, executor 15): java.lang.IllegalArgumentException:
Illegal Capacity: -221

how can I fix it?

Thanks.

 


Re: How do I read parquet with python object

Posted by Sean Owen <sr...@gmail.com>.
That's a parquet library error. It might be this:
https://issues.apache.org/jira/browse/PARQUET-1633 That's fixed in recent
versions of Parquet. You didn't say what versions of libraries you are
using, but try the latest Spark.


On Mon, May 9, 2022 at 8:49 AM <be...@datalab.run> wrote:

> # python:
>
> import pandas as pd
>
> a = pd.DataFrame([[1, [2.3, 1.2]]], columns=['a', 'b'])
> a.to_parquet('a.parquet')
>
> # pyspark:
>
> d2 = spark.read.parquet('a.parquet')
>
> will return error:
>
> An error was encountered: An error occurred while calling o277.showString.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task
> 14 in stage 9.0 failed 4 times, most recent failure: Lost task 14.2 in
> stage 9.0 (TID 63, 10.169.0.196, executor 15):
> java.lang.IllegalArgumentException: Illegal Capacity: -221
>
> how can I fix it?
>
> Thanks.
>
>
>