You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Dave Challis (JIRA)" <ji...@apache.org> on 2018/04/06 11:42:00 UTC
[jira] [Updated] (ARROW-2406) [Python] Segfault when creating
PyArrow table from Pandas for empty string column when schema provided
[ https://issues.apache.org/jira/browse/ARROW-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dave Challis updated ARROW-2406:
--------------------------------
Description:
Minimal example to recreate:
{code}
import pandas as pd
import pyarrow as pa
df = pd.DataFrame({'a': []})
df['a'] = df['a'].astype(str)
schema = pa.schema([pa.field('a', pa.string())])
pa.Table.from_pandas(df, schema=schema){code}
This causes the python interpreter to exit with "Segmentation fault: 11".
The following examples all work without any issue:
{code}
# column 'a' is no longer empty
df = pd.DataFrame({'a': ['foo']})
df['a'] = df['a'].astype(str)
schema = pa.schema([pa.field('a', pa.string())])
pa.Table.from_pandas(df, schema=schema)
{code}
{code}
# column 'a' is empty, but no schema is specified
df = pd.DataFrame({'a': []})
df['a'] = df['a'].astype(str)
pa.Table.from_pandas(df)
{code}
was:
Minimal example to recreate:
{code:python}
import pandas as pd
import pyarrow as pa
df = pd.DataFrame({'a': []})
df['a'] = df['a'].astype(str)
schema = pa.schema([pa.field('a', pa.string())])
pa.Table.from_pandas(df, schema=schema){code}
This causes the python interpreter to exit with "Segmentation fault: 11".
The following examples all work without any issue:
{code:python}
# column 'a' is no longer empty
df = pd.DataFrame({'a': ['foo']})
df['a'] = df['a'].astype(str)
schema = pa.schema([pa.field('a', pa.string())])
pa.Table.from_pandas(df, schema=schema)
{code}
{code:python}
# column 'a' is empty, but no schema is specified
df = pd.DataFrame({'a': []})
df['a'] = df['a'].astype(str)
pa.Table.from_pandas(df)
{code}
> [Python] Segfault when creating PyArrow table from Pandas for empty string column when schema provided
> ------------------------------------------------------------------------------------------------------
>
> Key: ARROW-2406
> URL: https://issues.apache.org/jira/browse/ARROW-2406
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6.3
> Reporter: Dave Challis
> Priority: Major
>
> Minimal example to recreate:
> {code}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'a': []})
> df['a'] = df['a'].astype(str)
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema){code}
>
> This causes the python interpreter to exit with "Segmentation fault: 11".
> The following examples all work without any issue:
> {code}
> # column 'a' is no longer empty
> df = pd.DataFrame({'a': ['foo']})
> df['a'] = df['a'].astype(str)
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> {code}
> # column 'a' is empty, but no schema is specified
> df = pd.DataFrame({'a': []})
> df['a'] = df['a'].astype(str)
> pa.Table.from_pandas(df)
> {code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)