You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Henrik Anker Rasmussen (Jira)" <ji...@apache.org> on 2021/02/03 10:37:00 UTC
[jira] [Created] (ARROW-11480) [Python]Segmentation fault with date
filter
Henrik Anker Rasmussen created ARROW-11480:
----------------------------------------------
Summary: [Python]Segmentation fault with date filter
Key: ARROW-11480
URL: https://issues.apache.org/jira/browse/ARROW-11480
Project: Apache Arrow
Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Henrik Anker Rasmussen
Attachments: timestamp.parquet
If I read a parquet file (see attachment) with timestamps generated in Spark and apply a filter on a date column I get segmentation fault
{code:java}
import pyarrow.parquet as pq
now = datetime.datetime.now()
table = pq.read_table("timestamp.parquet", filters=[("date", "<=", now)])
{code}
The attached parquet file is generated with this code in spark:
{code:java}
now = datetime.datetime.now()
data = {"date": [ now - datetime.timedelta(days=i) for i in range(100)]}
schema = { "type": "struct", "fields": [{"name": "date", "type": "timestamp", "nullable": True, "metadata": {}}, ], }
spf = spark.createDataFrame(pd.DataFrame(data), schema=StructType.fromJson(schema))
spf.write.format("parquet").mode("overwrite").save("timestamp.parquet")
{code}
If I downgrade pyarrow to 2.0.0 it works fine.
Python version 3.7.7
pyarrow version 3.0.0
--
This message was sent by Atlassian Jira
(v8.3.4#803005)