You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2021/02/04 09:58:00 UTC
[jira] [Commented] (ARROW-11480) [Python]Segmentation fault with
date filter
[ https://issues.apache.org/jira/browse/ARROW-11480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17278729#comment-17278729 ]
Joris Van den Bossche commented on ARROW-11480:
-----------------------------------------------
[~henrikrasmussen] thanks for the report!
I _suppose_ this is a duplicate of ARROW-11379, although I didn't realize that it could be a regression compared to 2.0.0
> [Python]Segmentation fault with date filter
> -------------------------------------------
>
> Key: ARROW-11480
> URL: https://issues.apache.org/jira/browse/ARROW-11480
> Project: Apache Arrow
> Issue Type: Bug
> Affects Versions: 3.0.0
> Reporter: Henrik Anker Rasmussen
> Priority: Major
> Attachments: timestamp.parquet
>
>
> If I read a parquet file (see attachment) with timestamps generated in Spark and apply a filter on a date column I get segmentation fault
>
> {code:java}
> import pyarrow.parquet as pq
> now = datetime.datetime.now()
> table = pq.read_table("timestamp.parquet", filters=[("date", "<=", now)])
> {code}
>
> The attached parquet file is generated with this code in spark:
> {code:java}
> now = datetime.datetime.now()
> data = {"date": [ now - datetime.timedelta(days=i) for i in range(100)]}
> schema = { "type": "struct", "fields": [{"name": "date", "type": "timestamp", "nullable": True, "metadata": {}}, ], }
> spf = spark.createDataFrame(pd.DataFrame(data), schema=StructType.fromJson(schema))
> spf.write.format("parquet").mode("overwrite").save("timestamp.parquet")
> {code}
> If I downgrade pyarrow to 2.0.0 it works fine.
> Python version 3.7.7
> pyarrow version 3.0.0
--
This message was sent by Atlassian Jira
(v8.3.4#803005)