You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ben Kietzman (Jira)" <ji...@apache.org> on 2021/02/16 15:00:01 UTC
[jira] [Resolved] (ARROW-11480) [Python] Segmentation fault reading
parquet with date filter with INT96 column
[ https://issues.apache.org/jira/browse/ARROW-11480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ben Kietzman resolved ARROW-11480.
----------------------------------
Resolution: Fixed
Issue resolved by pull request 9470
[https://github.com/apache/arrow/pull/9470]
> [Python] Segmentation fault reading parquet with date filter with INT96 column
> ------------------------------------------------------------------------------
>
> Key: ARROW-11480
> URL: https://issues.apache.org/jira/browse/ARROW-11480
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Affects Versions: 3.0.0
> Reporter: Henrik Anker Rasmussen
> Assignee: Ben Kietzman
> Priority: Major
> Labels: dataset, pull-request-available
> Fix For: 3.0.1, 4.0.0
>
> Attachments: timestamp.parquet
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> If I read a parquet file (see attachment) with timestamps generated in Spark and apply a filter on a date column I get segmentation fault
>
> {code:java}
> import pyarrow.parquet as pq
> now = datetime.datetime.now()
> table = pq.read_table("timestamp.parquet", filters=[("date", "<=", now)])
> {code}
>
> The attached parquet file is generated with this code in spark:
> {code:java}
> now = datetime.datetime.now()
> data = {"date": [ now - datetime.timedelta(days=i) for i in range(100)]}
> schema = { "type": "struct", "fields": [{"name": "date", "type": "timestamp", "nullable": True, "metadata": {}}, ], }
> spf = spark.createDataFrame(pd.DataFrame(data), schema=StructType.fromJson(schema))
> spf.write.format("parquet").mode("overwrite").save("timestamp.parquet")
> {code}
> If I downgrade pyarrow to 2.0.0 it works fine.
> Python version 3.7.7
> pyarrow version 3.0.0
--
This message was sent by Atlassian Jira
(v8.3.4#803005)