You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Diana Clarke (Jira)" <ji...@apache.org> on 2021/03/27 00:38:00 UTC
[jira] [Created] (ARROW-12114) Dataset to table filter expression
API change
Diana Clarke created ARROW-12114:
------------------------------------
Summary: Dataset to table filter expression API change
Key: ARROW-12114
URL: https://issues.apache.org/jira/browse/ARROW-12114
Project: Apache Arrow
Issue Type: Bug
Reporter: Diana Clarke
Ben:
Can you please confirm that we're aware and okay with the following API change? Thanks!
{code}
import pyarrow.dataset
path_prefix = "ursa-labs-taxi-data-repartitioned-10k/"
paths = [
f"ursa-labs-taxi-data-repartitioned-10k/{year}/{month:02}/{part:04}/data.parquet"
for year in range(2009, 2020)
for month in range(1, 13)
for part in range(101)
if not (year == 2019 and month > 6) # Data ends in 2019/06
and not (year == 2010 and month == 3) # Data is missing in 2010/03
]
partitioning = pyarrow.dataset.DirectoryPartitioning.discover(
field_names=["year", "month", "part"],
infer_dictionary=True,
)
for source in self.get_sources(source):
s3 = pyarrow.fs.S3FileSystem(region="us-east-2")
dataset = pyarrow.dataset.dataset(
paths,
format="parquet",
filesystem=s3,
partitioning=partitioning,
partition_base_dir=path_prefix,
)
year = pyarrow.dataset.field("year")
month = pyarrow.dataset.field("month")
part = pyarrow.dataset.field("part")
filter_expr = (year == "2011") & (month == 1) & (part == 2)
dataset.to_table(filter=filter_expr)
{code}
In arrow 3.0, the above code executes without error.
On head, {{year == "2011"}}, which should be {{year == 2011}} (no quotes) raises the following exception.
{code}
pyarrow.lib.ArrowNotImplementedError: Function equal has no kernel matching input types (array[int32], scalar[string])
{code}
This API change appears to have been introduced in ARROW-8919. Perhaps it was intentional, just figured we should double check. Thanks again!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)