You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ben Kietzman (Jira)" <ji...@apache.org> on 2021/02/16 15:00:16 UTC

[jira] [Resolved] (ARROW-11379) [C++][Dataset] Reading dataset with filtering on timestamp partition field crashes

     [ https://issues.apache.org/jira/browse/ARROW-11379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ben Kietzman resolved ARROW-11379.
----------------------------------
    Resolution: Fixed

Issue resolved by pull request 9466
[https://github.com/apache/arrow/pull/9466]

> [C++][Dataset] Reading dataset with filtering on timestamp partition field crashes
> ----------------------------------------------------------------------------------
>
>                 Key: ARROW-11379
>                 URL: https://issues.apache.org/jira/browse/ARROW-11379
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>    Affects Versions: 2.0.0
>            Reporter: Joris Van den Bossche
>            Assignee: Ben Kietzman
>            Priority: Major
>              Labels: dataset, pull-request-available
>             Fix For: 3.0.1, 4.0.0
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {code}
> In [1]: df = pd.DataFrame({"dates": list(pd.date_range("2012-01-01", periods=2, freq="D")) * 5, "col": range(10)})
> In [2]: df.to_parquet("test_partition_timestamps", partition_cols=["dates"])
> In [3]: !ls test_partition_timestamps/
> 'dates=2012-01-01 00:00:00'  'dates=2012-01-02 00:00:00'
> In [4]: import pyarrow.dataset as ds
> In [6]: part = ds.partitioning(pa.schema([("dates", pa.timestamp("s"))]), flavor="hive")
> In [7]: dataset = ds.dataset("test_partition_timestamps/", format="parquet", partitioning=part)
> {code}
> Reading the dataset is fine and fives the correct types:
> {code}
> In [10]: dataset.to_table()
> Out[10]: 
> pyarrow.Table
> col: int64
> dates: timestamp[s]
> {code}
> but filtering on the timestamp column segfaults:
> {code}
> In [11]: dataset.to_table(filter=ds.field("dates") > pd.Timestamp("2012-01-01"))
> ../src/arrow/compute/kernels/scalar_cast_temporal.cc:129:  Check failed: (batch[0].kind()) == (Datum::ARRAY) 
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(+0xc2224a)[0x7f68d2ccf24a]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(+0xc221c8)[0x7f68d2ccf1c8]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(+0xc221ea)[0x7f68d2ccf1ea]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZN5arrow4util8ArrowLogD1Ev+0x47)[0x7f68d2ccf549]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(+0xf0252a)[0x7f68d2faf52a]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZNSt17_Function_handlerIFvPN5arrow7compute13KernelContextERKNS1_9ExecBatchEPNS0_5DatumEEPS9_E9_M_invokeERKSt9_Any_dataOS3_S6_OS8_+0x69)[0x7f68d2e8ab86]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZNKSt8functionIFvPN5arrow7compute13KernelContextERKNS1_9ExecBatchEPNS0_5DatumEEEclES3_S6_S8_+0x7a)[0x7f68d2deec04]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(+0xd3d6f9)[0x7f68d2dea6f9]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(+0xd3cd5b)[0x7f68d2de9d5b]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZNK5arrow7compute8Function7ExecuteERKSt6vectorINS_5DatumESaIS3_EEPKNS0_15FunctionOptionsEPNS0_11ExecContextE+0x8c7)[0x7f68d2df9963]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(+0xd2eed2)[0x7f68d2ddbed2]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZNK5arrow7compute12MetaFunction7ExecuteERKSt6vectorINS_5DatumESaIS3_EEPKNS0_15FunctionOptionsEPNS0_11ExecContextE+0x15d)[0x7f68d2dfac8f]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZN5arrow7compute12CallFunctionERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorINS_5DatumESaISA_EEPKNS0_15FunctionOptionsEPNS0_11ExecContextE+0x26c)[0x7f68d2dedc6f]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZN5arrow7compute12CallFunctionERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorINS_5DatumESaISA_EEPKNS0_15FunctionOptionsEPNS0_11ExecContextE+0x93)[0x7f68d2deda96]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZN5arrow7compute4CastERKNS_5DatumERKNS0_11CastOptionsEPNS0_11ExecContextE+0xf7)[0x7f68d2ddd493]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZN5arrow7compute4CastERKNS_5DatumESt10shared_ptrINS_8DataTypeEERKNS0_11CastOptionsEPNS0_11ExecContextE+0x77)[0x7f68d2ddd6e2]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow_dataset.so.300(+0x1c5c21)[0x7f68b30cfc21]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow_dataset.so.300(+0x1c6789)[0x7f68b30d0789]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow_dataset.so.300(+0x1c5097)[0x7f68b30cf097]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow_dataset.so.300(_ZNK5arrow7dataset10Expression4BindENS_10ValueDescrEPNS_7compute11ExecContextE+0x732)[0x7f68b30d22e8]
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)