You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/15 20:27:13 UTC
[GitHub] [arrow-datafusion] alamb commented on issue #1433: Query failing to return any results when filter is an equality check on strings
alamb commented on issue #1433:
URL: https://github.com/apache/arrow-datafusion/issues/1433#issuecomment-995187994
This sounds similar to something we hit in IOx (https://github.com/influxdata/influxdb_iox/issues/2153) which I ultimately tracked down to a bug in the parquet statistics generation: https://github.com/apache/arrow-rs/issues/641
So in this case, the statistics embedded in the parquet file for the `direction` column are `T:[min: Merged, max: Outgoing, num_nulls not defined]`, namely that the minimum value is `"Merged"` and the maximum value is `"Outgoing"` which I do not think is correct
```shell
$ parquet-tools meta test.parquet
file: file:/Users/alamb/Downloads/test.parquet
creator: UrbanLogiq
extra: ARROW:schema = /////+gAAAAQAAAAAAAKAA4ADAALAAQACgAAABQAAAAAAAABBAAKAAwAAAAIAAQACgAAAAgAAAAIAAAAAAAAAAMAAAB8AAAAPAAAAAQAAACg////GAAAACAAAAAAAAACHAAAAAgADAAEAAsACAAAACAAAAAAAAABAAAAAAMAAABhZHQA1P///xQAAAAMAAAAAAAABQwAAAAAAAAAxP///wkAAABkaXJlY3Rpb24AAAAQABQAEAAAAA8ABAAAAAgAEAAAABgAAAAMAAAAAAAABRAAAAAAAAAABAAEAAQAAAAKAAAAdWxfbm9kZV9pZAAA
file schema: arrow_schema
--------------------------------------------------------------------------------
ul_node_id: REQUIRED BINARY L:STRING R:0 D:0
direction: REQUIRED BINARY L:STRING R:0 D:0
adt: REQUIRED INT32 R:0 D:0
row group 1: RC:301 TS:3384 OFFSET:4
--------------------------------------------------------------------------------
ul_node_id: BINARY ZSTD DO:4 FPO:1796 SZ:2143/3187/1.49 VC:301 ENC:RLE_DICTIONARY,PLAIN,RLE ST:[min: /ehIvdei+UGfkQ4Gy5fr1w==, max: zThqpswvY6fa3VHF4BKWfw==, num_nulls not defined]
direction: BINARY ZSTD DO:2243 FPO:2311 SZ:195/177/0.91 VC:301 ENC:RLE_DICTIONARY,PLAIN,RLE ST:[min: Merged, max: Outgoing, num_nulls not defined]
adt: INT32 ZSTD DO:2500 FPO:3159 SZ:1046/1503/1.44 VC:301 ENC:RLE_DICTIONARY,PLAIN,RLE ST:[min: 15, max: 23116, num_nulls not defined]
```
Which appears to be incorrect for the data in test.parquet:
```
❯ select distinct direction from t order by direction;
+-----------+
| direction |
+-----------+
| Incoming |
| Merged |
| Outgoing |
| Two Way |
+-----------+
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org