You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/11/22 05:28:00 UTC
[GitHub] [iceberg] arunb2w opened a new issue, #6244: Iceberg metadata not stored properly
arunb2w opened a new issue, #6244:
URL: https://github.com/apache/iceberg/issues/6244
I tried creating an sample iceberg table with below schema
```
CREATE TABLE glue_dev.db.datatype_test (
id bigint,
data string,
category string
)
USING iceberg
TBLPROPERTIES ('read.split.target-size'='134217728', "write.metadata.metrics.default"="full")
```
Then, inserted around 100 records and then did rewrite files after that so that all the inserted data will be rewritten in a single file.
```
for num in range(1,100)
INSERT INTO glue_dev.db.datatype_test VALUES ({num}, 'data{num}', 'catagory{num}')
```
After that I tried to query the data_files metadata.
```
select * from glue_dev.db.datatype_test.data_files limit 10;
content file_path file_format spec_id record_count file_size_in_bytes column_sizes value_counts null_value_counts nan_value_counts lower_bounds upper_bounds key_metadata split_offsets equality_ids sort_order_id
0 s3://bucket/folder/db.db/datatype_test/data/00000-0-4bb9ce80-c7a7-4192-98c6-ed6e7289a981-00001.parquet PARQUET 0 559 4234 {1:991,2:1188,3:1258} {1:559,2:559,3:559} {1:0,2:0,3:0} {} {1:,2:data1,3:catagory1} {1:/,2:data99,3:catagory99} NULL [4] NULL 0
Time taken: 15.614 seconds, Fetched 1 row(s)
```
In this metadata, if you see the lower_bounds and upper_bounds data for the id column which is of type bigint they dont represent the correct values.
Does that mean iceberg is not storing the metadata correctly?
In this case if i join using id column how iceberg will properly scan/skip files since the lower bound and upper bound metadata is not correct?
This is a test data but i have actual data with around 10000 files and they too exhibit similar behaviour for integer columns.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] github-actions[bot] commented on issue #6244: Iceberg metadata not stored properly
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #6244:
URL: https://github.com/apache/iceberg/issues/6244#issuecomment-1560279324
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] hililiwei commented on issue #6244: Iceberg metadata not stored properly
Posted by GitBox <gi...@apache.org>.
hililiwei commented on issue #6244:
URL: https://github.com/apache/iceberg/issues/6244#issuecomment-1325832061
Agree with @ajantha-bhat.
> As a workaround, I believe https://github.com/hililiwei/iceberg-tools#manifest2json can convert them and show them.
>
> cc: @hililiwei
This little toolbox was a try a long time ago. I uploaded a snapshot version. You can try it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] ajantha-bhat commented on issue #6244: Iceberg metadata not stored properly
Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on issue #6244:
URL: https://github.com/apache/iceberg/issues/6244#issuecomment-1323230663
> Does that mean iceberg is not storing the metadata correctly?
lower bounds and upper bounds are stored as byte arrays in the manifest files. Hence, what you are seeing is the bytearray string representation which is not human readable. But during file pruning these values are converted back to proper data types and compared.
We can enhance these metadata tables to show actual values. I believe @szehon-ho has an open PR for this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] szehon-ho closed issue #6244: Iceberg metadata not stored properly
Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho closed issue #6244: Iceberg metadata not stored properly
URL: https://github.com/apache/iceberg/issues/6244
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] ajantha-bhat commented on issue #6244: Iceberg metadata not stored properly
Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on issue #6244:
URL: https://github.com/apache/iceberg/issues/6244#issuecomment-1323232753
As a workaround, I believe https://github.com/hililiwei/iceberg-tools#manifest2json can convert them and show them.
cc: @hililiwei
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] szehon-ho commented on issue #6244: Iceberg metadata not stored properly
Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on issue #6244:
URL: https://github.com/apache/iceberg/issues/6244#issuecomment-1561998472
Yea this (at least display) should be fixed by #5376
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org