You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/08/22 12:48:06 UTC
[GitHub] [iceberg] Fokko opened a new pull request, #5609: Python: Add additional information to the describe command
Fokko opened a new pull request, #5609:
URL: https://github.com/apache/iceberg/pull/5609
This was very useful debugging https://github.com/apache/iceberg/issues/5591
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] szehon-ho commented on pull request #5609: Python: Add additional information to the describe command
Posted by GitBox <gi...@apache.org>.
szehon-ho commented on PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#issuecomment-1223048873
Yea , I had thought a simple print representation of the metadata would be very useful, there's a pr and some discussion at: https://github.com/apache/iceberg/pull/4142. It had left off with DOT format being suggested as easier to read for large tables, and I didnt have time to do it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue commented on pull request #5609: Python: Add additional information to the describe command
Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#issuecomment-1223220144
Thanks, @szehon-ho! I didn't realize that there was a PR out for it. What do you think about using the python CLI instead of making it a Java utility? The CLI already does a great job showing tree data so I think we could build it quickly and make it easier to call by bundling in the CLI.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] Fokko commented on pull request #5609: Python: Add additional information to the describe command
Posted by GitBox <gi...@apache.org>.
Fokko commented on PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#issuecomment-1226160921
@rdblue Thanks for merging so quickly. I've removed the parts of fetching the manifests, and I'll do that in a separate PR including the mocking of the file-io, which is quite a hassle.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue commented on a diff in pull request #5609: Python: Add additional information to the describe command
Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#discussion_r951840954
##########
python/tests/cli/test_console.py:
##########
@@ -189,13 +189,17 @@ def test_describe_table(_):
2 ASC NULLS FIRST
bucket[4](3) DESC NULLS LAST
]
-Schema Schema
+Current schema Schema, id=1
├── 1: x: required long
├── 2: y: required long (comment)
└── 3: z: required long
+Current snapshot id=3055729675574597004, parent_id=3051729675574597004,
+ schema_id=1
Review Comment:
What caused this line wrap? I don't see it in the code above.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue merged pull request #5609: Python: Add additional information to the describe command
Posted by GitBox <gi...@apache.org>.
rdblue merged PR #5609:
URL: https://github.com/apache/iceberg/pull/5609
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue commented on a diff in pull request #5609: Python: Add additional information to the describe command
Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#discussion_r951839034
##########
python/pyiceberg/cli/output.py:
##########
@@ -97,13 +97,13 @@ def describe_table(self, table: Table):
for key, value in metadata.properties.items():
table_properties.add_row(key, value)
- schema_tree = Tree("Schema")
+ schema_tree = Tree(f"Schema, id={table.metadata.current_schema_id}")
for field in table.schema().fields:
schema_tree.add(str(field))
snapshot_tree = Tree("Snapshots")
for snapshot in metadata.snapshots:
- snapshot_tree.add(f"Snapshot {snapshot.schema_id}: {snapshot.manifest_list}")
+ snapshot_tree.add(f"Snapshot {snapshot.snapshot_id}, schema {snapshot.schema_id}: {snapshot.manifest_list}")
Review Comment:
Is the extra space needed?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue commented on pull request #5609: Python: Add additional information to the describe command
Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#issuecomment-1222831212
It may also be helpful to have a full tree view of a snapshot. I think @szehon-ho had suggested that a while ago.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] Fokko commented on a diff in pull request #5609: Python: Add additional information to the describe command
Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#discussion_r953823673
##########
python/pyiceberg/table/snapshots.py:
##########
@@ -96,6 +96,12 @@ class Snapshot(IcebergBaseModel):
summary: Optional[Summary] = Field()
schema_id: Optional[int] = Field(alias="schema-id", default=None)
+ def __str__(self) -> str:
Review Comment:
Good one, added it! 👍🏻
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue commented on pull request #5609: Python: Add additional information to the describe command
Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#issuecomment-1226058105
@Fokko, I agree we don't want to add that feature in this PR. I think it would actually be a separate `tree` command rather than adding more to `describe`. The description should be metadata only. It will be really expensive to run `tree` on a real table so we should only do it if asked specifically.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue commented on a diff in pull request #5609: Python: Add additional information to the describe command
Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#discussion_r951840433
##########
python/pyiceberg/table/snapshots.py:
##########
@@ -96,6 +96,12 @@ class Snapshot(IcebergBaseModel):
summary: Optional[Summary] = Field()
schema_id: Optional[int] = Field(alias="schema-id", default=None)
+ def __str__(self) -> str:
Review Comment:
It would be great to have operation in here as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] Fokko commented on a diff in pull request #5609: Python: Add additional information to the describe command
Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#discussion_r953826658
##########
python/tests/cli/test_console.py:
##########
@@ -189,13 +189,17 @@ def test_describe_table(_):
2 ASC NULLS FIRST
bucket[4](3) DESC NULLS LAST
]
-Schema Schema
+Current schema Schema, id=1
├── 1: x: required long
├── 2: y: required long (comment)
└── 3: z: required long
+Current snapshot id=3055729675574597004, parent_id=3051729675574597004,
+ schema_id=1
Review Comment:
`rich` automatically overflows to a newline (with the correct indentation, that's why `schema_id` is nicely aligned with `id`. In a normal terminal, it will infer the size. There are multiple options to control the overflow: https://rich.readthedocs.io/en/stable/console.html#overflow In the terminal it looks nicer if you have sufficient width.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] Fokko commented on pull request #5609: Python: Add additional information to the describe command
Posted by GitBox <gi...@apache.org>.
Fokko commented on PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#issuecomment-1225853269
Something like this:
```
(pyiceberg-0Eb0aXNo-py3.8) ➜ python git:(fd-add-more-information) ✗ pyiceberg --uri thrift://localhost:9083 describe nyc.taxis
Table format version 1
Metadata location file:/Users/fokkodriesprong/Desktop/docker-spark-iceberg/wh/nyc.db/taxis/metadata/00003-423ac127-c400-413b-8750-2bf3f17ce013.metadata.json
Table UUID 017ddcd0-afca-41b4-9613-a032d9d4ee69
Last Updated 1661347970107
Partition spec [
1000: tpep_days: void(2)
1001: tpep_dropoff_datetime_day: unknown(3)
1002: tpep_dropff_days: unknown(2)
]
Sort order []
Current schema Schema, id=0
├── 1: VendorID: optional long
├── 2: tpep_pickup_datetime: optional timestamptz
├── 3: tpep_dropoff_datetime: optional timestamptz
├── 4: passenger_count: optional double
├── 5: trip_distance: optional double
├── 6: RatecodeID: optional double
├── 7: store_and_fwd_flag: optional string
├── 8: PULocationID: optional long
├── 9: DOLocationID: optional long
├── 10: payment_type: optional long
├── 11: fare_amount: optional double
├── 12: extra: optional double
├── 13: mta_tax: optional double
├── 14: tip_amount: optional double
├── 15: tolls_amount: optional double
├── 16: improvement_surcharge: optional double
├── 17: total_amount: optional double
├── 18: congestion_surcharge: optional double
└── 19: airport_fee: optional double
Current snapshot Operation.APPEND: id=7992925991545429343, schema_id=0
Snapshots Snapshots
└── Snapshot 7992925991545429343, schema 0: file:/Users/fokkodriesprong/Desktop/docker-spark-iceberg/wh/nyc.db/taxis/metadata/snap-7992925991545429343-1-4f8901f6-2ba5-41ae-8be2-2f9d813d69a3.avro
└── Manifest: file:/Users/fokkodriesprong/Desktop/docker-spark-iceberg/wh/nyc.db/taxis/metadata/4f8901f6-2ba5-41ae-8be2-2f9d813d69a3-m0.avro
└── Datafile: file:/Users/fokkodriesprong/Desktop/docker-spark-iceberg/wh/nyc.db/taxis/data/00003-5-e9d9a61e-7383-431a-9b8c-ccc92534c0f0-00001.parquet
Properties owner root
write.format.default parquet
```
Probably we need to do some multithreading here, but not sure if we want to do that in this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] Fokko commented on a diff in pull request #5609: Python: Add additional information to the describe command
Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#discussion_r953830042
##########
python/pyiceberg/cli/output.py:
##########
@@ -97,13 +97,13 @@ def describe_table(self, table: Table):
for key, value in metadata.properties.items():
table_properties.add_row(key, value)
- schema_tree = Tree("Schema")
+ schema_tree = Tree(f"Schema, id={table.metadata.current_schema_id}")
for field in table.schema().fields:
schema_tree.add(str(field))
snapshot_tree = Tree("Snapshots")
for snapshot in metadata.snapshots:
- snapshot_tree.add(f"Snapshot {snapshot.schema_id}: {snapshot.manifest_list}")
+ snapshot_tree.add(f"Snapshot {snapshot.snapshot_id}, schema {snapshot.schema_id}: {snapshot.manifest_list}")
Review Comment:
Definitely not, slipped in there. Thanks for noticing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue commented on pull request #5609: Python: Add additional information to the describe command
Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#issuecomment-1226140984
Thanks, @Fokko!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org