You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/08/22 12:48:06 UTC

[GitHub] [iceberg] Fokko opened a new pull request, #5609: Python: Add additional information to the describe command

Fokko opened a new pull request, #5609:
URL: https://github.com/apache/iceberg/pull/5609

   This was very useful debugging https://github.com/apache/iceberg/issues/5591


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #5609: Python: Add additional information to the describe command

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#issuecomment-1223048873

   Yea , I had thought a simple print representation of the metadata would be very useful, there's a pr and some discussion at: https://github.com/apache/iceberg/pull/4142. It had left off with DOT format being suggested as easier to read for large tables, and I didnt have time to do it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #5609: Python: Add additional information to the describe command

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#issuecomment-1223220144

   Thanks, @szehon-ho! I didn't realize that there was a PR out for it. What do you think about using the python CLI instead of making it a Java utility? The CLI already does a great job showing tree data so I think we could build it quickly and make it easier to call by bundling in the CLI.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on pull request #5609: Python: Add additional information to the describe command

Posted by GitBox <gi...@apache.org>.
Fokko commented on PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#issuecomment-1226160921

   @rdblue Thanks for merging so quickly. I've removed the parts of fetching the manifests, and I'll do that in a separate PR including the mocking of the file-io, which is quite a hassle.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5609: Python: Add additional information to the describe command

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#discussion_r951840954


##########
python/tests/cli/test_console.py:
##########
@@ -189,13 +189,17 @@ def test_describe_table(_):
                         2 ASC NULLS FIRST
                         bucket[4](3) DESC NULLS LAST
                       ]
-Schema                Schema
+Current schema        Schema, id=1
                       ├── 1: x: required long
                       ├── 2: y: required long (comment)
                       └── 3: z: required long
+Current snapshot      id=3055729675574597004, parent_id=3051729675574597004,
+                      schema_id=1

Review Comment:
   What caused this line wrap? I don't see it in the code above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #5609: Python: Add additional information to the describe command

Posted by GitBox <gi...@apache.org>.
rdblue merged PR #5609:
URL: https://github.com/apache/iceberg/pull/5609


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5609: Python: Add additional information to the describe command

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#discussion_r951839034


##########
python/pyiceberg/cli/output.py:
##########
@@ -97,13 +97,13 @@ def describe_table(self, table: Table):
         for key, value in metadata.properties.items():
             table_properties.add_row(key, value)
 
-        schema_tree = Tree("Schema")
+        schema_tree = Tree(f"Schema, id={table.metadata.current_schema_id}")
         for field in table.schema().fields:
             schema_tree.add(str(field))
 
         snapshot_tree = Tree("Snapshots")
         for snapshot in metadata.snapshots:
-            snapshot_tree.add(f"Snapshot {snapshot.schema_id}: {snapshot.manifest_list}")
+            snapshot_tree.add(f"Snapshot {snapshot.snapshot_id}, schema {snapshot.schema_id}:  {snapshot.manifest_list}")

Review Comment:
   Is the extra space needed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #5609: Python: Add additional information to the describe command

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#issuecomment-1222831212

   It may also be helpful to have a full tree view of a snapshot. I think @szehon-ho had suggested that a while ago.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #5609: Python: Add additional information to the describe command

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#discussion_r953823673


##########
python/pyiceberg/table/snapshots.py:
##########
@@ -96,6 +96,12 @@ class Snapshot(IcebergBaseModel):
     summary: Optional[Summary] = Field()
     schema_id: Optional[int] = Field(alias="schema-id", default=None)
 
+    def __str__(self) -> str:

Review Comment:
   Good one, added it! 👍🏻 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #5609: Python: Add additional information to the describe command

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#issuecomment-1226058105

   @Fokko, I agree we don't want to add that feature in this PR. I think it would actually be a separate `tree` command rather than adding more to `describe`. The description should be metadata only. It will be really expensive to run `tree` on a real table so we should only do it if asked specifically.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5609: Python: Add additional information to the describe command

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#discussion_r951840433


##########
python/pyiceberg/table/snapshots.py:
##########
@@ -96,6 +96,12 @@ class Snapshot(IcebergBaseModel):
     summary: Optional[Summary] = Field()
     schema_id: Optional[int] = Field(alias="schema-id", default=None)
 
+    def __str__(self) -> str:

Review Comment:
   It would be great to have operation in here as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #5609: Python: Add additional information to the describe command

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#discussion_r953826658


##########
python/tests/cli/test_console.py:
##########
@@ -189,13 +189,17 @@ def test_describe_table(_):
                         2 ASC NULLS FIRST
                         bucket[4](3) DESC NULLS LAST
                       ]
-Schema                Schema
+Current schema        Schema, id=1
                       ├── 1: x: required long
                       ├── 2: y: required long (comment)
                       └── 3: z: required long
+Current snapshot      id=3055729675574597004, parent_id=3051729675574597004,
+                      schema_id=1

Review Comment:
   `rich` automatically overflows to a newline (with the correct indentation, that's why `schema_id` is nicely aligned with `id`. In a normal terminal, it will infer the size. There are multiple options to control the overflow: https://rich.readthedocs.io/en/stable/console.html#overflow In the terminal it looks nicer if you have sufficient width.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on pull request #5609: Python: Add additional information to the describe command

Posted by GitBox <gi...@apache.org>.
Fokko commented on PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#issuecomment-1225853269

   Something like this:
   ```
   (pyiceberg-0Eb0aXNo-py3.8) ➜  python git:(fd-add-more-information) ✗ pyiceberg --uri thrift://localhost:9083 describe nyc.taxis
   Table format version  1                                                                                                                                                                                                 
   Metadata location     file:/Users/fokkodriesprong/Desktop/docker-spark-iceberg/wh/nyc.db/taxis/metadata/00003-423ac127-c400-413b-8750-2bf3f17ce013.metadata.json                                                        
   Table UUID            017ddcd0-afca-41b4-9613-a032d9d4ee69                                                                                                                                                              
   Last Updated          1661347970107                                                                                                                                                                                     
   Partition spec        [                                                                                                                                                                                                 
                           1000: tpep_days: void(2)                                                                                                                                                                        
                           1001: tpep_dropoff_datetime_day: unknown(3)                                                                                                                                                     
                           1002: tpep_dropff_days: unknown(2)                                                                                                                                                              
                         ]                                                                                                                                                                                                 
   Sort order            []                                                                                                                                                                                                
   Current schema        Schema, id=0                                                                                                                                                                                      
                         ├── 1: VendorID: optional long                                                                                                                                                                    
                         ├── 2: tpep_pickup_datetime: optional timestamptz                                                                                                                                                 
                         ├── 3: tpep_dropoff_datetime: optional timestamptz                                                                                                                                                
                         ├── 4: passenger_count: optional double                                                                                                                                                           
                         ├── 5: trip_distance: optional double                                                                                                                                                             
                         ├── 6: RatecodeID: optional double                                                                                                                                                                
                         ├── 7: store_and_fwd_flag: optional string                                                                                                                                                        
                         ├── 8: PULocationID: optional long                                                                                                                                                                
                         ├── 9: DOLocationID: optional long                                                                                                                                                                
                         ├── 10: payment_type: optional long                                                                                                                                                               
                         ├── 11: fare_amount: optional double                                                                                                                                                              
                         ├── 12: extra: optional double                                                                                                                                                                    
                         ├── 13: mta_tax: optional double                                                                                                                                                                  
                         ├── 14: tip_amount: optional double                                                                                                                                                               
                         ├── 15: tolls_amount: optional double                                                                                                                                                             
                         ├── 16: improvement_surcharge: optional double                                                                                                                                                    
                         ├── 17: total_amount: optional double                                                                                                                                                             
                         ├── 18: congestion_surcharge: optional double                                                                                                                                                     
                         └── 19: airport_fee: optional double                                                                                                                                                              
   Current snapshot      Operation.APPEND: id=7992925991545429343, schema_id=0                                                                                                                                             
   Snapshots             Snapshots                                                                                                                                                                                         
                         └── Snapshot 7992925991545429343, schema 0: file:/Users/fokkodriesprong/Desktop/docker-spark-iceberg/wh/nyc.db/taxis/metadata/snap-7992925991545429343-1-4f8901f6-2ba5-41ae-8be2-2f9d813d69a3.avro
                             └── Manifest: file:/Users/fokkodriesprong/Desktop/docker-spark-iceberg/wh/nyc.db/taxis/metadata/4f8901f6-2ba5-41ae-8be2-2f9d813d69a3-m0.avro                                                  
                                 └── Datafile: file:/Users/fokkodriesprong/Desktop/docker-spark-iceberg/wh/nyc.db/taxis/data/00003-5-e9d9a61e-7383-431a-9b8c-ccc92534c0f0-00001.parquet                                    
   Properties            owner                 root                                                                                                                                                                        
                         write.format.default  parquet 
   ```
   Probably we need to do some multithreading here, but not sure if we want to do that in this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #5609: Python: Add additional information to the describe command

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#discussion_r953830042


##########
python/pyiceberg/cli/output.py:
##########
@@ -97,13 +97,13 @@ def describe_table(self, table: Table):
         for key, value in metadata.properties.items():
             table_properties.add_row(key, value)
 
-        schema_tree = Tree("Schema")
+        schema_tree = Tree(f"Schema, id={table.metadata.current_schema_id}")
         for field in table.schema().fields:
             schema_tree.add(str(field))
 
         snapshot_tree = Tree("Snapshots")
         for snapshot in metadata.snapshots:
-            snapshot_tree.add(f"Snapshot {snapshot.schema_id}: {snapshot.manifest_list}")
+            snapshot_tree.add(f"Snapshot {snapshot.snapshot_id}, schema {snapshot.schema_id}:  {snapshot.manifest_list}")

Review Comment:
   Definitely not, slipped in there. Thanks for noticing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #5609: Python: Add additional information to the describe command

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5609:
URL: https://github.com/apache/iceberg/pull/5609#issuecomment-1226140984

   Thanks, @Fokko!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org