You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/08/07 21:32:21 UTC

[GitHub] [iceberg] rdblue commented on pull request #5417: Python: Add a CLI to go through the catalog

rdblue commented on PR #5417:
URL: https://github.com/apache/iceberg/pull/5417#issuecomment-1207489316

   I like this a lot, but I think we should make some adjustments so that the output is usable with other CLI tools. It feels weird to work with a CLI that pretty-prints and can't be used in combination with `awk` and `grep` easily. I made some comments on the implementation for this. I think just using `Table.grid` in most places is a good compromise between the trade-offs. And this should route error text to stderr as well.
   
   I also thought about the command structure for a while. I definitely prefer not to mirror the API directly, unless we have an API where that is the expectation (like, the `aws s3api` functionality). Most people will use this CLI to do something, like explore a catalog or find information on a table. I think the CLI should be written to make those use cases effective and easy.
   
   For example, currently there's a `load-table` command that corresponds to `load_table` in the catalog API. But loading a table is just a first step in a series of actions to do something useful. It doesn't make sense to me to load a table as a CLI command because the caller is coming with a purpose beyond that first step, like looking at the columns of a table and their documentation. It may be that the user wants to see the schema, the partitioning, table properties, or maybe a summary of everything. All of those cases load the table, but showing the same information each time is distracting.
   
   I think that the CLI should have distinct commands that are more focused on a purpose, like `schema` to show a table schema, or `properties` to show table or namespace properties. Like these, for example:
   
   ```
   pyiceberg schema db.table
   pyiceberg spec db.table
   pyiceberg order db.table
   pyiceberg uuid db.table
   pyiceberg location db.table
   pyiceberg properties db.table
   ```
   
   We probably do want a `summary` or `describe` command that shows more information.
   
   It's also strange for the to tell the API what type of object is being operated on or expected. That's another artifact of mirroring the catalog API. When I use `ls`, I don't need to tell the command that I want to list a directory or a symlinked directory. I also don't need to tell `ls` what I'm looking for, it just lists everything. I think `pyiceberg` should work the same way. Rather than having `list-tables` and `list-namespaces`, I think we should have a single `list` command that shows the output of both API calls. We can use `[bold blue]` to format namespaces and use filters, like `--tables`, to restrict the type of objects shown. Commands that I think would work with both namespaces and tables are `describe`, `properties`, `set`, and `remove`:
   
   ```
   pyiceberg describe db
   pyiceberg describe db.table
   pyiceberg properties db
   pyiceberg properties db.table
   pyiceberg set db properties a=b
   pyiceberg set db.t1 properties a=b
   pyiceberg remove db properties a b
   pyiceberg remove db.t1 properties a b
   ```
   
   The exception to that are the `create` and `drop` commands, which should probably be explicit about what you're dropping (like `rm` vs `rmdir`): `pyiceberg drop table db.table`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org