You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/08/30 12:04:00 UTC

[GitHub] [iceberg] Fokko opened a new pull request, #5672: Python: Update docs and fine-tune the API

Fokko opened a new pull request, #5672:
URL: https://github.com/apache/iceberg/pull/5672

   The API wasn't consistent everywhere. Now the ids will just initialize at 1, so the user doesn't have to do this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r962038084


##########
docs/python-quickstart.md:
##########
@@ -26,45 +26,431 @@ menu:
  -->
 
 
-# Python API Quickstart
+# Python CLI Quickstart
 
-## Installation
+Pyiceberg ships with a CLI that's available after installing the package.
 
-Iceberg python is currently in development, for development and testing purposes the best way to install the library is to perform the following steps:
+```sh
+➜  pyiceberg --help
+Usage: pyiceberg [OPTIONS] COMMAND [ARGS]...
+
+Options:
+  --catalog TEXT
+  --verbose BOOLEAN
+  --output [text|json]
+  --uri TEXT
+  --credential TEXT
+  --help                Show this message and exit.
+
+Commands:
+  describe    Describes a namespace xor table
+  drop        Operations to drop a namespace or table
+  list        Lists tables or namespaces
+  location    Returns the location of the table
+  properties  Properties on tables/namespaces
+  rename      Renames a table
+  schema      Gets the schema of the table
+  spec        Returns the partition spec of the table
+  uuid        Returns the UUID of the table
+```
+
+Browsing the catalog

Review Comment:
   Should this be a section so we get TOC?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on pull request #5672: Python: Fine-tune the API

Posted by GitBox <gi...@apache.org>.
Fokko commented on PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#issuecomment-1251060366

   @rdblue I've resolved the merge conflicts, would you have time for another pass? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r961879301


##########
docs/python-api-intro.md:
##########
@@ -27,158 +27,152 @@ menu:
 
 # Iceberg Python API
 
-Much of the python api conforms to the java api. You can get more info about the java api [here](../api).
+Much of the python api conforms to the Java API. You can get more info about the java api [here](../api).
 
-## Catalog
-
-The Catalog interface, like java provides search and management operations for tables.
-
-To create a catalog:
+## Instal
 
-``` python
-from iceberg.hive import HiveTables
+You can install the latest release version from pypi:
 
-# instantiate Hive Tables
-conf = {"hive.metastore.uris": 'thrift://{hms_host}:{hms_port}',
-        "hive.metastore.warehouse.dir": {tmpdir} }
-tables = HiveTables(conf)
+```sh
+pip3 install "pyiceberg[s3fs,hive]"
 ```
 
-and to create a table from a catalog:
-
-``` python
-from iceberg.api.schema import Schema\
-from iceberg.api.types import TimestampType, DoubleType, StringType, NestedField
-from iceberg.api.partition_spec import PartitionSpecBuilder
-
-schema = Schema(NestedField.optional(1, "DateTime", TimestampType.with_timezone()),
-                NestedField.optional(2, "Bid", DoubleType.get()),
-                NestedField.optional(3, "Ask", DoubleType.get()),
-                NestedField.optional(4, "symbol", StringType.get()))
-partition_spec = PartitionSpecBuilder(schema).add(1, 1000, "DateTime_day", "day").build()
+Or install the latest development version locally:
 
-tables.create(schema, "test.test_123", partition_spec)
 ```
-
-
-## Tables
-
-The Table interface provides access to table metadata
-
-+ schema returns the current table `Schema`
-+ spec returns the current table `PartitonSpec`
-+ properties returns a map of key-value `TableProperties`
-+ currentSnapshot returns the current table `Snapshot`
-+ snapshots returns all valid snapshots for the table
-+ snapshot(id) returns a specific snapshot by ID
-+ location returns the table’s base location
-
-Tables also provide refresh to update the table to the latest version.
-
-### Scanning
-Iceberg table scans start by creating a `TableScan` object with `newScan`.
-
-``` python
-scan = table.new_scan();
+pip3 install poetry --upgrade
+pip3 install -e ".[s3fs,hive]"
 ```
 
-To configure a scan, call filter and select on the `TableScan` to get a new `TableScan` with those changes.
-
-``` python
-filtered_scan = scan.filter(Expressions.equal("id", 5))
-```
+With optional dependencies:
 
-String expressions can also be passed to the filter method.
+| Key       | Description:                                                          |
+|-----------|-----------------------------------------------------------------------|
+| hive      | Support for the Hive metastore                                        |
+| pyarrow   | PyArrow as a FileIO implementation to interact with the object store  |
+| s3fs      | S3FS as a FileIO implementation to interact with the object store     |
+| zstandard | Support for zstandard Avro compresssion                               |

Review Comment:
   Should we make this install by default? Seems like a good one for a hard dependency



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r962362728


##########
docs/python-feature-support.md:
##########
@@ -27,53 +27,29 @@ menu:
 
 # Feature Support
 
-The goal is that the python library will provide a functional, performant subset of the java library. The initial focus has been on reading table metadata as well as providing the capability to both plan and execute a scan.
+The goal is that the python library will provide a functional, performant subset of the Java library. The initial focus has been on reading table metadata and provide a convenient CLI to go through the catalog.
 
 ## Feature Comparison
 
 ### Metadata
 
 | Operation               | Java  | Python |
 |:------------------------|:-----:|:------:|
-| Get Schema              |    X  |    X   |
-| Get Snapshots           |    X  |    X   |
-| Plan Scan               |    X  |    X   |
-| Plan Scan for Snapshot  |    X  |    X   |
+| Get Schema              |    X  |   X    |
+| Get Snapshots           |    X  |   X    |
+| Plan Scan               |    X  |        |
+| Plan Scan for Snapshot  |    X  |        |
 | Update Current Snapshot |    X  |        |
 | Set Table Properties    |    X  |        |

Review Comment:
   Missed that one, both in the CLI and Python πŸ‘πŸ» 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r963088103


##########
python/pyiceberg/table/metadata.py:
##########
@@ -128,13 +138,13 @@ def construct_refs(cls, data: Dict[str, Any]):
     schemas: List[Schema] = Field(default_factory=list)
     """A list of schemas, stored as objects with schema-id."""
 
-    current_schema_id: int = Field(alias="current-schema-id", default=DEFAULT_SCHEMA_ID)
+    current_schema_id: int = Field(alias="current-schema-id", default=INITIAL_SCHEMA_ID)

Review Comment:
   I think that defaulting the ID when creating a new schema, spec, or order is fine. But I don't think it is a good idea to default it here. At this point, we no longer have users constructing metadata by hand and we want to make sure that we're setting the ID correctly. If we re-create a schema for a new table metadata object, then we should also set the current schema ID to that schema's ID rather than relying on the same default in two places. That way if we ever change the default assignment we don't break tables.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r963106427


##########
python/pyiceberg/table/partitioning.py:
##########
@@ -82,19 +82,16 @@ class PartitionSpec(IcebergBaseModel):
         fields(List[PartitionField): list of partition fields to produce partition values
     """
 
-    spec_id: int = Field(alias="spec-id")
-    fields: Tuple[PartitionField, ...] = Field(default_factory=tuple)
+    spec_id: int = Field(alias="spec-id", default=INITIAL_PARTITION_SPEC_ID)

Review Comment:
   This will definitely involve a lot of testing πŸ‘πŸ» 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r961819628


##########
docs/python-api-intro.md:
##########
@@ -27,158 +27,152 @@ menu:
 
 # Iceberg Python API
 
-Much of the python api conforms to the java api. You can get more info about the java api [here](../api).
+Much of the python api conforms to the Java API. You can get more info about the java api [here](../api).
 
-## Catalog
-
-The Catalog interface, like java provides search and management operations for tables.
-
-To create a catalog:
+## Instal
 
-``` python
-from iceberg.hive import HiveTables
+You can install the latest release version from pypi:
 
-# instantiate Hive Tables
-conf = {"hive.metastore.uris": 'thrift://{hms_host}:{hms_port}',
-        "hive.metastore.warehouse.dir": {tmpdir} }
-tables = HiveTables(conf)
+```sh
+pip3 install "pyiceberg[hive,pyarrow]"

Review Comment:
   Do we want this to be `s3fs` instead of `pyarrow`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r962039059


##########
python/pyiceberg/table/metadata.py:
##########
@@ -69,6 +73,12 @@ def check_partition_specs(values: Dict[str, Any]) -> Dict[str, Any]:
         if spec.spec_id == default_spec_id:
             return values
 
+    # When the table is unpartitioned, we just add the default spec
+    # This makes it optional when defining a table
+    if default_spec_id == UNPARTITIONED_PARTITION_SPEC_ID:

Review Comment:
   Unlike sort orders, we didn't reserve an ID for the unpartitioned spec. So unfortunately we can't use logic like this and add the spec. We also don't want to modify the table here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #5672: Python: Fine-tune the API

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#issuecomment-1251751243

   Looks good. There were a couple minor things, but those aren't blockers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #5672: Python: Fine-tune the API

Posted by GitBox <gi...@apache.org>.
rdblue merged PR #5672:
URL: https://github.com/apache/iceberg/pull/5672


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r963108278


##########
docs/python-quickstart.md:
##########
@@ -26,45 +26,431 @@ menu:
  -->
 
 
-# Python API Quickstart
+# Python CLI Quickstart
 
-## Installation
+Pyiceberg ships with a CLI that's available after installing the package.
 
-Iceberg python is currently in development, for development and testing purposes the best way to install the library is to perform the following steps:
+```sh
+➜  pyiceberg --help
+Usage: pyiceberg [OPTIONS] COMMAND [ARGS]...
+
+Options:
+  --catalog TEXT
+  --verbose BOOLEAN
+  --output [text|json]
+  --uri TEXT
+  --credential TEXT
+  --help                Show this message and exit.
+
+Commands:
+  describe    Describes a namespace xor table
+  drop        Operations to drop a namespace or table
+  list        Lists tables or namespaces
+  location    Returns the location of the table
+  properties  Properties on tables/namespaces
+  rename      Renames a table
+  schema      Gets the schema of the table
+  spec        Returns the partition spec of the table
+  uuid        Returns the UUID of the table
+```
+
+Browsing the catalog
+
+```sh
+➜  pyiceberg --uri thrift://localhost:9083 list                       
+default
+nyc  
+```
+
+```sh
+➜  pyiceberg --uri thrift://localhost:9083 list nyc
+nyc.taxis
 ```
-git clone https://github.com/apache/iceberg.git
-cd iceberg/python
-pip install -e .
+
+```sh
+➜  pyiceberg --uri thrift://localhost:9083 list nyc

Review Comment:
   I agree with that, but it gets in the way. I'd probably start the intro with a simple example with the --uri and then move to config and more commands. That way, the user can skip config if they choose and continue to use --uri, but we don't have to put it in every command (and update it later).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r963106606


##########
docs/python-api-intro.md:
##########
@@ -27,158 +27,156 @@ menu:
 
 # Iceberg Python API
 
-Much of the python api conforms to the java api. You can get more info about the java api [here](../api).
+Much of the python api conforms to the Java API. You can get more info about the java api [here](../api).
 
-## Catalog
-
-The Catalog interface, like java provides search and management operations for tables.
-
-To create a catalog:
+## Instal

Review Comment:
   Good one, thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r962038544


##########
docs/python-quickstart.md:
##########
@@ -26,45 +26,431 @@ menu:
  -->
 
 
-# Python API Quickstart
+# Python CLI Quickstart
 
-## Installation
+Pyiceberg ships with a CLI that's available after installing the package.
 
-Iceberg python is currently in development, for development and testing purposes the best way to install the library is to perform the following steps:
+```sh
+➜  pyiceberg --help
+Usage: pyiceberg [OPTIONS] COMMAND [ARGS]...
+
+Options:
+  --catalog TEXT
+  --verbose BOOLEAN
+  --output [text|json]
+  --uri TEXT
+  --credential TEXT
+  --help                Show this message and exit.
+
+Commands:
+  describe    Describes a namespace xor table
+  drop        Operations to drop a namespace or table
+  list        Lists tables or namespaces
+  location    Returns the location of the table
+  properties  Properties on tables/namespaces
+  rename      Renames a table
+  schema      Gets the schema of the table
+  spec        Returns the partition spec of the table
+  uuid        Returns the UUID of the table
+```
+
+Browsing the catalog
+
+```sh
+➜  pyiceberg --uri thrift://localhost:9083 list                       
+default
+nyc  
+```
+
+```sh
+➜  pyiceberg --uri thrift://localhost:9083 list nyc
+nyc.taxis
 ```
-git clone https://github.com/apache/iceberg.git
-cd iceberg/python
-pip install -e .
+
+```sh
+➜  pyiceberg --uri thrift://localhost:9083 list nyc
+nyc.taxis
 ```
 
-## Testing
-Testing is done using tox. The config can be found in `tox.ini` within the python directory of the iceberg project.
 
+```sh
+pyiceberg --uri thrift://localhost:9083 describe nyc.taxis
+Table format version  1                                                                                                                                                                                                 
+Metadata location     file:/.../nyc.db/taxis/metadata/00000-aa3a3eac-ea08-4255-b890-383a64a94e42.metadata.json                                                        
+Table UUID            6cdfda33-bfa3-48a7-a09e-7abb462e3460                                                                                                                                                              
+Last Updated          1661783158061                                                                                                                                                                                     
+Partition spec        []                                                                                                                                                                                                
+Sort order            []                                                                                                                                                                                                
+Current schema        Schema, id=0                                                                                                                                                                                      
+                      β”œβ”€β”€ 1: VendorID: optional long                                                                                                                                                                    
+                      β”œβ”€β”€ 2: tpep_pickup_datetime: optional timestamptz                                                                                                                                                 
+                      β”œβ”€β”€ 3: tpep_dropoff_datetime: optional timestamptz                                                                                                                                                
+                      β”œβ”€β”€ 4: passenger_count: optional double                                                                                                                                                           
+                      β”œβ”€β”€ 5: trip_distance: optional double                                                                                                                                                             
+                      β”œβ”€β”€ 6: RatecodeID: optional double                                                                                                                                                                
+                      β”œβ”€β”€ 7: store_and_fwd_flag: optional string                                                                                                                                                        
+                      β”œβ”€β”€ 8: PULocationID: optional long                                                                                                                                                                
+                      β”œβ”€β”€ 9: DOLocationID: optional long                                                                                                                                                                
+                      β”œβ”€β”€ 10: payment_type: optional long                                                                                                                                                               
+                      β”œβ”€β”€ 11: fare_amount: optional double                                                                                                                                                              
+                      β”œβ”€β”€ 12: extra: optional double                                                                                                                                                                    
+                      β”œβ”€β”€ 13: mta_tax: optional double                                                                                                                                                                  
+                      β”œβ”€β”€ 14: tip_amount: optional double                                                                                                                                                               
+                      β”œβ”€β”€ 15: tolls_amount: optional double                                                                                                                                                             
+                      β”œβ”€β”€ 16: improvement_surcharge: optional double                                                                                                                                                    
+                      β”œβ”€β”€ 17: total_amount: optional double                                                                                                                                                             
+                      β”œβ”€β”€ 18: congestion_surcharge: optional double                                                                                                                                                     
+                      └── 19: airport_fee: optional double                                                                                                                                                              
+Current snapshot      Operation.APPEND: id=5937117119577207079, schema_id=0                                                                                                                                             
+Snapshots             Snapshots                                                                                                                                                                                         
+                      └── Snapshot 5937117119577207079, schema 0: file:/.../nyc.db/taxis/metadata/snap-5937117119577207079-1-94656c4f-4c66-4600-a4ca-f30377300527.avro
+Properties            owner                 root                                                                                                                                                                        
+                      write.format.default  parquet
 ```
-# simply run tox from within the python dir
-tox
+
+Or output in JSON for automation:
+
+```sh
+pyiceberg --uri thrift://localhost:9083 --output json describe nyc.taxis | jq
+{
+  "identifier": [

Review Comment:
   This is a bit long. Could we snip some parts of the schemas?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r962039927


##########
python/pyiceberg/table/partitioning.py:
##########
@@ -178,4 +176,10 @@ def assign_fresh_partition_spec_ids(spec: PartitionSpec, old_schema: Schema, fre
                 transform=field.transform,
             )
         )
-    return PartitionSpec(INITIAL_SPEC_ID, fields=tuple(partition_fields))
+    if len(spec.fields) > 0:
+        return PartitionSpec(*partition_fields, spec_id=INITIAL_PARTITION_SPEC_ID)
+    return UNPARTITIONED_PARTITION_SPEC
+
+
+UNPARTITIONED_PARTITION_SPEC_ID = 0

Review Comment:
   I don't think we should have this constant, since we can't rely on it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r962039623


##########
python/pyiceberg/table/partitioning.py:
##########
@@ -29,7 +29,7 @@
 from pyiceberg.transforms import Transform
 from pyiceberg.utils.iceberg_base_model import IcebergBaseModel
 
-INITIAL_SPEC_ID = 0
+INITIAL_PARTITION_SPEC_ID = 1

Review Comment:
   I think it's a good idea to start assigning spec IDs at 1 and reserve 0 for unpartitioned, but there's no requirement in the spec to do that. Plus, it would be a breaking change so we can't really fix it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r962040291


##########
python/pyproject.toml:
##########
@@ -80,7 +80,7 @@ build-backend = "poetry.core.masonry.api"
 [tool.poetry.extras]
 pyarrow = ["pyarrow"]
 snappy = ["python-snappy"]
-python-snappy = ["zstandard"]
+zstandard = ["zstandard"]

Review Comment:
   Good catch!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r962040196


##########
python/pyiceberg/table/sorting.py:
##########
@@ -112,20 +115,18 @@ class SortOrder(IcebergBaseModel):
     The order of the sort fields within the list defines the order in which the sort is applied to the data.
 
     Args:
-      order_id (int): The id of the sort-order. To keep track of historical sorting
+      order_id (int): An unique id of the sort-order of a table.
       fields (List[SortField]): The fields how the table is sorted
     """
 
-    def __init__(self, order_id: Optional[int] = None, *fields: SortField, **data: Any):
-        if order_id is not None:
-            data["order-id"] = order_id
+    order_id: int = Field(alias="order-id", default=INITIAL_SORT_ORDER_ID)

Review Comment:
   Here as well, I think forcing the caller to handle sort order ID is a good idea.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r962362353


##########
docs/python-api-intro.md:
##########
@@ -27,158 +27,152 @@ menu:
 
 # Iceberg Python API
 
-Much of the python api conforms to the java api. You can get more info about the java api [here](../api).
+Much of the python api conforms to the Java API. You can get more info about the java api [here](../api).
 
-## Catalog
-
-The Catalog interface, like java provides search and management operations for tables.
-
-To create a catalog:
+## Instal
 
-``` python
-from iceberg.hive import HiveTables
+You can install the latest release version from pypi:
 
-# instantiate Hive Tables
-conf = {"hive.metastore.uris": 'thrift://{hms_host}:{hms_port}',
-        "hive.metastore.warehouse.dir": {tmpdir} }
-tables = HiveTables(conf)
+```sh
+pip3 install "pyiceberg[s3fs,hive]"
 ```
 
-and to create a table from a catalog:
-
-``` python
-from iceberg.api.schema import Schema\
-from iceberg.api.types import TimestampType, DoubleType, StringType, NestedField
-from iceberg.api.partition_spec import PartitionSpecBuilder
-
-schema = Schema(NestedField.optional(1, "DateTime", TimestampType.with_timezone()),
-                NestedField.optional(2, "Bid", DoubleType.get()),
-                NestedField.optional(3, "Ask", DoubleType.get()),
-                NestedField.optional(4, "symbol", StringType.get()))
-partition_spec = PartitionSpecBuilder(schema).add(1, 1000, "DateTime_day", "day").build()
+Or install the latest development version locally:
 
-tables.create(schema, "test.test_123", partition_spec)
 ```
-
-
-## Tables
-
-The Table interface provides access to table metadata
-
-+ schema returns the current table `Schema`
-+ spec returns the current table `PartitonSpec`
-+ properties returns a map of key-value `TableProperties`
-+ currentSnapshot returns the current table `Snapshot`
-+ snapshots returns all valid snapshots for the table
-+ snapshot(id) returns a specific snapshot by ID
-+ location returns the table’s base location
-
-Tables also provide refresh to update the table to the latest version.
-
-### Scanning
-Iceberg table scans start by creating a `TableScan` object with `newScan`.
-
-``` python
-scan = table.new_scan();
+pip3 install poetry --upgrade
+pip3 install -e ".[s3fs,hive]"
 ```
 
-To configure a scan, call filter and select on the `TableScan` to get a new `TableScan` with those changes.
-
-``` python
-filtered_scan = scan.filter(Expressions.equal("id", 5))
-```
+With optional dependencies:
 
-String expressions can also be passed to the filter method.
+| Key       | Description:                                                          |
+|-----------|-----------------------------------------------------------------------|
+| hive      | Support for the Hive metastore                                        |
+| pyarrow   | PyArrow as a FileIO implementation to interact with the object store  |
+| s3fs      | S3FS as a FileIO implementation to interact with the object store     |
+| zstandard | Support for zstandard Avro compresssion                               |
+| snappy    | Support for snappy Avro compresssion                                  |
 
-``` python
-filtered_scan = scan.filter("id=5")
-```
+## Catalog
 
-`Schema` projections can be applied against a `TableScan` by passing a list of column names.
+To instantiate a catalog:
 
 ``` python
-filtered_scan = scan.select(["col_1", "col_2", "col_3"])
-```
+>>> from pyiceberg.catalog.hive import HiveCatalog
+>>> catalog = HiveCatalog(name='prod', uri='thrift://localhost:9083/')
 
-Because some data types cannot be read using the python library, a convenience method for excluding columns from projection is provided.
-
-``` python
-filtered_scan = scan.select_except(["unsupported_col_1", "unsupported_col_2"])
-```
+>>> catalog.list_namespaces()
+[('default',), ('nyc',)]
 
+>>> catalog.list_tables('nyc')
+[('nyc', 'taxis')]
 
-Calls to configuration methods create a new `TableScan` so that each `TableScan` is immutable.
+>>> catalog.load_table(('nyc', 'taxis'))
+Table(identifier=('nyc', 'taxis'), ...)
+```
 
-When a scan is configured, `planFiles`, `planTasks`, and `Schema` are used to return files, tasks, and the read projection.
+And to create a table from a catalog:
 
 ``` python
-scan = table.new_scan() \
-    .filter("id=5") \
-    .select(["id", "data"])
-
-projection = scan.schema
-for task in scan.plan_tasks():
-    print(task)
+from pyiceberg.schema import Schema
+from pyiceberg.types import TimestampType, DoubleType, StringType, NestedField
+
+schema = Schema(
+    NestedField(field_id=1, name="datetime", field_type=TimestampType(), required=False),
+    NestedField(field_id=2, name="bid", field_type=DoubleType(), required=False),
+    NestedField(field_id=3, name="ask", field_type=DoubleType(), required=False),
+    NestedField(field_id=4, name="symbol", field_type=StringType(), required=False),
+)
+
+from pyiceberg.table.partitioning import PartitionSpec, PartitionField
+from pyiceberg.transforms import DayTransform
+
+partition_spec = PartitionSpec(
+    PartitionField(source_id=1, field_id=1000, transform=DayTransform(), name="datetime_day")
+)
+
+from pyiceberg.table.sorting import SortOrder, SortField
+from pyiceberg.transforms import IdentityTransform
+
+sort_order = SortOrder(
+    SortField(source_id=4, transform=IdentityTransform())
+)
+
+from pyiceberg.catalog.hive import HiveCatalog
+catalog = HiveCatalog(name='prod', uri='thrift://localhost:9083/')
+
+catalog.create_table(
+    identifier='default.bids',
+    location='/Users/fokkodriesprong/Desktop/docker-spark-iceberg/wh/bids/',
+    schema=schema,
+    partition_spec=partition_spec,
+    sort_order=sort_order
+)
+
+Table(

Review Comment:
   I don't have a strong opinion on it. I split this one out since it is quite big.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r962363797


##########
python/pyiceberg/table/metadata.py:
##########
@@ -69,6 +73,12 @@ def check_partition_specs(values: Dict[str, Any]) -> Dict[str, Any]:
         if spec.spec_id == default_spec_id:
             return values
 
+    # When the table is unpartitioned, we just add the default spec
+    # This makes it optional when defining a table
+    if default_spec_id == UNPARTITIONED_PARTITION_SPEC_ID:

Review Comment:
   That's a pity, let me remove this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5672: Python: Fine-tune the API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r974820490


##########
python/tests/catalog/test_rest.py:
##########
@@ -524,9 +523,9 @@ def test_create_table_200(rest_mock: Mocker, table_schema_simple: Schema):
         schema=table_schema_simple,
         location=None,
         partition_spec=PartitionSpec(
-            spec_id=1, fields=(PartitionField(source_id=1, field_id=1000, transform=TruncateTransform(width=3), name="id"),)
+            PartitionField(source_id=1, field_id=1000, transform=TruncateTransform(width=3), name="id"), spec_id=1

Review Comment:
   Looks like the mock causes the result to not match the request. We should start testing against the REST catalog servlet as soon as we can.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5672: Python: Fine-tune the API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r974817649


##########
python/pyiceberg/table/sorting.py:
##########
@@ -112,20 +115,18 @@ class SortOrder(IcebergBaseModel):
     The order of the sort fields within the list defines the order in which the sort is applied to the data.
 
     Args:
-      order_id (int): The id of the sort-order. To keep track of historical sorting
+      order_id (int): An unique id of the sort-order of a table.

Review Comment:
   I don't think we need "of a table" -- that assumes the context that uses the sort order.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r962363439


##########
docs/python-quickstart.md:
##########
@@ -26,45 +26,431 @@ menu:
  -->
 
 
-# Python API Quickstart
+# Python CLI Quickstart
 
-## Installation
+Pyiceberg ships with a CLI that's available after installing the package.
 
-Iceberg python is currently in development, for development and testing purposes the best way to install the library is to perform the following steps:
+```sh
+➜  pyiceberg --help
+Usage: pyiceberg [OPTIONS] COMMAND [ARGS]...
+
+Options:
+  --catalog TEXT
+  --verbose BOOLEAN
+  --output [text|json]
+  --uri TEXT
+  --credential TEXT
+  --help                Show this message and exit.
+
+Commands:
+  describe    Describes a namespace xor table
+  drop        Operations to drop a namespace or table
+  list        Lists tables or namespaces
+  location    Returns the location of the table
+  properties  Properties on tables/namespaces
+  rename      Renames a table
+  schema      Gets the schema of the table
+  spec        Returns the partition spec of the table
+  uuid        Returns the UUID of the table
+```
+
+Browsing the catalog
+
+```sh
+➜  pyiceberg --uri thrift://localhost:9083 list                       
+default
+nyc  
+```
+
+```sh
+➜  pyiceberg --uri thrift://localhost:9083 list nyc
+nyc.taxis
 ```
-git clone https://github.com/apache/iceberg.git
-cd iceberg/python
-pip install -e .
+
+```sh
+➜  pyiceberg --uri thrift://localhost:9083 list nyc
+nyc.taxis
 ```
 
-## Testing
-Testing is done using tox. The config can be found in `tox.ini` within the python directory of the iceberg project.
 
+```sh
+pyiceberg --uri thrift://localhost:9083 describe nyc.taxis
+Table format version  1                                                                                                                                                                                                 
+Metadata location     file:/.../nyc.db/taxis/metadata/00000-aa3a3eac-ea08-4255-b890-383a64a94e42.metadata.json                                                        
+Table UUID            6cdfda33-bfa3-48a7-a09e-7abb462e3460                                                                                                                                                              
+Last Updated          1661783158061                                                                                                                                                                                     
+Partition spec        []                                                                                                                                                                                                
+Sort order            []                                                                                                                                                                                                
+Current schema        Schema, id=0                                                                                                                                                                                      
+                      β”œβ”€β”€ 1: VendorID: optional long                                                                                                                                                                    
+                      β”œβ”€β”€ 2: tpep_pickup_datetime: optional timestamptz                                                                                                                                                 
+                      β”œβ”€β”€ 3: tpep_dropoff_datetime: optional timestamptz                                                                                                                                                
+                      β”œβ”€β”€ 4: passenger_count: optional double                                                                                                                                                           
+                      β”œβ”€β”€ 5: trip_distance: optional double                                                                                                                                                             
+                      β”œβ”€β”€ 6: RatecodeID: optional double                                                                                                                                                                
+                      β”œβ”€β”€ 7: store_and_fwd_flag: optional string                                                                                                                                                        
+                      β”œβ”€β”€ 8: PULocationID: optional long                                                                                                                                                                
+                      β”œβ”€β”€ 9: DOLocationID: optional long                                                                                                                                                                
+                      β”œβ”€β”€ 10: payment_type: optional long                                                                                                                                                               
+                      β”œβ”€β”€ 11: fare_amount: optional double                                                                                                                                                              
+                      β”œβ”€β”€ 12: extra: optional double                                                                                                                                                                    
+                      β”œβ”€β”€ 13: mta_tax: optional double                                                                                                                                                                  
+                      β”œβ”€β”€ 14: tip_amount: optional double                                                                                                                                                               
+                      β”œβ”€β”€ 15: tolls_amount: optional double                                                                                                                                                             
+                      β”œβ”€β”€ 16: improvement_surcharge: optional double                                                                                                                                                    
+                      β”œβ”€β”€ 17: total_amount: optional double                                                                                                                                                             
+                      β”œβ”€β”€ 18: congestion_surcharge: optional double                                                                                                                                                     
+                      └── 19: airport_fee: optional double                                                                                                                                                              
+Current snapshot      Operation.APPEND: id=5937117119577207079, schema_id=0                                                                                                                                             
+Snapshots             Snapshots                                                                                                                                                                                         
+                      └── Snapshot 5937117119577207079, schema 0: file:/.../nyc.db/taxis/metadata/snap-5937117119577207079-1-94656c4f-4c66-4600-a4ca-f30377300527.avro
+Properties            owner                 root                                                                                                                                                                        
+                      write.format.default  parquet
 ```
-# simply run tox from within the python dir
-tox
+
+Or output in JSON for automation:
+
+```sh
+pyiceberg --uri thrift://localhost:9083 --output json describe nyc.taxis | jq
+{
+  "identifier": [

Review Comment:
   Agreed, pruned the schemas



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r962362825


##########
docs/python-quickstart.md:
##########
@@ -26,45 +26,431 @@ menu:
  -->
 
 
-# Python API Quickstart
+# Python CLI Quickstart
 
-## Installation
+Pyiceberg ships with a CLI that's available after installing the package.
 
-Iceberg python is currently in development, for development and testing purposes the best way to install the library is to perform the following steps:
+```sh
+➜  pyiceberg --help
+Usage: pyiceberg [OPTIONS] COMMAND [ARGS]...
+
+Options:
+  --catalog TEXT
+  --verbose BOOLEAN
+  --output [text|json]
+  --uri TEXT
+  --credential TEXT
+  --help                Show this message and exit.
+
+Commands:
+  describe    Describes a namespace xor table
+  drop        Operations to drop a namespace or table
+  list        Lists tables or namespaces
+  location    Returns the location of the table
+  properties  Properties on tables/namespaces
+  rename      Renames a table
+  schema      Gets the schema of the table
+  spec        Returns the partition spec of the table
+  uuid        Returns the UUID of the table
+```
+
+Browsing the catalog

Review Comment:
   Good call πŸ‘πŸ» 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r962363711


##########
python/pyiceberg/table/partitioning.py:
##########
@@ -82,19 +82,16 @@ class PartitionSpec(IcebergBaseModel):
         fields(List[PartitionField): list of partition fields to produce partition values
     """
 
-    spec_id: int = Field(alias="spec-id")
-    fields: Tuple[PartitionField, ...] = Field(default_factory=tuple)
+    spec_id: int = Field(alias="spec-id", default=INITIAL_PARTITION_SPEC_ID)

Review Comment:
   I feel that we don't should really expose this to the user. For example, when we create a new table, we re-assign the IDs anyway (using the assign fresh IDs logic).
   If we follow the Java API, and we have something similar to `updateSpec`: https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/Table.java#L165-L171 Then we can just take the next ID. What do you think of this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r963106150


##########
python/pyiceberg/table/metadata.py:
##########
@@ -128,13 +138,13 @@ def construct_refs(cls, data: Dict[str, Any]):
     schemas: List[Schema] = Field(default_factory=list)
     """A list of schemas, stored as objects with schema-id."""
 
-    current_schema_id: int = Field(alias="current-schema-id", default=DEFAULT_SCHEMA_ID)
+    current_schema_id: int = Field(alias="current-schema-id", default=INITIAL_SCHEMA_ID)

Review Comment:
   Fair point. I've removed this set explicitly when we get a v1 metadata.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r963058814


##########
docs/python-api-intro.md:
##########
@@ -27,158 +27,156 @@ menu:
 
 # Iceberg Python API
 
-Much of the python api conforms to the java api. You can get more info about the java api [here](../api).
+Much of the python api conforms to the Java API. You can get more info about the java api [here](../api).
 
-## Catalog
-
-The Catalog interface, like java provides search and management operations for tables.
-
-To create a catalog:
+## Instal

Review Comment:
   Typo: `Install`. What about "Installing" or "Installation"? "Install" is a verb so it makes a strange section heading.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r963058372


##########
python/pyiceberg/table/partitioning.py:
##########
@@ -82,19 +82,16 @@ class PartitionSpec(IcebergBaseModel):
         fields(List[PartitionField): list of partition fields to produce partition values
     """
 
-    spec_id: int = Field(alias="spec-id")
-    fields: Tuple[PartitionField, ...] = Field(default_factory=tuple)
+    spec_id: int = Field(alias="spec-id", default=INITIAL_PARTITION_SPEC_ID)

Review Comment:
   That sounds reasonable to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
Fokko commented on PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#issuecomment-1232104623

   Waiting for https://github.com/apache/iceberg/pull/5627


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r961879991


##########
docs/python-api-intro.md:
##########
@@ -27,158 +27,152 @@ menu:
 
 # Iceberg Python API
 
-Much of the python api conforms to the java api. You can get more info about the java api [here](../api).
+Much of the python api conforms to the Java API. You can get more info about the java api [here](../api).
 
-## Catalog
-
-The Catalog interface, like java provides search and management operations for tables.
-
-To create a catalog:
+## Instal
 
-``` python
-from iceberg.hive import HiveTables
+You can install the latest release version from pypi:
 
-# instantiate Hive Tables
-conf = {"hive.metastore.uris": 'thrift://{hms_host}:{hms_port}',
-        "hive.metastore.warehouse.dir": {tmpdir} }
-tables = HiveTables(conf)
+```sh
+pip3 install "pyiceberg[s3fs,hive]"
 ```
 
-and to create a table from a catalog:
-
-``` python
-from iceberg.api.schema import Schema\
-from iceberg.api.types import TimestampType, DoubleType, StringType, NestedField
-from iceberg.api.partition_spec import PartitionSpecBuilder
-
-schema = Schema(NestedField.optional(1, "DateTime", TimestampType.with_timezone()),
-                NestedField.optional(2, "Bid", DoubleType.get()),
-                NestedField.optional(3, "Ask", DoubleType.get()),
-                NestedField.optional(4, "symbol", StringType.get()))
-partition_spec = PartitionSpecBuilder(schema).add(1, 1000, "DateTime_day", "day").build()
+Or install the latest development version locally:
 
-tables.create(schema, "test.test_123", partition_spec)
 ```
-
-
-## Tables
-
-The Table interface provides access to table metadata
-
-+ schema returns the current table `Schema`
-+ spec returns the current table `PartitonSpec`
-+ properties returns a map of key-value `TableProperties`
-+ currentSnapshot returns the current table `Snapshot`
-+ snapshots returns all valid snapshots for the table
-+ snapshot(id) returns a specific snapshot by ID
-+ location returns the table’s base location
-
-Tables also provide refresh to update the table to the latest version.
-
-### Scanning
-Iceberg table scans start by creating a `TableScan` object with `newScan`.
-
-``` python
-scan = table.new_scan();
+pip3 install poetry --upgrade
+pip3 install -e ".[s3fs,hive]"
 ```
 
-To configure a scan, call filter and select on the `TableScan` to get a new `TableScan` with those changes.
-
-``` python
-filtered_scan = scan.filter(Expressions.equal("id", 5))
-```
+With optional dependencies:
 
-String expressions can also be passed to the filter method.
+| Key       | Description:                                                          |
+|-----------|-----------------------------------------------------------------------|
+| hive      | Support for the Hive metastore                                        |
+| pyarrow   | PyArrow as a FileIO implementation to interact with the object store  |
+| s3fs      | S3FS as a FileIO implementation to interact with the object store     |
+| zstandard | Support for zstandard Avro compresssion                               |
+| snappy    | Support for snappy Avro compresssion                                  |
 
-``` python
-filtered_scan = scan.filter("id=5")
-```
+## Catalog
 
-`Schema` projections can be applied against a `TableScan` by passing a list of column names.
+To instantiate a catalog:
 
 ``` python
-filtered_scan = scan.select(["col_1", "col_2", "col_3"])
-```
+>>> from pyiceberg.catalog.hive import HiveCatalog
+>>> catalog = HiveCatalog(name='prod', uri='thrift://localhost:9083/')
 
-Because some data types cannot be read using the python library, a convenience method for excluding columns from projection is provided.
-
-``` python
-filtered_scan = scan.select_except(["unsupported_col_1", "unsupported_col_2"])
-```
+>>> catalog.list_namespaces()
+[('default',), ('nyc',)]
 
+>>> catalog.list_tables('nyc')
+[('nyc', 'taxis')]
 
-Calls to configuration methods create a new `TableScan` so that each `TableScan` is immutable.
+>>> catalog.load_table(('nyc', 'taxis'))
+Table(identifier=('nyc', 'taxis'), ...)
+```
 
-When a scan is configured, `planFiles`, `planTasks`, and `Schema` are used to return files, tasks, and the read projection.
+And to create a table from a catalog:
 
 ``` python
-scan = table.new_scan() \
-    .filter("id=5") \
-    .select(["id", "data"])
-
-projection = scan.schema
-for task in scan.plan_tasks():
-    print(task)
+from pyiceberg.schema import Schema
+from pyiceberg.types import TimestampType, DoubleType, StringType, NestedField
+
+schema = Schema(
+    NestedField(field_id=1, name="datetime", field_type=TimestampType(), required=False),
+    NestedField(field_id=2, name="bid", field_type=DoubleType(), required=False),
+    NestedField(field_id=3, name="ask", field_type=DoubleType(), required=False),
+    NestedField(field_id=4, name="symbol", field_type=StringType(), required=False),
+)
+
+from pyiceberg.table.partitioning import PartitionSpec, PartitionField
+from pyiceberg.transforms import DayTransform
+
+partition_spec = PartitionSpec(
+    PartitionField(source_id=1, field_id=1000, transform=DayTransform(), name="datetime_day")
+)
+
+from pyiceberg.table.sorting import SortOrder, SortField
+from pyiceberg.transforms import IdentityTransform
+
+sort_order = SortOrder(
+    SortField(source_id=4, transform=IdentityTransform())
+)
+
+from pyiceberg.catalog.hive import HiveCatalog
+catalog = HiveCatalog(name='prod', uri='thrift://localhost:9083/')
+
+catalog.create_table(
+    identifier='default.bids',
+    location='/Users/fokkodriesprong/Desktop/docker-spark-iceberg/wh/bids/',
+    schema=schema,
+    partition_spec=partition_spec,
+    sort_order=sort_order
+)
+
+Table(

Review Comment:
   I usually put result output in a separate pre box, but up to you.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r962039789


##########
python/pyiceberg/table/partitioning.py:
##########
@@ -82,19 +82,16 @@ class PartitionSpec(IcebergBaseModel):
         fields(List[PartitionField): list of partition fields to produce partition values
     """
 
-    spec_id: int = Field(alias="spec-id")
-    fields: Tuple[PartitionField, ...] = Field(default_factory=tuple)
+    spec_id: int = Field(alias="spec-id", default=INITIAL_PARTITION_SPEC_ID)

Review Comment:
   I think I'd prefer to handle ID assignment manually rather than defaulting. Defaulting seems to bring in complexity because if we forget to pass along an ID somewhere, it would cause problems.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r962363353


##########
docs/python-quickstart.md:
##########
@@ -26,45 +26,431 @@ menu:
  -->
 
 
-# Python API Quickstart
+# Python CLI Quickstart
 
-## Installation
+Pyiceberg ships with a CLI that's available after installing the package.
 
-Iceberg python is currently in development, for development and testing purposes the best way to install the library is to perform the following steps:
+```sh
+➜  pyiceberg --help
+Usage: pyiceberg [OPTIONS] COMMAND [ARGS]...
+
+Options:
+  --catalog TEXT
+  --verbose BOOLEAN
+  --output [text|json]
+  --uri TEXT
+  --credential TEXT
+  --help                Show this message and exit.
+
+Commands:
+  describe    Describes a namespace xor table
+  drop        Operations to drop a namespace or table
+  list        Lists tables or namespaces
+  location    Returns the location of the table
+  properties  Properties on tables/namespaces
+  rename      Renames a table
+  schema      Gets the schema of the table
+  spec        Returns the partition spec of the table
+  uuid        Returns the UUID of the table
+```
+
+Browsing the catalog
+
+```sh
+➜  pyiceberg --uri thrift://localhost:9083 list                       
+default
+nyc  
+```
+
+```sh
+➜  pyiceberg --uri thrift://localhost:9083 list nyc
+nyc.taxis
 ```
-git clone https://github.com/apache/iceberg.git
-cd iceberg/python
-pip install -e .
+
+```sh
+➜  pyiceberg --uri thrift://localhost:9083 list nyc

Review Comment:
   My initial idea was to have the user as quickly as possible up and running, but I changed the order. Let me know what you think.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r962370075


##########
python/pyiceberg/table/partitioning.py:
##########
@@ -29,7 +29,7 @@
 from pyiceberg.transforms import Transform
 from pyiceberg.utils.iceberg_base_model import IcebergBaseModel
 
-INITIAL_SPEC_ID = 0
+INITIAL_PARTITION_SPEC_ID = 1

Review Comment:
   Alright, moved everything back



##########
python/pyiceberg/table/partitioning.py:
##########
@@ -178,4 +176,10 @@ def assign_fresh_partition_spec_ids(spec: PartitionSpec, old_schema: Schema, fre
                 transform=field.transform,
             )
         )
-    return PartitionSpec(INITIAL_SPEC_ID, fields=tuple(partition_fields))
+    if len(spec.fields) > 0:
+        return PartitionSpec(*partition_fields, spec_id=INITIAL_PARTITION_SPEC_ID)
+    return UNPARTITIONED_PARTITION_SPEC
+
+
+UNPARTITIONED_PARTITION_SPEC_ID = 0

Review Comment:
   And it is gone 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r961880410


##########
docs/python-feature-support.md:
##########
@@ -27,53 +27,29 @@ menu:
 
 # Feature Support
 
-The goal is that the python library will provide a functional, performant subset of the java library. The initial focus has been on reading table metadata as well as providing the capability to both plan and execute a scan.
+The goal is that the python library will provide a functional, performant subset of the Java library. The initial focus has been on reading table metadata and provide a convenient CLI to go through the catalog.
 
 ## Feature Comparison
 
 ### Metadata
 
 | Operation               | Java  | Python |
 |:------------------------|:-----:|:------:|
-| Get Schema              |    X  |    X   |
-| Get Snapshots           |    X  |    X   |
-| Plan Scan               |    X  |    X   |
-| Plan Scan for Snapshot  |    X  |    X   |
+| Get Schema              |    X  |   X    |
+| Get Snapshots           |    X  |   X    |
+| Plan Scan               |    X  |        |
+| Plan Scan for Snapshot  |    X  |        |
 | Update Current Snapshot |    X  |        |
 | Set Table Properties    |    X  |        |

Review Comment:
   Is this now supported, at least in the CLI?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r962038342


##########
docs/python-quickstart.md:
##########
@@ -26,45 +26,431 @@ menu:
  -->
 
 
-# Python API Quickstart
+# Python CLI Quickstart
 
-## Installation
+Pyiceberg ships with a CLI that's available after installing the package.
 
-Iceberg python is currently in development, for development and testing purposes the best way to install the library is to perform the following steps:
+```sh
+➜  pyiceberg --help
+Usage: pyiceberg [OPTIONS] COMMAND [ARGS]...
+
+Options:
+  --catalog TEXT
+  --verbose BOOLEAN
+  --output [text|json]
+  --uri TEXT
+  --credential TEXT
+  --help                Show this message and exit.
+
+Commands:
+  describe    Describes a namespace xor table
+  drop        Operations to drop a namespace or table
+  list        Lists tables or namespaces
+  location    Returns the location of the table
+  properties  Properties on tables/namespaces
+  rename      Renames a table
+  schema      Gets the schema of the table
+  spec        Returns the partition spec of the table
+  uuid        Returns the UUID of the table
+```
+
+Browsing the catalog
+
+```sh
+➜  pyiceberg --uri thrift://localhost:9083 list                       
+default
+nyc  
+```
+
+```sh
+➜  pyiceberg --uri thrift://localhost:9083 list nyc
+nyc.taxis
 ```
-git clone https://github.com/apache/iceberg.git
-cd iceberg/python
-pip install -e .
+
+```sh
+➜  pyiceberg --uri thrift://localhost:9083 list nyc

Review Comment:
   What do you think about having a section early on for the config file so that we can avoid having `--uri` in every command?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r962039298


##########
python/pyiceberg/table/metadata.py:
##########
@@ -128,13 +138,13 @@ def construct_refs(cls, data: Dict[str, Any]):
     schemas: List[Schema] = Field(default_factory=list)
     """A list of schemas, stored as objects with schema-id."""
 
-    current_schema_id: int = Field(alias="current-schema-id", default=DEFAULT_SCHEMA_ID)
+    current_schema_id: int = Field(alias="current-schema-id", default=INITIAL_SCHEMA_ID)

Review Comment:
   These probably shouldn't have defaults because they need to be explicitly set to some ID that exists in the list of schemas, specs, or sort orders.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r962362114


##########
docs/python-api-intro.md:
##########
@@ -27,158 +27,152 @@ menu:
 
 # Iceberg Python API
 
-Much of the python api conforms to the java api. You can get more info about the java api [here](../api).
+Much of the python api conforms to the Java API. You can get more info about the java api [here](../api).
 
-## Catalog
-
-The Catalog interface, like java provides search and management operations for tables.
-
-To create a catalog:
+## Instal
 
-``` python
-from iceberg.hive import HiveTables
+You can install the latest release version from pypi:
 
-# instantiate Hive Tables
-conf = {"hive.metastore.uris": 'thrift://{hms_host}:{hms_port}',
-        "hive.metastore.warehouse.dir": {tmpdir} }
-tables = HiveTables(conf)
+```sh
+pip3 install "pyiceberg[s3fs,hive]"
 ```
 
-and to create a table from a catalog:
-
-``` python
-from iceberg.api.schema import Schema\
-from iceberg.api.types import TimestampType, DoubleType, StringType, NestedField
-from iceberg.api.partition_spec import PartitionSpecBuilder
-
-schema = Schema(NestedField.optional(1, "DateTime", TimestampType.with_timezone()),
-                NestedField.optional(2, "Bid", DoubleType.get()),
-                NestedField.optional(3, "Ask", DoubleType.get()),
-                NestedField.optional(4, "symbol", StringType.get()))
-partition_spec = PartitionSpecBuilder(schema).add(1, 1000, "DateTime_day", "day").build()
+Or install the latest development version locally:
 
-tables.create(schema, "test.test_123", partition_spec)
 ```
-
-
-## Tables
-
-The Table interface provides access to table metadata
-
-+ schema returns the current table `Schema`
-+ spec returns the current table `PartitonSpec`
-+ properties returns a map of key-value `TableProperties`
-+ currentSnapshot returns the current table `Snapshot`
-+ snapshots returns all valid snapshots for the table
-+ snapshot(id) returns a specific snapshot by ID
-+ location returns the table’s base location
-
-Tables also provide refresh to update the table to the latest version.
-
-### Scanning
-Iceberg table scans start by creating a `TableScan` object with `newScan`.
-
-``` python
-scan = table.new_scan();
+pip3 install poetry --upgrade
+pip3 install -e ".[s3fs,hive]"
 ```
 
-To configure a scan, call filter and select on the `TableScan` to get a new `TableScan` with those changes.
-
-``` python
-filtered_scan = scan.filter(Expressions.equal("id", 5))
-```
+With optional dependencies:
 
-String expressions can also be passed to the filter method.
+| Key       | Description:                                                          |
+|-----------|-----------------------------------------------------------------------|
+| hive      | Support for the Hive metastore                                        |
+| pyarrow   | PyArrow as a FileIO implementation to interact with the object store  |
+| s3fs      | S3FS as a FileIO implementation to interact with the object store     |
+| zstandard | Support for zstandard Avro compresssion                               |

Review Comment:
   I checked, and it is quite lightweight. We don't write any Manifest yet, but in Java we do: https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/avro/Avro.java I'm fine with making this a hard one πŸ‘πŸ» 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r963058372


##########
python/pyiceberg/table/partitioning.py:
##########
@@ -82,19 +82,16 @@ class PartitionSpec(IcebergBaseModel):
         fields(List[PartitionField): list of partition fields to produce partition values
     """
 
-    spec_id: int = Field(alias="spec-id")
-    fields: Tuple[PartitionField, ...] = Field(default_factory=tuple)
+    spec_id: int = Field(alias="spec-id", default=INITIAL_PARTITION_SPEC_ID)

Review Comment:
   That sounds reasonable to me. I think we just need to make sure that reassignment is correct!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r962369982


##########
python/pyiceberg/table/metadata.py:
##########
@@ -128,13 +138,13 @@ def construct_refs(cls, data: Dict[str, Any]):
     schemas: List[Schema] = Field(default_factory=list)
     """A list of schemas, stored as objects with schema-id."""
 
-    current_schema_id: int = Field(alias="current-schema-id", default=DEFAULT_SCHEMA_ID)
+    current_schema_id: int = Field(alias="current-schema-id", default=INITIAL_SCHEMA_ID)

Review Comment:
   https://github.com/apache/iceberg/pull/5672#discussion_r962363711



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on pull request #5672: Python: Fine-tune the API

Posted by GitBox <gi...@apache.org>.
Fokko commented on PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#issuecomment-1240627545

   Split out the changes to the docs to https://github.com/apache/iceberg/pull/5727


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org