You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/05/05 19:30:57 UTC

[GitHub] [iceberg] samredai commented on a diff in pull request #4706: Python: Support iceberg base catalog in python library (#3245)

samredai commented on code in PR #4706:
URL: https://github.com/apache/iceberg/pull/4706#discussion_r866235532


##########
python/src/iceberg/catalog/base.py:
##########
@@ -0,0 +1,198 @@
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing,
+#  software distributed under the License is distributed on an
+#  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+#  KIND, either express or implied.  See the License for the
+#  specific language governing permissions and limitations
+#  under the License.
+
+from abc import ABC, abstractmethod
+from typing import Optional
+
+from iceberg.schema import Schema
+from iceberg.table.base import PartitionSpec, Table
+
+
+class Catalog(ABC):
+    """
+    Base Catalog for table operations like - create, drop, load, list and others.
+
+    Attributes:
+        name(str): Name of the catalog
+        properties(dict): Catalog properties
+    """
+
+    def __init__(self, name: str, properties: dict):
+        self._name = name
+        self._properties = properties
+
+    @property
+    def name(self) -> str:
+        return self._name
+
+    @property
+    def properties(self) -> dict:
+        return self._properties
+
+    @abstractmethod
+    def list_tables(self) -> list:
+        """
+        List tables in the catalog.
+
+        :return: list of table names in the catalog.
+        """
+
+    @abstractmethod
+    def create_table(
+        self,
+        name: str,
+        schema: Schema,
+        partition_spec: PartitionSpec,
+        *,
+        location: Optional[str] = None,
+        properties: Optional[dict] = None
+    ) -> Table:
+        """
+        Create a table
+
+        :param name: Table's name. Fully classified table name, if it is a namespaced catalog.
+        :param schema: Table's schema
+        :param partition_spec: A partition spec for the table
+        :param location: a location for the table; Optional Keyword Argument
+        :param properties: a string dictionary of table properties; Optional Keyword Argument
+        :return: the created table instance
+        :raises AlreadyExistsError: If a table with the name already exists
+        """
+
+    @abstractmethod
+    def table(self, name: str) -> Table:
+        """
+        Loads the table's metadata and returns the table instance. You can also use this method to
+        check for table existence using 'try catalog.table() except TableNotFoundError'
+        Note: This method does not load table's data in any form.
+
+        :param name: Table's name. Fully classified table name, if it is a namespaced catalog.
+        :return: the table instance with its metadata
+        :raises TableNotFoundError: If a table with the name does not exist
+        """
+
+    @abstractmethod
+    def drop_table(self, name: str, purge: bool = True) -> None:
+        """
+        Drop a table; Optionally purge all data and metadata files.
+
+        :param name: table name
+        :param purge: Defaults to true, which deletes all data and metadata files in the table; Optional Argument
+        :return: Nothing
+        :raises TableNotFoundError: If a table with the name does not exist
+        """
+
+    @abstractmethod
+    def rename_table(self, from_name: str, to_name: str) -> None:
+        """
+        Drop a table; Optionally purge all data and metadata files.
+
+        :param from_name: Existing table's name. Fully classified table name, if it is a namespaced catalog.
+        :param to_name: New Table name to be assigned. Fully classified table name, if it is a namespaced catalog.
+        :return: Nothing
+        :raises TableNotFoundError: If a table with the name does not exist
+        """
+
+    @abstractmethod
+    def replace_table(
+        self,
+        name: str,
+        schema: Schema,
+        partition_spec: PartitionSpec,
+        *,
+        location: Optional[str] = None,
+        properties: Optional[dict] = None
+    ) -> Table:
+        """
+        Starts a transaction and replaces the table with the provided spec.
+
+        :param name: Table's name. Fully classified table name, if it is a namespaced catalog.
+        :param schema: Table's schema
+        :param partition_spec: A partition spec for the table
+        :param location: a location for the table; Optional Keyword Argument
+        :param properties: a string dictionary of table properties; Optional Keyword Argument
+        :return: the replaced table instance
+        :raises TableNotFoundError: If a table with the name does not exist
+        """
+
+
+class NamespacedCatalog(Catalog):

Review Comment:
   Do we need to have this as a subclass? I know the JDBC catalog has been updated recently to include namespaces. If the ideal implementation of a catalog should support namespaces, then I'd vote for adding these abstractmethods to `Catalog(ABC)` and an implementation that does not support namespaces will have to explicitly override it and raise some kind of error.



##########
python/src/iceberg/catalog/base.py:
##########
@@ -0,0 +1,198 @@
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing,
+#  software distributed under the License is distributed on an
+#  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+#  KIND, either express or implied.  See the License for the
+#  specific language governing permissions and limitations
+#  under the License.
+
+from abc import ABC, abstractmethod
+from typing import Optional
+
+from iceberg.schema import Schema
+from iceberg.table.base import PartitionSpec, Table
+
+
+class Catalog(ABC):
+    """
+    Base Catalog for table operations like - create, drop, load, list and others.
+
+    Attributes:
+        name(str): Name of the catalog
+        properties(dict): Catalog properties
+    """
+
+    def __init__(self, name: str, properties: dict):
+        self._name = name
+        self._properties = properties
+
+    @property
+    def name(self) -> str:
+        return self._name
+
+    @property
+    def properties(self) -> dict:
+        return self._properties
+
+    @abstractmethod
+    def list_tables(self) -> list:
+        """
+        List tables in the catalog.
+
+        :return: list of table names in the catalog.
+        """
+
+    @abstractmethod
+    def create_table(
+        self,
+        name: str,
+        schema: Schema,
+        partition_spec: PartitionSpec,
+        *,
+        location: Optional[str] = None,
+        properties: Optional[dict] = None
+    ) -> Table:
+        """
+        Create a table
+
+        :param name: Table's name. Fully classified table name, if it is a namespaced catalog.

Review Comment:
   For consistency with the other docstrings, can you switch this to [google style](https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings) instead of sphinx style? (So using `Args:`)



##########
python/src/iceberg/catalog/base.py:
##########
@@ -0,0 +1,198 @@
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing,
+#  software distributed under the License is distributed on an
+#  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+#  KIND, either express or implied.  See the License for the
+#  specific language governing permissions and limitations
+#  under the License.
+
+from abc import ABC, abstractmethod
+from typing import Optional
+
+from iceberg.schema import Schema
+from iceberg.table.base import PartitionSpec, Table
+
+
+class Catalog(ABC):
+    """
+    Base Catalog for table operations like - create, drop, load, list and others.
+
+    Attributes:
+        name(str): Name of the catalog
+        properties(dict): Catalog properties
+    """
+
+    def __init__(self, name: str, properties: dict):
+        self._name = name
+        self._properties = properties
+
+    @property
+    def name(self) -> str:
+        return self._name
+
+    @property
+    def properties(self) -> dict:
+        return self._properties
+
+    @abstractmethod
+    def list_tables(self) -> list:
+        """
+        List tables in the catalog.
+
+        :return: list of table names in the catalog.
+        """
+
+    @abstractmethod
+    def create_table(
+        self,
+        name: str,
+        schema: Schema,
+        partition_spec: PartitionSpec,
+        *,
+        location: Optional[str] = None,
+        properties: Optional[dict] = None
+    ) -> Table:
+        """
+        Create a table
+
+        :param name: Table's name. Fully classified table name, if it is a namespaced catalog.
+        :param schema: Table's schema
+        :param partition_spec: A partition spec for the table
+        :param location: a location for the table; Optional Keyword Argument
+        :param properties: a string dictionary of table properties; Optional Keyword Argument
+        :return: the created table instance
+        :raises AlreadyExistsError: If a table with the name already exists
+        """
+
+    @abstractmethod
+    def table(self, name: str) -> Table:
+        """
+        Loads the table's metadata and returns the table instance. You can also use this method to
+        check for table existence using 'try catalog.table() except TableNotFoundError'
+        Note: This method does not load table's data in any form.
+
+        :param name: Table's name. Fully classified table name, if it is a namespaced catalog.
+        :return: the table instance with its metadata
+        :raises TableNotFoundError: If a table with the name does not exist
+        """
+
+    @abstractmethod
+    def drop_table(self, name: str, purge: bool = True) -> None:
+        """
+        Drop a table; Optionally purge all data and metadata files.
+
+        :param name: table name
+        :param purge: Defaults to true, which deletes all data and metadata files in the table; Optional Argument
+        :return: Nothing
+        :raises TableNotFoundError: If a table with the name does not exist
+        """
+
+    @abstractmethod
+    def rename_table(self, from_name: str, to_name: str) -> None:
+        """
+        Drop a table; Optionally purge all data and metadata files.
+
+        :param from_name: Existing table's name. Fully classified table name, if it is a namespaced catalog.
+        :param to_name: New Table name to be assigned. Fully classified table name, if it is a namespaced catalog.
+        :return: Nothing
+        :raises TableNotFoundError: If a table with the name does not exist
+        """
+
+    @abstractmethod
+    def replace_table(
+        self,
+        name: str,
+        schema: Schema,
+        partition_spec: PartitionSpec,
+        *,
+        location: Optional[str] = None,
+        properties: Optional[dict] = None
+    ) -> Table:
+        """
+        Starts a transaction and replaces the table with the provided spec.
+
+        :param name: Table's name. Fully classified table name, if it is a namespaced catalog.
+        :param schema: Table's schema
+        :param partition_spec: A partition spec for the table
+        :param location: a location for the table; Optional Keyword Argument
+        :param properties: a string dictionary of table properties; Optional Keyword Argument
+        :return: the replaced table instance
+        :raises TableNotFoundError: If a table with the name does not exist
+        """
+
+
+class NamespacedCatalog(Catalog):
+    """
+    Base catalog for catalogs that support namespaces.
+    """
+
+    @abstractmethod
+    def create_namespace(self, namespace: str, properties: Optional[dict] = None) -> None:
+        """
+        Create a namespace in the catalog.
+
+        :param namespace: The namespace to be created.
+        :param properties: A string dict of properties for the given namespace
+        :return: Nothing
+        :raises AlreadyExistsError: If a namespace with the name already exists in the namespace
+        """
+
+    @abstractmethod
+    def drop_namespace(self, namespace: str) -> None:
+        """
+        Drop a namespace.
+
+        :param namespace: The namespace to be dropped.
+        :return: Nothing
+        :raises NamespaceNotFoundError: If a namespace with the name does not exist in the namespace
+        :raises NamespaceNotEmptyError: If the namespace is not empty
+        """
+
+    @abstractmethod
+    def list_tables(self, namespace: Optional[str] = None) -> list:
+        """
+        List tables under the given namespace in the catalog. If namespace not provided, will list all tables in the
+        catalog.
+
+        :param namespace: the namespace to search
+        :return: list of table names under this namespace.
+        :raises NamespaceNotFoundError: If no such namespace exist
+        """
+
+    @abstractmethod
+    def list_namespaces(self, namespace: Optional[str] = None) -> list:
+        """
+        List namespaces from the given namespace. If not given, list top-level namespaces from the catalog.
+
+        :param namespace: given namespace
+        :return: a List of namespace names
+        """
+
+    @abstractmethod
+    def get_namespace_metadata(self, namespace: str) -> dict:

Review Comment:
   I'm not familiar with namespace metadata, is this general metadata that's persisted in the catalog?



##########
python/src/iceberg/catalog/base.py:
##########
@@ -0,0 +1,198 @@
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing,
+#  software distributed under the License is distributed on an
+#  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+#  KIND, either express or implied.  See the License for the
+#  specific language governing permissions and limitations
+#  under the License.
+
+from abc import ABC, abstractmethod
+from typing import Optional
+
+from iceberg.schema import Schema
+from iceberg.table.base import PartitionSpec, Table
+
+
+class Catalog(ABC):
+    """
+    Base Catalog for table operations like - create, drop, load, list and others.
+
+    Attributes:
+        name(str): Name of the catalog
+        properties(dict): Catalog properties
+    """
+
+    def __init__(self, name: str, properties: dict):
+        self._name = name
+        self._properties = properties
+
+    @property
+    def name(self) -> str:
+        return self._name
+
+    @property
+    def properties(self) -> dict:
+        return self._properties
+
+    @abstractmethod
+    def list_tables(self) -> list:
+        """
+        List tables in the catalog.
+
+        :return: list of table names in the catalog.
+        """
+
+    @abstractmethod
+    def create_table(
+        self,
+        name: str,
+        schema: Schema,
+        partition_spec: PartitionSpec,

Review Comment:
   This should be optional to account for unpartitioned table creation, right? Actually thinking about this more, information like name, schema, and partition spec should be contained in an instance of `Table`--would it be better here if these table related methods just took a `Table` instance?



##########
python/src/iceberg/catalog/base.py:
##########
@@ -0,0 +1,198 @@
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing,
+#  software distributed under the License is distributed on an
+#  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+#  KIND, either express or implied.  See the License for the
+#  specific language governing permissions and limitations
+#  under the License.
+
+from abc import ABC, abstractmethod
+from typing import Optional
+
+from iceberg.schema import Schema
+from iceberg.table.base import PartitionSpec, Table
+
+
+class Catalog(ABC):
+    """
+    Base Catalog for table operations like - create, drop, load, list and others.
+
+    Attributes:
+        name(str): Name of the catalog
+        properties(dict): Catalog properties
+    """
+
+    def __init__(self, name: str, properties: dict):
+        self._name = name
+        self._properties = properties
+
+    @property
+    def name(self) -> str:
+        return self._name
+
+    @property
+    def properties(self) -> dict:
+        return self._properties
+
+    @abstractmethod
+    def list_tables(self) -> list:
+        """
+        List tables in the catalog.
+
+        :return: list of table names in the catalog.
+        """
+
+    @abstractmethod
+    def create_table(
+        self,
+        name: str,
+        schema: Schema,
+        partition_spec: PartitionSpec,
+        *,
+        location: Optional[str] = None,
+        properties: Optional[dict] = None
+    ) -> Table:
+        """
+        Create a table
+
+        :param name: Table's name. Fully classified table name, if it is a namespaced catalog.
+        :param schema: Table's schema
+        :param partition_spec: A partition spec for the table
+        :param location: a location for the table; Optional Keyword Argument
+        :param properties: a string dictionary of table properties; Optional Keyword Argument
+        :return: the created table instance
+        :raises AlreadyExistsError: If a table with the name already exists
+        """
+
+    @abstractmethod
+    def table(self, name: str) -> Table:
+        """
+        Loads the table's metadata and returns the table instance. You can also use this method to
+        check for table existence using 'try catalog.table() except TableNotFoundError'
+        Note: This method does not load table's data in any form.
+
+        :param name: Table's name. Fully classified table name, if it is a namespaced catalog.
+        :return: the table instance with its metadata
+        :raises TableNotFoundError: If a table with the name does not exist
+        """
+
+    @abstractmethod
+    def drop_table(self, name: str, purge: bool = True) -> None:
+        """
+        Drop a table; Optionally purge all data and metadata files.
+
+        :param name: table name
+        :param purge: Defaults to true, which deletes all data and metadata files in the table; Optional Argument
+        :return: Nothing
+        :raises TableNotFoundError: If a table with the name does not exist
+        """
+
+    @abstractmethod
+    def rename_table(self, from_name: str, to_name: str) -> None:
+        """
+        Drop a table; Optionally purge all data and metadata files.
+
+        :param from_name: Existing table's name. Fully classified table name, if it is a namespaced catalog.
+        :param to_name: New Table name to be assigned. Fully classified table name, if it is a namespaced catalog.
+        :return: Nothing
+        :raises TableNotFoundError: If a table with the name does not exist
+        """
+
+    @abstractmethod
+    def replace_table(
+        self,
+        name: str,
+        schema: Schema,
+        partition_spec: PartitionSpec,
+        *,
+        location: Optional[str] = None,
+        properties: Optional[dict] = None
+    ) -> Table:
+        """
+        Starts a transaction and replaces the table with the provided spec.
+
+        :param name: Table's name. Fully classified table name, if it is a namespaced catalog.
+        :param schema: Table's schema
+        :param partition_spec: A partition spec for the table
+        :param location: a location for the table; Optional Keyword Argument
+        :param properties: a string dictionary of table properties; Optional Keyword Argument
+        :return: the replaced table instance
+        :raises TableNotFoundError: If a table with the name does not exist
+        """
+
+
+class NamespacedCatalog(Catalog):
+    """
+    Base catalog for catalogs that support namespaces.
+    """
+
+    @abstractmethod
+    def create_namespace(self, namespace: str, properties: Optional[dict] = None) -> None:
+        """
+        Create a namespace in the catalog.
+
+        :param namespace: The namespace to be created.
+        :param properties: A string dict of properties for the given namespace
+        :return: Nothing
+        :raises AlreadyExistsError: If a namespace with the name already exists in the namespace
+        """
+
+    @abstractmethod
+    def drop_namespace(self, namespace: str) -> None:
+        """
+        Drop a namespace.
+
+        :param namespace: The namespace to be dropped.
+        :return: Nothing
+        :raises NamespaceNotFoundError: If a namespace with the name does not exist in the namespace
+        :raises NamespaceNotEmptyError: If the namespace is not empty
+        """
+
+    @abstractmethod
+    def list_tables(self, namespace: Optional[str] = None) -> list:
+        """
+        List tables under the given namespace in the catalog. If namespace not provided, will list all tables in the
+        catalog.
+
+        :param namespace: the namespace to search
+        :return: list of table names under this namespace.
+        :raises NamespaceNotFoundError: If no such namespace exist
+        """
+
+    @abstractmethod
+    def list_namespaces(self, namespace: Optional[str] = None) -> list:

Review Comment:
   I'm not sure if this is from the java client but I can't think of an example of when you would provide a namespace here. I might be missing something but I'm imagining that `catalog.list_namespaces()` would always be called with no args and return all namespaces in the catalog.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org