You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/12/07 12:48:31 UTC

[GitHub] [iceberg] rajarshisarkar opened a new pull request, #6376: Docs: Add register table Spark procedure documentation

rajarshisarkar opened a new pull request, #6376:
URL: https://github.com/apache/iceberg/pull/6376

   Add documentation for https://github.com/apache/iceberg/pull/4810
   
   ---
   cc: @RussellSpitzer


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#discussion_r1042956811


##########
docs/spark-procedures.md:
##########
@@ -493,6 +493,39 @@ CALL spark_catalog.system.add_files(
 )
 ```
 
+### `register_table`
+
+Creates a catalog entry for a metadata.json file which already exists but does not have a corresponding catalog identifier.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+| `table`       | ✔️  | string | Table which is to be registered |
+| `metadata_file`| ✔️  | string | Metadata file which is to be registered as a new catalog identifier |
+
+{{< hint warning >}}

Review Comment:
   I am not sure how this looks like, even in rich text view it didn't show as warning. 
   
   But I know another way to give warning.
   Adding as "!!! note" will display as a block in the sitedocs. 
   Example:
   md file: https://github.com/projectnessie/nessie/blob/main/site/docs/tools/iceberg/spark.md#spark-via-iceberg
   respective site doc: https://projectnessie.org/tools/iceberg/spark/#spark-via-iceberg
   
   So, According to me the below format is good. Let's see what @RussellSpitzer thinks.
   
   !!! note
   Having the same metadata.json registered in more than one catalog can lead to missing updates, loss of data, and table corruption. Only use this procedure when the table is no longer registered in an existing catalog, or you are moving a table between catalogs.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#issuecomment-1344055332

   @RussellSpitzer I was able to generate the warning block with `{{< hint warning >}} ... {{< /hint >}}` and not with `!!! Warning`. I have attached the screenshot in the PR. Please suggest.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#discussion_r1042354666


##########
docs/spark-procedures.md:
##########
@@ -493,6 +493,37 @@ CALL spark_catalog.system.add_files(
 )
 ```
 
+### `register_table`
+
+Creates a catalog entry for a metadata.json file which already exists but does not have a corresponding catalog identifier.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+| `table`       | ✔️  | string | Table which is to be registered |
+| `metadata_file`| ✔️  | string | Metadata file which is to be registered as a new catalog identifier |
+
+Warning: If we register tables which exist in another catalog to the current catalog, then the tables would exist in both the catalogs. And using same table from multiple catalogs is not recommended as it fails to keep the table metadata up to date. 

Review Comment:
   sgtm



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on a diff in pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on code in PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#discussion_r1043147720


##########
docs/spark-procedures.md:
##########
@@ -493,6 +493,39 @@ CALL spark_catalog.system.add_files(
 )
 ```
 
+### `register_table`
+
+Creates a catalog entry for a metadata.json file which already exists but does not have a corresponding catalog identifier.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+| `table`       | ✔️  | string | Table which is to be registered |
+| `metadata_file`| ✔️  | string | Metadata file which is to be registered as a new catalog identifier |
+
+{{< hint warning >}}

Review Comment:
   However, I do not find any other occurrence of `!!! note` in the project.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on code in PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#discussion_r1044117660


##########
docs/spark-procedures.md:
##########
@@ -493,6 +493,38 @@ CALL spark_catalog.system.add_files(
 )
 ```
 
+### `register_table`
+
+Creates a catalog entry for a metadata.json file which already exists but does not have a corresponding catalog identifier.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+| `table`       | ✔️  | string | Table which is to be registered |
+| `metadata_file`| ✔️  | string | Metadata file which is to be registered as a new catalog identifier |
+
+!!! Warning
+Having the same metadata.json registered in more than one catalog can lead to missing updates, loss of data, and table corruption. Only use this procedure when the table is no longer registered in an existing catalog, or you are moving a table between catalogs.

Review Comment:
   This line needs to be indented to have the block work
   
   Please check all do changes after rendering the page to make sure it works
   
   To do so follow the directions here
   https://github.com/apache/iceberg-docs/tree/main
   
   Once I copied the files into the iceberg-docs repo I ran
   hugo in the docs directory to generate the html in the public dir
   
   ![image](https://user-images.githubusercontent.com/413025/206635505-1e59295c-97d3-4fb8-810b-79e50089f6a7.png)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#discussion_r1043232718


##########
docs/spark-procedures.md:
##########
@@ -493,6 +493,39 @@ CALL spark_catalog.system.add_files(
 )
 ```
 
+### `register_table`
+
+Creates a catalog entry for a metadata.json file which already exists but does not have a corresponding catalog identifier.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+| `table`       | ✔️  | string | Table which is to be registered |
+| `metadata_file`| ✔️  | string | Metadata file which is to be registered as a new catalog identifier |
+
+{{< hint warning >}}

Review Comment:
   Sorry for the little back and forth. If `Hint warning` is already used in our docs. Then it should be fine to use and the document style also will be uniform.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#issuecomment-1348826087

   Thanks @rajarshisarkar for updating the docs and for working through the syntax issues I that I only remembered from old docs!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on a diff in pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on code in PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#discussion_r1042948979


##########
docs/spark-procedures.md:
##########
@@ -493,6 +493,37 @@ CALL spark_catalog.system.add_files(
 )
 ```
 
+### `register_table`
+
+Creates a catalog entry for a metadata.json file which already exists but does not have a corresponding catalog identifier.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+| `table`       | ✔️  | string | Table which is to be registered |
+| `metadata_file`| ✔️  | string | Metadata file which is to be registered as a new catalog identifier |
+
+Warning: If we register tables which exist in another catalog to the current catalog, then the tables would exist in both the catalogs. And using same table from multiple catalogs is not recommended as it fails to keep the table metadata up to date. 

Review Comment:
   Thanks, I have updated the PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#discussion_r1043601587


##########
docs/spark-procedures.md:
##########
@@ -493,6 +493,39 @@ CALL spark_catalog.system.add_files(
 )
 ```
 
+### `register_table`
+
+Creates a catalog entry for a metadata.json file which already exists but does not have a corresponding catalog identifier.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+| `table`       | ✔️  | string | Table which is to be registered |
+| `metadata_file`| ✔️  | string | Metadata file which is to be registered as a new catalog identifier |
+
+{{< hint warning >}}

Review Comment:
   @RussellSpitzer: we discussed Hint warning syntax here. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on a diff in pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on code in PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#discussion_r1044234964


##########
docs/spark-procedures.md:
##########
@@ -493,6 +493,38 @@ CALL spark_catalog.system.add_files(
 )
 ```
 
+### `register_table`
+
+Creates a catalog entry for a metadata.json file which already exists but does not have a corresponding catalog identifier.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+| `table`       | ✔️  | string | Table which is to be registered |
+| `metadata_file`| ✔️  | string | Metadata file which is to be registered as a new catalog identifier |
+
+!!! Warning
+Having the same metadata.json registered in more than one catalog can lead to missing updates, loss of data, and table corruption. Only use this procedure when the table is no longer registered in an existing catalog, or you are moving a table between catalogs.

Review Comment:
   I tried indenting and running the server, but `!!! Warning` didn't work. Also, I don't find any occurrences of `!!! Warning` in the code base. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on a diff in pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on code in PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#discussion_r1042170977


##########
docs/spark-procedures.md:
##########
@@ -493,6 +493,35 @@ CALL spark_catalog.system.add_files(
 )
 ```
 
+### `register_table`
+
+Creates a catalog entry for a metadata.json file which already exists but does not have a corresponding catalog identifier.

Review Comment:
   Yeah, I have added a warning.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on code in PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#discussion_r1042333590


##########
docs/spark-procedures.md:
##########
@@ -493,6 +493,37 @@ CALL spark_catalog.system.add_files(
 )
 ```
 
+### `register_table`
+
+Creates a catalog entry for a metadata.json file which already exists but does not have a corresponding catalog identifier.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+| `table`       | ✔️  | string | Table which is to be registered |
+| `metadata_file`| ✔️  | string | Metadata file which is to be registered as a new catalog identifier |
+
+Warning: If we register tables which exist in another catalog to the current catalog, then the tables would exist in both the catalogs. And using same table from multiple catalogs is not recommended as it fails to keep the table metadata up to date. 

Review Comment:
   I think this has to be much stronger. Having a table registered in two catalogs is essentially a Split Brain issue. It wont' just have problems with metadata being kept up to date, it will also potentially lose data.
   
   I would say something along
   
   "Warning: Having the same metadata.json registered in more than one catalog can lead to missing updates, loss of data, and table corruption. Only use this procedure when the table is no longer registered in an existing catalog, or you are moving a table between catalogs."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on code in PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#discussion_r1043593616


##########
docs/spark-procedures.md:
##########
@@ -493,6 +493,39 @@ CALL spark_catalog.system.add_files(
 )
 ```
 
+### `register_table`
+
+Creates a catalog entry for a metadata.json file which already exists but does not have a corresponding catalog identifier.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+| `table`       | ✔️  | string | Table which is to be registered |
+| `metadata_file`| ✔️  | string | Metadata file which is to be registered as a new catalog identifier |
+
+{{< hint warning >}}

Review Comment:
   I'm not familiar with this syntax, I believe we do ```!!! Warning```
   
   Like
   
   ```
   !!! Warning
       Spark 3.0.0 has a correctness bug that affects dynamic `INSERT OVERWRITE` with hidden partitioning, [SPARK-32168][spark-32168].
       For tables with [hidden partitions](./partitioning.md), make sure you use Spark 3.0.1.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#issuecomment-1342084915

   @RussellSpitzer @ajantha-bhat I have made the changes. Please review once you get time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on code in PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#discussion_r1047327247


##########
docs/spark-procedures.md:
##########
@@ -493,6 +493,38 @@ CALL spark_catalog.system.add_files(
 )
 ```
 
+### `register_table`
+
+Creates a catalog entry for a metadata.json file which already exists but does not have a corresponding catalog identifier.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+| `table`       | ✔️  | string | Table which is to be registered |
+| `metadata_file`| ✔️  | string | Metadata file which is to be registered as a new catalog identifier |
+
+!!! Warning
+Having the same metadata.json registered in more than one catalog can lead to missing updates, loss of data, and table corruption. Only use this procedure when the table is no longer registered in an existing catalog, or you are moving a table between catalogs.

Review Comment:
   Ah, I must be behind the times. Thanks for checking!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#discussion_r1042957469


##########
docs/spark-procedures.md:
##########
@@ -493,6 +493,39 @@ CALL spark_catalog.system.add_files(
 )
 ```
 
+### `register_table`
+
+Creates a catalog entry for a metadata.json file which already exists but does not have a corresponding catalog identifier.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+| `table`       | ✔️  | string | Table which is to be registered |
+| `metadata_file`| ✔️  | string | Metadata file which is to be registered as a new catalog identifier |
+
+{{< hint warning >}}
+Having the same metadata.json registered in more than one catalog can lead to missing updates, loss of data, and table corruption. Only use this procedure when the table is no longer registered in an existing catalog, or you are moving a table between catalogs.
+{{< /hint >}}
+
+#### Output
+
+| Output Name | Type | Description |
+| ------------|------|-------------|
+| `current_snapshot_id` | long | The current snapshot ID of the newly registered Iceberg table |
+| `total_records_count` | long | Total records count of the newly registered Iceberg table |
+| `total_data_files_count` | long | Total data files count of the newly registered Iceberg table |
+
+#### Examples
+
+Register a new table `spark_catalog.db.tbl` pointing to metadata.json file `path/to/metadata/file.json`.

Review Comment:
   ```suggestion
   Register a new table as `db.tbl` to `spark_catalog` pointing to metadata.json file `path/to/metadata/file.json`.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#issuecomment-1343855770

   @RussellSpitzer PR is ready for another round of review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on a diff in pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on code in PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#discussion_r1044236371


##########
docs/spark-procedures.md:
##########
@@ -493,6 +493,38 @@ CALL spark_catalog.system.add_files(
 )
 ```
 
+### `register_table`
+
+Creates a catalog entry for a metadata.json file which already exists but does not have a corresponding catalog identifier.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+| `table`       | ✔️  | string | Table which is to be registered |
+| `metadata_file`| ✔️  | string | Metadata file which is to be registered as a new catalog identifier |
+
+!!! Warning
+Having the same metadata.json registered in more than one catalog can lead to missing updates, loss of data, and table corruption. Only use this procedure when the table is no longer registered in an existing catalog, or you are moving a table between catalogs.

Review Comment:
   Here is how it looks with `{{< hint warning >}} ... {{< /hint >}}`:
   
   <img width="828" alt="Screenshot 2022-12-09 at 2 41 13 PM" src="https://user-images.githubusercontent.com/4561678/206666389-f7eb2be3-c163-4a1b-a251-b99f026e48b9.png">



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on a diff in pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on code in PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#discussion_r1044235368


##########
docs/spark-procedures.md:
##########
@@ -493,6 +493,38 @@ CALL spark_catalog.system.add_files(
 )
 ```
 
+### `register_table`
+
+Creates a catalog entry for a metadata.json file which already exists but does not have a corresponding catalog identifier.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+| `table`       | ✔️  | string | Table which is to be registered |
+| `metadata_file`| ✔️  | string | Metadata file which is to be registered as a new catalog identifier |
+
+!!! Warning
+Having the same metadata.json registered in more than one catalog can lead to missing updates, loss of data, and table corruption. Only use this procedure when the table is no longer registered in an existing catalog, or you are moving a table between catalogs.

Review Comment:
   Hint warning is used in the Iceberg hive docs: https://github.com/apache/iceberg/blob/master/docs/hive.md. It displays a yellow block: https://iceberg.apache.org/docs/latest/hive/



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on a diff in pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on code in PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#discussion_r1044243384


##########
docs/spark-procedures.md:
##########
@@ -493,6 +493,38 @@ CALL spark_catalog.system.add_files(
 )
 ```
 
+### `register_table`
+
+Creates a catalog entry for a metadata.json file which already exists but does not have a corresponding catalog identifier.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+| `table`       | ✔️  | string | Table which is to be registered |
+| `metadata_file`| ✔️  | string | Metadata file which is to be registered as a new catalog identifier |
+
+!!! Warning
+Having the same metadata.json registered in more than one catalog can lead to missing updates, loss of data, and table corruption. Only use this procedure when the table is no longer registered in an existing catalog, or you are moving a table between catalogs.

Review Comment:
   I think even `!!! note` is being replaced with `{{< hint info >}} ... {{< /hint >}}` in the codebase. I see few occurrences here: https://github.com/apache/iceberg/blob/master/docs/spark-writes.md#writing-to-partitioned-tables



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer merged pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
RussellSpitzer merged PR #6376:
URL: https://github.com/apache/iceberg/pull/6376


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#discussion_r1042177406


##########
docs/spark-procedures.md:
##########
@@ -493,6 +493,37 @@ CALL spark_catalog.system.add_files(
 )
 ```
 
+### `register_table`
+
+Creates a catalog entry for a metadata.json file which already exists but does not have a corresponding catalog identifier.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+| `table`       | ✔️  | string | Table which is to be registered |
+| `metadata_file`| ✔️  | string | Metadata file which is to be registered as a new catalog identifier |
+
+Warning: If we register tables which exist in another catalog to the current catalog, then the tables would exist in both the catalogs.

Review Comment:
   ```suggestion
   Warning: If we register tables which exist in another catalog to the current catalog, then the tables would exist in both the catalogs. And using same table from multiple catalogs is not recommended as it fails to keep the table metadata up to date. 
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#discussion_r1042165921


##########
docs/spark-procedures.md:
##########
@@ -493,6 +493,35 @@ CALL spark_catalog.system.add_files(
 )
 ```
 
+### `register_table`
+
+Creates a catalog entry for a metadata.json file which already exists but does not have a corresponding catalog identifier.

Review Comment:
   Should we add a warning about registering tables which exist in another catalog to current catalog? Because in that case table exists in both catalogs. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on code in PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#discussion_r1042333590


##########
docs/spark-procedures.md:
##########
@@ -493,6 +493,37 @@ CALL spark_catalog.system.add_files(
 )
 ```
 
+### `register_table`
+
+Creates a catalog entry for a metadata.json file which already exists but does not have a corresponding catalog identifier.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+| `table`       | ✔️  | string | Table which is to be registered |
+| `metadata_file`| ✔️  | string | Metadata file which is to be registered as a new catalog identifier |
+
+Warning: If we register tables which exist in another catalog to the current catalog, then the tables would exist in both the catalogs. And using same table from multiple catalogs is not recommended as it fails to keep the table metadata up to date. 

Review Comment:
   I think this has to be much stronger. Having a table registered in two catalogs is essentially a Split Brain issue. It wont' just have problems with metadata being kept up to date, it will also potentially lose data.
   
   I would say something along
   
   "Warning: Having the same metadata.json registered in more than one catalog can lead to missing updates, loss of data, and table corruption."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on a diff in pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on code in PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#discussion_r1044093602


##########
docs/spark-procedures.md:
##########
@@ -493,6 +493,39 @@ CALL spark_catalog.system.add_files(
 )
 ```
 
+### `register_table`
+
+Creates a catalog entry for a metadata.json file which already exists but does not have a corresponding catalog identifier.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+| `table`       | ✔️  | string | Table which is to be registered |
+| `metadata_file`| ✔️  | string | Metadata file which is to be registered as a new catalog identifier |
+
+{{< hint warning >}}

Review Comment:
   Sure, I have made the change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on a diff in pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on code in PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#discussion_r1043144195


##########
docs/spark-procedures.md:
##########
@@ -493,6 +493,39 @@ CALL spark_catalog.system.add_files(
 )
 ```
 
+### `register_table`
+
+Creates a catalog entry for a metadata.json file which already exists but does not have a corresponding catalog identifier.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+| `table`       | ✔️  | string | Table which is to be registered |
+| `metadata_file`| ✔️  | string | Metadata file which is to be registered as a new catalog identifier |
+
+{{< hint warning >}}

Review Comment:
   Hint warning is used in the Iceberg hive docs: https://github.com/apache/iceberg/blob/master/docs/hive.md. It displays a yellow block: https://iceberg.apache.org/docs/latest/hive/
   
   I have kept `!!! note` for time being.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on a diff in pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on code in PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#discussion_r1043246170


##########
docs/spark-procedures.md:
##########
@@ -493,6 +493,39 @@ CALL spark_catalog.system.add_files(
 )
 ```
 
+### `register_table`
+
+Creates a catalog entry for a metadata.json file which already exists but does not have a corresponding catalog identifier.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+| `table`       | ✔️  | string | Table which is to be registered |
+| `metadata_file`| ✔️  | string | Metadata file which is to be registered as a new catalog identifier |
+
+{{< hint warning >}}

Review Comment:
   No worries, I have updated it back to Hint warning.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6376: Docs: Add register table Spark procedure documentation

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on code in PR #6376:
URL: https://github.com/apache/iceberg/pull/6376#discussion_r1044119076


##########
docs/spark-procedures.md:
##########
@@ -493,6 +493,38 @@ CALL spark_catalog.system.add_files(
 )
 ```
 
+### `register_table`
+
+Creates a catalog entry for a metadata.json file which already exists but does not have a corresponding catalog identifier.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+| `table`       | ✔️  | string | Table which is to be registered |
+| `metadata_file`| ✔️  | string | Metadata file which is to be registered as a new catalog identifier |
+
+!!! Warning
+Having the same metadata.json registered in more than one catalog can lead to missing updates, loss of data, and table corruption. Only use this procedure when the table is no longer registered in an existing catalog, or you are moving a table between catalogs.

Review Comment:
   Or you can run the server and the page will be at
   http://localhost:1313/spark-procedures/
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org