You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2023/01/16 00:35:20 UTC

[GitHub] [iceberg] JonasJ-ap opened a new pull request, #6600: Doc: Add descriptions and examples for Snapshot Delta Lake Table to Iceberg Table

JonasJ-ap opened a new pull request, #6600:
URL: https://github.com/apache/iceberg/pull/6600

   This PR adds documentation for the `iceberg-delta-lake` module created in #6449 . 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "JonasJ-ap (via GitHub)" <gi...@apache.org>.
JonasJ-ap commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1159285844


##########
docs/delta-lake-migration.md:
##########
@@ -0,0 +1,128 @@
+---
+title: "Delta Lake Migration"
+url: delta-lake-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 200
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Delta Lake Table Migration
+Iceberg supports converting Delta Lake tables to Iceberg tables through the `iceberg-delta-lake` module
+by using [Delta Standalone](https://docs.delta.io/latest/delta-standalone.html) to read logs of Delta lake tables.
+
+
+## Enabling Migration from Delta Lake to Iceberg
+The `iceberg-delta-lake` module is not bundled with Spark and Flink engine runtimes. Users need to manually include this module in their environment to enable the conversion.
+Also, users need to provide [Delta Standalone](https://github.com/delta-io/connectors/releases/tag/v0.6.0) and [Delta Storage](https://repo1.maven.org/maven2/io/delta/delta-storage/2.2.0/)
+because they are what `iceberg-delta-lake` uses to read Delta Lake tables.
+
+## Compatibilities
+The module is built and tested with `Delta Standalone:0.6.0` and supports Delta Lake tables with the following protocol version:
+* `minReaderVersion`: 1
+* `minWriterVersion`: 2
+
+Please refer to [Delta Lake Table Protocol Versioning](https://docs.delta.io/latest/versioning.html) for more details about Delta Lake protocol versions.
+
+For delta lake table contains `TimestampType` columns, please make sure to set table property `read.parquet.vectorization.enabled` to `false` since the vectorized reader doesn't support `INT96` yet.
+Such support is under development in the [PR#6962](https://github.com/apache/iceberg/pull/6962)
+
+## API
+The `iceberg-delta-lake` module provides an interface named `DeltaLakeToIcebergMigrationActionsProvider`, which contains actions that helps converting from Delta Lake to Iceberg.
+The supported actions are:
+* `snapshotDeltaLakeTable`: snapshot an existing Delta Lake table to an Iceberg table
+
+## Default Implementation
+The `iceberg-delta-lake` module also provides a default implementation of the interface which can be accessed by
+```java
+DeltaLakeToIcebergMigrationActionsProvider defaultActions = DeltaLakeToIcebergMigrationActionsProvider.defaultActions()
+```
+
+### `snapshotDeltaLakeTable`
+The action reads the Delta Lake table's most recent snapshot and converts it to a new Iceberg table with the same schema and partitioning in one iceberg transaction.
+The original Delta Lake table remains unchanged.
+
+The newly created table can be changed or written to without affecting the source table, but the snapshot uses the original table's data files.
+Existing data files are added to the Iceberg table's metadata and can be read using a name-to-id mapping created from the original table schema.
+
+When inserts or overwrites run on the snapshot, new files are placed in the snapshot table's location. The location is default to be the same as that
+of the source Delta Lake Table. Users can also specify a different location for the snapshot table.
+
+{{< hint info >}}
+Because tables created by `snapshotDeltaLakeTable` are not the sole owners of their data files, they are prohibited from
+actions like `expire_snapshots` which would physically delete data files. Iceberg deletes, which only effect metadata,
+are still allowed. In addition, any operations which affect the original data files will disrupt the Snapshot's
+integrity. DELETE statements executed against the original Delta Lake table will remove original data files and the
+`snapshotDeltaLakeTable` table will no longer be able to access them.
+{{< /hint >}}
+
+#### Arguments
+The delta table's location is required to be provided when initializing the action.
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+|`sourceTableLocation` | Yes | String | The location of the source Delta Lake table | 
+
+#### Configurations
+The configurations can be gave via method chaining
+
+| Method Name | Arguments      | Required? | Type                                       | Description                                                                                                  |
+|---------------------------|----------------|-----------|--------------------------------------------|--------------------------------------------------------------------------------------------------------------|
+| `as`                      | `identifier`   | Yes       | org.apache.iceberg.catalog.TableIdentifier | The identifier of the Iceberg table to be created.                                                           |
+| `icebergCatalog`          | `catalog`      | Yes       | org.apache.iceberg.catalog.Catalog         | The Iceberg catalog for the Iceberg table to be created                                                      |
+| `deltaLakeConfiguration`  | `conf`         | Yes       | org.apache.hadoop.conf.Configuration       | The Hadoop Configuration to access Delta Lake Table's log and datafiles                                      |
+| `tableLocation`           | `location`     | No        | String                                     | The location of the Iceberg table to be created. Defaults to the same location as the given Delta Lake table |
+| `tableProperty`           | `name`,`value` | No        | String, String                             | A property entry to add to the Iceberg table to be created                                                   |
+| `tableProperties`         | `properties`   | No        | Map<String, String>                        | Properties to add to the the Iceberg table to be created                                                     |
+
+#### Output
+| Output Name | Type | Description |
+| ------------|------|-------------|
+| `imported_files_count` | long | Number of files added to the new table |
+
+#### Added Table Properties
+The following table properties are added to the Iceberg table to be created by default:
+
+| Property Name                 | Value                                     | Description                                                        |
+|-------------------------------|-------------------------------------------|--------------------------------------------------------------------|
+| `snapshot_source`             | `delta`                                   | Indicates that the table is snapshot from a delta lake table       |
+| `original_location`           | location of the delta lake table          | The absolute path to the location of the original delta lake table |
+| `schema.name-mapping.default` | JSON name mapping derived from the schema | The name mapping string used to read Delta Lake table's data files |
+
+#### Examples
+Snapshot a Delta Lake table located at `s3://my-bucket/delta-table` to an Iceberg table named `my_table` in the `my_db` database in the `my_catalog` catalog.
+```java
+DeltaLakeToIcebergMigrationActionsProvider.defaultActions()
+    .snapshotDeltaLakeTable("s3://my-bucket/delta-table")
+    .as("my_db.my_table")

Review Comment:
   Thank you for the correction. I revised the code example. I extracted the argument as separate variables defined above so that we can have more room to write descriptions for catalog and hadoop conf



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1171732165


##########
docs/hive-migration.md:
##########
@@ -0,0 +1,61 @@
+---
+title: "Hive Migration"
+url: hive-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 200
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Hive Table Migration
+Apache Hive supports ORC, Parquet, and Avro file formats that could be migrated to Iceberg.
+When migrating data to an Iceberg table, which provides versioning and transactional updates, only the most recent data files need to be migrated.
+
+Iceberg supports all three migration actions: Snapshot Table, Migrate Table, and Add Files for migrating from Hive tables to Iceberg tables. Since Hive tables do not maintain snapshots,
+the migration process essentially involves creating a new Iceberg table with the existing schema and committing all data files across all partitions to the new Iceberg table.
+After the initial migration, any new data files are added to the new Iceberg table using the Add Files action.
+
+## Enabling Migration from Hive to Iceberg
+The Hive table migration actions are supported by the Spark Integration module via Spark Procedures. The procedures are bundled in the following dependencies:
+- [`iceberg-spark-runtime_3.3_2.12`](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-3.3_2.12/1.2.0/iceberg-spark-runtime-3.3_2.12-1.2.0.jar)

Review Comment:
   we don't need to put links here, can link to the release page for those artifacts.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1171734072


##########
docs/delta-lake-migration.md:
##########
@@ -0,0 +1,124 @@
+---
+title: "Delta Lake Migration"
+url: delta-lake-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 300
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Delta Lake Table Migration
+Delta Lake is a table format that supports Parquet file format and provides time travel and versioning features. When migrating data from Delta Lake to Iceberg,
+it is common to migrate all snapshots to maintain the history of the data.
+
+Currently, Iceberg only supports the Snapshot Table action for migrating from Delta Lake to Iceberg tables. It is done via the `iceberg-delta-lake` module
+by using [Delta Standalone](https://docs.delta.io/latest/delta-standalone.html) to read logs of Delta lake tables.

Review Comment:
   I don't think people care much about what library is used, we can remove this sentence of "It is done via ..."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1171739353


##########
docs/delta-lake-migration.md:
##########
@@ -0,0 +1,124 @@
+---
+title: "Delta Lake Migration"
+url: delta-lake-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 300
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Delta Lake Table Migration
+Delta Lake is a table format that supports Parquet file format and provides time travel and versioning features. When migrating data from Delta Lake to Iceberg,
+it is common to migrate all snapshots to maintain the history of the data.
+
+Currently, Iceberg only supports the Snapshot Table action for migrating from Delta Lake to Iceberg tables. It is done via the `iceberg-delta-lake` module
+by using [Delta Standalone](https://docs.delta.io/latest/delta-standalone.html) to read logs of Delta lake tables.
+Since Delta Lake tables maintain snapshots, all available snapshots will be committed to the new Iceberg table as transactions in order.
+For Delta Lake tables, any additional data files added after the initial migration will be included in their corresponding snapshots and subsequently added to the new Iceberg table using the Add Snapshot action.
+The Add Snapshot action, a variant of the Add File action, is still under development.
+
+## Enabling Migration from Delta Lake to Iceberg
+The `iceberg-delta-lake` module is not bundled with Spark and Flink engine runtimes. To enable migration from delta lake features, the minimum required dependencies are:
+- [iceberg-delta-lake](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-delta-lake/1.2.0/iceberg-delta-lake-1.2.0.jar)
+- [delta-standalone-0.6.0](https://repo1.maven.org/maven2/io/delta/delta-standalone_2.13/0.6.0/delta-standalone_2.13-0.6.0.jar)
+- [delta-storage-2.2.0](https://repo1.maven.org/maven2/io/delta/delta-storage/2.2.0/delta-storage-2.2.0.jar)
+
+### Compatibilities
+The module is built and tested with `Delta Standalone:0.6.0` and supports Delta Lake tables with the following protocol version:
+* `minReaderVersion`: 1
+* `minWriterVersion`: 2
+
+Please refer to [Delta Lake Table Protocol Versioning](https://docs.delta.io/latest/versioning.html) for more details about Delta Lake protocol versions.
+
+### API
+The `iceberg-delta-lake` module provides an interface named `DeltaLakeToIcebergMigrationActionsProvider`, which contains actions that helps converting from Delta Lake to Iceberg.
+The supported actions are:
+* `snapshotDeltaLakeTable`: snapshot an existing Delta Lake table to an Iceberg table
+
+### Default Implementation
+The `iceberg-delta-lake` module also provides a default implementation of the interface which can be accessed by
+```java
+DeltaLakeToIcebergMigrationActionsProvider defaultActions = DeltaLakeToIcebergMigrationActionsProvider.defaultActions()
+```
+
+## Snapshot Delta Lake Table to Iceberg
+The action `snapshotDeltaLakeTable` reads the Delta Lake table's most recent snapshot and converts it to a new Iceberg table with the same schema and partitioning in one iceberg transaction.
+The original Delta Lake table remains unchanged.
+
+The newly created table can be changed or written to without affecting the source table, but the snapshot uses the original table's data files.
+Existing data files are added to the Iceberg table's metadata and can be read using a name-to-id mapping created from the original table schema.
+
+When inserts or overwrites run on the snapshot, new files are placed in the snapshot table's location. The location is default to be the same as that
+of the source Delta Lake Table. Users can also specify a different location for the snapshot table.
+
+{{< hint info >}}
+Because tables created by `snapshotDeltaLakeTable` are not the sole owners of their data files, they are prohibited from
+actions like `expire_snapshots` which would physically delete data files. Iceberg deletes, which only effect metadata,
+are still allowed. In addition, any operations which affect the original data files will disrupt the Snapshot's
+integrity. DELETE statements executed against the original Delta Lake table will remove original data files and the
+`snapshotDeltaLakeTable` table will no longer be able to access them.
+{{< /hint >}}
+
+#### Usage
+| Required Input                                  | Configured By                                                                                                                                                                                             |
+|-------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Source Table Location                           | Argument [`sourceTableLocation`](https://iceberg.apache.org/javadoc/latest/org/apache/iceberg/delta/DeltaLakeToIcebergMigrationActionsProvider.html#snapshotDeltaLakeTable(java.lang.String))             | 

Review Comment:
   I don't think `configured by` column is necessary, but we can add a `description` column to describe the meaning



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1145671053


##########
docs/table-migration.md:
##########
@@ -0,0 +1,129 @@
+---
+title: "Table Migration"
+url: table-migration
+weight: 400
+menu: main
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Hive and Spark Table Migration
+Iceberg supports converting Hive and Spark tables to Iceberg tables through Spark Procedures. For specific instructions on how to use the procedures, please refer to the [Table Migration via Spark Procedures](../spark-procedures/#table-migration) page.

Review Comment:
   Yes I agree we should point to the Spark procedure instead of repeat the information. I am expecting this dedicated page to explain more about how the migration works, including (1) how snapshot, migrate are different, (2) how to use snapshot/migrate + add files to continuously migrate a table, (3) how the migration works differently for Hive, Delta and in the future Hudi. 
   
   We can add some graphs and explain each case, and then point to the Spark procedure for actual usage of it. For Delta, we can add a section in java API documentation and point to that instead of a Spark procedure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1171737416


##########
docs/delta-lake-migration.md:
##########
@@ -0,0 +1,124 @@
+---
+title: "Delta Lake Migration"
+url: delta-lake-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 300
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Delta Lake Table Migration
+Delta Lake is a table format that supports Parquet file format and provides time travel and versioning features. When migrating data from Delta Lake to Iceberg,
+it is common to migrate all snapshots to maintain the history of the data.
+
+Currently, Iceberg only supports the Snapshot Table action for migrating from Delta Lake to Iceberg tables. It is done via the `iceberg-delta-lake` module
+by using [Delta Standalone](https://docs.delta.io/latest/delta-standalone.html) to read logs of Delta lake tables.
+Since Delta Lake tables maintain snapshots, all available snapshots will be committed to the new Iceberg table as transactions in order.
+For Delta Lake tables, any additional data files added after the initial migration will be included in their corresponding snapshots and subsequently added to the new Iceberg table using the Add Snapshot action.
+The Add Snapshot action, a variant of the Add File action, is still under development.
+
+## Enabling Migration from Delta Lake to Iceberg
+The `iceberg-delta-lake` module is not bundled with Spark and Flink engine runtimes. To enable migration from delta lake features, the minimum required dependencies are:
+- [iceberg-delta-lake](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-delta-lake/1.2.0/iceberg-delta-lake-1.2.0.jar)
+- [delta-standalone-0.6.0](https://repo1.maven.org/maven2/io/delta/delta-standalone_2.13/0.6.0/delta-standalone_2.13-0.6.0.jar)
+- [delta-storage-2.2.0](https://repo1.maven.org/maven2/io/delta/delta-storage/2.2.0/delta-storage-2.2.0.jar)
+
+### Compatibilities
+The module is built and tested with `Delta Standalone:0.6.0` and supports Delta Lake tables with the following protocol version:
+* `minReaderVersion`: 1
+* `minWriterVersion`: 2
+
+Please refer to [Delta Lake Table Protocol Versioning](https://docs.delta.io/latest/versioning.html) for more details about Delta Lake protocol versions.
+
+### API
+The `iceberg-delta-lake` module provides an interface named `DeltaLakeToIcebergMigrationActionsProvider`, which contains actions that helps converting from Delta Lake to Iceberg.
+The supported actions are:
+* `snapshotDeltaLakeTable`: snapshot an existing Delta Lake table to an Iceberg table
+
+### Default Implementation
+The `iceberg-delta-lake` module also provides a default implementation of the interface which can be accessed by
+```java
+DeltaLakeToIcebergMigrationActionsProvider defaultActions = DeltaLakeToIcebergMigrationActionsProvider.defaultActions()
+```
+
+## Snapshot Delta Lake Table to Iceberg
+The action `snapshotDeltaLakeTable` reads the Delta Lake table's most recent snapshot and converts it to a new Iceberg table with the same schema and partitioning in one iceberg transaction.
+The original Delta Lake table remains unchanged.
+
+The newly created table can be changed or written to without affecting the source table, but the snapshot uses the original table's data files.
+Existing data files are added to the Iceberg table's metadata and can be read using a name-to-id mapping created from the original table schema.
+
+When inserts or overwrites run on the snapshot, new files are placed in the snapshot table's location. The location is default to be the same as that
+of the source Delta Lake Table. Users can also specify a different location for the snapshot table.
+
+{{< hint info >}}
+Because tables created by `snapshotDeltaLakeTable` are not the sole owners of their data files, they are prohibited from
+actions like `expire_snapshots` which would physically delete data files. Iceberg deletes, which only effect metadata,
+are still allowed. In addition, any operations which affect the original data files will disrupt the Snapshot's
+integrity. DELETE statements executed against the original Delta Lake table will remove original data files and the
+`snapshotDeltaLakeTable` table will no longer be able to access them.
+{{< /hint >}}
+
+#### Usage
+| Required Input                                  | Configured By                                                                                                                                                                                             |
+|-------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Source Table Location                           | Argument [`sourceTableLocation`](https://iceberg.apache.org/javadoc/latest/org/apache/iceberg/delta/DeltaLakeToIcebergMigrationActionsProvider.html#snapshotDeltaLakeTable(java.lang.String))             | 
+| New Iceberg Table Identifier                    | Configuration API [`as`](https://iceberg.apache.org/javadoc/latest/org/apache/iceberg/delta/SnapshotDeltaLakeTable.html#as(org.apache.iceberg.catalog.TableIdentifier))                                   |
+| Iceberg Catalog To Create New Iceberg Table     | Configuration API [`icebergCatalog`](https://iceberg.apache.org/javadoc/latest/org/apache/iceberg/delta/SnapshotDeltaLakeTable.html#icebergCatalog(org.apache.iceberg.catalog.Catalog))                   |
+| Hadoop Configuration To Access Delta Lake Table | Configuration API [`deltaLakeConfiguration`](https://iceberg.apache.org/javadoc/latest/org/apache/iceberg/delta/SnapshotDeltaLakeTable.html#deltaLakeConfiguration(org.apache.hadoop.conf.Configuration)) |
+
+For detailed usage and other optional configurations, please refer to the [SnapshotDeltaLakeTable API](https://iceberg.apache.org/javadoc/latest/org/apache/iceberg/delta/SnapshotDeltaLakeTable.html)
+
+#### Output
+| Output Name | Type | Description |
+| ------------|------|-------------|
+| `imported_files_count` | long | Number of files added to the new table |
+
+#### Added Table Properties
+The following table properties are added to the Iceberg table to be created by default:
+
+| Property Name                 | Value                                     | Description                                                        |
+|-------------------------------|-------------------------------------------|--------------------------------------------------------------------|
+| `snapshot_source`             | `delta`                                   | Indicates that the table is snapshot from a delta lake table       |
+| `original_location`           | location of the delta lake table          | The absolute path to the location of the original delta lake table |
+| `schema.name-mapping.default` | JSON name mapping derived from the schema | The name mapping string used to read Delta Lake table's data files |
+
+#### Examples
+```java
+sourceDeltaLakeTableLocation = "s3://my-bucket/delta-table";
+destTableIdentifier = TableIdentifier.of("my_db", "my_table");
+destTableLocation = "s3://my-bucket/iceberg-table";
+icebergCatalog = ...; // Iceberg Catalog fetched from engines like Spark or created via CatalogUtil.loadCatalog
+hadoopConf = ...; // Hadoop Configuration fetched from engines like Spark and have proper file system configuration to access the Delta Lake table.
+    
+DeltaLakeToIcebergMigrationActionsProvider.defaultActions()
+    .snapshotDeltaLakeTable(sourceDeltaLakeTableLocation)
+    .as(destTableIdentifier)
+    .icebergCatalog(icebergCatalog)
+    .tableLocation(destTableLocation)
+    .deltaLakeConfiguration(hadoopConf)
+    .tableProperty("my_property", "my_value")
+    .execute();
+```
+
+## Migrate Delta Lake Table To Iceberg
+**Not Yet Support**. This action should read the Delta Lake table's most recent snapshot and convert it to a new Iceberg table with the same name, schema and partitioning in one iceberg transaction. The source Delta Lake table should be dropped as the completion of this action.

Review Comment:
   I think we don't need to have these sections given the fact that we already mention they are not supported in the intro



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1171739353


##########
docs/delta-lake-migration.md:
##########
@@ -0,0 +1,124 @@
+---
+title: "Delta Lake Migration"
+url: delta-lake-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 300
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Delta Lake Table Migration
+Delta Lake is a table format that supports Parquet file format and provides time travel and versioning features. When migrating data from Delta Lake to Iceberg,
+it is common to migrate all snapshots to maintain the history of the data.
+
+Currently, Iceberg only supports the Snapshot Table action for migrating from Delta Lake to Iceberg tables. It is done via the `iceberg-delta-lake` module
+by using [Delta Standalone](https://docs.delta.io/latest/delta-standalone.html) to read logs of Delta lake tables.
+Since Delta Lake tables maintain snapshots, all available snapshots will be committed to the new Iceberg table as transactions in order.
+For Delta Lake tables, any additional data files added after the initial migration will be included in their corresponding snapshots and subsequently added to the new Iceberg table using the Add Snapshot action.
+The Add Snapshot action, a variant of the Add File action, is still under development.
+
+## Enabling Migration from Delta Lake to Iceberg
+The `iceberg-delta-lake` module is not bundled with Spark and Flink engine runtimes. To enable migration from delta lake features, the minimum required dependencies are:
+- [iceberg-delta-lake](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-delta-lake/1.2.0/iceberg-delta-lake-1.2.0.jar)
+- [delta-standalone-0.6.0](https://repo1.maven.org/maven2/io/delta/delta-standalone_2.13/0.6.0/delta-standalone_2.13-0.6.0.jar)
+- [delta-storage-2.2.0](https://repo1.maven.org/maven2/io/delta/delta-storage/2.2.0/delta-storage-2.2.0.jar)
+
+### Compatibilities
+The module is built and tested with `Delta Standalone:0.6.0` and supports Delta Lake tables with the following protocol version:
+* `minReaderVersion`: 1
+* `minWriterVersion`: 2
+
+Please refer to [Delta Lake Table Protocol Versioning](https://docs.delta.io/latest/versioning.html) for more details about Delta Lake protocol versions.
+
+### API
+The `iceberg-delta-lake` module provides an interface named `DeltaLakeToIcebergMigrationActionsProvider`, which contains actions that helps converting from Delta Lake to Iceberg.
+The supported actions are:
+* `snapshotDeltaLakeTable`: snapshot an existing Delta Lake table to an Iceberg table
+
+### Default Implementation
+The `iceberg-delta-lake` module also provides a default implementation of the interface which can be accessed by
+```java
+DeltaLakeToIcebergMigrationActionsProvider defaultActions = DeltaLakeToIcebergMigrationActionsProvider.defaultActions()
+```
+
+## Snapshot Delta Lake Table to Iceberg
+The action `snapshotDeltaLakeTable` reads the Delta Lake table's most recent snapshot and converts it to a new Iceberg table with the same schema and partitioning in one iceberg transaction.
+The original Delta Lake table remains unchanged.
+
+The newly created table can be changed or written to without affecting the source table, but the snapshot uses the original table's data files.
+Existing data files are added to the Iceberg table's metadata and can be read using a name-to-id mapping created from the original table schema.
+
+When inserts or overwrites run on the snapshot, new files are placed in the snapshot table's location. The location is default to be the same as that
+of the source Delta Lake Table. Users can also specify a different location for the snapshot table.
+
+{{< hint info >}}
+Because tables created by `snapshotDeltaLakeTable` are not the sole owners of their data files, they are prohibited from
+actions like `expire_snapshots` which would physically delete data files. Iceberg deletes, which only effect metadata,
+are still allowed. In addition, any operations which affect the original data files will disrupt the Snapshot's
+integrity. DELETE statements executed against the original Delta Lake table will remove original data files and the
+`snapshotDeltaLakeTable` table will no longer be able to access them.
+{{< /hint >}}
+
+#### Usage
+| Required Input                                  | Configured By                                                                                                                                                                                             |
+|-------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Source Table Location                           | Argument [`sourceTableLocation`](https://iceberg.apache.org/javadoc/latest/org/apache/iceberg/delta/DeltaLakeToIcebergMigrationActionsProvider.html#snapshotDeltaLakeTable(java.lang.String))             | 

Review Comment:
   We can add a `description` column to describe the meaning



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1171734347


##########
docs/delta-lake-migration.md:
##########
@@ -0,0 +1,124 @@
+---
+title: "Delta Lake Migration"
+url: delta-lake-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 300
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Delta Lake Table Migration
+Delta Lake is a table format that supports Parquet file format and provides time travel and versioning features. When migrating data from Delta Lake to Iceberg,
+it is common to migrate all snapshots to maintain the history of the data.
+
+Currently, Iceberg only supports the Snapshot Table action for migrating from Delta Lake to Iceberg tables. It is done via the `iceberg-delta-lake` module
+by using [Delta Standalone](https://docs.delta.io/latest/delta-standalone.html) to read logs of Delta lake tables.
+Since Delta Lake tables maintain snapshots, all available snapshots will be committed to the new Iceberg table as transactions in order.

Review Comment:
   nit: maintain transactions. Delta does not have snapshot as its concept



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1174439031


##########
docs/table-migration.md:
##########
@@ -0,0 +1,77 @@
+---
+title: "Overview"
+url: table-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 100
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approaches
+There are two methods for executing table migration: full data migration and in-place metadata migration.
+
+Full data migration involves copying all data files from the source table to the new Iceberg table. This method makes the new table fully isolated from the source table, but is slower and doubles the space.
+In practice, users can use operations like [Create-Table-As-Select](../spark-ddl/#create-table--as-select), [INSERT](../spark-writes/#insert-into), and Change-Data-Capture pipelines to perform such migration.
+
+In-place metadata migration preserves the existing data files while incorporating Iceberg metadata on top of them.
+This method is not only faster but also eliminates the need for data duplication. However, the new table and the source table are not fully isolated. In other words, if any processes vacuum data files from the source table, the new table will also be affected.
+
+In this doc, we will describe more about in-place metadata migration.
+
+![In-Place Metadata Migration](../../../img/iceberg-in-place-metadata-migration.png)
+
+Apache Iceberg supports the in-place metadata migration approach, which includes three important actions: **Snapshot Table**, **Migrate Table**, and **Add Files**.
+
+## Snapshot Table
+The Snapshot Table action creates a new iceberg table with a different name and with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+- Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table and a different name. Readers and Writers on the source table can continue to work.
+
+![Snapshot Table Step 1](../../../img/iceberg-snapshotaction-step1.png)
+
+- Commit all data files across all partitions to the new Iceberg table. The source table remains unchanged. Readers can be switched to the new Iceberg table.
+
+![Snapshot Table Step 2](../../../img/iceberg-snapshotaction-step2.png)

Review Comment:
   should add a sentence after that that eventually all the writers can be migrated to the new table, completing the migration process. And it's up to you if you want to add a final graph for that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #6600: Doc: Add descriptions and examples for Snapshot Delta Lake Table to Iceberg Table

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#issuecomment-1480001596

   I think this can fix #6770 , can we update this page to a dedicated page for all migration use cases?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1150942428


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Table Migration"
+url: table-migration
+weight: 1300
+menu: main
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approaches
+There are two main approaches to perform table migration: CTAS (Create Table As Select) and in-place migration.
+
+### Create-Table-As-Select Migration
+CTAS migration involves creating a new Iceberg table and copying data from the existing table to the new one. This method is preferred when you want to completely cut ties with your old table, ensuring the new table is independent and fully managed by Iceberg.
+However, CTAS migration may require more time to complete and might not be suitable for production use cases where downtime is not acceptable.
+![CTAS Migration](../../../img/iceberg-CTAS.png)
+### In-Place Migration
+In-place migration retains the existing data files but adds Iceberg metadata on top of them. This approach is faster and does not require copying data, making it more suitable for production use cases.
+![In-Place Migration](../../../img/iceberg-In-place.png)
+## In-Place Migration Actions
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions:
+
+1. Snapshot Table
+2. Migrate Table
+3. Add Files
+
+### Snapshot Table
+The Snapshot Table action creates a new iceberg table with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+**Step 1:** Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table
+![Snapshot Table Step 1](../../../img/iceberg-snapshotaction-step1.png)
+
+**Step 2:** Commit all data files across all partitions to the new Iceberg table. The source table remains unchanged.
+![Snapshot Table Step 2](../../../img/iceberg-snapshotaction-step2.png)
+### Migrate Table
+The Migrate Table action also creates a new Iceberg table with the same schema and partitioning as the source table. However, during the action execution, it locks and drops the source table from the catalog.
+Consequently, Migrate Table requires all readers and writers working on the source table to be stopped before the action is performed.
+
+**Step 1:** Stop all readers and writers interacting with the source table
+![Migrate Table Step 1](../../../img/iceberg-migrateaction-step1.png)
+
+**Step 2:** Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table. Rename the source table for a backup in case of failure and rollback.
+![Migrate Table Step 2](../../../img/iceberg-migrateaction-step2.png)
+
+**Step 3:** Commit all data files across all partitions to the new Iceberg table. Drop the source table.
+![Migrate Table Step 3](../../../img/iceberg-migrateaction-step3.png)
+### Add Files
+After the initial step (either Snapshot Table or Migrate Table), it is common to find some data files that have not been migrated. These files often originate from concurrent writers who continue writing to the source table during or after the migration process.
+In practice, these files can be new data files in Hive tables or new snapshots (versions) of Delta Lake tables. The Add Files action is essential for incorporating these files into the Iceberg table.
+
+## In-Place Migration Completion
+Once all data files have been migrated and there are no more concurrent writers writing to the source table, the migration process is complete.
+Readers and writers can now switch to the new Iceberg table for their operations.
+

Review Comment:
   nit: can we have a parent section for "Migrating from different table formats"?



##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Table Migration"
+url: table-migration
+weight: 1300
+menu: main
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approaches
+There are two main approaches to perform table migration: CTAS (Create Table As Select) and in-place migration.
+
+### Create-Table-As-Select Migration
+CTAS migration involves creating a new Iceberg table and copying data from the existing table to the new one. This method is preferred when you want to completely cut ties with your old table, ensuring the new table is independent and fully managed by Iceberg.
+However, CTAS migration may require more time to complete and might not be suitable for production use cases where downtime is not acceptable.
+![CTAS Migration](../../../img/iceberg-CTAS.png)
+### In-Place Migration
+In-place migration retains the existing data files but adds Iceberg metadata on top of them. This approach is faster and does not require copying data, making it more suitable for production use cases.
+![In-Place Migration](../../../img/iceberg-In-place.png)
+## In-Place Migration Actions
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions:
+
+1. Snapshot Table
+2. Migrate Table
+3. Add Files
+
+### Snapshot Table
+The Snapshot Table action creates a new iceberg table with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+**Step 1:** Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table
+![Snapshot Table Step 1](../../../img/iceberg-snapshotaction-step1.png)
+
+**Step 2:** Commit all data files across all partitions to the new Iceberg table. The source table remains unchanged.
+![Snapshot Table Step 2](../../../img/iceberg-snapshotaction-step2.png)
+### Migrate Table
+The Migrate Table action also creates a new Iceberg table with the same schema and partitioning as the source table. However, during the action execution, it locks and drops the source table from the catalog.
+Consequently, Migrate Table requires all readers and writers working on the source table to be stopped before the action is performed.
+
+**Step 1:** Stop all readers and writers interacting with the source table
+![Migrate Table Step 1](../../../img/iceberg-migrateaction-step1.png)
+
+**Step 2:** Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table. Rename the source table for a backup in case of failure and rollback.
+![Migrate Table Step 2](../../../img/iceberg-migrateaction-step2.png)
+
+**Step 3:** Commit all data files across all partitions to the new Iceberg table. Drop the source table.
+![Migrate Table Step 3](../../../img/iceberg-migrateaction-step3.png)
+### Add Files
+After the initial step (either Snapshot Table or Migrate Table), it is common to find some data files that have not been migrated. These files often originate from concurrent writers who continue writing to the source table during or after the migration process.
+In practice, these files can be new data files in Hive tables or new snapshots (versions) of Delta Lake tables. The Add Files action is essential for incorporating these files into the Iceberg table.
+
+## In-Place Migration Completion
+Once all data files have been migrated and there are no more concurrent writers writing to the source table, the migration process is complete.
+Readers and writers can now switch to the new Iceberg table for their operations.
+
+## Migration Implementation: From Hive/Spark to Iceberg

Review Comment:
   just Hive is fine instead of Hive/Spark. Hive is a table format.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1150933580


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Table Migration"
+url: table-migration
+weight: 1300
+menu: main
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approaches
+There are two main approaches to perform table migration: CTAS (Create Table As Select) and in-place migration.
+
+### Create-Table-As-Select Migration
+CTAS migration involves creating a new Iceberg table and copying data from the existing table to the new one. This method is preferred when you want to completely cut ties with your old table, ensuring the new table is independent and fully managed by Iceberg.
+However, CTAS migration may require more time to complete and might not be suitable for production use cases where downtime is not acceptable.
+![CTAS Migration](../../../img/iceberg-CTAS.png)
+### In-Place Migration
+In-place migration retains the existing data files but adds Iceberg metadata on top of them. This approach is faster and does not require copying data, making it more suitable for production use cases.
+![In-Place Migration](../../../img/iceberg-In-place.png)
+## In-Place Migration Actions
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions:
+
+1. Snapshot Table
+2. Migrate Table
+3. Add Files
+
+### Snapshot Table
+The Snapshot Table action creates a new iceberg table with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+**Step 1:** Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table
+![Snapshot Table Step 1](../../../img/iceberg-snapshotaction-step1.png)
+
+**Step 2:** Commit all data files across all partitions to the new Iceberg table. The source table remains unchanged.
+![Snapshot Table Step 2](../../../img/iceberg-snapshotaction-step2.png)
+### Migrate Table
+The Migrate Table action also creates a new Iceberg table with the same schema and partitioning as the source table. However, during the action execution, it locks and drops the source table from the catalog.
+Consequently, Migrate Table requires all readers and writers working on the source table to be stopped before the action is performed.
+
+**Step 1:** Stop all readers and writers interacting with the source table
+![Migrate Table Step 1](../../../img/iceberg-migrateaction-step1.png)

Review Comment:
   Yes you can open a separated PR for that in iceberg-doc



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1153992317


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Overview"
+url: table-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 100
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approach
+There are two primary methods for executing table migration: full data migration and in-place migration. In this doc, we will describe more about in-place migration.
+
+In-place migration preserves the existing data files while incorporating Iceberg metadata on top of them.
+This method is not only faster but also eliminates the need for data duplication, making it more appropriate for production environments.
+
+![In-Place Migration](../../../img/iceberg-in-place-migration.png)
+
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions: `Snapshot Table`, `Migrate Table`, and `Add Files`.
+
+## Snapshot Table
+The Snapshot Table action creates a new iceberg table with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.

Review Comment:
   with a different name.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#issuecomment-1518953015

   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1174439692


##########
docs/delta-lake-migration.md:
##########
@@ -0,0 +1,117 @@
+---
+title: "Delta Lake Migration"
+url: delta-lake-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 300
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Delta Lake Table Migration
+Delta Lake is a table format that supports Parquet file format and provides time travel and versioning features. When migrating data from Delta Lake to Iceberg,
+it is common to migrate all snapshots to maintain the history of the data.
+
+Currently, Iceberg supports the Snapshot Table action for migrating from Delta Lake to Iceberg tables.
+Since Delta Lake tables maintain transactions, all available transactions will be committed to the new Iceberg table as transactions in order.
+For Delta Lake tables, any additional data files added after the initial migration will be included in their corresponding transactions and subsequently added to the new Iceberg table using the Add Transaction action.
+The Add Transaction action, a variant of the Add File action, is still under development.
+
+## Enabling Migration from Delta Lake to Iceberg
+The `iceberg-delta-lake` module is not bundled with Spark and Flink engine runtimes. To enable migration from delta lake features, the minimum required dependencies are:
+- [iceberg-delta-lake](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-delta-lake/1.2.0/)
+- [delta-standalone-0.6.0](https://repo1.maven.org/maven2/io/delta/delta-standalone_2.13/0.6.0/)
+- [delta-storage-2.2.0](https://repo1.maven.org/maven2/io/delta/delta-storage/2.2.0/)
+
+### Compatibilities
+The module is built and tested with `Delta Standalone:0.6.0` and supports Delta Lake tables with the following protocol version:
+* `minReaderVersion`: 1
+* `minWriterVersion`: 2
+
+Please refer to [Delta Lake Table Protocol Versioning](https://docs.delta.io/latest/versioning.html) for more details about Delta Lake protocol versions.
+
+### API
+The `iceberg-delta-lake` module provides an interface named `DeltaLakeToIcebergMigrationActionsProvider`, which contains actions that helps converting from Delta Lake to Iceberg.
+The supported actions are:
+* `snapshotDeltaLakeTable`: snapshot an existing Delta Lake table to an Iceberg table
+
+### Default Implementation
+The `iceberg-delta-lake` module also provides a default implementation of the interface which can be accessed by
+```java
+DeltaLakeToIcebergMigrationActionsProvider defaultActions = DeltaLakeToIcebergMigrationActionsProvider.defaultActions()
+```
+
+## Snapshot Delta Lake Table to Iceberg
+The action `snapshotDeltaLakeTable` reads the Delta Lake table's transactions and converts them to a new Iceberg table with the same schema and partitioning in one iceberg transaction.
+The original Delta Lake table remains unchanged.
+
+The newly created table can be changed or written to without affecting the source table, but the snapshot uses the original table's data files.
+Existing data files are added to the Iceberg table's metadata and can be read using a name-to-id mapping created from the original table schema.
+
+When inserts or overwrites run on the snapshot, new files are placed in the snapshot table's location. The location is default to be the same as that
+of the source Delta Lake Table. Users can also specify a different location for the snapshot table.
+
+{{< hint info >}}
+Because tables created by `snapshotDeltaLakeTable` are not the sole owners of their data files, they are prohibited from
+actions like `expire_snapshots` which would physically delete data files. Iceberg deletes, which only effect metadata,
+are still allowed. In addition, any operations which affect the original data files will disrupt the Snapshot's
+integrity. DELETE statements executed against the original Delta Lake table will remove original data files and the
+`snapshotDeltaLakeTable` table will no longer be able to access them.
+{{< /hint >}}
+
+#### Usage
+| Required Input               | Configured By                                                                                                                                                                                             | Description                                                                     |
+|------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------|
+| Source Table Location        | Argument [`sourceTableLocation`](https://iceberg.apache.org/javadoc/latest/org/apache/iceberg/delta/DeltaLakeToIcebergMigrationActionsProvider.html#snapshotDeltaLakeTable(java.lang.String))             | The location of the source Delta Lake table                                     | 
+| New Iceberg Table Identifier | Configuration API [`as`](https://iceberg.apache.org/javadoc/latest/org/apache/iceberg/delta/SnapshotDeltaLakeTable.html#as(org.apache.iceberg.catalog.TableIdentifier))                                   | The identifier specifies the namespace and table name for the new iceberg table |
+| Iceberg Catalog              | Configuration API [`icebergCatalog`](https://iceberg.apache.org/javadoc/latest/org/apache/iceberg/delta/SnapshotDeltaLakeTable.html#icebergCatalog(org.apache.iceberg.catalog.Catalog))                   | The catalog used to create the new iceberg table                                |
+| Hadoop Configuration         | Configuration API [`deltaLakeConfiguration`](https://iceberg.apache.org/javadoc/latest/org/apache/iceberg/delta/SnapshotDeltaLakeTable.html#deltaLakeConfiguration(org.apache.hadoop.conf.Configuration)) | The Hadoop Configuration used to read the source Delta Lake table.              |
+
+For detailed usage and other optional configurations, please refer to the [SnapshotDeltaLakeTable API](https://iceberg.apache.org/javadoc/latest/org/apache/iceberg/delta/SnapshotDeltaLakeTable.html)
+
+#### Output
+| Output Name | Type | Description |
+| ------------|------|-------------|
+| `imported_files_count` | long | Number of files added to the new table |
+
+#### Added Table Properties
+The following table properties are added to the Iceberg table to be created by default:
+
+| Property Name                 | Value                                     | Description                                                        |
+|-------------------------------|-------------------------------------------|--------------------------------------------------------------------|
+| `snapshot_source`             | `delta`                                   | Indicates that the table is snapshot from a delta lake table       |
+| `original_location`           | location of the delta lake table          | The absolute path to the location of the original delta lake table |
+| `schema.name-mapping.default` | JSON name mapping derived from the schema | The name mapping string used to read Delta Lake table's data files |
+
+#### Examples
+```java
+String sourceDeltaLakeTableLocation = "s3://my-bucket/delta-table";
+String destTableLocation = "s3://my-bucket/iceberg-table";
+org.apache.iceberg.catalog.TableIdentifier destTableIdentifier = TableIdentifier.of("my_db", "my_table");

Review Comment:
   can we use `import org.apache.iceberg.catalog.TableIdentifier` upfront instead of specifying the full class name here? Although this is in documentation, we should expect the code to mostly work if people paste it to run it. If we do like this, `TableIdentifier` is still not resolved in `TableIdentifier.of("my_db", "my_table");` Similar comments for the next few lines as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1171733154


##########
docs/delta-lake-migration.md:
##########
@@ -0,0 +1,124 @@
+---
+title: "Delta Lake Migration"
+url: delta-lake-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 300
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Delta Lake Table Migration
+Delta Lake is a table format that supports Parquet file format and provides time travel and versioning features. When migrating data from Delta Lake to Iceberg,
+it is common to migrate all snapshots to maintain the history of the data.
+
+Currently, Iceberg only supports the Snapshot Table action for migrating from Delta Lake to Iceberg tables. It is done via the `iceberg-delta-lake` module

Review Comment:
   nit: remove "only"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1153990334


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Overview"
+url: table-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 100
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approach
+There are two primary methods for executing table migration: full data migration and in-place migration. In this doc, we will describe more about in-place migration.

Review Comment:
   I think we still need to talk a bit about "full data migration", in this case users use operations like CTAS, INSERT, Change data capture pipelines. And you can link to related docs.



##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Overview"
+url: table-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 100
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approach
+There are two primary methods for executing table migration: full data migration and in-place migration. In this doc, we will describe more about in-place migration.

Review Comment:
   nit: "in-place metadata migration"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1153992212


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Overview"
+url: table-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 100
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approach
+There are two primary methods for executing table migration: full data migration and in-place migration. In this doc, we will describe more about in-place migration.
+
+In-place migration preserves the existing data files while incorporating Iceberg metadata on top of them.
+This method is not only faster but also eliminates the need for data duplication, making it more appropriate for production environments.
+
+![In-Place Migration](../../../img/iceberg-in-place-migration.png)
+
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions: `Snapshot Table`, `Migrate Table`, and `Add Files`.

Review Comment:
   nit: remove "primaryly". Also `Snapshot Table`, `Migrate Table`, and `Add Files` should not be backquoted, but bold.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "JonasJ-ap (via GitHub)" <gi...@apache.org>.
JonasJ-ap commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1159287476


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Overview"
+url: table-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 100
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approach
+There are two primary methods for executing table migration: full data migration and in-place migration. In this doc, we will describe more about in-place migration.

Review Comment:
   Thank you for your suggestion. I revised the sentence and added link to CTAS and INSERT. However, it seems currently there is no OSS doc directly describing the Change-Data-Capture pipeline. So I leave it unlinked. I think it shall be ok since CDC pipeline is more like a conception compared to the first two terms which are more like specific implementations. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "JonasJ-ap (via GitHub)" <gi...@apache.org>.
JonasJ-ap commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1159291297


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Overview"
+url: table-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 100
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approach
+There are two primary methods for executing table migration: full data migration and in-place migration. In this doc, we will describe more about in-place migration.
+
+In-place migration preserves the existing data files while incorporating Iceberg metadata on top of them.
+This method is not only faster but also eliminates the need for data duplication, making it more appropriate for production environments.
+
+![In-Place Migration](../../../img/iceberg-in-place-migration.png)
+
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions: `Snapshot Table`, `Migrate Table`, and `Add Files`.
+
+## Snapshot Table
+The Snapshot Table action creates a new iceberg table with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+- Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table
+
+![Snapshot Table Step 1](../../../img/iceberg-snapshotaction-step1.png)
+
+- Commit all data files across all partitions to the new Iceberg table. The source table remains unchanged.
+
+![Snapshot Table Step 2](../../../img/iceberg-snapshotaction-step2.png)
+## Migrate Table
+The Migrate Table action also creates a new Iceberg table with the same schema and partitioning as the source table. However, during the action execution, it locks and drops the source table from the catalog.
+Consequently, Migrate Table requires all modifications working on the source table to be stopped before the action is performed.
+
+Stop all writers interacting with the source table. Readers that also supports Iceberg may continue reading.
+
+![Migrate Table Step 1](../../../img/iceberg-migrateaction-step1.png)
+
+- Create a new Iceberg table with the same identifier and metadata (schema, partition spec, etc.) as the source table. Rename the source table for a backup in case of failure and rollback.
+
+![Migrate Table Step 2](../../../img/iceberg-migrateaction-step2.png)
+
+- Commit all data files across all partitions to the new Iceberg table. Drop the source table.
+
+![Migrate Table Step 3](../../../img/iceberg-migrateaction-step3.png)
+## Add Files
+After the initial step (either Snapshot Table or Migrate Table), it is common to find some data files that have not been migrated. These files often originate from concurrent writers who continue writing to the source table during or after the migration process.
+In practice, these files can be new data files in Hive tables or new snapshots (versions) of Delta Lake tables. The Add Files action is essential for incorporating these files into the Iceberg table.
+
+## In-Place Migration Completion
+Once all data files have been migrated and there are no more concurrent writers writing to the source table, the migration process is complete.
+Readers and writers can now switch to the new Iceberg table for their operations.

Review Comment:
   Sorry for the confusion. My intention here is to mostly describe the combination of `snapshot table` and `add files`. Basically, I want to express that after snapshotting the initial snapshot and add all subsequent data files to make the new iceberg table have the same contents as the source table, users can also migrate readers and writers to the new iceberg table.
   
   Think twice about the above process. I found that 1. the process is mostly described in the `add files` action. 2. the  migration of reader/writer after the snapshot process may be out-of-scope. Hence, I decided to delete this section.
   What do you think of this change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "JonasJ-ap (via GitHub)" <gi...@apache.org>.
JonasJ-ap commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1145539246


##########
docs/table-migration.md:
##########
@@ -0,0 +1,129 @@
+---
+title: "Table Migration"
+url: table-migration
+weight: 400
+menu: main
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Hive and Spark Table Migration
+Iceberg supports converting Hive and Spark tables to Iceberg tables through Spark Procedures. For specific instructions on how to use the procedures, please refer to the [Table Migration via Spark Procedures](../spark-procedures/#table-migration) page.

Review Comment:
   Currently, the [Spark Procedures](https://iceberg.apache.org/docs/latest/spark-procedures/#table-migration) contains a comprehensive explanation on migrating from Hive or Spark to Iceberg. So I made a pointer to that page here. I think another option would be to move everything to this newly added page and leave a pointer at [Spark Procedures](https://iceberg.apache.org/docs/latest/spark-procedures/#table-migration) which point to the new migraiton page.
   
   More thoughts are welcomed on the design. Please feel free to share your opinions on this. Thank you in advance.



##########
docs/table-migration.md:
##########
@@ -0,0 +1,129 @@
+---
+title: "Table Migration"
+url: table-migration
+weight: 400
+menu: main

Review Comment:
   Since this is a new page, I would like to put a preview of the new page organization here:
   ![Screen Shot 2023-03-22 at 20 08 00](https://user-images.githubusercontent.com/55990951/227065989-003e075a-9ca6-46ec-913e-b62034cef355.png)
   
   Currently I made the `Table Migration` in the main menu.
   The preview is built via instructions in https://github.com/apache/iceberg-docs



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1154009406


##########
docs/delta-lake-migration.md:
##########
@@ -0,0 +1,128 @@
+---
+title: "Delta Lake Migration"
+url: delta-lake-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 200
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Delta Lake Table Migration
+Iceberg supports converting Delta Lake tables to Iceberg tables through the `iceberg-delta-lake` module
+by using [Delta Standalone](https://docs.delta.io/latest/delta-standalone.html) to read logs of Delta lake tables.
+
+
+## Enabling Migration from Delta Lake to Iceberg
+The `iceberg-delta-lake` module is not bundled with Spark and Flink engine runtimes. Users need to manually include this module in their environment to enable the conversion.
+Also, users need to provide [Delta Standalone](https://github.com/delta-io/connectors/releases/tag/v0.6.0) and [Delta Storage](https://repo1.maven.org/maven2/io/delta/delta-storage/2.2.0/)
+because they are what `iceberg-delta-lake` uses to read Delta Lake tables.
+
+## Compatibilities
+The module is built and tested with `Delta Standalone:0.6.0` and supports Delta Lake tables with the following protocol version:
+* `minReaderVersion`: 1
+* `minWriterVersion`: 2
+
+Please refer to [Delta Lake Table Protocol Versioning](https://docs.delta.io/latest/versioning.html) for more details about Delta Lake protocol versions.
+
+For delta lake table contains `TimestampType` columns, please make sure to set table property `read.parquet.vectorization.enabled` to `false` since the vectorized reader doesn't support `INT96` yet.
+Such support is under development in the [PR#6962](https://github.com/apache/iceberg/pull/6962)
+
+## API
+The `iceberg-delta-lake` module provides an interface named `DeltaLakeToIcebergMigrationActionsProvider`, which contains actions that helps converting from Delta Lake to Iceberg.
+The supported actions are:
+* `snapshotDeltaLakeTable`: snapshot an existing Delta Lake table to an Iceberg table
+
+## Default Implementation
+The `iceberg-delta-lake` module also provides a default implementation of the interface which can be accessed by
+```java
+DeltaLakeToIcebergMigrationActionsProvider defaultActions = DeltaLakeToIcebergMigrationActionsProvider.defaultActions()
+```
+
+### `snapshotDeltaLakeTable`
+The action reads the Delta Lake table's most recent snapshot and converts it to a new Iceberg table with the same schema and partitioning in one iceberg transaction.
+The original Delta Lake table remains unchanged.
+
+The newly created table can be changed or written to without affecting the source table, but the snapshot uses the original table's data files.
+Existing data files are added to the Iceberg table's metadata and can be read using a name-to-id mapping created from the original table schema.
+
+When inserts or overwrites run on the snapshot, new files are placed in the snapshot table's location. The location is default to be the same as that
+of the source Delta Lake Table. Users can also specify a different location for the snapshot table.
+
+{{< hint info >}}
+Because tables created by `snapshotDeltaLakeTable` are not the sole owners of their data files, they are prohibited from
+actions like `expire_snapshots` which would physically delete data files. Iceberg deletes, which only effect metadata,
+are still allowed. In addition, any operations which affect the original data files will disrupt the Snapshot's
+integrity. DELETE statements executed against the original Delta Lake table will remove original data files and the
+`snapshotDeltaLakeTable` table will no longer be able to access them.
+{{< /hint >}}
+
+#### Arguments
+The delta table's location is required to be provided when initializing the action.
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+|`sourceTableLocation` | Yes | String | The location of the source Delta Lake table | 
+
+#### Configurations
+The configurations can be gave via method chaining
+
+| Method Name | Arguments      | Required? | Type                                       | Description                                                                                                  |
+|---------------------------|----------------|-----------|--------------------------------------------|--------------------------------------------------------------------------------------------------------------|
+| `as`                      | `identifier`   | Yes       | org.apache.iceberg.catalog.TableIdentifier | The identifier of the Iceberg table to be created.                                                           |
+| `icebergCatalog`          | `catalog`      | Yes       | org.apache.iceberg.catalog.Catalog         | The Iceberg catalog for the Iceberg table to be created                                                      |
+| `deltaLakeConfiguration`  | `conf`         | Yes       | org.apache.hadoop.conf.Configuration       | The Hadoop Configuration to access Delta Lake Table's log and datafiles                                      |
+| `tableLocation`           | `location`     | No        | String                                     | The location of the Iceberg table to be created. Defaults to the same location as the given Delta Lake table |
+| `tableProperty`           | `name`,`value` | No        | String, String                             | A property entry to add to the Iceberg table to be created                                                   |
+| `tableProperties`         | `properties`   | No        | Map<String, String>                        | Properties to add to the the Iceberg table to be created                                                     |
+
+#### Output
+| Output Name | Type | Description |
+| ------------|------|-------------|
+| `imported_files_count` | long | Number of files added to the new table |
+
+#### Added Table Properties
+The following table properties are added to the Iceberg table to be created by default:
+
+| Property Name                 | Value                                     | Description                                                        |
+|-------------------------------|-------------------------------------------|--------------------------------------------------------------------|
+| `snapshot_source`             | `delta`                                   | Indicates that the table is snapshot from a delta lake table       |
+| `original_location`           | location of the delta lake table          | The absolute path to the location of the original delta lake table |
+| `schema.name-mapping.default` | JSON name mapping derived from the schema | The name mapping string used to read Delta Lake table's data files |
+
+#### Examples
+Snapshot a Delta Lake table located at `s3://my-bucket/delta-table` to an Iceberg table named `my_table` in the `my_db` database in the `my_catalog` catalog.
+```java
+DeltaLakeToIcebergMigrationActionsProvider.defaultActions()
+    .snapshotDeltaLakeTable("s3://my-bucket/delta-table")
+    .as("my_db.my_table")
+    .icebergCatalog("my_catalog")
+    .deltaLakeConfiguration(new Configuration())
+    .execute();
+```
+Snapshot a Delta Lake table located at `s3://my-bucket/delta-table` to an Iceberg table named `my_table` in the `my_db` database in the `my_catalog` catalog at a manually

Review Comment:
   it feels like the second example only adds table properties. I think we can just use 1 example with table properties.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1153998086


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Overview"
+url: table-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 100
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approach
+There are two primary methods for executing table migration: full data migration and in-place migration. In this doc, we will describe more about in-place migration.
+
+In-place migration preserves the existing data files while incorporating Iceberg metadata on top of them.
+This method is not only faster but also eliminates the need for data duplication, making it more appropriate for production environments.
+
+![In-Place Migration](../../../img/iceberg-in-place-migration.png)
+
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions: `Snapshot Table`, `Migrate Table`, and `Add Files`.
+
+## Snapshot Table
+The Snapshot Table action creates a new iceberg table with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+- Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table
+
+![Snapshot Table Step 1](../../../img/iceberg-snapshotaction-step1.png)
+
+- Commit all data files across all partitions to the new Iceberg table. The source table remains unchanged.
+
+![Snapshot Table Step 2](../../../img/iceberg-snapshotaction-step2.png)
+## Migrate Table
+The Migrate Table action also creates a new Iceberg table with the same schema and partitioning as the source table. However, during the action execution, it locks and drops the source table from the catalog.
+Consequently, Migrate Table requires all modifications working on the source table to be stopped before the action is performed.
+
+Stop all writers interacting with the source table. Readers that also supports Iceberg may continue reading.
+
+![Migrate Table Step 1](../../../img/iceberg-migrateaction-step1.png)
+
+- Create a new Iceberg table with the same identifier and metadata (schema, partition spec, etc.) as the source table. Rename the source table for a backup in case of failure and rollback.
+
+![Migrate Table Step 2](../../../img/iceberg-migrateaction-step2.png)
+
+- Commit all data files across all partitions to the new Iceberg table. Drop the source table.
+
+![Migrate Table Step 3](../../../img/iceberg-migrateaction-step3.png)
+## Add Files
+After the initial step (either Snapshot Table or Migrate Table), it is common to find some data files that have not been migrated. These files often originate from concurrent writers who continue writing to the source table during or after the migration process.
+In practice, these files can be new data files in Hive tables or new snapshots (versions) of Delta Lake tables. The Add Files action is essential for incorporating these files into the Iceberg table.
+
+## In-Place Migration Completion
+Once all data files have been migrated and there are no more concurrent writers writing to the source table, the migration process is complete.
+Readers and writers can now switch to the new Iceberg table for their operations.
+
+# Migrating From Different Table Formats
+## From Hive to Iceberg
+Apache Hive supports ORC, Parquet, and Avro file formats that could be migrated to Iceberg.
+When migrating data to an Iceberg table, which provides versioning and transactional updates, only the most recent data files need to be migrated.
+
+Iceberg supports all three migration actions: Snapshot Table, Migrate Table, and Add Files for migrating from Hive tables to Iceberg tables. Since Hive tables do not maintain snapshots,
+the migration process essentially involves creating a new Iceberg table with the existing schema and committing all data files across all partitions to the new Iceberg table.
+After the initial migration, any new data files are added to the new Iceberg table using the Add Files action.
+
+For more details on how to perform the migration on Hive tables, please refer to the [Hive Table Migration via Spark Procedures](../spark-procedures/#table-migration) page.

Review Comment:
   we should explicitly callout the snapshot, migrate, add files parts in 3 section corresponding to the overview, something like:
   
   ```
   ### Snapshot Hive Table
   
   To snapshot a Hive table, you can run Spark SQL:
   
   
   // procedure example
   
   
   See [xxx](link) for more details.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1153989996


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Overview"
+url: table-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 100
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approach

Review Comment:
   nit: Approaches



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1150937812


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Table Migration"
+url: table-migration
+weight: 1300
+menu: main
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approaches
+There are two main approaches to perform table migration: CTAS (Create Table As Select) and in-place migration.
+
+### Create-Table-As-Select Migration
+CTAS migration involves creating a new Iceberg table and copying data from the existing table to the new one. This method is preferred when you want to completely cut ties with your old table, ensuring the new table is independent and fully managed by Iceberg.
+However, CTAS migration may require more time to complete and might not be suitable for production use cases where downtime is not acceptable.
+![CTAS Migration](../../../img/iceberg-CTAS.png)
+### In-Place Migration
+In-place migration retains the existing data files but adds Iceberg metadata on top of them. This approach is faster and does not require copying data, making it more suitable for production use cases.
+![In-Place Migration](../../../img/iceberg-In-place.png)
+## In-Place Migration Actions
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions:
+
+1. Snapshot Table
+2. Migrate Table
+3. Add Files
+
+### Snapshot Table
+The Snapshot Table action creates a new iceberg table with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+**Step 1:** Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table

Review Comment:
   nit: we can use bullet points for steps.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1150945609


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Table Migration"
+url: table-migration
+weight: 1300
+menu: main
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approaches
+There are two main approaches to perform table migration: CTAS (Create Table As Select) and in-place migration.
+
+### Create-Table-As-Select Migration
+CTAS migration involves creating a new Iceberg table and copying data from the existing table to the new one. This method is preferred when you want to completely cut ties with your old table, ensuring the new table is independent and fully managed by Iceberg.
+However, CTAS migration may require more time to complete and might not be suitable for production use cases where downtime is not acceptable.
+![CTAS Migration](../../../img/iceberg-CTAS.png)
+### In-Place Migration
+In-place migration retains the existing data files but adds Iceberg metadata on top of them. This approach is faster and does not require copying data, making it more suitable for production use cases.
+![In-Place Migration](../../../img/iceberg-In-place.png)
+## In-Place Migration Actions
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions:
+
+1. Snapshot Table
+2. Migrate Table
+3. Add Files
+
+### Snapshot Table
+The Snapshot Table action creates a new iceberg table with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+**Step 1:** Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table
+![Snapshot Table Step 1](../../../img/iceberg-snapshotaction-step1.png)
+
+**Step 2:** Commit all data files across all partitions to the new Iceberg table. The source table remains unchanged.
+![Snapshot Table Step 2](../../../img/iceberg-snapshotaction-step2.png)
+### Migrate Table
+The Migrate Table action also creates a new Iceberg table with the same schema and partitioning as the source table. However, during the action execution, it locks and drops the source table from the catalog.
+Consequently, Migrate Table requires all readers and writers working on the source table to be stopped before the action is performed.
+
+**Step 1:** Stop all readers and writers interacting with the source table
+![Migrate Table Step 1](../../../img/iceberg-migrateaction-step1.png)
+
+**Step 2:** Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table. Rename the source table for a backup in case of failure and rollback.
+![Migrate Table Step 2](../../../img/iceberg-migrateaction-step2.png)
+
+**Step 3:** Commit all data files across all partitions to the new Iceberg table. Drop the source table.
+![Migrate Table Step 3](../../../img/iceberg-migrateaction-step3.png)
+### Add Files
+After the initial step (either Snapshot Table or Migrate Table), it is common to find some data files that have not been migrated. These files often originate from concurrent writers who continue writing to the source table during or after the migration process.
+In practice, these files can be new data files in Hive tables or new snapshots (versions) of Delta Lake tables. The Add Files action is essential for incorporating these files into the Iceberg table.
+
+## In-Place Migration Completion
+Once all data files have been migrated and there are no more concurrent writers writing to the source table, the migration process is complete.
+Readers and writers can now switch to the new Iceberg table for their operations.
+
+## Migration Implementation: From Hive/Spark to Iceberg
+Apache Hive and Apache Spark are two popular data warehouse systems used for big data processing and analysis.
+However, both systems do not natively support time travel or rollback to previous snapshots.
+When migrating data to an Iceberg table, which provides versioning and transactional updates, only the most recent data files need to be migrated.
+
+Iceberg supports all three migration actions: Snapshot Table, Migrate Table, and Add Files for migrating from Hive or Spark tables to Iceberg tables. Since Hive or Spark tables do not maintain snapshots,
+the migration process essentially involves creating a new Iceberg table with the existing schema and committing all data files across all partitions to the new Iceberg table.
+After the initial migration, any new data files are added to the new Iceberg table using the Add Files action.
+
+For more details on how to perform the migration on Hive/Spark tables, please refer to the [Table Migration via Spark Procedures](../spark-procedures/#table-migration) page.
+
+## Migration Implementation: From Delta Lake to Iceberg
+Delta Lake is a popular data storage system that provides time travel and versioning features. When migrating data from Delta Lake to Iceberg,
+it is common to migrate all snapshots to maintain the history of the data.
+
+Currently, Iceberg only supports the Snapshot Table action for migrating from Delta Lake to Iceberg tables. Since Delta Lake tables maintain snapshots, all available snapshots will be committed to the new Iceberg table as transactions in order.
+For Delta Lake tables, any additional data files added after the initial migration will be included in their corresponding snapshots and subsequently added to the new Iceberg table using the Add Snapshot action. The Add Snapshot action, a variant of the Add File action, is still under development.
+
+For more details on how to perform the migration on Delta Lake tables, please refer to the [Delta Lake Migration](../delta-lake-migration/#delta-lake-table-migration) page.

Review Comment:
   Sorry I was the one proposing having the java example separated, but it seems better to just add your content from Delta Lake migration markdown here, because that doc seems isolated and people won't really be able to find that specific doc. What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "JonasJ-ap (via GitHub)" <gi...@apache.org>.
JonasJ-ap commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1174243611


##########
docs/hive-migration.md:
##########
@@ -0,0 +1,61 @@
+---
+title: "Hive Migration"
+url: hive-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 200
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Hive Table Migration
+Apache Hive supports ORC, Parquet, and Avro file formats that could be migrated to Iceberg.
+When migrating data to an Iceberg table, which provides versioning and transactional updates, only the most recent data files need to be migrated.
+
+Iceberg supports all three migration actions: Snapshot Table, Migrate Table, and Add Files for migrating from Hive tables to Iceberg tables. Since Hive tables do not maintain snapshots,
+the migration process essentially involves creating a new Iceberg table with the existing schema and committing all data files across all partitions to the new Iceberg table.
+After the initial migration, any new data files are added to the new Iceberg table using the Add Files action.
+
+## Enabling Migration from Hive to Iceberg
+The Hive table migration actions are supported by the Spark Integration module via Spark Procedures. The procedures are bundled in the following dependencies:
+- [`iceberg-spark-runtime_3.3_2.12`](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-3.3_2.12/1.2.0/iceberg-spark-runtime-3.3_2.12-1.2.0.jar)

Review Comment:
   Just to confirm, "link to the release page" means link like this: https://repo1.maven.org/maven2/io/delta/delta-storage/2.2.0/ ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 merged pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 merged PR #6600:
URL: https://github.com/apache/iceberg/pull/6600


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1153993516


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Overview"
+url: table-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 100
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approach
+There are two primary methods for executing table migration: full data migration and in-place migration. In this doc, we will describe more about in-place migration.
+
+In-place migration preserves the existing data files while incorporating Iceberg metadata on top of them.
+This method is not only faster but also eliminates the need for data duplication, making it more appropriate for production environments.
+
+![In-Place Migration](../../../img/iceberg-in-place-migration.png)
+
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions: `Snapshot Table`, `Migrate Table`, and `Add Files`.
+
+## Snapshot Table
+The Snapshot Table action creates a new iceberg table with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+- Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table
+
+![Snapshot Table Step 1](../../../img/iceberg-snapshotaction-step1.png)
+
+- Commit all data files across all partitions to the new Iceberg table. The source table remains unchanged.
+
+![Snapshot Table Step 2](../../../img/iceberg-snapshotaction-step2.png)
+## Migrate Table

Review Comment:
   and also below



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1153991807


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Overview"
+url: table-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 100
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approach
+There are two primary methods for executing table migration: full data migration and in-place migration. In this doc, we will describe more about in-place migration.
+
+In-place migration preserves the existing data files while incorporating Iceberg metadata on top of them.
+This method is not only faster but also eliminates the need for data duplication, making it more appropriate for production environments.

Review Comment:
   I think we need to describe the pros and cons of both approaches. Full data migration has the advantage that new table and old table are fully isolated, but slower and doubles the space. In place will be faster and saves space, but they are not fully isolated. If the old table has any processes attached to do something like removing data, then it will affect the new table.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1154002901


##########
docs/delta-lake-migration.md:
##########
@@ -0,0 +1,128 @@
+---
+title: "Delta Lake Migration"
+url: delta-lake-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 200
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Delta Lake Table Migration

Review Comment:
   I think we should organize each format-specific migration page in the same way, with the following sections:
   1. enabling migration from XXX to Iceberg
   2. snapshot XXX table to Iceberg
   3. migrate XXX table to Iceberg
   4. add files from XXX table to Iceberg
   
   For delta, we can say unsupported for 3 and 4.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1171742537


##########
docs/table-migration.md:
##########
@@ -0,0 +1,77 @@
+---
+title: "Overview"
+url: table-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 100
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approaches
+There are two methods for executing table migration: full data migration and in-place metadata migration.
+
+Full data migration involves copying all data files from the source table to the new Iceberg table. This method makes the new table fully isolated from the source table, but is slower and doubles the space.
+In practice, users can use operations like [Create-Table-As-Select](../spark-ddl/#create-table--as-select), [INSERT](../spark-writes/#insert-into), and Change-Data-Capture pipelines to perform such migration.
+
+In-place metadata migration preserves the existing data files while incorporating Iceberg metadata on top of them.
+This method is not only faster but also eliminates the need for data duplication. However, the new table and the source table are not fully isolated. In other words, if any processes vacuum data files from the source table, the new table will also be affected.
+
+In this doc, we will describe more about in-place metadata migration.
+
+![In-Place Metadata Migration](../../../img/iceberg-in-place-metadata-migration.png)
+
+Apache Iceberg supports the in-place metadata migration approach, which includes three important actions: **Snapshot Table**, **Migrate Table**, and **Add Files**.
+
+## Snapshot Table
+The Snapshot Table action creates a new iceberg table with a different name and with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+- Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table and a different name.

Review Comment:
   I see in your image you have writers and readers. We should mention how they would switch from source table to the snapshot table in these steps.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1150945609


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Table Migration"
+url: table-migration
+weight: 1300
+menu: main
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approaches
+There are two main approaches to perform table migration: CTAS (Create Table As Select) and in-place migration.
+
+### Create-Table-As-Select Migration
+CTAS migration involves creating a new Iceberg table and copying data from the existing table to the new one. This method is preferred when you want to completely cut ties with your old table, ensuring the new table is independent and fully managed by Iceberg.
+However, CTAS migration may require more time to complete and might not be suitable for production use cases where downtime is not acceptable.
+![CTAS Migration](../../../img/iceberg-CTAS.png)
+### In-Place Migration
+In-place migration retains the existing data files but adds Iceberg metadata on top of them. This approach is faster and does not require copying data, making it more suitable for production use cases.
+![In-Place Migration](../../../img/iceberg-In-place.png)
+## In-Place Migration Actions
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions:
+
+1. Snapshot Table
+2. Migrate Table
+3. Add Files
+
+### Snapshot Table
+The Snapshot Table action creates a new iceberg table with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+**Step 1:** Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table
+![Snapshot Table Step 1](../../../img/iceberg-snapshotaction-step1.png)
+
+**Step 2:** Commit all data files across all partitions to the new Iceberg table. The source table remains unchanged.
+![Snapshot Table Step 2](../../../img/iceberg-snapshotaction-step2.png)
+### Migrate Table
+The Migrate Table action also creates a new Iceberg table with the same schema and partitioning as the source table. However, during the action execution, it locks and drops the source table from the catalog.
+Consequently, Migrate Table requires all readers and writers working on the source table to be stopped before the action is performed.
+
+**Step 1:** Stop all readers and writers interacting with the source table
+![Migrate Table Step 1](../../../img/iceberg-migrateaction-step1.png)
+
+**Step 2:** Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table. Rename the source table for a backup in case of failure and rollback.
+![Migrate Table Step 2](../../../img/iceberg-migrateaction-step2.png)
+
+**Step 3:** Commit all data files across all partitions to the new Iceberg table. Drop the source table.
+![Migrate Table Step 3](../../../img/iceberg-migrateaction-step3.png)
+### Add Files
+After the initial step (either Snapshot Table or Migrate Table), it is common to find some data files that have not been migrated. These files often originate from concurrent writers who continue writing to the source table during or after the migration process.
+In practice, these files can be new data files in Hive tables or new snapshots (versions) of Delta Lake tables. The Add Files action is essential for incorporating these files into the Iceberg table.
+
+## In-Place Migration Completion
+Once all data files have been migrated and there are no more concurrent writers writing to the source table, the migration process is complete.
+Readers and writers can now switch to the new Iceberg table for their operations.
+
+## Migration Implementation: From Hive/Spark to Iceberg
+Apache Hive and Apache Spark are two popular data warehouse systems used for big data processing and analysis.
+However, both systems do not natively support time travel or rollback to previous snapshots.
+When migrating data to an Iceberg table, which provides versioning and transactional updates, only the most recent data files need to be migrated.
+
+Iceberg supports all three migration actions: Snapshot Table, Migrate Table, and Add Files for migrating from Hive or Spark tables to Iceberg tables. Since Hive or Spark tables do not maintain snapshots,
+the migration process essentially involves creating a new Iceberg table with the existing schema and committing all data files across all partitions to the new Iceberg table.
+After the initial migration, any new data files are added to the new Iceberg table using the Add Files action.
+
+For more details on how to perform the migration on Hive/Spark tables, please refer to the [Table Migration via Spark Procedures](../spark-procedures/#table-migration) page.
+
+## Migration Implementation: From Delta Lake to Iceberg
+Delta Lake is a popular data storage system that provides time travel and versioning features. When migrating data from Delta Lake to Iceberg,
+it is common to migrate all snapshots to maintain the history of the data.
+
+Currently, Iceberg only supports the Snapshot Table action for migrating from Delta Lake to Iceberg tables. Since Delta Lake tables maintain snapshots, all available snapshots will be committed to the new Iceberg table as transactions in order.
+For Delta Lake tables, any additional data files added after the initial migration will be included in their corresponding snapshots and subsequently added to the new Iceberg table using the Add Snapshot action. The Add Snapshot action, a variant of the Add File action, is still under development.
+
+For more details on how to perform the migration on Delta Lake tables, please refer to the [Delta Lake Migration](../delta-lake-migration/#delta-lake-table-migration) page.

Review Comment:
   Sorry I was the one proposing having the java example separated, but it seems better to just add your content from Delta Lake migration markdown here. What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1150944425


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Table Migration"
+url: table-migration
+weight: 1300
+menu: main
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approaches
+There are two main approaches to perform table migration: CTAS (Create Table As Select) and in-place migration.
+
+### Create-Table-As-Select Migration
+CTAS migration involves creating a new Iceberg table and copying data from the existing table to the new one. This method is preferred when you want to completely cut ties with your old table, ensuring the new table is independent and fully managed by Iceberg.
+However, CTAS migration may require more time to complete and might not be suitable for production use cases where downtime is not acceptable.
+![CTAS Migration](../../../img/iceberg-CTAS.png)
+### In-Place Migration
+In-place migration retains the existing data files but adds Iceberg metadata on top of them. This approach is faster and does not require copying data, making it more suitable for production use cases.
+![In-Place Migration](../../../img/iceberg-In-place.png)
+## In-Place Migration Actions
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions:
+
+1. Snapshot Table
+2. Migrate Table
+3. Add Files
+
+### Snapshot Table
+The Snapshot Table action creates a new iceberg table with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+**Step 1:** Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table
+![Snapshot Table Step 1](../../../img/iceberg-snapshotaction-step1.png)
+
+**Step 2:** Commit all data files across all partitions to the new Iceberg table. The source table remains unchanged.
+![Snapshot Table Step 2](../../../img/iceberg-snapshotaction-step2.png)
+### Migrate Table
+The Migrate Table action also creates a new Iceberg table with the same schema and partitioning as the source table. However, during the action execution, it locks and drops the source table from the catalog.
+Consequently, Migrate Table requires all readers and writers working on the source table to be stopped before the action is performed.
+
+**Step 1:** Stop all readers and writers interacting with the source table
+![Migrate Table Step 1](../../../img/iceberg-migrateaction-step1.png)
+
+**Step 2:** Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table. Rename the source table for a backup in case of failure and rollback.
+![Migrate Table Step 2](../../../img/iceberg-migrateaction-step2.png)
+
+**Step 3:** Commit all data files across all partitions to the new Iceberg table. Drop the source table.
+![Migrate Table Step 3](../../../img/iceberg-migrateaction-step3.png)
+### Add Files
+After the initial step (either Snapshot Table or Migrate Table), it is common to find some data files that have not been migrated. These files often originate from concurrent writers who continue writing to the source table during or after the migration process.
+In practice, these files can be new data files in Hive tables or new snapshots (versions) of Delta Lake tables. The Add Files action is essential for incorporating these files into the Iceberg table.
+
+## In-Place Migration Completion
+Once all data files have been migrated and there are no more concurrent writers writing to the source table, the migration process is complete.
+Readers and writers can now switch to the new Iceberg table for their operations.
+
+## Migration Implementation: From Hive/Spark to Iceberg
+Apache Hive and Apache Spark are two popular data warehouse systems used for big data processing and analysis.

Review Comment:
   we should remove opinions and evaluations of other systems in OSS docs. We just need an intro like Hive tables backed by Parquet, ORC and Avro file formats can be migrated to Iceberg.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1171734815


##########
docs/delta-lake-migration.md:
##########
@@ -0,0 +1,124 @@
+---
+title: "Delta Lake Migration"
+url: delta-lake-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 300
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Delta Lake Table Migration
+Delta Lake is a table format that supports Parquet file format and provides time travel and versioning features. When migrating data from Delta Lake to Iceberg,
+it is common to migrate all snapshots to maintain the history of the data.
+
+Currently, Iceberg only supports the Snapshot Table action for migrating from Delta Lake to Iceberg tables. It is done via the `iceberg-delta-lake` module
+by using [Delta Standalone](https://docs.delta.io/latest/delta-standalone.html) to read logs of Delta lake tables.
+Since Delta Lake tables maintain snapshots, all available snapshots will be committed to the new Iceberg table as transactions in order.
+For Delta Lake tables, any additional data files added after the initial migration will be included in their corresponding snapshots and subsequently added to the new Iceberg table using the Add Snapshot action.
+The Add Snapshot action, a variant of the Add File action, is still under development.

Review Comment:
   We probably wanna call that "Add transactions" as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1174439231


##########
docs/hive-migration.md:
##########
@@ -0,0 +1,61 @@
+---
+title: "Hive Migration"
+url: hive-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 200
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Hive Table Migration
+Apache Hive supports ORC, Parquet, and Avro file formats that could be migrated to Iceberg.
+When migrating data to an Iceberg table, which provides versioning and transactional updates, only the most recent data files need to be migrated.
+
+Iceberg supports all three migration actions: Snapshot Table, Migrate Table, and Add Files for migrating from Hive tables to Iceberg tables. Since Hive tables do not maintain snapshots,
+the migration process essentially involves creating a new Iceberg table with the existing schema and committing all data files across all partitions to the new Iceberg table.
+After the initial migration, any new data files are added to the new Iceberg table using the Add Files action.
+
+## Enabling Migration from Hive to Iceberg
+The Hive table migration actions are supported by the Spark Integration module via Spark Procedures. The procedures are bundled in the following dependencies:
+- [`iceberg-spark-runtime_3.3_2.12`](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-3.3_2.12/1.2.0/iceberg-spark-runtime-3.3_2.12-1.2.0.jar)

Review Comment:
   Oh sorry, I mean just link to this page: https://iceberg.apache.org/releases/#downloads, and say it is bundled in the Spark runtime jar.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1153983693


##########
docs/delta-lake-migration.md:
##########
@@ -0,0 +1,128 @@
+---
+title: "Delta Lake Migration"
+url: delta-lake-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 200
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Delta Lake Table Migration
+Iceberg supports converting Delta Lake tables to Iceberg tables through the `iceberg-delta-lake` module
+by using [Delta Standalone](https://docs.delta.io/latest/delta-standalone.html) to read logs of Delta lake tables.
+
+
+## Enabling Migration from Delta Lake to Iceberg
+The `iceberg-delta-lake` module is not bundled with Spark and Flink engine runtimes. Users need to manually include this module in their environment to enable the conversion.
+Also, users need to provide [Delta Standalone](https://github.com/delta-io/connectors/releases/tag/v0.6.0) and [Delta Storage](https://repo1.maven.org/maven2/io/delta/delta-storage/2.2.0/)

Review Comment:
   in this case, can we provide a list of jars to include with Maven links?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1153983561


##########
docs/delta-lake-migration.md:
##########
@@ -0,0 +1,128 @@
+---
+title: "Delta Lake Migration"
+url: delta-lake-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 200
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Delta Lake Table Migration
+Iceberg supports converting Delta Lake tables to Iceberg tables through the `iceberg-delta-lake` module
+by using [Delta Standalone](https://docs.delta.io/latest/delta-standalone.html) to read logs of Delta lake tables.
+
+
+## Enabling Migration from Delta Lake to Iceberg
+The `iceberg-delta-lake` module is not bundled with Spark and Flink engine runtimes. Users need to manually include this module in their environment to enable the conversion.

Review Comment:
   nit: remove "manually"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "JonasJ-ap (via GitHub)" <gi...@apache.org>.
JonasJ-ap commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1153933686


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Table Migration"
+url: table-migration
+weight: 1300
+menu: main
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approaches
+There are two main approaches to perform table migration: CTAS (Create Table As Select) and in-place migration.
+
+### Create-Table-As-Select Migration
+CTAS migration involves creating a new Iceberg table and copying data from the existing table to the new one. This method is preferred when you want to completely cut ties with your old table, ensuring the new table is independent and fully managed by Iceberg.
+However, CTAS migration may require more time to complete and might not be suitable for production use cases where downtime is not acceptable.
+![CTAS Migration](../../../img/iceberg-CTAS.png)
+### In-Place Migration
+In-place migration retains the existing data files but adds Iceberg metadata on top of them. This approach is faster and does not require copying data, making it more suitable for production use cases.
+![In-Place Migration](../../../img/iceberg-In-place.png)
+## In-Place Migration Actions
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions:
+
+1. Snapshot Table
+2. Migrate Table
+3. Add Files
+
+### Snapshot Table
+The Snapshot Table action creates a new iceberg table with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+**Step 1:** Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table
+![Snapshot Table Step 1](../../../img/iceberg-snapshotaction-step1.png)
+
+**Step 2:** Commit all data files across all partitions to the new Iceberg table. The source table remains unchanged.
+![Snapshot Table Step 2](../../../img/iceberg-snapshotaction-step2.png)
+### Migrate Table
+The Migrate Table action also creates a new Iceberg table with the same schema and partitioning as the source table. However, during the action execution, it locks and drops the source table from the catalog.
+Consequently, Migrate Table requires all readers and writers working on the source table to be stopped before the action is performed.
+
+**Step 1:** Stop all readers and writers interacting with the source table
+![Migrate Table Step 1](../../../img/iceberg-migrateaction-step1.png)

Review Comment:
   Opened a PR here: https://github.com/apache/iceberg-docs/pull/220



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1154001028


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Overview"
+url: table-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 100
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approach
+There are two primary methods for executing table migration: full data migration and in-place migration. In this doc, we will describe more about in-place migration.
+
+In-place migration preserves the existing data files while incorporating Iceberg metadata on top of them.
+This method is not only faster but also eliminates the need for data duplication, making it more appropriate for production environments.
+
+![In-Place Migration](../../../img/iceberg-in-place-migration.png)
+
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions: `Snapshot Table`, `Migrate Table`, and `Add Files`.
+
+## Snapshot Table
+The Snapshot Table action creates a new iceberg table with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+- Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table
+
+![Snapshot Table Step 1](../../../img/iceberg-snapshotaction-step1.png)
+
+- Commit all data files across all partitions to the new Iceberg table. The source table remains unchanged.
+
+![Snapshot Table Step 2](../../../img/iceberg-snapshotaction-step2.png)
+## Migrate Table
+The Migrate Table action also creates a new Iceberg table with the same schema and partitioning as the source table. However, during the action execution, it locks and drops the source table from the catalog.
+Consequently, Migrate Table requires all modifications working on the source table to be stopped before the action is performed.
+
+Stop all writers interacting with the source table. Readers that also supports Iceberg may continue reading.
+
+![Migrate Table Step 1](../../../img/iceberg-migrateaction-step1.png)
+
+- Create a new Iceberg table with the same identifier and metadata (schema, partition spec, etc.) as the source table. Rename the source table for a backup in case of failure and rollback.
+
+![Migrate Table Step 2](../../../img/iceberg-migrateaction-step2.png)
+
+- Commit all data files across all partitions to the new Iceberg table. Drop the source table.
+
+![Migrate Table Step 3](../../../img/iceberg-migrateaction-step3.png)
+## Add Files
+After the initial step (either Snapshot Table or Migrate Table), it is common to find some data files that have not been migrated. These files often originate from concurrent writers who continue writing to the source table during or after the migration process.
+In practice, these files can be new data files in Hive tables or new snapshots (versions) of Delta Lake tables. The Add Files action is essential for incorporating these files into the Iceberg table.
+
+## In-Place Migration Completion
+Once all data files have been migrated and there are no more concurrent writers writing to the source table, the migration process is complete.
+Readers and writers can now switch to the new Iceberg table for their operations.

Review Comment:
   aren't readers and writers already switched during your process?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1150939856


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Table Migration"
+url: table-migration
+weight: 1300
+menu: main
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approaches
+There are two main approaches to perform table migration: CTAS (Create Table As Select) and in-place migration.
+
+### Create-Table-As-Select Migration
+CTAS migration involves creating a new Iceberg table and copying data from the existing table to the new one. This method is preferred when you want to completely cut ties with your old table, ensuring the new table is independent and fully managed by Iceberg.

Review Comment:
   I think we can skip talking about "full data migration case" in too much depth, and just refer to links of the write operations of specific engines for CTAS, INSERT, and change data capture. We don't even need a section for it, we just need to say in the intro that in this doc we will describe more about "In-place metadata migration"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1171736792


##########
docs/delta-lake-migration.md:
##########
@@ -0,0 +1,124 @@
+---
+title: "Delta Lake Migration"
+url: delta-lake-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 300
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Delta Lake Table Migration
+Delta Lake is a table format that supports Parquet file format and provides time travel and versioning features. When migrating data from Delta Lake to Iceberg,
+it is common to migrate all snapshots to maintain the history of the data.
+
+Currently, Iceberg only supports the Snapshot Table action for migrating from Delta Lake to Iceberg tables. It is done via the `iceberg-delta-lake` module
+by using [Delta Standalone](https://docs.delta.io/latest/delta-standalone.html) to read logs of Delta lake tables.
+Since Delta Lake tables maintain snapshots, all available snapshots will be committed to the new Iceberg table as transactions in order.
+For Delta Lake tables, any additional data files added after the initial migration will be included in their corresponding snapshots and subsequently added to the new Iceberg table using the Add Snapshot action.
+The Add Snapshot action, a variant of the Add File action, is still under development.
+
+## Enabling Migration from Delta Lake to Iceberg
+The `iceberg-delta-lake` module is not bundled with Spark and Flink engine runtimes. To enable migration from delta lake features, the minimum required dependencies are:
+- [iceberg-delta-lake](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-delta-lake/1.2.0/iceberg-delta-lake-1.2.0.jar)
+- [delta-standalone-0.6.0](https://repo1.maven.org/maven2/io/delta/delta-standalone_2.13/0.6.0/delta-standalone_2.13-0.6.0.jar)
+- [delta-storage-2.2.0](https://repo1.maven.org/maven2/io/delta/delta-storage/2.2.0/delta-storage-2.2.0.jar)
+
+### Compatibilities
+The module is built and tested with `Delta Standalone:0.6.0` and supports Delta Lake tables with the following protocol version:
+* `minReaderVersion`: 1
+* `minWriterVersion`: 2
+
+Please refer to [Delta Lake Table Protocol Versioning](https://docs.delta.io/latest/versioning.html) for more details about Delta Lake protocol versions.
+
+### API
+The `iceberg-delta-lake` module provides an interface named `DeltaLakeToIcebergMigrationActionsProvider`, which contains actions that helps converting from Delta Lake to Iceberg.
+The supported actions are:
+* `snapshotDeltaLakeTable`: snapshot an existing Delta Lake table to an Iceberg table
+
+### Default Implementation
+The `iceberg-delta-lake` module also provides a default implementation of the interface which can be accessed by
+```java
+DeltaLakeToIcebergMigrationActionsProvider defaultActions = DeltaLakeToIcebergMigrationActionsProvider.defaultActions()
+```
+
+## Snapshot Delta Lake Table to Iceberg
+The action `snapshotDeltaLakeTable` reads the Delta Lake table's most recent snapshot and converts it to a new Iceberg table with the same schema and partitioning in one iceberg transaction.
+The original Delta Lake table remains unchanged.
+
+The newly created table can be changed or written to without affecting the source table, but the snapshot uses the original table's data files.
+Existing data files are added to the Iceberg table's metadata and can be read using a name-to-id mapping created from the original table schema.
+
+When inserts or overwrites run on the snapshot, new files are placed in the snapshot table's location. The location is default to be the same as that
+of the source Delta Lake Table. Users can also specify a different location for the snapshot table.
+
+{{< hint info >}}
+Because tables created by `snapshotDeltaLakeTable` are not the sole owners of their data files, they are prohibited from
+actions like `expire_snapshots` which would physically delete data files. Iceberg deletes, which only effect metadata,
+are still allowed. In addition, any operations which affect the original data files will disrupt the Snapshot's
+integrity. DELETE statements executed against the original Delta Lake table will remove original data files and the
+`snapshotDeltaLakeTable` table will no longer be able to access them.
+{{< /hint >}}
+
+#### Usage
+| Required Input                                  | Configured By                                                                                                                                                                                             |
+|-------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Source Table Location                           | Argument [`sourceTableLocation`](https://iceberg.apache.org/javadoc/latest/org/apache/iceberg/delta/DeltaLakeToIcebergMigrationActionsProvider.html#snapshotDeltaLakeTable(java.lang.String))             | 
+| New Iceberg Table Identifier                    | Configuration API [`as`](https://iceberg.apache.org/javadoc/latest/org/apache/iceberg/delta/SnapshotDeltaLakeTable.html#as(org.apache.iceberg.catalog.TableIdentifier))                                   |
+| Iceberg Catalog To Create New Iceberg Table     | Configuration API [`icebergCatalog`](https://iceberg.apache.org/javadoc/latest/org/apache/iceberg/delta/SnapshotDeltaLakeTable.html#icebergCatalog(org.apache.iceberg.catalog.Catalog))                   |
+| Hadoop Configuration To Access Delta Lake Table | Configuration API [`deltaLakeConfiguration`](https://iceberg.apache.org/javadoc/latest/org/apache/iceberg/delta/SnapshotDeltaLakeTable.html#deltaLakeConfiguration(org.apache.hadoop.conf.Configuration)) |
+
+For detailed usage and other optional configurations, please refer to the [SnapshotDeltaLakeTable API](https://iceberg.apache.org/javadoc/latest/org/apache/iceberg/delta/SnapshotDeltaLakeTable.html)
+
+#### Output
+| Output Name | Type | Description |
+| ------------|------|-------------|
+| `imported_files_count` | long | Number of files added to the new table |
+
+#### Added Table Properties
+The following table properties are added to the Iceberg table to be created by default:
+
+| Property Name                 | Value                                     | Description                                                        |
+|-------------------------------|-------------------------------------------|--------------------------------------------------------------------|
+| `snapshot_source`             | `delta`                                   | Indicates that the table is snapshot from a delta lake table       |
+| `original_location`           | location of the delta lake table          | The absolute path to the location of the original delta lake table |
+| `schema.name-mapping.default` | JSON name mapping derived from the schema | The name mapping string used to read Delta Lake table's data files |
+
+#### Examples
+```java
+sourceDeltaLakeTableLocation = "s3://my-bucket/delta-table";

Review Comment:
   nit: missing Java types for these variable definitions



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "JonasJ-ap (via GitHub)" <gi...@apache.org>.
JonasJ-ap commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1174499448


##########
docs/hive-migration.md:
##########
@@ -0,0 +1,60 @@
+---
+title: "Hive Migration"
+url: hive-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 200
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Hive Table Migration
+Apache Hive supports ORC, Parquet, and Avro file formats that could be migrated to Iceberg.
+When migrating data to an Iceberg table, which provides versioning and transactional updates, only the most recent data files need to be migrated.
+
+Iceberg supports all three migration actions: Snapshot Table, Migrate Table, and Add Files for migrating from Hive tables to Iceberg tables. Since Hive tables do not maintain snapshots,
+the migration process essentially involves creating a new Iceberg table with the existing schema and committing all data files across all partitions to the new Iceberg table.
+After the initial migration, any new data files are added to the new Iceberg table using the Add Files action.
+
+## Enabling Migration from Hive to Iceberg
+The Hive table migration actions are supported by the Spark Integration module via Spark Procedures. 
+The procedures are bundled in the Spark runtime jar, which is available in the [Iceberg Release Downloads](https://iceberg.apache.org/releases/#downloads).
+
+## Snapshot Hive Table to Iceberg
+To snapshot a Hive table, users can run the following Spark SQL:
+```sql
+CALL catalog_name.system.snapshot('db.source', 'db.dest')
+```
+See [Spark Procedure: snapshot](../spark-procedures/#snapshot) for more details.
+
+## Migrate Hive Table To Iceberg
+To migrate a Hive table to Iceberg, users can run the following Spark SQL:
+```sql
+CALL catalog_name.system.migrate('db.sample')
+```
+See [Spark Procedure: migrate](../spark-procedures/#migrate) for more details.
+
+## Add Files From Delta Lake Table to Iceberg

Review Comment:
   Oops. Thank you for catching this. Sorry for the typo. 
   
   Have opened a new PR here: #7407, to address that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1154007345


##########
docs/delta-lake-migration.md:
##########
@@ -0,0 +1,128 @@
+---
+title: "Delta Lake Migration"
+url: delta-lake-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 200
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Delta Lake Table Migration
+Iceberg supports converting Delta Lake tables to Iceberg tables through the `iceberg-delta-lake` module
+by using [Delta Standalone](https://docs.delta.io/latest/delta-standalone.html) to read logs of Delta lake tables.
+
+
+## Enabling Migration from Delta Lake to Iceberg
+The `iceberg-delta-lake` module is not bundled with Spark and Flink engine runtimes. Users need to manually include this module in their environment to enable the conversion.
+Also, users need to provide [Delta Standalone](https://github.com/delta-io/connectors/releases/tag/v0.6.0) and [Delta Storage](https://repo1.maven.org/maven2/io/delta/delta-storage/2.2.0/)
+because they are what `iceberg-delta-lake` uses to read Delta Lake tables.
+
+## Compatibilities
+The module is built and tested with `Delta Standalone:0.6.0` and supports Delta Lake tables with the following protocol version:
+* `minReaderVersion`: 1
+* `minWriterVersion`: 2
+
+Please refer to [Delta Lake Table Protocol Versioning](https://docs.delta.io/latest/versioning.html) for more details about Delta Lake protocol versions.
+
+For delta lake table contains `TimestampType` columns, please make sure to set table property `read.parquet.vectorization.enabled` to `false` since the vectorized reader doesn't support `INT96` yet.
+Such support is under development in the [PR#6962](https://github.com/apache/iceberg/pull/6962)
+
+## API
+The `iceberg-delta-lake` module provides an interface named `DeltaLakeToIcebergMigrationActionsProvider`, which contains actions that helps converting from Delta Lake to Iceberg.
+The supported actions are:
+* `snapshotDeltaLakeTable`: snapshot an existing Delta Lake table to an Iceberg table
+
+## Default Implementation
+The `iceberg-delta-lake` module also provides a default implementation of the interface which can be accessed by
+```java
+DeltaLakeToIcebergMigrationActionsProvider defaultActions = DeltaLakeToIcebergMigrationActionsProvider.defaultActions()
+```
+
+### `snapshotDeltaLakeTable`
+The action reads the Delta Lake table's most recent snapshot and converts it to a new Iceberg table with the same schema and partitioning in one iceberg transaction.
+The original Delta Lake table remains unchanged.
+
+The newly created table can be changed or written to without affecting the source table, but the snapshot uses the original table's data files.
+Existing data files are added to the Iceberg table's metadata and can be read using a name-to-id mapping created from the original table schema.
+
+When inserts or overwrites run on the snapshot, new files are placed in the snapshot table's location. The location is default to be the same as that
+of the source Delta Lake Table. Users can also specify a different location for the snapshot table.
+
+{{< hint info >}}
+Because tables created by `snapshotDeltaLakeTable` are not the sole owners of their data files, they are prohibited from
+actions like `expire_snapshots` which would physically delete data files. Iceberg deletes, which only effect metadata,
+are still allowed. In addition, any operations which affect the original data files will disrupt the Snapshot's
+integrity. DELETE statements executed against the original Delta Lake table will remove original data files and the
+`snapshotDeltaLakeTable` table will no longer be able to access them.
+{{< /hint >}}
+
+#### Arguments
+The delta table's location is required to be provided when initializing the action.
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+|`sourceTableLocation` | Yes | String | The location of the source Delta Lake table | 
+
+#### Configurations
+The configurations can be gave via method chaining
+
+| Method Name | Arguments      | Required? | Type                                       | Description                                                                                                  |
+|---------------------------|----------------|-----------|--------------------------------------------|--------------------------------------------------------------------------------------------------------------|
+| `as`                      | `identifier`   | Yes       | org.apache.iceberg.catalog.TableIdentifier | The identifier of the Iceberg table to be created.                                                           |
+| `icebergCatalog`          | `catalog`      | Yes       | org.apache.iceberg.catalog.Catalog         | The Iceberg catalog for the Iceberg table to be created                                                      |
+| `deltaLakeConfiguration`  | `conf`         | Yes       | org.apache.hadoop.conf.Configuration       | The Hadoop Configuration to access Delta Lake Table's log and datafiles                                      |
+| `tableLocation`           | `location`     | No        | String                                     | The location of the Iceberg table to be created. Defaults to the same location as the given Delta Lake table |
+| `tableProperty`           | `name`,`value` | No        | String, String                             | A property entry to add to the Iceberg table to be created                                                   |
+| `tableProperties`         | `properties`   | No        | Map<String, String>                        | Properties to add to the the Iceberg table to be created                                                     |
+
+#### Output
+| Output Name | Type | Description |
+| ------------|------|-------------|
+| `imported_files_count` | long | Number of files added to the new table |
+
+#### Added Table Properties
+The following table properties are added to the Iceberg table to be created by default:
+
+| Property Name                 | Value                                     | Description                                                        |
+|-------------------------------|-------------------------------------------|--------------------------------------------------------------------|
+| `snapshot_source`             | `delta`                                   | Indicates that the table is snapshot from a delta lake table       |
+| `original_location`           | location of the delta lake table          | The absolute path to the location of the original delta lake table |
+| `schema.name-mapping.default` | JSON name mapping derived from the schema | The name mapping string used to read Delta Lake table's data files |
+
+#### Examples
+Snapshot a Delta Lake table located at `s3://my-bucket/delta-table` to an Iceberg table named `my_table` in the `my_db` database in the `my_catalog` catalog.
+```java
+DeltaLakeToIcebergMigrationActionsProvider.defaultActions()
+    .snapshotDeltaLakeTable("s3://my-bucket/delta-table")
+    .as("my_db.my_table")
+    .icebergCatalog("my_catalog")
+    .deltaLakeConfiguration(new Configuration())
+    .execute();
+```
+Snapshot a Delta Lake table located at `s3://my-bucket/delta-table` to an Iceberg table named `my_table` in the `my_db` database in the `my_catalog` catalog at a manually
+specified location `s3://my-bucket/snapshot-loc`. Also, add a table property `my_property` with value `my_value` to the Iceberg table.
+```java
+DeltaLakeToIcebergMigrationActionsProvider.defaultActions()
+    .snapshotDeltaLakeTable("s3://my-bucket/delta-table")
+    .as("my_db.my_table")
+    .icebergCatalog("my_catalog")
+    .deltaLakeConfiguration(new Configuration())

Review Comment:
    we should mention this configuration should come from engine like Spark, and should have proper file system configurations to access the Delta table. We can just have a placeholder like
   
   ```
   .deltaLakeConfiguration(hadoopConf)
   ```
   
   and add some explanations



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "JonasJ-ap (via GitHub)" <gi...@apache.org>.
JonasJ-ap commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1153925543


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Table Migration"
+url: table-migration
+weight: 1300
+menu: main
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approaches
+There are two main approaches to perform table migration: CTAS (Create Table As Select) and in-place migration.
+
+### Create-Table-As-Select Migration
+CTAS migration involves creating a new Iceberg table and copying data from the existing table to the new one. This method is preferred when you want to completely cut ties with your old table, ensuring the new table is independent and fully managed by Iceberg.
+However, CTAS migration may require more time to complete and might not be suitable for production use cases where downtime is not acceptable.
+![CTAS Migration](../../../img/iceberg-CTAS.png)
+### In-Place Migration
+In-place migration retains the existing data files but adds Iceberg metadata on top of them. This approach is faster and does not require copying data, making it more suitable for production use cases.
+![In-Place Migration](../../../img/iceberg-In-place.png)
+## In-Place Migration Actions
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions:
+
+1. Snapshot Table
+2. Migrate Table
+3. Add Files
+
+### Snapshot Table
+The Snapshot Table action creates a new iceberg table with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+**Step 1:** Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table
+![Snapshot Table Step 1](../../../img/iceberg-snapshotaction-step1.png)
+
+**Step 2:** Commit all data files across all partitions to the new Iceberg table. The source table remains unchanged.
+![Snapshot Table Step 2](../../../img/iceberg-snapshotaction-step2.png)
+### Migrate Table
+The Migrate Table action also creates a new Iceberg table with the same schema and partitioning as the source table. However, during the action execution, it locks and drops the source table from the catalog.
+Consequently, Migrate Table requires all readers and writers working on the source table to be stopped before the action is performed.
+
+**Step 1:** Stop all readers and writers interacting with the source table
+![Migrate Table Step 1](../../../img/iceberg-migrateaction-step1.png)
+
+**Step 2:** Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table. Rename the source table for a backup in case of failure and rollback.
+![Migrate Table Step 2](../../../img/iceberg-migrateaction-step2.png)
+
+**Step 3:** Commit all data files across all partitions to the new Iceberg table. Drop the source table.
+![Migrate Table Step 3](../../../img/iceberg-migrateaction-step3.png)
+### Add Files
+After the initial step (either Snapshot Table or Migrate Table), it is common to find some data files that have not been migrated. These files often originate from concurrent writers who continue writing to the source table during or after the migration process.
+In practice, these files can be new data files in Hive tables or new snapshots (versions) of Delta Lake tables. The Add Files action is essential for incorporating these files into the Iceberg table.
+
+## In-Place Migration Completion
+Once all data files have been migrated and there are no more concurrent writers writing to the source table, the migration process is complete.
+Readers and writers can now switch to the new Iceberg table for their operations.
+
+## Migration Implementation: From Hive/Spark to Iceberg
+Apache Hive and Apache Spark are two popular data warehouse systems used for big data processing and analysis.
+However, both systems do not natively support time travel or rollback to previous snapshots.
+When migrating data to an Iceberg table, which provides versioning and transactional updates, only the most recent data files need to be migrated.
+
+Iceberg supports all three migration actions: Snapshot Table, Migrate Table, and Add Files for migrating from Hive or Spark tables to Iceberg tables. Since Hive or Spark tables do not maintain snapshots,
+the migration process essentially involves creating a new Iceberg table with the existing schema and committing all data files across all partitions to the new Iceberg table.
+After the initial migration, any new data files are added to the new Iceberg table using the Add Files action.
+
+For more details on how to perform the migration on Hive/Spark tables, please refer to the [Table Migration via Spark Procedures](../spark-procedures/#table-migration) page.
+
+## Migration Implementation: From Delta Lake to Iceberg
+Delta Lake is a popular data storage system that provides time travel and versioning features. When migrating data from Delta Lake to Iceberg,
+it is common to migrate all snapshots to maintain the history of the data.
+
+Currently, Iceberg only supports the Snapshot Table action for migrating from Delta Lake to Iceberg tables. Since Delta Lake tables maintain snapshots, all available snapshots will be committed to the new Iceberg table as transactions in order.
+For Delta Lake tables, any additional data files added after the initial migration will be included in their corresponding snapshots and subsequently added to the new Iceberg table using the Add Snapshot action. The Add Snapshot action, a variant of the Add File action, is still under development.
+
+For more details on how to perform the migration on Delta Lake tables, please refer to the [Delta Lake Migration](../delta-lake-migration/#delta-lake-table-migration) page.

Review Comment:
   Thank you for your suggestion. I agree that we want to keep the content for Delta Lake Migration close to the general introduction of Migration. In this case, I think we can have a group named "Migration" and two sub-pages, one for the overview of table migration and one for the Delta Lake Migration. Do you think it is a proper way to align these pages?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1145672320


##########
docs/table-migration.md:
##########
@@ -0,0 +1,129 @@
+---
+title: "Table Migration"
+url: table-migration
+weight: 400
+menu: main

Review Comment:
   I think we are missing some introduction of the general idea of migration. There should be 2 cases, where (1) users want to migrate using something like a CTAS, that is actually preferred because you completely cut tie with your old table. But it takes more time and might not be possible for production use cases. (2) users use in place migration, which retains existing files but add Iceberg metadata on top of it. And what we have in this page explains more about the later approach. Then we can divide into Hive and Delta cases to explain further.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "JonasJ-ap (via GitHub)" <gi...@apache.org>.
JonasJ-ap commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1149858806


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Table Migration"
+url: table-migration
+weight: 1300
+menu: main
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approaches
+There are two main approaches to perform table migration: CTAS (Create Table As Select) and in-place migration.
+
+### Create-Table-As-Select Migration
+CTAS migration involves creating a new Iceberg table and copying data from the existing table to the new one. This method is preferred when you want to completely cut ties with your old table, ensuring the new table is independent and fully managed by Iceberg.
+However, CTAS migration may require more time to complete and might not be suitable for production use cases where downtime is not acceptable.
+![CTAS Migration](../../../img/iceberg-CTAS.png)
+### In-Place Migration
+In-place migration retains the existing data files but adds Iceberg metadata on top of them. This approach is faster and does not require copying data, making it more suitable for production use cases.
+![In-Place Migration](../../../img/iceberg-In-place.png)
+## In-Place Migration Actions
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions:
+
+1. Snapshot Table
+2. Migrate Table
+3. Add Files
+
+### Snapshot Table
+The Snapshot Table action creates a new iceberg table with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+**Step 1:** Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table
+![Snapshot Table Step 1](../../../img/iceberg-snapshotaction-step1.png)
+
+**Step 2:** Commit all data files across all partitions to the new Iceberg table. The source table remains unchanged.
+![Snapshot Table Step 2](../../../img/iceberg-snapshotaction-step2.png)
+### Migrate Table
+The Migrate Table action also creates a new Iceberg table with the same schema and partitioning as the source table. However, during the action execution, it locks and drops the source table from the catalog.
+Consequently, Migrate Table requires all readers and writers working on the source table to be stopped before the action is performed.
+
+**Step 1:** Stop all readers and writers interacting with the source table
+![Migrate Table Step 1](../../../img/iceberg-migrateaction-step1.png)

Review Comment:
   These graphs are supposed to be located in https://github.com/apache/iceberg-docs/tree/main/iceberg-theme/static/img. Not sure if I need to open a separate PR in `iceberg-doc` to add these graphs.
   
   But before that, I would like to put a preview of the page with pictures here
   
   [table_migration_preview.pdf](https://github.com/apache/iceberg/files/11083574/table_migration_preview.pdf)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1150944849


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Table Migration"
+url: table-migration
+weight: 1300
+menu: main
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approaches
+There are two main approaches to perform table migration: CTAS (Create Table As Select) and in-place migration.
+
+### Create-Table-As-Select Migration
+CTAS migration involves creating a new Iceberg table and copying data from the existing table to the new one. This method is preferred when you want to completely cut ties with your old table, ensuring the new table is independent and fully managed by Iceberg.
+However, CTAS migration may require more time to complete and might not be suitable for production use cases where downtime is not acceptable.
+![CTAS Migration](../../../img/iceberg-CTAS.png)
+### In-Place Migration
+In-place migration retains the existing data files but adds Iceberg metadata on top of them. This approach is faster and does not require copying data, making it more suitable for production use cases.
+![In-Place Migration](../../../img/iceberg-In-place.png)
+## In-Place Migration Actions
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions:
+
+1. Snapshot Table
+2. Migrate Table
+3. Add Files
+
+### Snapshot Table
+The Snapshot Table action creates a new iceberg table with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+**Step 1:** Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table
+![Snapshot Table Step 1](../../../img/iceberg-snapshotaction-step1.png)
+
+**Step 2:** Commit all data files across all partitions to the new Iceberg table. The source table remains unchanged.
+![Snapshot Table Step 2](../../../img/iceberg-snapshotaction-step2.png)
+### Migrate Table
+The Migrate Table action also creates a new Iceberg table with the same schema and partitioning as the source table. However, during the action execution, it locks and drops the source table from the catalog.
+Consequently, Migrate Table requires all readers and writers working on the source table to be stopped before the action is performed.
+
+**Step 1:** Stop all readers and writers interacting with the source table
+![Migrate Table Step 1](../../../img/iceberg-migrateaction-step1.png)
+
+**Step 2:** Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table. Rename the source table for a backup in case of failure and rollback.
+![Migrate Table Step 2](../../../img/iceberg-migrateaction-step2.png)
+
+**Step 3:** Commit all data files across all partitions to the new Iceberg table. Drop the source table.
+![Migrate Table Step 3](../../../img/iceberg-migrateaction-step3.png)
+### Add Files
+After the initial step (either Snapshot Table or Migrate Table), it is common to find some data files that have not been migrated. These files often originate from concurrent writers who continue writing to the source table during or after the migration process.
+In practice, these files can be new data files in Hive tables or new snapshots (versions) of Delta Lake tables. The Add Files action is essential for incorporating these files into the Iceberg table.
+
+## In-Place Migration Completion
+Once all data files have been migrated and there are no more concurrent writers writing to the source table, the migration process is complete.
+Readers and writers can now switch to the new Iceberg table for their operations.
+
+## Migration Implementation: From Hive/Spark to Iceberg
+Apache Hive and Apache Spark are two popular data warehouse systems used for big data processing and analysis.
+However, both systems do not natively support time travel or rollback to previous snapshots.
+When migrating data to an Iceberg table, which provides versioning and transactional updates, only the most recent data files need to be migrated.
+
+Iceberg supports all three migration actions: Snapshot Table, Migrate Table, and Add Files for migrating from Hive or Spark tables to Iceberg tables. Since Hive or Spark tables do not maintain snapshots,
+the migration process essentially involves creating a new Iceberg table with the existing schema and committing all data files across all partitions to the new Iceberg table.
+After the initial migration, any new data files are added to the new Iceberg table using the Add Files action.
+
+For more details on how to perform the migration on Hive/Spark tables, please refer to the [Table Migration via Spark Procedures](../spark-procedures/#table-migration) page.
+
+## Migration Implementation: From Delta Lake to Iceberg
+Delta Lake is a popular data storage system that provides time travel and versioning features. When migrating data from Delta Lake to Iceberg,

Review Comment:
   Similar comment about removing opinions, no need to say "is a popular storage system"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1150935997


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Table Migration"
+url: table-migration
+weight: 1300
+menu: main
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approaches
+There are two main approaches to perform table migration: CTAS (Create Table As Select) and in-place migration.
+
+### Create-Table-As-Select Migration
+CTAS migration involves creating a new Iceberg table and copying data from the existing table to the new one. This method is preferred when you want to completely cut ties with your old table, ensuring the new table is independent and fully managed by Iceberg.
+However, CTAS migration may require more time to complete and might not be suitable for production use cases where downtime is not acceptable.
+![CTAS Migration](../../../img/iceberg-CTAS.png)
+### In-Place Migration
+In-place migration retains the existing data files but adds Iceberg metadata on top of them. This approach is faster and does not require copying data, making it more suitable for production use cases.
+![In-Place Migration](../../../img/iceberg-In-place.png)
+## In-Place Migration Actions
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions:
+
+1. Snapshot Table
+2. Migrate Table
+3. Add Files
+
+### Snapshot Table
+The Snapshot Table action creates a new iceberg table with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+**Step 1:** Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table
+![Snapshot Table Step 1](../../../img/iceberg-snapshotaction-step1.png)
+
+**Step 2:** Commit all data files across all partitions to the new Iceberg table. The source table remains unchanged.
+![Snapshot Table Step 2](../../../img/iceberg-snapshotaction-step2.png)
+### Migrate Table
+The Migrate Table action also creates a new Iceberg table with the same schema and partitioning as the source table. However, during the action execution, it locks and drops the source table from the catalog.
+Consequently, Migrate Table requires all readers and writers working on the source table to be stopped before the action is performed.

Review Comment:
   I think for writers you are right they need to all stop, for reader it does not have to be, because the reader can support reading both types of table.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1150948444


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Table Migration"
+url: table-migration
+weight: 1300
+menu: main
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approaches
+There are two main approaches to perform table migration: CTAS (Create Table As Select) and in-place migration.
+
+### Create-Table-As-Select Migration
+CTAS migration involves creating a new Iceberg table and copying data from the existing table to the new one. This method is preferred when you want to completely cut ties with your old table, ensuring the new table is independent and fully managed by Iceberg.
+However, CTAS migration may require more time to complete and might not be suitable for production use cases where downtime is not acceptable.
+![CTAS Migration](../../../img/iceberg-CTAS.png)
+### In-Place Migration
+In-place migration retains the existing data files but adds Iceberg metadata on top of them. This approach is faster and does not require copying data, making it more suitable for production use cases.
+![In-Place Migration](../../../img/iceberg-In-place.png)
+## In-Place Migration Actions
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions:
+
+1. Snapshot Table

Review Comment:
   since you already have dedicated sections below, we don't need to list them out here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#issuecomment-1518708928

   Thanks for keeping the work going on! I think we are very close, just 3 nit comments and we should be good to go! Let me know when you have another look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1174492539


##########
docs/hive-migration.md:
##########
@@ -0,0 +1,60 @@
+---
+title: "Hive Migration"
+url: hive-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 200
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Hive Table Migration
+Apache Hive supports ORC, Parquet, and Avro file formats that could be migrated to Iceberg.
+When migrating data to an Iceberg table, which provides versioning and transactional updates, only the most recent data files need to be migrated.
+
+Iceberg supports all three migration actions: Snapshot Table, Migrate Table, and Add Files for migrating from Hive tables to Iceberg tables. Since Hive tables do not maintain snapshots,
+the migration process essentially involves creating a new Iceberg table with the existing schema and committing all data files across all partitions to the new Iceberg table.
+After the initial migration, any new data files are added to the new Iceberg table using the Add Files action.
+
+## Enabling Migration from Hive to Iceberg
+The Hive table migration actions are supported by the Spark Integration module via Spark Procedures. 
+The procedures are bundled in the Spark runtime jar, which is available in the [Iceberg Release Downloads](https://iceberg.apache.org/releases/#downloads).
+
+## Snapshot Hive Table to Iceberg
+To snapshot a Hive table, users can run the following Spark SQL:
+```sql
+CALL catalog_name.system.snapshot('db.source', 'db.dest')
+```
+See [Spark Procedure: snapshot](../spark-procedures/#snapshot) for more details.
+
+## Migrate Hive Table To Iceberg
+To migrate a Hive table to Iceberg, users can run the following Spark SQL:
+```sql
+CALL catalog_name.system.migrate('db.sample')
+```
+See [Spark Procedure: migrate](../spark-procedures/#migrate) for more details.
+
+## Add Files From Delta Lake Table to Iceberg

Review Comment:
   Should say hive



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "JonasJ-ap (via GitHub)" <gi...@apache.org>.
JonasJ-ap commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1159293697


##########
docs/delta-lake-migration.md:
##########
@@ -0,0 +1,128 @@
+---
+title: "Delta Lake Migration"
+url: delta-lake-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 200
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Delta Lake Table Migration
+Iceberg supports converting Delta Lake tables to Iceberg tables through the `iceberg-delta-lake` module
+by using [Delta Standalone](https://docs.delta.io/latest/delta-standalone.html) to read logs of Delta lake tables.
+
+
+## Enabling Migration from Delta Lake to Iceberg
+The `iceberg-delta-lake` module is not bundled with Spark and Flink engine runtimes. Users need to manually include this module in their environment to enable the conversion.
+Also, users need to provide [Delta Standalone](https://github.com/delta-io/connectors/releases/tag/v0.6.0) and [Delta Storage](https://repo1.maven.org/maven2/io/delta/delta-storage/2.2.0/)
+because they are what `iceberg-delta-lake` uses to read Delta Lake tables.
+
+## Compatibilities
+The module is built and tested with `Delta Standalone:0.6.0` and supports Delta Lake tables with the following protocol version:
+* `minReaderVersion`: 1
+* `minWriterVersion`: 2
+
+Please refer to [Delta Lake Table Protocol Versioning](https://docs.delta.io/latest/versioning.html) for more details about Delta Lake protocol versions.
+
+For delta lake table contains `TimestampType` columns, please make sure to set table property `read.parquet.vectorization.enabled` to `false` since the vectorized reader doesn't support `INT96` yet.
+Such support is under development in the [PR#6962](https://github.com/apache/iceberg/pull/6962)
+
+## API
+The `iceberg-delta-lake` module provides an interface named `DeltaLakeToIcebergMigrationActionsProvider`, which contains actions that helps converting from Delta Lake to Iceberg.
+The supported actions are:
+* `snapshotDeltaLakeTable`: snapshot an existing Delta Lake table to an Iceberg table
+
+## Default Implementation
+The `iceberg-delta-lake` module also provides a default implementation of the interface which can be accessed by
+```java
+DeltaLakeToIcebergMigrationActionsProvider defaultActions = DeltaLakeToIcebergMigrationActionsProvider.defaultActions()
+```
+
+### `snapshotDeltaLakeTable`
+The action reads the Delta Lake table's most recent snapshot and converts it to a new Iceberg table with the same schema and partitioning in one iceberg transaction.
+The original Delta Lake table remains unchanged.
+
+The newly created table can be changed or written to without affecting the source table, but the snapshot uses the original table's data files.
+Existing data files are added to the Iceberg table's metadata and can be read using a name-to-id mapping created from the original table schema.
+
+When inserts or overwrites run on the snapshot, new files are placed in the snapshot table's location. The location is default to be the same as that
+of the source Delta Lake Table. Users can also specify a different location for the snapshot table.
+
+{{< hint info >}}
+Because tables created by `snapshotDeltaLakeTable` are not the sole owners of their data files, they are prohibited from
+actions like `expire_snapshots` which would physically delete data files. Iceberg deletes, which only effect metadata,
+are still allowed. In addition, any operations which affect the original data files will disrupt the Snapshot's
+integrity. DELETE statements executed against the original Delta Lake table will remove original data files and the
+`snapshotDeltaLakeTable` table will no longer be able to access them.
+{{< /hint >}}
+
+#### Arguments
+The delta table's location is required to be provided when initializing the action.
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+|`sourceTableLocation` | Yes | String | The location of the source Delta Lake table | 
+
+#### Configurations

Review Comment:
   Thank you for your suggestions. I agree it looks a liitle redundant as the table just repeat the content in the javadoc. However, I still want to includes the basic Input/Output here so that users can understand the minimum requirements to run the action without being re-direct to other pages. Hence, I re-factored the whole thing as a small table which contains two column which include the Input content and link to the configuration API. For other optional configurations, I just put a javadoc link. Do you think it is proper to refactor like this?



##########
docs/delta-lake-migration.md:
##########
@@ -0,0 +1,128 @@
+---
+title: "Delta Lake Migration"
+url: delta-lake-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 200
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Delta Lake Table Migration
+Iceberg supports converting Delta Lake tables to Iceberg tables through the `iceberg-delta-lake` module
+by using [Delta Standalone](https://docs.delta.io/latest/delta-standalone.html) to read logs of Delta lake tables.
+
+
+## Enabling Migration from Delta Lake to Iceberg
+The `iceberg-delta-lake` module is not bundled with Spark and Flink engine runtimes. Users need to manually include this module in their environment to enable the conversion.
+Also, users need to provide [Delta Standalone](https://github.com/delta-io/connectors/releases/tag/v0.6.0) and [Delta Storage](https://repo1.maven.org/maven2/io/delta/delta-storage/2.2.0/)
+because they are what `iceberg-delta-lake` uses to read Delta Lake tables.
+
+## Compatibilities
+The module is built and tested with `Delta Standalone:0.6.0` and supports Delta Lake tables with the following protocol version:
+* `minReaderVersion`: 1
+* `minWriterVersion`: 2
+
+Please refer to [Delta Lake Table Protocol Versioning](https://docs.delta.io/latest/versioning.html) for more details about Delta Lake protocol versions.
+
+For delta lake table contains `TimestampType` columns, please make sure to set table property `read.parquet.vectorization.enabled` to `false` since the vectorized reader doesn't support `INT96` yet.
+Such support is under development in the [PR#6962](https://github.com/apache/iceberg/pull/6962)
+
+## API
+The `iceberg-delta-lake` module provides an interface named `DeltaLakeToIcebergMigrationActionsProvider`, which contains actions that helps converting from Delta Lake to Iceberg.
+The supported actions are:
+* `snapshotDeltaLakeTable`: snapshot an existing Delta Lake table to an Iceberg table
+
+## Default Implementation
+The `iceberg-delta-lake` module also provides a default implementation of the interface which can be accessed by
+```java
+DeltaLakeToIcebergMigrationActionsProvider defaultActions = DeltaLakeToIcebergMigrationActionsProvider.defaultActions()
+```
+
+### `snapshotDeltaLakeTable`
+The action reads the Delta Lake table's most recent snapshot and converts it to a new Iceberg table with the same schema and partitioning in one iceberg transaction.
+The original Delta Lake table remains unchanged.
+
+The newly created table can be changed or written to without affecting the source table, but the snapshot uses the original table's data files.
+Existing data files are added to the Iceberg table's metadata and can be read using a name-to-id mapping created from the original table schema.
+
+When inserts or overwrites run on the snapshot, new files are placed in the snapshot table's location. The location is default to be the same as that
+of the source Delta Lake Table. Users can also specify a different location for the snapshot table.
+
+{{< hint info >}}
+Because tables created by `snapshotDeltaLakeTable` are not the sole owners of their data files, they are prohibited from
+actions like `expire_snapshots` which would physically delete data files. Iceberg deletes, which only effect metadata,
+are still allowed. In addition, any operations which affect the original data files will disrupt the Snapshot's
+integrity. DELETE statements executed against the original Delta Lake table will remove original data files and the
+`snapshotDeltaLakeTable` table will no longer be able to access them.
+{{< /hint >}}
+
+#### Arguments
+The delta table's location is required to be provided when initializing the action.
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+|`sourceTableLocation` | Yes | String | The location of the source Delta Lake table | 
+
+#### Configurations

Review Comment:
   Thank you for your suggestions. I agree it looks a liitle redundant as the table just repeat the content in the javadoc. However, I still want to include the basic Input/Output here so that users can understand the minimum requirements to run the action without being re-direct to other pages. Hence, I re-factored the whole thing as a small table which contains two column which include the Input content and link to the configuration API. For other optional configurations, I just put a javadoc link. Do you think it is proper to refactor like this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "JonasJ-ap (via GitHub)" <gi...@apache.org>.
JonasJ-ap commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1159289015


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Overview"
+url: table-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 100
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approach
+There are two primary methods for executing table migration: full data migration and in-place migration. In this doc, we will describe more about in-place migration.
+
+In-place migration preserves the existing data files while incorporating Iceberg metadata on top of them.
+This method is not only faster but also eliminates the need for data duplication, making it more appropriate for production environments.
+
+![In-Place Migration](../../../img/iceberg-in-place-migration.png)
+
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions: `Snapshot Table`, `Migrate Table`, and `Add Files`.
+
+## Snapshot Table
+The Snapshot Table action creates a new iceberg table with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+- Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table
+
+![Snapshot Table Step 1](../../../img/iceberg-snapshotaction-step1.png)
+
+- Commit all data files across all partitions to the new Iceberg table. The source table remains unchanged.
+
+![Snapshot Table Step 2](../../../img/iceberg-snapshotaction-step2.png)
+## Migrate Table
+The Migrate Table action also creates a new Iceberg table with the same schema and partitioning as the source table. However, during the action execution, it locks and drops the source table from the catalog.
+Consequently, Migrate Table requires all modifications working on the source table to be stopped before the action is performed.
+
+Stop all writers interacting with the source table. Readers that also supports Iceberg may continue reading.
+
+![Migrate Table Step 1](../../../img/iceberg-migrateaction-step1.png)
+
+- Create a new Iceberg table with the same identifier and metadata (schema, partition spec, etc.) as the source table. Rename the source table for a backup in case of failure and rollback.
+
+![Migrate Table Step 2](../../../img/iceberg-migrateaction-step2.png)
+
+- Commit all data files across all partitions to the new Iceberg table. Drop the source table.
+
+![Migrate Table Step 3](../../../img/iceberg-migrateaction-step3.png)
+## Add Files
+After the initial step (either Snapshot Table or Migrate Table), it is common to find some data files that have not been migrated. These files often originate from concurrent writers who continue writing to the source table during or after the migration process.
+In practice, these files can be new data files in Hive tables or new snapshots (versions) of Delta Lake tables. The Add Files action is essential for incorporating these files into the Iceberg table.
+
+## In-Place Migration Completion
+Once all data files have been migrated and there are no more concurrent writers writing to the source table, the migration process is complete.
+Readers and writers can now switch to the new Iceberg table for their operations.
+
+# Migrating From Different Table Formats
+## From Hive to Iceberg

Review Comment:
   I added a new page called `Hive-Migration` and make it reside in the `Migration` subgroup



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "JonasJ-ap (via GitHub)" <gi...@apache.org>.
JonasJ-ap commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1153931597


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Table Migration"
+url: table-migration
+weight: 1300
+menu: main
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approaches
+There are two main approaches to perform table migration: CTAS (Create Table As Select) and in-place migration.
+
+### Create-Table-As-Select Migration
+CTAS migration involves creating a new Iceberg table and copying data from the existing table to the new one. This method is preferred when you want to completely cut ties with your old table, ensuring the new table is independent and fully managed by Iceberg.
+However, CTAS migration may require more time to complete and might not be suitable for production use cases where downtime is not acceptable.
+![CTAS Migration](../../../img/iceberg-CTAS.png)
+### In-Place Migration
+In-place migration retains the existing data files but adds Iceberg metadata on top of them. This approach is faster and does not require copying data, making it more suitable for production use cases.
+![In-Place Migration](../../../img/iceberg-In-place.png)
+## In-Place Migration Actions
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions:
+
+1. Snapshot Table
+2. Migrate Table
+3. Add Files
+
+### Snapshot Table
+The Snapshot Table action creates a new iceberg table with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+**Step 1:** Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table
+![Snapshot Table Step 1](../../../img/iceberg-snapshotaction-step1.png)
+
+**Step 2:** Commit all data files across all partitions to the new Iceberg table. The source table remains unchanged.
+![Snapshot Table Step 2](../../../img/iceberg-snapshotaction-step2.png)
+### Migrate Table
+The Migrate Table action also creates a new Iceberg table with the same schema and partitioning as the source table. However, during the action execution, it locks and drops the source table from the catalog.
+Consequently, Migrate Table requires all readers and writers working on the source table to be stopped before the action is performed.

Review Comment:
   Thank you for your explanation. I have a follow-up question regarding the migration process. 
   
   When performing the migration, I understand that the source table is first renamed as a backup, and a staged iceberg table with the same identifier is created. However, it appears that the staged changes will not be committed until all data files have been migrated. This creates a period during the migration process where readers will not be able to find any table with the source table identifier in the catalog. In light of this, I would like to confirm if it is still safe to keep the existing readers active during this period. Thank you!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1153992931


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Overview"
+url: table-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 100
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approach
+There are two primary methods for executing table migration: full data migration and in-place migration. In this doc, we will describe more about in-place migration.
+
+In-place migration preserves the existing data files while incorporating Iceberg metadata on top of them.
+This method is not only faster but also eliminates the need for data duplication, making it more appropriate for production environments.
+
+![In-Place Migration](../../../img/iceberg-in-place-migration.png)
+
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions: `Snapshot Table`, `Migrate Table`, and `Add Files`.
+
+## Snapshot Table
+The Snapshot Table action creates a new iceberg table with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+- Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table
+
+![Snapshot Table Step 1](../../../img/iceberg-snapshotaction-step1.png)
+
+- Commit all data files across all partitions to the new Iceberg table. The source table remains unchanged.
+
+![Snapshot Table Step 2](../../../img/iceberg-snapshotaction-step2.png)
+## Migrate Table

Review Comment:
   nit: each header should have empty line above



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1153997002


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Table Migration"
+url: table-migration
+weight: 1300
+menu: main
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approaches
+There are two main approaches to perform table migration: CTAS (Create Table As Select) and in-place migration.
+
+### Create-Table-As-Select Migration
+CTAS migration involves creating a new Iceberg table and copying data from the existing table to the new one. This method is preferred when you want to completely cut ties with your old table, ensuring the new table is independent and fully managed by Iceberg.
+However, CTAS migration may require more time to complete and might not be suitable for production use cases where downtime is not acceptable.
+![CTAS Migration](../../../img/iceberg-CTAS.png)
+### In-Place Migration
+In-place migration retains the existing data files but adds Iceberg metadata on top of them. This approach is faster and does not require copying data, making it more suitable for production use cases.
+![In-Place Migration](../../../img/iceberg-In-place.png)
+## In-Place Migration Actions
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions:
+
+1. Snapshot Table
+2. Migrate Table
+3. Add Files
+
+### Snapshot Table
+The Snapshot Table action creates a new iceberg table with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+**Step 1:** Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table
+![Snapshot Table Step 1](../../../img/iceberg-snapshotaction-step1.png)
+
+**Step 2:** Commit all data files across all partitions to the new Iceberg table. The source table remains unchanged.
+![Snapshot Table Step 2](../../../img/iceberg-snapshotaction-step2.png)
+### Migrate Table
+The Migrate Table action also creates a new Iceberg table with the same schema and partitioning as the source table. However, during the action execution, it locks and drops the source table from the catalog.
+Consequently, Migrate Table requires all readers and writers working on the source table to be stopped before the action is performed.
+
+**Step 1:** Stop all readers and writers interacting with the source table
+![Migrate Table Step 1](../../../img/iceberg-migrateaction-step1.png)
+
+**Step 2:** Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table. Rename the source table for a backup in case of failure and rollback.
+![Migrate Table Step 2](../../../img/iceberg-migrateaction-step2.png)
+
+**Step 3:** Commit all data files across all partitions to the new Iceberg table. Drop the source table.
+![Migrate Table Step 3](../../../img/iceberg-migrateaction-step3.png)
+### Add Files
+After the initial step (either Snapshot Table or Migrate Table), it is common to find some data files that have not been migrated. These files often originate from concurrent writers who continue writing to the source table during or after the migration process.
+In practice, these files can be new data files in Hive tables or new snapshots (versions) of Delta Lake tables. The Add Files action is essential for incorporating these files into the Iceberg table.
+
+## In-Place Migration Completion
+Once all data files have been migrated and there are no more concurrent writers writing to the source table, the migration process is complete.
+Readers and writers can now switch to the new Iceberg table for their operations.
+
+## Migration Implementation: From Hive/Spark to Iceberg
+Apache Hive and Apache Spark are two popular data warehouse systems used for big data processing and analysis.
+However, both systems do not natively support time travel or rollback to previous snapshots.
+When migrating data to an Iceberg table, which provides versioning and transactional updates, only the most recent data files need to be migrated.
+
+Iceberg supports all three migration actions: Snapshot Table, Migrate Table, and Add Files for migrating from Hive or Spark tables to Iceberg tables. Since Hive or Spark tables do not maintain snapshots,
+the migration process essentially involves creating a new Iceberg table with the existing schema and committing all data files across all partitions to the new Iceberg table.
+After the initial migration, any new data files are added to the new Iceberg table using the Add Files action.
+
+For more details on how to perform the migration on Hive/Spark tables, please refer to the [Table Migration via Spark Procedures](../spark-procedures/#table-migration) page.
+
+## Migration Implementation: From Delta Lake to Iceberg
+Delta Lake is a popular data storage system that provides time travel and versioning features. When migrating data from Delta Lake to Iceberg,
+it is common to migrate all snapshots to maintain the history of the data.
+
+Currently, Iceberg only supports the Snapshot Table action for migrating from Delta Lake to Iceberg tables. Since Delta Lake tables maintain snapshots, all available snapshots will be committed to the new Iceberg table as transactions in order.
+For Delta Lake tables, any additional data files added after the initial migration will be included in their corresponding snapshots and subsequently added to the new Iceberg table using the Add Snapshot action. The Add Snapshot action, a variant of the Add File action, is still under development.
+
+For more details on how to perform the migration on Delta Lake tables, please refer to the [Delta Lake Migration](../delta-lake-migration/#delta-lake-table-migration) page.

Review Comment:
   yes that sounds good to me!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1153998086


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Overview"
+url: table-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 100
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approach
+There are two primary methods for executing table migration: full data migration and in-place migration. In this doc, we will describe more about in-place migration.
+
+In-place migration preserves the existing data files while incorporating Iceberg metadata on top of them.
+This method is not only faster but also eliminates the need for data duplication, making it more appropriate for production environments.
+
+![In-Place Migration](../../../img/iceberg-in-place-migration.png)
+
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions: `Snapshot Table`, `Migrate Table`, and `Add Files`.
+
+## Snapshot Table
+The Snapshot Table action creates a new iceberg table with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+- Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table
+
+![Snapshot Table Step 1](../../../img/iceberg-snapshotaction-step1.png)
+
+- Commit all data files across all partitions to the new Iceberg table. The source table remains unchanged.
+
+![Snapshot Table Step 2](../../../img/iceberg-snapshotaction-step2.png)
+## Migrate Table
+The Migrate Table action also creates a new Iceberg table with the same schema and partitioning as the source table. However, during the action execution, it locks and drops the source table from the catalog.
+Consequently, Migrate Table requires all modifications working on the source table to be stopped before the action is performed.
+
+Stop all writers interacting with the source table. Readers that also supports Iceberg may continue reading.
+
+![Migrate Table Step 1](../../../img/iceberg-migrateaction-step1.png)
+
+- Create a new Iceberg table with the same identifier and metadata (schema, partition spec, etc.) as the source table. Rename the source table for a backup in case of failure and rollback.
+
+![Migrate Table Step 2](../../../img/iceberg-migrateaction-step2.png)
+
+- Commit all data files across all partitions to the new Iceberg table. Drop the source table.
+
+![Migrate Table Step 3](../../../img/iceberg-migrateaction-step3.png)
+## Add Files
+After the initial step (either Snapshot Table or Migrate Table), it is common to find some data files that have not been migrated. These files often originate from concurrent writers who continue writing to the source table during or after the migration process.
+In practice, these files can be new data files in Hive tables or new snapshots (versions) of Delta Lake tables. The Add Files action is essential for incorporating these files into the Iceberg table.
+
+## In-Place Migration Completion
+Once all data files have been migrated and there are no more concurrent writers writing to the source table, the migration process is complete.
+Readers and writers can now switch to the new Iceberg table for their operations.
+
+# Migrating From Different Table Formats
+## From Hive to Iceberg
+Apache Hive supports ORC, Parquet, and Avro file formats that could be migrated to Iceberg.
+When migrating data to an Iceberg table, which provides versioning and transactional updates, only the most recent data files need to be migrated.
+
+Iceberg supports all three migration actions: Snapshot Table, Migrate Table, and Add Files for migrating from Hive tables to Iceberg tables. Since Hive tables do not maintain snapshots,
+the migration process essentially involves creating a new Iceberg table with the existing schema and committing all data files across all partitions to the new Iceberg table.
+After the initial migration, any new data files are added to the new Iceberg table using the Add Files action.
+
+For more details on how to perform the migration on Hive tables, please refer to the [Hive Table Migration via Spark Procedures](../spark-procedures/#table-migration) page.

Review Comment:
   we should explicitly callout the snapshot, migrate, add files parts, something like:
   
   ```
   ### Snapshot Hive Table
   
   To snapshot a Hive table, you can run Spark SQL:
   
   
   // procedure example
   
   
   See [xxx](link) for more details.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1154009044


##########
docs/delta-lake-migration.md:
##########
@@ -0,0 +1,128 @@
+---
+title: "Delta Lake Migration"
+url: delta-lake-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 200
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Delta Lake Table Migration
+Iceberg supports converting Delta Lake tables to Iceberg tables through the `iceberg-delta-lake` module
+by using [Delta Standalone](https://docs.delta.io/latest/delta-standalone.html) to read logs of Delta lake tables.
+
+
+## Enabling Migration from Delta Lake to Iceberg
+The `iceberg-delta-lake` module is not bundled with Spark and Flink engine runtimes. Users need to manually include this module in their environment to enable the conversion.
+Also, users need to provide [Delta Standalone](https://github.com/delta-io/connectors/releases/tag/v0.6.0) and [Delta Storage](https://repo1.maven.org/maven2/io/delta/delta-storage/2.2.0/)
+because they are what `iceberg-delta-lake` uses to read Delta Lake tables.
+
+## Compatibilities
+The module is built and tested with `Delta Standalone:0.6.0` and supports Delta Lake tables with the following protocol version:
+* `minReaderVersion`: 1
+* `minWriterVersion`: 2
+
+Please refer to [Delta Lake Table Protocol Versioning](https://docs.delta.io/latest/versioning.html) for more details about Delta Lake protocol versions.
+
+For delta lake table contains `TimestampType` columns, please make sure to set table property `read.parquet.vectorization.enabled` to `false` since the vectorized reader doesn't support `INT96` yet.
+Such support is under development in the [PR#6962](https://github.com/apache/iceberg/pull/6962)
+
+## API
+The `iceberg-delta-lake` module provides an interface named `DeltaLakeToIcebergMigrationActionsProvider`, which contains actions that helps converting from Delta Lake to Iceberg.
+The supported actions are:
+* `snapshotDeltaLakeTable`: snapshot an existing Delta Lake table to an Iceberg table
+
+## Default Implementation
+The `iceberg-delta-lake` module also provides a default implementation of the interface which can be accessed by
+```java
+DeltaLakeToIcebergMigrationActionsProvider defaultActions = DeltaLakeToIcebergMigrationActionsProvider.defaultActions()
+```
+
+### `snapshotDeltaLakeTable`
+The action reads the Delta Lake table's most recent snapshot and converts it to a new Iceberg table with the same schema and partitioning in one iceberg transaction.
+The original Delta Lake table remains unchanged.
+
+The newly created table can be changed or written to without affecting the source table, but the snapshot uses the original table's data files.
+Existing data files are added to the Iceberg table's metadata and can be read using a name-to-id mapping created from the original table schema.
+
+When inserts or overwrites run on the snapshot, new files are placed in the snapshot table's location. The location is default to be the same as that
+of the source Delta Lake Table. Users can also specify a different location for the snapshot table.
+
+{{< hint info >}}
+Because tables created by `snapshotDeltaLakeTable` are not the sole owners of their data files, they are prohibited from
+actions like `expire_snapshots` which would physically delete data files. Iceberg deletes, which only effect metadata,
+are still allowed. In addition, any operations which affect the original data files will disrupt the Snapshot's
+integrity. DELETE statements executed against the original Delta Lake table will remove original data files and the
+`snapshotDeltaLakeTable` table will no longer be able to access them.
+{{< /hint >}}
+
+#### Arguments
+The delta table's location is required to be provided when initializing the action.
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+|`sourceTableLocation` | Yes | String | The location of the source Delta Lake table | 
+
+#### Configurations

Review Comment:
   I think we don't need to add these information in table, we can provide javadoc link



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1153987314


##########
docs/delta-lake-migration.md:
##########
@@ -0,0 +1,128 @@
+---
+title: "Delta Lake Migration"
+url: delta-lake-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 200
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Delta Lake Table Migration
+Iceberg supports converting Delta Lake tables to Iceberg tables through the `iceberg-delta-lake` module
+by using [Delta Standalone](https://docs.delta.io/latest/delta-standalone.html) to read logs of Delta lake tables.
+
+
+## Enabling Migration from Delta Lake to Iceberg
+The `iceberg-delta-lake` module is not bundled with Spark and Flink engine runtimes. Users need to manually include this module in their environment to enable the conversion.
+Also, users need to provide [Delta Standalone](https://github.com/delta-io/connectors/releases/tag/v0.6.0) and [Delta Storage](https://repo1.maven.org/maven2/io/delta/delta-storage/2.2.0/)
+because they are what `iceberg-delta-lake` uses to read Delta Lake tables.
+
+## Compatibilities
+The module is built and tested with `Delta Standalone:0.6.0` and supports Delta Lake tables with the following protocol version:
+* `minReaderVersion`: 1
+* `minWriterVersion`: 2
+
+Please refer to [Delta Lake Table Protocol Versioning](https://docs.delta.io/latest/versioning.html) for more details about Delta Lake protocol versions.
+
+For delta lake table contains `TimestampType` columns, please make sure to set table property `read.parquet.vectorization.enabled` to `false` since the vectorized reader doesn't support `INT96` yet.
+Such support is under development in the [PR#6962](https://github.com/apache/iceberg/pull/6962)
+
+## API
+The `iceberg-delta-lake` module provides an interface named `DeltaLakeToIcebergMigrationActionsProvider`, which contains actions that helps converting from Delta Lake to Iceberg.
+The supported actions are:
+* `snapshotDeltaLakeTable`: snapshot an existing Delta Lake table to an Iceberg table
+
+## Default Implementation
+The `iceberg-delta-lake` module also provides a default implementation of the interface which can be accessed by
+```java
+DeltaLakeToIcebergMigrationActionsProvider defaultActions = DeltaLakeToIcebergMigrationActionsProvider.defaultActions()
+```
+
+### `snapshotDeltaLakeTable`
+The action reads the Delta Lake table's most recent snapshot and converts it to a new Iceberg table with the same schema and partitioning in one iceberg transaction.
+The original Delta Lake table remains unchanged.
+
+The newly created table can be changed or written to without affecting the source table, but the snapshot uses the original table's data files.
+Existing data files are added to the Iceberg table's metadata and can be read using a name-to-id mapping created from the original table schema.
+
+When inserts or overwrites run on the snapshot, new files are placed in the snapshot table's location. The location is default to be the same as that
+of the source Delta Lake Table. Users can also specify a different location for the snapshot table.
+
+{{< hint info >}}
+Because tables created by `snapshotDeltaLakeTable` are not the sole owners of their data files, they are prohibited from
+actions like `expire_snapshots` which would physically delete data files. Iceberg deletes, which only effect metadata,
+are still allowed. In addition, any operations which affect the original data files will disrupt the Snapshot's
+integrity. DELETE statements executed against the original Delta Lake table will remove original data files and the
+`snapshotDeltaLakeTable` table will no longer be able to access them.
+{{< /hint >}}
+
+#### Arguments
+The delta table's location is required to be provided when initializing the action.
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+|`sourceTableLocation` | Yes | String | The location of the source Delta Lake table | 
+
+#### Configurations
+The configurations can be gave via method chaining
+
+| Method Name | Arguments      | Required? | Type                                       | Description                                                                                                  |
+|---------------------------|----------------|-----------|--------------------------------------------|--------------------------------------------------------------------------------------------------------------|
+| `as`                      | `identifier`   | Yes       | org.apache.iceberg.catalog.TableIdentifier | The identifier of the Iceberg table to be created.                                                           |
+| `icebergCatalog`          | `catalog`      | Yes       | org.apache.iceberg.catalog.Catalog         | The Iceberg catalog for the Iceberg table to be created                                                      |
+| `deltaLakeConfiguration`  | `conf`         | Yes       | org.apache.hadoop.conf.Configuration       | The Hadoop Configuration to access Delta Lake Table's log and datafiles                                      |
+| `tableLocation`           | `location`     | No        | String                                     | The location of the Iceberg table to be created. Defaults to the same location as the given Delta Lake table |
+| `tableProperty`           | `name`,`value` | No        | String, String                             | A property entry to add to the Iceberg table to be created                                                   |
+| `tableProperties`         | `properties`   | No        | Map<String, String>                        | Properties to add to the the Iceberg table to be created                                                     |
+
+#### Output
+| Output Name | Type | Description |
+| ------------|------|-------------|
+| `imported_files_count` | long | Number of files added to the new table |
+
+#### Added Table Properties
+The following table properties are added to the Iceberg table to be created by default:
+
+| Property Name                 | Value                                     | Description                                                        |
+|-------------------------------|-------------------------------------------|--------------------------------------------------------------------|
+| `snapshot_source`             | `delta`                                   | Indicates that the table is snapshot from a delta lake table       |
+| `original_location`           | location of the delta lake table          | The absolute path to the location of the original delta lake table |
+| `schema.name-mapping.default` | JSON name mapping derived from the schema | The name mapping string used to read Delta Lake table's data files |
+
+#### Examples
+Snapshot a Delta Lake table located at `s3://my-bucket/delta-table` to an Iceberg table named `my_table` in the `my_db` database in the `my_catalog` catalog.
+```java
+DeltaLakeToIcebergMigrationActionsProvider.defaultActions()
+    .snapshotDeltaLakeTable("s3://my-bucket/delta-table")
+    .as("my_db.my_table")

Review Comment:
   the code here seems incorrect? You need `TableIdentifier` here and `Catalog` object below



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1153997224


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Overview"
+url: table-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 100
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approach
+There are two primary methods for executing table migration: full data migration and in-place migration. In this doc, we will describe more about in-place migration.
+
+In-place migration preserves the existing data files while incorporating Iceberg metadata on top of them.
+This method is not only faster but also eliminates the need for data duplication, making it more appropriate for production environments.
+
+![In-Place Migration](../../../img/iceberg-in-place-migration.png)
+
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions: `Snapshot Table`, `Migrate Table`, and `Add Files`.
+
+## Snapshot Table
+The Snapshot Table action creates a new iceberg table with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+- Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table
+
+![Snapshot Table Step 1](../../../img/iceberg-snapshotaction-step1.png)
+
+- Commit all data files across all partitions to the new Iceberg table. The source table remains unchanged.
+
+![Snapshot Table Step 2](../../../img/iceberg-snapshotaction-step2.png)
+## Migrate Table
+The Migrate Table action also creates a new Iceberg table with the same schema and partitioning as the source table. However, during the action execution, it locks and drops the source table from the catalog.
+Consequently, Migrate Table requires all modifications working on the source table to be stopped before the action is performed.
+
+Stop all writers interacting with the source table. Readers that also supports Iceberg may continue reading.
+
+![Migrate Table Step 1](../../../img/iceberg-migrateaction-step1.png)
+
+- Create a new Iceberg table with the same identifier and metadata (schema, partition spec, etc.) as the source table. Rename the source table for a backup in case of failure and rollback.
+
+![Migrate Table Step 2](../../../img/iceberg-migrateaction-step2.png)
+
+- Commit all data files across all partitions to the new Iceberg table. Drop the source table.
+
+![Migrate Table Step 3](../../../img/iceberg-migrateaction-step3.png)
+## Add Files
+After the initial step (either Snapshot Table or Migrate Table), it is common to find some data files that have not been migrated. These files often originate from concurrent writers who continue writing to the source table during or after the migration process.
+In practice, these files can be new data files in Hive tables or new snapshots (versions) of Delta Lake tables. The Add Files action is essential for incorporating these files into the Iceberg table.
+
+## In-Place Migration Completion
+Once all data files have been migrated and there are no more concurrent writers writing to the source table, the migration process is complete.
+Readers and writers can now switch to the new Iceberg table for their operations.
+
+# Migrating From Different Table Formats
+## From Hive to Iceberg

Review Comment:
   based on our discussion, since Delta is in separated page, Hive can also be on a separated page



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1154000365


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Overview"
+url: table-migration
+menu:
+  main:
+    parent: "Migration"
+    weight: 100
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approach
+There are two primary methods for executing table migration: full data migration and in-place migration. In this doc, we will describe more about in-place migration.
+
+In-place migration preserves the existing data files while incorporating Iceberg metadata on top of them.
+This method is not only faster but also eliminates the need for data duplication, making it more appropriate for production environments.
+
+![In-Place Migration](../../../img/iceberg-in-place-migration.png)
+
+Apache Iceberg primarily supports the in-place migration approach, which includes three important actions: `Snapshot Table`, `Migrate Table`, and `Add Files`.
+
+## Snapshot Table
+The Snapshot Table action creates a new iceberg table with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.
+
+- Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table
+
+![Snapshot Table Step 1](../../../img/iceberg-snapshotaction-step1.png)
+
+- Commit all data files across all partitions to the new Iceberg table. The source table remains unchanged.
+
+![Snapshot Table Step 2](../../../img/iceberg-snapshotaction-step2.png)
+## Migrate Table
+The Migrate Table action also creates a new Iceberg table with the same schema and partitioning as the source table. However, during the action execution, it locks and drops the source table from the catalog.
+Consequently, Migrate Table requires all modifications working on the source table to be stopped before the action is performed.
+
+Stop all writers interacting with the source table. Readers that also supports Iceberg may continue reading.
+
+![Migrate Table Step 1](../../../img/iceberg-migrateaction-step1.png)
+
+- Create a new Iceberg table with the same identifier and metadata (schema, partition spec, etc.) as the source table. Rename the source table for a backup in case of failure and rollback.
+
+![Migrate Table Step 2](../../../img/iceberg-migrateaction-step2.png)
+
+- Commit all data files across all partitions to the new Iceberg table. Drop the source table.
+
+![Migrate Table Step 3](../../../img/iceberg-migrateaction-step3.png)
+## Add Files
+After the initial step (either Snapshot Table or Migrate Table), it is common to find some data files that have not been migrated. These files often originate from concurrent writers who continue writing to the source table during or after the migration process.
+In practice, these files can be new data files in Hive tables or new snapshots (versions) of Delta Lake tables. The Add Files action is essential for incorporating these files into the Iceberg table.
+
+## In-Place Migration Completion
+Once all data files have been migrated and there are no more concurrent writers writing to the source table, the migration process is complete.
+Readers and writers can now switch to the new Iceberg table for their operations.
+
+# Migrating From Different Table Formats

Review Comment:
   Let's just add those new page links under this section.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#issuecomment-1518892877

   Thanks for all the work Jonas, merging


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6600: Doc: Add a page explaining migration from other table formats to iceberg

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6600:
URL: https://github.com/apache/iceberg/pull/6600#discussion_r1150932612


##########
docs/table-migration.md:
##########
@@ -0,0 +1,89 @@
+---
+title: "Table Migration"
+url: table-migration
+weight: 1300
+menu: main
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+# Table Migration
+Apache Iceberg supports converting existing tables in other formats to Iceberg tables. This section introduces the general concept of table migration, its approaches, and existing implementations in Iceberg.
+
+## Migration Approaches
+There are two main approaches to perform table migration: CTAS (Create Table As Select) and in-place migration.

Review Comment:
   CTAS is a specific implementation, I think what we want to categorize are "Full data migration" versus "In-place metadata migration".



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org