You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/04/06 16:44:22 UTC

[GitHub] [iceberg] RussellSpitzer opened a new issue #2425: Migrate table's "Rename to backup" may cause datafile movement

RussellSpitzer opened a new issue #2425:
URL: https://github.com/apache/iceberg/issues/2425


   In Spark3 we've observed that doing a table rename can also move files if the table is a "managed" table. This means a migrated table will have all its files moved to the backup directory before the new migrated table. I'm not sure we can avoid this while still keeping the same logic we have in the migrate code. Something to think about or possibly warn users about?
   
   Example
   
   ```
   CREATE TABLE sample_parquet1 (
       id bigint COMMENT ‘unique id’,
       data string)
   USING parquet
   insert into sample_parquet1 values (1,‘one’),(2,‘two’),(3,‘three’)
   CALL spark_catalog.system.migrate(‘sample_parquet1’)
   select * from mycatalog.iceberg_eval.sample_parquet1.files
   0	s3://gbidl-us-west-2-dev-ac/repair-poc/curated/iceberg_eval/sample_parquet1_backup_/part-00000-3c3b57db-edae-4880-a322-987b38759f0c-c000.snappy.parquet	PARQUET	1	686	{1:71,2:50}	{1:1,2:1}	{1:0,2:0}	{}	{1:,2:one}	{1:,2:one}	NULL	NULL	NULL	0
   0	s3://gbidl-us-west-2-dev-ac/repair-poc/curated/iceberg_eval/sample_parquet1_backup_/part-00001-3c3b57db-edae-4880-a322-987b38759f0c-c000.snappy.parquet	PARQUET	2	687	{1:79,2:51}	{1:2,2:2}	{1:0,2:0}	{}	{1:,2:three}	{1:,2:two}	NULL	NULL	NULL	0
   Time taken: 7.579 seconds, Fetched 2 row(s)
   ```
   
   Thanks @dpaani


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #2425: Migrate table's "Rename to backup" may cause datafile movement

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #2425:
URL: https://github.com/apache/iceberg/issues/2425#issuecomment-815411826


   I think we could do that, or at least warn? Say if you would like to make this an iceberg table, please set a distinct location first. I'm not sure if you can do that though, I'll have to test in Spark3 to see


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on issue #2425: Migrate table's "Rename to backup" may cause datafile movement

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #2425:
URL: https://github.com/apache/iceberg/issues/2425#issuecomment-814504976


   What are our options here? Shall we prohibit migrating managed tables?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org