You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@iceberg.apache.org by Yufei Gu <fl...@gmail.com> on 2021/08/12 18:35:38 UTC

Iceberg disaster recovery and relative path sync-up

Here is a summary of yesterday's community sync-up.


Yufei gave a brief update on disaster recovery requirements and the current
progress of relative path approach.


Ryan: We all agreed that relative path is the way for disaster recovery.


*Multiple roots for the relative path*

Ryan proposed an idea to enable multiple roots for a table, basically, we
can add a list of roots in table metadata, and use a selector to choose
different roots when we move the table from one place to another. The
selector reads a property to decide which root to use. The property could
be either from catalog or the table metadata, which is yet to be decided.


Here is an example I’d image:

   1. Root1: hdfs://nn:8020/path/to/the/table
   2. Root2: s3://bucket1/path/to/the/table
   3. Root3: s3://bucket2/path/to/the/table

*Relative path use case*

We brainstormed use cases for relative paths. Please let us know if there
are any other use cases.

   1. Disaster Recovery
   2. Jack: AWS s3 bucket alias
   3. Ryan: fall-back use case. In case that the root1 doesn’t work, the
   table falls back to root2, then root3. As Russell mentioned, it is
   challenging to do snapshot expiration and other table maintenance actions.


*Timeline*

In terms of timeline, relative path could be a feature in Spec V3, since
Spec V1 and V2 assume absolute path in metadata.


*Misc*

   1. Miao: How is the relative path compatible with the absolute path?
   2. How do we migrate an existing table? Build a tool for that.

Please let us know if you have any ideas, questions, or concerns.



Yufei

Re: [CWS] Re: Iceberg disaster recovery and relative path sync-up

Posted by Ryan Blue <bl...@tabular.io>.

Anjali, my thoughts are inline below:

On Mon, Aug 23, 2021 at 1:14 PM Anjali Norwood <an...@netflix.com.invalid>
wrote:

> *"The more I think about this, the more I like the solution to add
> multiple table roots to metadata, rather than removing table roots. Adding
> a way to plug in a root selector makes a lot of sense to me and it ensures
> that the metadata is complete (table location is set in metadata) and that
> multiple locations can be used. Are there any objections or arguments
> against doing it that way?"*
>
> In the context of your comment on multiple locations above, I am thinking
> about the following scenarios:
> 1) Disaster recovery or low-latency use case where clients connect to the
> region geographically closest to them: In this case, multiple table roots
> represent a copy of the table, the copies may or may not be in sync. (Most
> likely active-active replication would be set up in this case and the
> copies are near-identical). A root level selector works/makes sense.
> 2) Data residency requirement: data must not leave a country/region. In
> this case, federation of data from multiple roots constitutes the entirety
> of the table.
> 3) One can also imagine combinations of 1 and 2 above where some locations
> need to be federated and some locations have data replicated from other
> locations.
>
> Curious how the 2nd and 3rd scenarios would be supported with this design.
>

#2 is currently done by adding a custom LocationProvider that creates file
locations for data files. Stripe created one that will map a partition to a
bucket for use cases like this.

When combining #2 and #1, you'd have a service that is responsible for
mirroring the table and making the copies. You'd have to keep logic in that
service to mirror only certain buckets or to mirror buckets to specific
places. So if I have buckets region1 and region2, I might only mirror
region1 or maybe I'd mirror region1 to region3-in-same-country-as-region1
and region2 to region4-in-same-country-as-region2. From there, you'd be
able to choose whether to have all 3 or 4 buckets as table roots and how to
resolve paths.

Layering on relative paths, if you're mirroring only region1 then it might
make sense for region1 paths to be relative and region2 paths to be
absolute. Then you could have multiple table locations that work for
relative region1 paths, but region2 is always in the same place. If you
were mirroring both regions, then it might make more sense to not use
relative paths at all and simply rewrite URIs in FileIO for fallback.

Ryan

Re: Iceberg disaster recovery and relative path sync-up

Posted by Anurag Mantripragada <am...@apple.com.INVALID>.

Also, in S3’s case, my understanding is that instead of write.object-storage.path/write.data.path, users must now make sure that the location prefix must be short to get the benefits of appending a hash to the data paths. For example, a large prefix like "s3://somebucket/region/timestamp/folder/inside/another/folder <s3://somebucket/region/timestamp/folder/inside/another/folder>" may not give the benefits of hash. Is this understanding correct?

Thanks, 
Anurag

Re: Iceberg disaster recovery and relative path sync-up

Posted by Anurag Mantripragada <am...@apple.com.INVALID>.

Hi Russell, 

I don’t have see any major issues with your approach other than that it may break some custimizability of locations. If I understand correctly, today write.object-storage.path or write.metadata.path can be outside of the table base location. With your suggestion, are we saying that data and metadata must always reside inside the prefix? We can do that but for S3 locations, users will have to make sure the prefix stays small.

Another point that Yufei mentioned is that this will assume that the folder layout after the prefix must remain same for S3 and HDFS. This may not be true in real world. 


Regards, 
Anurag

> On Sep 22, 2021, at 9:46 AM, Russell Spitzer <ru...@gmail.com> wrote:
> 
> During a sync with Yufei and Anurag I had some thought on this proposal that I wanted to share with the wider group. As Yufei has perviously noted, I'm worried about the alternative configuration parameters like (folder-storage, object-storage). Specifically i'm thinking about the issue of moving a table from S3 to HDFS and vice versa. The table may be using table-location and object-storage path on S3 so we need something that correctly relatives those paths as well.
> 
> I was considering that rather than holding "table-location" as our root path prefix we use a separate "prefix" parameter which is used on all relative path options. For example, instead of storing locations
> 
> 1: S3://bucket/tableLocation
> 2: HDFS://clusterx/tableLocation
> 
> We store prefixes
> 
> 1: S3://Bucket
> 2: HDFS://clusterx
> 
> Then whenever object-storage is set with a relative path we prefix this before writing. The files will be stored with the prefix and be marked in the metadata using only the path relative to the prefix and not the absolute path.  This lets us move our data between buckets or to an HDFS cluster while also being able to use the object-storage path.
> 
>> On Sep 17, 2021, at 6:03 PM, Anurag Mantripragada <am...@apple.com.INVALID> wrote:
>> 
>> Hi everyone, 
>> 
>> 
>> Thanks for sharing your ideas and suggestions on this thread. I believe we have consensus on supporting multiple roots for a table and storing relative paths in metadata. We can start by adding this support in the initial phase. Yufei and I have updated the design doc[1] with these details of this initial support. Please share your feedback.
>> 
>> In the next phase, we can look at complex use-cases and support them as needed.
>> 
>> [1] - https://docs.google.com/document/d/1RDEjJAVEXg1csRzyzTuM634L88vvI0iDHNQQK3kOVR0/edit#heading=h.hxmtkjthp8hm
>> 
>> Thanks, 
>> Anurag
>> 
>> 
>

Re: Iceberg disaster recovery and relative path sync-up

Posted by Russell Spitzer <ru...@gmail.com>.

During a sync with Yufei and Anurag I had some thought on this proposal that I wanted to share with the wider group. As Yufei has perviously noted, I'm worried about the alternative configuration parameters like (folder-storage, object-storage). Specifically i'm thinking about the issue of moving a table from S3 to HDFS and vice versa. The table may be using table-location and object-storage path on S3 so we need something that correctly relatives those paths as well.

I was considering that rather than holding "table-location" as our root path prefix we use a separate "prefix" parameter which is used on all relative path options. For example, instead of storing locations

1: S3://bucket/tableLocation
2: HDFS://clusterx/tableLocation

We store prefixes

1: S3://Bucket
2: HDFS://clusterx

Then whenever object-storage is set with a relative path we prefix this before writing. The files will be stored with the prefix and be marked in the metadata using only the path relative to the prefix and not the absolute path.  This lets us move our data between buckets or to an HDFS cluster while also being able to use the object-storage path.

> On Sep 17, 2021, at 6:03 PM, Anurag Mantripragada <am...@apple.com.INVALID> wrote:
> 
> Hi everyone, 
> 
> 
> Thanks for sharing your ideas and suggestions on this thread. I believe we have consensus on supporting multiple roots for a table and storing relative paths in metadata. We can start by adding this support in the initial phase. Yufei and I have updated the design doc[1] with these details of this initial support. Please share your feedback.
> 
> In the next phase, we can look at complex use-cases and support them as needed.
> 
> [1] - https://docs.google.com/document/d/1RDEjJAVEXg1csRzyzTuM634L88vvI0iDHNQQK3kOVR0/edit#heading=h.hxmtkjthp8hm
> 
> Thanks, 
> Anurag
> 
>

Re: Iceberg disaster recovery and relative path sync-up

Posted by Anurag Mantripragada <am...@apple.com.INVALID>.

Hi everyone, 


Thanks for sharing your ideas and suggestions on this thread. I believe we have consensus on supporting multiple roots for a table and storing relative paths in metadata. We can start by adding this support in the initial phase. Yufei and I have updated the design doc[1] with these details of this initial support. Please share your feedback.

In the next phase, we can look at complex use-cases and support them as needed.

[1] - https://docs.google.com/document/d/1RDEjJAVEXg1csRzyzTuM634L88vvI0iDHNQQK3kOVR0/edit#heading=h.hxmtkjthp8hm

Thanks, 
Anurag

Re: Iceberg disaster recovery and relative path sync-up

Posted by Ryan Blue <bl...@tabular.io>.

Yufei, answers inline:

On Mon, Aug 23, 2021 at 4:06 PM Yufei Gu <fl...@gmail.com> wrote:

> @Ryan, how do these properties work with multiple table locations?
>
>    1.
>
>    write.metadata.path
>    2.
>
>    write.folder-storage.path
>    3.
>
>    write.object-storage.path
>
> The current logic with single table location is to honor these properties
> on top of table location. In case of multiple roots(table locations), we
> cannot mix them. For example, write.object-storage.path and table
> location should be in the same region for the DR use case.
>

I think that we continue to honor those settings and relativize anything
that is within the table location. For example, if I have
`location=s3://bucket/db.table/` and
`write.object-storage.path=s3://bucket/db.table/objects` then the paths
would be something like
`objects/987ae79d/ts_day=2021-08-18/00000-1-ea98234-1b34-4098-1298f40981.parquet`.
If the object storage path was outside the table location, then all paths
would be absolute.

> One of solutions to keep the similar logic is that we support all
> properties for each root, like this
>
>
> 1. Root1: (table location1, metadata path1, folder-storage path1,
> object-storage path1)
>
> 2. Root2: (table location2, metadata path2, folder-storage path2,
> object-storage path2)
>
> 3. Root3: (table location3, null, null, null)
> What do you think?
>

This is something that is up to the catalog/table. Those properties are
used by the built-in `LocationProvider` implementations, but there isn't
anything that requires using them. You could build a system that does this.

> Ryan

-- 
Ryan Blue
Tabular

Re: Iceberg disaster recovery and relative path sync-up

Posted by Ryan Blue <bl...@tabular.io>.

Jack, I agree with just about everything you've said.

On Sun, Aug 29, 2021 at 2:13 PM Jack Ye <ye...@gmail.com> wrote:

> trying to catch up with the conversation here, just typing some of my
> thought process:
>
> Based on my understanding, there are in general 2 use cases:
>
> 1. multiple data copies in different physical storages, which includes:
>
> 1.1 disaster recovery: if 1 storage is completely down, all access needs
> to be redirected another storage
> 1.2 multi-tier data access: a table is stored in multiple tiers, such as a
> fast single-availability-zone (AZ) layer, a common multi-AZ layer, and a
> slow but cheap archival layer
> 1.3 table migration: user can manually migrate a table to point to a
> completely different location with completely different set of data files
>
> 2. different access points referring to the same storage, which includes:
>
> 2.1 access point: different names are pointed to the same physical storage
> with different access policies.
>
>
> I think we mostly reached the consensus that multiple root locations are
> needed.
> Now we need to ask the question of "Can a user access the storage from
> catalog to at least get the table metadata and read all the different root
> locations".
> I think the answer is Yes for 1.2 and 1.3, but No for 1.1 and 2.1.
> Therefore, we need to deal with those 2 situations separately.
>
> For 1.1 and 2.1, the metadata cannot be accessed by the user unless an
> alternative root bucket/storage endpoint is provided at catalog level. So
> this information should be put as a part of the catalog API.
> For 1.2 and 1.3, we need some more details about the characteristics of
> the root location, specifically:
> 1. if the location is read only or it can also be written to. Typically we
> should only allow 1 single "master" write location to avoid complicated
> situations of data in different locations out of sync, unless certain
> storage supports a bi-directional sync.
> 2. how should a reader/writer choose which root to access. To achieve
> that, I think we need some root location resolver interface, with some
> implementations like RegionBasedRootLocationResolver,
> AccessPolicyBasedRootLocationResolver, etc. (this is just my brainstorming)
>
> -Jack Ye
>
>
>
>
>
>
> On Mon, Aug 23, 2021 at 4:06 PM Yufei Gu <fl...@gmail.com> wrote:
>
>> @Ryan, how do these properties work with multiple table locations?
>>
>>    1.
>>
>>    write.metadata.path
>>    2.
>>
>>    write.folder-storage.path
>>    3.
>>
>>    write.object-storage.path
>>
>> The current logic with single table location is to honor these properties
>> on top of table location. In case of multiple roots(table locations), we
>> cannot mix them. For example, write.object-storage.path and table
>> location should be in the same region for the DR use case.
>>
>>
>> One of solutions to keep the similar logic is that we support all
>> properties for each root, like this
>>
>>
>> 1. Root1: (table location1, metadata path1, folder-storage path1,
>> object-storage path1)
>>
>> 2. Root2: (table location2, metadata path2, folder-storage path2,
>> object-storage path2)
>>
>> 3. Root3: (table location3, null, null, null)
>> What do you think?
>>
>> @Anjali, not sure supporting federation by using multiple roots is a good
>> idea. Can we create a partition with a specific location to distinguish
>> data between regions?
>>
>> On Mon, Aug 23, 2021 at 1:14 PM Anjali Norwood
>> <an...@netflix.com.invalid> wrote:
>>
>>> Hi Ryan, All,
>>>
>>> *"The more I think about this, the more I like the solution to add
>>> multiple table roots to metadata, rather than removing table roots. Adding
>>> a way to plug in a root selector makes a lot of sense to me and it ensures
>>> that the metadata is complete (table location is set in metadata) and that
>>> multiple locations can be used. Are there any objections or arguments
>>> against doing it that way?"*
>>>
>>> In the context of your comment on multiple locations above, I am
>>> thinking about the following scenarios:
>>> 1) Disaster recovery or low-latency use case where clients connect to
>>> the region geographically closest to them: In this case, multiple table
>>> roots represent a copy of the table, the copies may or may not be in sync.
>>> (Most likely active-active replication would be set up in this case and the
>>> copies are near-identical). A root level selector works/makes sense.
>>> 2) Data residency requirement: data must not leave a country/region. In
>>> this case, federation of data from multiple roots constitutes the entirety
>>> of the table.
>>> 3) One can also imagine combinations of 1 and 2 above where some
>>> locations need to be federated and some locations have data replicated from
>>> other locations.
>>>
>>> Curious how the 2nd and 3rd scenarios would be supported with this
>>> design.
>>>
>>> regards,
>>> Anjali.
>>>
>>>
>>>
>>> On Sun, Aug 22, 2021 at 11:22 PM Peter Vary <pv...@cloudera.com.invalid>
>>> wrote:
>>>
>>>>
>>>> @Ryan: If I understand correctly, currently there is a possibility to
>>>> change the root location of the table, and it will not change/move the old
>>>> data/metadata files created before the change, only the new data/metadata
>>>> files will be created in the new location.
>>>>
>>>> Are we planning to make sure that the tables with relative paths will
>>>> always contain every data/metadata file in single root folder?
>>>>
>>>> Here is a few scenarios which I am thinking about:
>>>> 1. Table T1 is created with relative path in HadoopCatalog/HiveCatalog
>>>> where the root location is L1, and this location is generated from the
>>>> TableIdentifier (at least in creation time)
>>>> 2. Data inserted to the table, so data files are created under L1, and
>>>> the metadata files contain R1 relative path to the current L1.
>>>> 3a. Table location is updated to L2 (Hive: ALTER TABLE T1 SET LOCATION
>>>> L2)
>>>> 3b. Table renamed to T2 (Hive: ALTER TABLE T1 RENAME TO T2) - Table
>>>> location is updated for Hive tables if the old location was the default
>>>> location.
>>>> 4. When we try to read this table we should read the old data/metadata
>>>> files as well.
>>>>
>>>> So in both cases we have to move the old data/metadata files around
>>>> like Hive does for the native tables, and for the tables with relative
>>>> paths we do not have to change the metadata other than the root path? Will
>>>> we do the same thing with other engines as well?
>>>>
>>>> Thanks,
>>>> Peter
>>>>
>>>> On Mon, 23 Aug 2021, 06:38 Yufei Gu, <fl...@gmail.com> wrote:
>>>>
>>>>> Steven, here is my understanding. It depends on whether you want to
>>>>> move the data. In the DR case, we do move the data, we expect data to be
>>>>> identical from time to time, but not always be. In the case of S3 aliases,
>>>>> different roots actually point to the same location, there is no data move,
>>>>> and data is identical for sure.
>>>>>
>>>>> On Sun, Aug 22, 2021 at 8:19 PM Steven Wu <st...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> For the multiple table roots, do we expect or ensure that the data
>>>>>> are identical across the different roots? or this is best-effort background
>>>>>> synchronization across the different roots?
>>>>>>
>>>>>> On Sun, Aug 22, 2021 at 11:53 AM Ryan Blue <bl...@tabular.io> wrote:
>>>>>>
>>>>>>> Peter, I think that this feature would be useful when moving tables
>>>>>>> between root locations or when you want to maintain multiple root
>>>>>>> locations. Renames are orthogonal because a rename doesn't change the table
>>>>>>> location. You may want to move the table after a rename, and this would
>>>>>>> help in that case. But actually moving data is optional. That's why we put
>>>>>>> the table location in metadata.
>>>>>>>
>>>>>>> Anjali, DR is a big use case, but we also talked about directing
>>>>>>> accesses through other URLs, like S3 access points, table migration (like
>>>>>>> the rename case), and background data migration (e.g. lifting files between
>>>>>>> S3 regions). There are a few uses for it.
>>>>>>>
>>>>>>> The more I think about this, the more I like the solution to add
>>>>>>> multiple table roots to metadata, rather than removing table roots. Adding
>>>>>>> a way to plug in a root selector makes a lot of sense to me and it ensures
>>>>>>> that the metadata is complete (table location is set in metadata) and that
>>>>>>> multiple locations can be used. Are there any objections or arguments
>>>>>>> against doing it that way?
>>>>>>>
>>>>>>> On Fri, Aug 20, 2021 at 9:00 AM Anjali Norwood
>>>>>>> <an...@netflix.com.invalid> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> This thread is about disaster recovery and relative paths, but I
>>>>>>>> wanted to ask an orthogonal but related question.
>>>>>>>> Do we see disaster recovery as the only (or main) use case for
>>>>>>>> multi-region?
>>>>>>>> Is data residency requirement a use case for anybody? Is it
>>>>>>>> possible to shard an iceberg table across regions? How is the location
>>>>>>>> managed in that case?
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>> Anjali.
>>>>>>>>
>>>>>>>> On Fri, Aug 20, 2021 at 12:20 AM Peter Vary
>>>>>>>> <pv...@cloudera.com.invalid> wrote:
>>>>>>>>
>>>>>>>>> Sadly, I have missed the meeting :(
>>>>>>>>>
>>>>>>>>> Quick question:
>>>>>>>>> Was table rename / location change discussed for tables with
>>>>>>>>> relative paths?
>>>>>>>>>
>>>>>>>>> AFAIK when a table rename happens then we do not move old data /
>>>>>>>>> metadata files, we just change the root location of the new data / metadata
>>>>>>>>> files. If I am correct about this then we might need to handle this
>>>>>>>>> differently for tables with relative paths.
>>>>>>>>>
>>>>>>>>> Thanks, Peter
>>>>>>>>>
>>>>>>>>> On Fri, 13 Aug 2021, 15:12 Anjali Norwood,
>>>>>>>>> <an...@netflix.com.invalid> wrote:
>>>>>>>>>
>>>>>>>>>> Perfect, thank you Yufei.
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>> Anjali
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 12, 2021 at 9:58 PM Yufei Gu <fl...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Anjali,
>>>>>>>>>>>
>>>>>>>>>>> Inline...
>>>>>>>>>>> On Thu, Aug 12, 2021 at 5:31 PM Anjali Norwood
>>>>>>>>>>> <an...@netflix.com.invalid> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the summary Yufei.
>>>>>>>>>>>> Sorry, if this was already discussed, I missed the meeting
>>>>>>>>>>>> yesterday.
>>>>>>>>>>>> Is there anything in the design that would prevent multiple
>>>>>>>>>>>> roots from being in different aws regions?
>>>>>>>>>>>>
>>>>>>>>>>> No. DR is the major use case of relative paths, if not the only
>>>>>>>>>>> one. So, it will support roots in different regions.
>>>>>>>>>>>
>>>>>>>>>>> For disaster recovery in the case of an entire aws region down
>>>>>>>>>>>> or slow, is metastore still a point of failure or can metastore be stood up
>>>>>>>>>>>> in a different region and could select a different root?
>>>>>>>>>>>>
>>>>>>>>>>> Normally, DR also requires a backup metastore, besides the
>>>>>>>>>>> storage(s3 bucket). In that case, the backup metastore will be in a
>>>>>>>>>>> different region along with the table files. For example, the primary table
>>>>>>>>>>> is located in region A as well as its metastore, the backup table is
>>>>>>>>>>> located in region B as well as its metastore. The primary table root points
>>>>>>>>>>> to a path in region A, while backup table root points to a path in region B.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> regards,
>>>>>>>>>>>> Anjali.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Aug 12, 2021 at 11:35 AM Yufei Gu <fl...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Here is a summary of yesterday's community sync-up.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yufei gave a brief update on disaster recovery requirements
>>>>>>>>>>>>> and the current progress of relative path approach.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ryan: We all agreed that relative path is the way for disaster
>>>>>>>>>>>>> recovery.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Multiple roots for the relative path*
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ryan proposed an idea to enable multiple roots for a table,
>>>>>>>>>>>>> basically, we can add a list of roots in table metadata, and use a selector
>>>>>>>>>>>>> to choose different roots when we move the table from one place to another.
>>>>>>>>>>>>> The selector reads a property to decide which root to use. The property
>>>>>>>>>>>>> could be either from catalog or the table metadata, which is yet to be
>>>>>>>>>>>>> decided.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is an example I’d image:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    1. Root1: hdfs://nn:8020/path/to/the/table
>>>>>>>>>>>>>    2. Root2: s3://bucket1/path/to/the/table
>>>>>>>>>>>>>    3. Root3: s3://bucket2/path/to/the/table
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Relative path use case*
>>>>>>>>>>>>>
>>>>>>>>>>>>> We brainstormed use cases for relative paths. Please let us
>>>>>>>>>>>>> know if there are any other use cases.
>>>>>>>>>>>>>
>>>>>>>>>>>>>    1. Disaster Recovery
>>>>>>>>>>>>>    2. Jack: AWS s3 bucket alias
>>>>>>>>>>>>>    3. Ryan: fall-back use case. In case that the root1
>>>>>>>>>>>>>    doesn’t work, the table falls back to root2, then root3. As Russell
>>>>>>>>>>>>>    mentioned, it is challenging to do snapshot expiration and other table
>>>>>>>>>>>>>    maintenance actions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Timeline*
>>>>>>>>>>>>>
>>>>>>>>>>>>> In terms of timeline, relative path could be a feature in Spec
>>>>>>>>>>>>> V3, since Spec V1 and V2 assume absolute path in metadata.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Misc*
>>>>>>>>>>>>>
>>>>>>>>>>>>>    1. Miao: How is the relative path compatible with the
>>>>>>>>>>>>>    absolute path?
>>>>>>>>>>>>>    2. How do we migrate an existing table? Build a tool for
>>>>>>>>>>>>>    that.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please let us know if you have any ideas, questions, or
>>>>>>>>>>>>> concerns.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yufei
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ryan Blue
>>>>>>> Tabular
>>>>>>>
>>>>>>

-- 
Ryan Blue
Tabular

Re: Iceberg disaster recovery and relative path sync-up

Posted by Yufei Gu <fl...@gmail.com>.

Jack and Ryan, one valid root at a time looks straightforward and good
enough for use cases like DR and certain table migration cases.


Here are questions for multiple valid roots at a time, which is needed by
federation use case, multiple storage tiers use case.


   1. Metadata sync-up questions
      1. Do we keep multiple sets of version files under different roots? I
      assume yes because the table will be invalid if not when switching roots.
      2. Do we sync-up all metadata files across roots? I assume we at
      least need to sync version files, otherwise, it'd be better to consider
      they are different tables.
      3. How do we sync metadata to prevent any inconsistency?
   2. Is there an alternative way to do that? Iceberg has a very nice
   feature that each individual data file could be in any location, e.g.
   regions or tiers. But, switching a root to read files from different
   storage layers is not flexible, since the switching root applies to all
   metadata files and data files of a table.


Best,

Yufei


On Sun, Aug 29, 2021 at 2:13 PM Jack Ye <ye...@gmail.com> wrote:

> trying to catch up with the conversation here, just typing some of my
> thought process:
>
> Based on my understanding, there are in general 2 use cases:
>
> 1. multiple data copies in different physical storages, which includes:
>
> 1.1 disaster recovery: if 1 storage is completely down, all access needs
> to be redirected another storage
> 1.2 multi-tier data access: a table is stored in multiple tiers, such as a
> fast single-availability-zone (AZ) layer, a common multi-AZ layer, and a
> slow but cheap archival layer
> 1.3 table migration: user can manually migrate a table to point to a
> completely different location with completely different set of data files
>
> 2. different access points referring to the same storage, which includes:
>
> 2.1 access point: different names are pointed to the same physical storage
> with different access policies.
>
>
> I think we mostly reached the consensus that multiple root locations are
> needed.
> Now we need to ask the question of "Can a user access the storage from
> catalog to at least get the table metadata and read all the different root
> locations".
> I think the answer is Yes for 1.2 and 1.3, but No for 1.1 and 2.1.
> Therefore, we need to deal with those 2 situations separately.
>
> For 1.1 and 2.1, the metadata cannot be accessed by the user unless an
> alternative root bucket/storage endpoint is provided at catalog level. So
> this information should be put as a part of the catalog API.
> For 1.2 and 1.3, we need some more details about the characteristics of
> the root location, specifically:
> 1. if the location is read only or it can also be written to. Typically we
> should only allow 1 single "master" write location to avoid complicated
> situations of data in different locations out of sync, unless certain
> storage supports a bi-directional sync.
> 2. how should a reader/writer choose which root to access. To achieve
> that, I think we need some root location resolver interface, with some
> implementations like RegionBasedRootLocationResolver,
> AccessPolicyBasedRootLocationResolver, etc. (this is just my brainstorming)
>
> -Jack Ye
>
>
>
>
>
>
> On Mon, Aug 23, 2021 at 4:06 PM Yufei Gu <fl...@gmail.com> wrote:
>
>> @Ryan, how do these properties work with multiple table locations?
>>
>>    1.
>>
>>    write.metadata.path
>>    2.
>>
>>    write.folder-storage.path
>>    3.
>>
>>    write.object-storage.path
>>
>> The current logic with single table location is to honor these properties
>> on top of table location. In case of multiple roots(table locations), we
>> cannot mix them. For example, write.object-storage.path and table
>> location should be in the same region for the DR use case.
>>
>>
>> One of solutions to keep the similar logic is that we support all
>> properties for each root, like this
>>
>>
>> 1. Root1: (table location1, metadata path1, folder-storage path1,
>> object-storage path1)
>>
>> 2. Root2: (table location2, metadata path2, folder-storage path2,
>> object-storage path2)
>>
>> 3. Root3: (table location3, null, null, null)
>> What do you think?
>>
>> @Anjali, not sure supporting federation by using multiple roots is a good
>> idea. Can we create a partition with a specific location to distinguish
>> data between regions?
>>
>> On Mon, Aug 23, 2021 at 1:14 PM Anjali Norwood
>> <an...@netflix.com.invalid> wrote:
>>
>>> Hi Ryan, All,
>>>
>>> *"The more I think about this, the more I like the solution to add
>>> multiple table roots to metadata, rather than removing table roots. Adding
>>> a way to plug in a root selector makes a lot of sense to me and it ensures
>>> that the metadata is complete (table location is set in metadata) and that
>>> multiple locations can be used. Are there any objections or arguments
>>> against doing it that way?"*
>>>
>>> In the context of your comment on multiple locations above, I am
>>> thinking about the following scenarios:
>>> 1) Disaster recovery or low-latency use case where clients connect to
>>> the region geographically closest to them: In this case, multiple table
>>> roots represent a copy of the table, the copies may or may not be in sync.
>>> (Most likely active-active replication would be set up in this case and the
>>> copies are near-identical). A root level selector works/makes sense.
>>> 2) Data residency requirement: data must not leave a country/region. In
>>> this case, federation of data from multiple roots constitutes the entirety
>>> of the table.
>>> 3) One can also imagine combinations of 1 and 2 above where some
>>> locations need to be federated and some locations have data replicated from
>>> other locations.
>>>
>>> Curious how the 2nd and 3rd scenarios would be supported with this
>>> design.
>>>
>>> regards,
>>> Anjali.
>>>
>>>
>>>
>>> On Sun, Aug 22, 2021 at 11:22 PM Peter Vary <pv...@cloudera.com.invalid>
>>> wrote:
>>>
>>>>
>>>> @Ryan: If I understand correctly, currently there is a possibility to
>>>> change the root location of the table, and it will not change/move the old
>>>> data/metadata files created before the change, only the new data/metadata
>>>> files will be created in the new location.
>>>>
>>>> Are we planning to make sure that the tables with relative paths will
>>>> always contain every data/metadata file in single root folder?
>>>>
>>>> Here is a few scenarios which I am thinking about:
>>>> 1. Table T1 is created with relative path in HadoopCatalog/HiveCatalog
>>>> where the root location is L1, and this location is generated from the
>>>> TableIdentifier (at least in creation time)
>>>> 2. Data inserted to the table, so data files are created under L1, and
>>>> the metadata files contain R1 relative path to the current L1.
>>>> 3a. Table location is updated to L2 (Hive: ALTER TABLE T1 SET LOCATION
>>>> L2)
>>>> 3b. Table renamed to T2 (Hive: ALTER TABLE T1 RENAME TO T2) - Table
>>>> location is updated for Hive tables if the old location was the default
>>>> location.
>>>> 4. When we try to read this table we should read the old data/metadata
>>>> files as well.
>>>>
>>>> So in both cases we have to move the old data/metadata files around
>>>> like Hive does for the native tables, and for the tables with relative
>>>> paths we do not have to change the metadata other than the root path? Will
>>>> we do the same thing with other engines as well?
>>>>
>>>> Thanks,
>>>> Peter
>>>>
>>>> On Mon, 23 Aug 2021, 06:38 Yufei Gu, <fl...@gmail.com> wrote:
>>>>
>>>>> Steven, here is my understanding. It depends on whether you want to
>>>>> move the data. In the DR case, we do move the data, we expect data to be
>>>>> identical from time to time, but not always be. In the case of S3 aliases,
>>>>> different roots actually point to the same location, there is no data move,
>>>>> and data is identical for sure.
>>>>>
>>>>> On Sun, Aug 22, 2021 at 8:19 PM Steven Wu <st...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> For the multiple table roots, do we expect or ensure that the data
>>>>>> are identical across the different roots? or this is best-effort background
>>>>>> synchronization across the different roots?
>>>>>>
>>>>>> On Sun, Aug 22, 2021 at 11:53 AM Ryan Blue <bl...@tabular.io> wrote:
>>>>>>
>>>>>>> Peter, I think that this feature would be useful when moving tables
>>>>>>> between root locations or when you want to maintain multiple root
>>>>>>> locations. Renames are orthogonal because a rename doesn't change the table
>>>>>>> location. You may want to move the table after a rename, and this would
>>>>>>> help in that case. But actually moving data is optional. That's why we put
>>>>>>> the table location in metadata.
>>>>>>>
>>>>>>> Anjali, DR is a big use case, but we also talked about directing
>>>>>>> accesses through other URLs, like S3 access points, table migration (like
>>>>>>> the rename case), and background data migration (e.g. lifting files between
>>>>>>> S3 regions). There are a few uses for it.
>>>>>>>
>>>>>>> The more I think about this, the more I like the solution to add
>>>>>>> multiple table roots to metadata, rather than removing table roots. Adding
>>>>>>> a way to plug in a root selector makes a lot of sense to me and it ensures
>>>>>>> that the metadata is complete (table location is set in metadata) and that
>>>>>>> multiple locations can be used. Are there any objections or arguments
>>>>>>> against doing it that way?
>>>>>>>
>>>>>>> On Fri, Aug 20, 2021 at 9:00 AM Anjali Norwood
>>>>>>> <an...@netflix.com.invalid> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> This thread is about disaster recovery and relative paths, but I
>>>>>>>> wanted to ask an orthogonal but related question.
>>>>>>>> Do we see disaster recovery as the only (or main) use case for
>>>>>>>> multi-region?
>>>>>>>> Is data residency requirement a use case for anybody? Is it
>>>>>>>> possible to shard an iceberg table across regions? How is the location
>>>>>>>> managed in that case?
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>> Anjali.
>>>>>>>>
>>>>>>>> On Fri, Aug 20, 2021 at 12:20 AM Peter Vary
>>>>>>>> <pv...@cloudera.com.invalid> wrote:
>>>>>>>>
>>>>>>>>> Sadly, I have missed the meeting :(
>>>>>>>>>
>>>>>>>>> Quick question:
>>>>>>>>> Was table rename / location change discussed for tables with
>>>>>>>>> relative paths?
>>>>>>>>>
>>>>>>>>> AFAIK when a table rename happens then we do not move old data /
>>>>>>>>> metadata files, we just change the root location of the new data / metadata
>>>>>>>>> files. If I am correct about this then we might need to handle this
>>>>>>>>> differently for tables with relative paths.
>>>>>>>>>
>>>>>>>>> Thanks, Peter
>>>>>>>>>
>>>>>>>>> On Fri, 13 Aug 2021, 15:12 Anjali Norwood,
>>>>>>>>> <an...@netflix.com.invalid> wrote:
>>>>>>>>>
>>>>>>>>>> Perfect, thank you Yufei.
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>> Anjali
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 12, 2021 at 9:58 PM Yufei Gu <fl...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Anjali,
>>>>>>>>>>>
>>>>>>>>>>> Inline...
>>>>>>>>>>> On Thu, Aug 12, 2021 at 5:31 PM Anjali Norwood
>>>>>>>>>>> <an...@netflix.com.invalid> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the summary Yufei.
>>>>>>>>>>>> Sorry, if this was already discussed, I missed the meeting
>>>>>>>>>>>> yesterday.
>>>>>>>>>>>> Is there anything in the design that would prevent multiple
>>>>>>>>>>>> roots from being in different aws regions?
>>>>>>>>>>>>
>>>>>>>>>>> No. DR is the major use case of relative paths, if not the only
>>>>>>>>>>> one. So, it will support roots in different regions.
>>>>>>>>>>>
>>>>>>>>>>> For disaster recovery in the case of an entire aws region down
>>>>>>>>>>>> or slow, is metastore still a point of failure or can metastore be stood up
>>>>>>>>>>>> in a different region and could select a different root?
>>>>>>>>>>>>
>>>>>>>>>>> Normally, DR also requires a backup metastore, besides the
>>>>>>>>>>> storage(s3 bucket). In that case, the backup metastore will be in a
>>>>>>>>>>> different region along with the table files. For example, the primary table
>>>>>>>>>>> is located in region A as well as its metastore, the backup table is
>>>>>>>>>>> located in region B as well as its metastore. The primary table root points
>>>>>>>>>>> to a path in region A, while backup table root points to a path in region B.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> regards,
>>>>>>>>>>>> Anjali.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Aug 12, 2021 at 11:35 AM Yufei Gu <fl...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Here is a summary of yesterday's community sync-up.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yufei gave a brief update on disaster recovery requirements
>>>>>>>>>>>>> and the current progress of relative path approach.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ryan: We all agreed that relative path is the way for disaster
>>>>>>>>>>>>> recovery.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Multiple roots for the relative path*
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ryan proposed an idea to enable multiple roots for a table,
>>>>>>>>>>>>> basically, we can add a list of roots in table metadata, and use a selector
>>>>>>>>>>>>> to choose different roots when we move the table from one place to another.
>>>>>>>>>>>>> The selector reads a property to decide which root to use. The property
>>>>>>>>>>>>> could be either from catalog or the table metadata, which is yet to be
>>>>>>>>>>>>> decided.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is an example I’d image:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    1. Root1: hdfs://nn:8020/path/to/the/table
>>>>>>>>>>>>>    2. Root2: s3://bucket1/path/to/the/table
>>>>>>>>>>>>>    3. Root3: s3://bucket2/path/to/the/table
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Relative path use case*
>>>>>>>>>>>>>
>>>>>>>>>>>>> We brainstormed use cases for relative paths. Please let us
>>>>>>>>>>>>> know if there are any other use cases.
>>>>>>>>>>>>>
>>>>>>>>>>>>>    1. Disaster Recovery
>>>>>>>>>>>>>    2. Jack: AWS s3 bucket alias
>>>>>>>>>>>>>    3. Ryan: fall-back use case. In case that the root1
>>>>>>>>>>>>>    doesn’t work, the table falls back to root2, then root3. As Russell
>>>>>>>>>>>>>    mentioned, it is challenging to do snapshot expiration and other table
>>>>>>>>>>>>>    maintenance actions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Timeline*
>>>>>>>>>>>>>
>>>>>>>>>>>>> In terms of timeline, relative path could be a feature in Spec
>>>>>>>>>>>>> V3, since Spec V1 and V2 assume absolute path in metadata.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Misc*
>>>>>>>>>>>>>
>>>>>>>>>>>>>    1. Miao: How is the relative path compatible with the
>>>>>>>>>>>>>    absolute path?
>>>>>>>>>>>>>    2. How do we migrate an existing table? Build a tool for
>>>>>>>>>>>>>    that.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please let us know if you have any ideas, questions, or
>>>>>>>>>>>>> concerns.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yufei
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ryan Blue
>>>>>>> Tabular
>>>>>>>
>>>>>>

Re: Iceberg disaster recovery and relative path sync-up

Posted by Jack Ye <ye...@gmail.com>.

trying to catch up with the conversation here, just typing some of my
thought process:

Based on my understanding, there are in general 2 use cases:

1. multiple data copies in different physical storages, which includes:

1.1 disaster recovery: if 1 storage is completely down, all access needs to
be redirected another storage
1.2 multi-tier data access: a table is stored in multiple tiers, such as a
fast single-availability-zone (AZ) layer, a common multi-AZ layer, and a
slow but cheap archival layer
1.3 table migration: user can manually migrate a table to point to a
completely different location with completely different set of data files

2. different access points referring to the same storage, which includes:

2.1 access point: different names are pointed to the same physical storage
with different access policies.


I think we mostly reached the consensus that multiple root locations are
needed.
Now we need to ask the question of "Can a user access the storage from
catalog to at least get the table metadata and read all the different root
locations".
I think the answer is Yes for 1.2 and 1.3, but No for 1.1 and 2.1.
Therefore, we need to deal with those 2 situations separately.

For 1.1 and 2.1, the metadata cannot be accessed by the user unless an
alternative root bucket/storage endpoint is provided at catalog level. So
this information should be put as a part of the catalog API.
For 1.2 and 1.3, we need some more details about the characteristics of the
root location, specifically:
1. if the location is read only or it can also be written to. Typically we
should only allow 1 single "master" write location to avoid complicated
situations of data in different locations out of sync, unless certain
storage supports a bi-directional sync.
2. how should a reader/writer choose which root to access. To achieve that,
I think we need some root location resolver interface, with some
implementations like RegionBasedRootLocationResolver,
AccessPolicyBasedRootLocationResolver, etc. (this is just my brainstorming)

-Jack Ye






On Mon, Aug 23, 2021 at 4:06 PM Yufei Gu <fl...@gmail.com> wrote:

> @Ryan, how do these properties work with multiple table locations?
>
>    1.
>
>    write.metadata.path
>    2.
>
>    write.folder-storage.path
>    3.
>
>    write.object-storage.path
>
> The current logic with single table location is to honor these properties
> on top of table location. In case of multiple roots(table locations), we
> cannot mix them. For example, write.object-storage.path and table
> location should be in the same region for the DR use case.
>
>
> One of solutions to keep the similar logic is that we support all
> properties for each root, like this
>
>
> 1. Root1: (table location1, metadata path1, folder-storage path1,
> object-storage path1)
>
> 2. Root2: (table location2, metadata path2, folder-storage path2,
> object-storage path2)
>
> 3. Root3: (table location3, null, null, null)
> What do you think?
>
> @Anjali, not sure supporting federation by using multiple roots is a good
> idea. Can we create a partition with a specific location to distinguish
> data between regions?
>
> On Mon, Aug 23, 2021 at 1:14 PM Anjali Norwood
> <an...@netflix.com.invalid> wrote:
>
>> Hi Ryan, All,
>>
>> *"The more I think about this, the more I like the solution to add
>> multiple table roots to metadata, rather than removing table roots. Adding
>> a way to plug in a root selector makes a lot of sense to me and it ensures
>> that the metadata is complete (table location is set in metadata) and that
>> multiple locations can be used. Are there any objections or arguments
>> against doing it that way?"*
>>
>> In the context of your comment on multiple locations above, I am thinking
>> about the following scenarios:
>> 1) Disaster recovery or low-latency use case where clients connect to the
>> region geographically closest to them: In this case, multiple table roots
>> represent a copy of the table, the copies may or may not be in sync. (Most
>> likely active-active replication would be set up in this case and the
>> copies are near-identical). A root level selector works/makes sense.
>> 2) Data residency requirement: data must not leave a country/region. In
>> this case, federation of data from multiple roots constitutes the entirety
>> of the table.
>> 3) One can also imagine combinations of 1 and 2 above where some
>> locations need to be federated and some locations have data replicated from
>> other locations.
>>
>> Curious how the 2nd and 3rd scenarios would be supported with this
>> design.
>>
>> regards,
>> Anjali.
>>
>>
>>
>> On Sun, Aug 22, 2021 at 11:22 PM Peter Vary <pv...@cloudera.com.invalid>
>> wrote:
>>
>>>
>>> @Ryan: If I understand correctly, currently there is a possibility to
>>> change the root location of the table, and it will not change/move the old
>>> data/metadata files created before the change, only the new data/metadata
>>> files will be created in the new location.
>>>
>>> Are we planning to make sure that the tables with relative paths will
>>> always contain every data/metadata file in single root folder?
>>>
>>> Here is a few scenarios which I am thinking about:
>>> 1. Table T1 is created with relative path in HadoopCatalog/HiveCatalog
>>> where the root location is L1, and this location is generated from the
>>> TableIdentifier (at least in creation time)
>>> 2. Data inserted to the table, so data files are created under L1, and
>>> the metadata files contain R1 relative path to the current L1.
>>> 3a. Table location is updated to L2 (Hive: ALTER TABLE T1 SET LOCATION
>>> L2)
>>> 3b. Table renamed to T2 (Hive: ALTER TABLE T1 RENAME TO T2) - Table
>>> location is updated for Hive tables if the old location was the default
>>> location.
>>> 4. When we try to read this table we should read the old data/metadata
>>> files as well.
>>>
>>> So in both cases we have to move the old data/metadata files around like
>>> Hive does for the native tables, and for the tables with relative paths we
>>> do not have to change the metadata other than the root path? Will we do the
>>> same thing with other engines as well?
>>>
>>> Thanks,
>>> Peter
>>>
>>> On Mon, 23 Aug 2021, 06:38 Yufei Gu, <fl...@gmail.com> wrote:
>>>
>>>> Steven, here is my understanding. It depends on whether you want to
>>>> move the data. In the DR case, we do move the data, we expect data to be
>>>> identical from time to time, but not always be. In the case of S3 aliases,
>>>> different roots actually point to the same location, there is no data move,
>>>> and data is identical for sure.
>>>>
>>>> On Sun, Aug 22, 2021 at 8:19 PM Steven Wu <st...@gmail.com> wrote:
>>>>
>>>>> For the multiple table roots, do we expect or ensure that the data are
>>>>> identical across the different roots? or this is best-effort background
>>>>> synchronization across the different roots?
>>>>>
>>>>> On Sun, Aug 22, 2021 at 11:53 AM Ryan Blue <bl...@tabular.io> wrote:
>>>>>
>>>>>> Peter, I think that this feature would be useful when moving tables
>>>>>> between root locations or when you want to maintain multiple root
>>>>>> locations. Renames are orthogonal because a rename doesn't change the table
>>>>>> location. You may want to move the table after a rename, and this would
>>>>>> help in that case. But actually moving data is optional. That's why we put
>>>>>> the table location in metadata.
>>>>>>
>>>>>> Anjali, DR is a big use case, but we also talked about directing
>>>>>> accesses through other URLs, like S3 access points, table migration (like
>>>>>> the rename case), and background data migration (e.g. lifting files between
>>>>>> S3 regions). There are a few uses for it.
>>>>>>
>>>>>> The more I think about this, the more I like the solution to add
>>>>>> multiple table roots to metadata, rather than removing table roots. Adding
>>>>>> a way to plug in a root selector makes a lot of sense to me and it ensures
>>>>>> that the metadata is complete (table location is set in metadata) and that
>>>>>> multiple locations can be used. Are there any objections or arguments
>>>>>> against doing it that way?
>>>>>>
>>>>>> On Fri, Aug 20, 2021 at 9:00 AM Anjali Norwood
>>>>>> <an...@netflix.com.invalid> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> This thread is about disaster recovery and relative paths, but I
>>>>>>> wanted to ask an orthogonal but related question.
>>>>>>> Do we see disaster recovery as the only (or main) use case for
>>>>>>> multi-region?
>>>>>>> Is data residency requirement a use case for anybody? Is it possible
>>>>>>> to shard an iceberg table across regions? How is the location managed in
>>>>>>> that case?
>>>>>>>
>>>>>>> thanks,
>>>>>>> Anjali.
>>>>>>>
>>>>>>> On Fri, Aug 20, 2021 at 12:20 AM Peter Vary
>>>>>>> <pv...@cloudera.com.invalid> wrote:
>>>>>>>
>>>>>>>> Sadly, I have missed the meeting :(
>>>>>>>>
>>>>>>>> Quick question:
>>>>>>>> Was table rename / location change discussed for tables with
>>>>>>>> relative paths?
>>>>>>>>
>>>>>>>> AFAIK when a table rename happens then we do not move old data /
>>>>>>>> metadata files, we just change the root location of the new data / metadata
>>>>>>>> files. If I am correct about this then we might need to handle this
>>>>>>>> differently for tables with relative paths.
>>>>>>>>
>>>>>>>> Thanks, Peter
>>>>>>>>
>>>>>>>> On Fri, 13 Aug 2021, 15:12 Anjali Norwood,
>>>>>>>> <an...@netflix.com.invalid> wrote:
>>>>>>>>
>>>>>>>>> Perfect, thank you Yufei.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Anjali
>>>>>>>>>
>>>>>>>>> On Thu, Aug 12, 2021 at 9:58 PM Yufei Gu <fl...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Anjali,
>>>>>>>>>>
>>>>>>>>>> Inline...
>>>>>>>>>> On Thu, Aug 12, 2021 at 5:31 PM Anjali Norwood
>>>>>>>>>> <an...@netflix.com.invalid> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks for the summary Yufei.
>>>>>>>>>>> Sorry, if this was already discussed, I missed the meeting
>>>>>>>>>>> yesterday.
>>>>>>>>>>> Is there anything in the design that would prevent multiple
>>>>>>>>>>> roots from being in different aws regions?
>>>>>>>>>>>
>>>>>>>>>> No. DR is the major use case of relative paths, if not the only
>>>>>>>>>> one. So, it will support roots in different regions.
>>>>>>>>>>
>>>>>>>>>> For disaster recovery in the case of an entire aws region down or
>>>>>>>>>>> slow, is metastore still a point of failure or can metastore be stood up in
>>>>>>>>>>> a different region and could select a different root?
>>>>>>>>>>>
>>>>>>>>>> Normally, DR also requires a backup metastore, besides the
>>>>>>>>>> storage(s3 bucket). In that case, the backup metastore will be in a
>>>>>>>>>> different region along with the table files. For example, the primary table
>>>>>>>>>> is located in region A as well as its metastore, the backup table is
>>>>>>>>>> located in region B as well as its metastore. The primary table root points
>>>>>>>>>> to a path in region A, while backup table root points to a path in region B.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> regards,
>>>>>>>>>>> Anjali.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Aug 12, 2021 at 11:35 AM Yufei Gu <fl...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Here is a summary of yesterday's community sync-up.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Yufei gave a brief update on disaster recovery requirements and
>>>>>>>>>>>> the current progress of relative path approach.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Ryan: We all agreed that relative path is the way for disaster
>>>>>>>>>>>> recovery.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Multiple roots for the relative path*
>>>>>>>>>>>>
>>>>>>>>>>>> Ryan proposed an idea to enable multiple roots for a table,
>>>>>>>>>>>> basically, we can add a list of roots in table metadata, and use a selector
>>>>>>>>>>>> to choose different roots when we move the table from one place to another.
>>>>>>>>>>>> The selector reads a property to decide which root to use. The property
>>>>>>>>>>>> could be either from catalog or the table metadata, which is yet to be
>>>>>>>>>>>> decided.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Here is an example I’d image:
>>>>>>>>>>>>
>>>>>>>>>>>>    1. Root1: hdfs://nn:8020/path/to/the/table
>>>>>>>>>>>>    2. Root2: s3://bucket1/path/to/the/table
>>>>>>>>>>>>    3. Root3: s3://bucket2/path/to/the/table
>>>>>>>>>>>>
>>>>>>>>>>>> *Relative path use case*
>>>>>>>>>>>>
>>>>>>>>>>>> We brainstormed use cases for relative paths. Please let us
>>>>>>>>>>>> know if there are any other use cases.
>>>>>>>>>>>>
>>>>>>>>>>>>    1. Disaster Recovery
>>>>>>>>>>>>    2. Jack: AWS s3 bucket alias
>>>>>>>>>>>>    3. Ryan: fall-back use case. In case that the root1 doesn’t
>>>>>>>>>>>>    work, the table falls back to root2, then root3. As Russell mentioned, it
>>>>>>>>>>>>    is challenging to do snapshot expiration and other table maintenance
>>>>>>>>>>>>    actions.
>>>>>>>>>>>>
>>>>>>>>>>>> *Timeline*
>>>>>>>>>>>>
>>>>>>>>>>>> In terms of timeline, relative path could be a feature in Spec
>>>>>>>>>>>> V3, since Spec V1 and V2 assume absolute path in metadata.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Misc*
>>>>>>>>>>>>
>>>>>>>>>>>>    1. Miao: How is the relative path compatible with the
>>>>>>>>>>>>    absolute path?
>>>>>>>>>>>>    2. How do we migrate an existing table? Build a tool for
>>>>>>>>>>>>    that.
>>>>>>>>>>>>
>>>>>>>>>>>> Please let us know if you have any ideas, questions, or
>>>>>>>>>>>> concerns.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Yufei
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ryan Blue
>>>>>> Tabular
>>>>>>
>>>>>

Re: Iceberg disaster recovery and relative path sync-up

Posted by Yufei Gu <fl...@gmail.com>.

@Ryan, how do these properties work with multiple table locations?

   1.

   write.metadata.path
   2.

   write.folder-storage.path
   3.

   write.object-storage.path

The current logic with single table location is to honor these properties
on top of table location. In case of multiple roots(table locations), we
cannot mix them. For example, write.object-storage.path and table location
should be in the same region for the DR use case.


One of solutions to keep the similar logic is that we support all
properties for each root, like this


1. Root1: (table location1, metadata path1, folder-storage path1,
object-storage path1)

2. Root2: (table location2, metadata path2, folder-storage path2,
object-storage path2)

3. Root3: (table location3, null, null, null)
What do you think?

@Anjali, not sure supporting federation by using multiple roots is a good
idea. Can we create a partition with a specific location to distinguish
data between regions?

On Mon, Aug 23, 2021 at 1:14 PM Anjali Norwood <an...@netflix.com.invalid>
wrote:

> Hi Ryan, All,
>
> *"The more I think about this, the more I like the solution to add
> multiple table roots to metadata, rather than removing table roots. Adding
> a way to plug in a root selector makes a lot of sense to me and it ensures
> that the metadata is complete (table location is set in metadata) and that
> multiple locations can be used. Are there any objections or arguments
> against doing it that way?"*
>
> In the context of your comment on multiple locations above, I am thinking
> about the following scenarios:
> 1) Disaster recovery or low-latency use case where clients connect to the
> region geographically closest to them: In this case, multiple table roots
> represent a copy of the table, the copies may or may not be in sync. (Most
> likely active-active replication would be set up in this case and the
> copies are near-identical). A root level selector works/makes sense.
> 2) Data residency requirement: data must not leave a country/region. In
> this case, federation of data from multiple roots constitutes the entirety
> of the table.
> 3) One can also imagine combinations of 1 and 2 above where some locations
> need to be federated and some locations have data replicated from other
> locations.
>
> Curious how the 2nd and 3rd scenarios would be supported with this design.
>
> regards,
> Anjali.
>
>
>
> On Sun, Aug 22, 2021 at 11:22 PM Peter Vary <pv...@cloudera.com.invalid>
> wrote:
>
>>
>> @Ryan: If I understand correctly, currently there is a possibility to
>> change the root location of the table, and it will not change/move the old
>> data/metadata files created before the change, only the new data/metadata
>> files will be created in the new location.
>>
>> Are we planning to make sure that the tables with relative paths will
>> always contain every data/metadata file in single root folder?
>>
>> Here is a few scenarios which I am thinking about:
>> 1. Table T1 is created with relative path in HadoopCatalog/HiveCatalog
>> where the root location is L1, and this location is generated from the
>> TableIdentifier (at least in creation time)
>> 2. Data inserted to the table, so data files are created under L1, and
>> the metadata files contain R1 relative path to the current L1.
>> 3a. Table location is updated to L2 (Hive: ALTER TABLE T1 SET LOCATION L2)
>> 3b. Table renamed to T2 (Hive: ALTER TABLE T1 RENAME TO T2) - Table
>> location is updated for Hive tables if the old location was the default
>> location.
>> 4. When we try to read this table we should read the old data/metadata
>> files as well.
>>
>> So in both cases we have to move the old data/metadata files around like
>> Hive does for the native tables, and for the tables with relative paths we
>> do not have to change the metadata other than the root path? Will we do the
>> same thing with other engines as well?
>>
>> Thanks,
>> Peter
>>
>> On Mon, 23 Aug 2021, 06:38 Yufei Gu, <fl...@gmail.com> wrote:
>>
>>> Steven, here is my understanding. It depends on whether you want to move
>>> the data. In the DR case, we do move the data, we expect data to be
>>> identical from time to time, but not always be. In the case of S3 aliases,
>>> different roots actually point to the same location, there is no data move,
>>> and data is identical for sure.
>>>
>>> On Sun, Aug 22, 2021 at 8:19 PM Steven Wu <st...@gmail.com> wrote:
>>>
>>>> For the multiple table roots, do we expect or ensure that the data are
>>>> identical across the different roots? or this is best-effort background
>>>> synchronization across the different roots?
>>>>
>>>> On Sun, Aug 22, 2021 at 11:53 AM Ryan Blue <bl...@tabular.io> wrote:
>>>>
>>>>> Peter, I think that this feature would be useful when moving tables
>>>>> between root locations or when you want to maintain multiple root
>>>>> locations. Renames are orthogonal because a rename doesn't change the table
>>>>> location. You may want to move the table after a rename, and this would
>>>>> help in that case. But actually moving data is optional. That's why we put
>>>>> the table location in metadata.
>>>>>
>>>>> Anjali, DR is a big use case, but we also talked about directing
>>>>> accesses through other URLs, like S3 access points, table migration (like
>>>>> the rename case), and background data migration (e.g. lifting files between
>>>>> S3 regions). There are a few uses for it.
>>>>>
>>>>> The more I think about this, the more I like the solution to add
>>>>> multiple table roots to metadata, rather than removing table roots. Adding
>>>>> a way to plug in a root selector makes a lot of sense to me and it ensures
>>>>> that the metadata is complete (table location is set in metadata) and that
>>>>> multiple locations can be used. Are there any objections or arguments
>>>>> against doing it that way?
>>>>>
>>>>> On Fri, Aug 20, 2021 at 9:00 AM Anjali Norwood
>>>>> <an...@netflix.com.invalid> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> This thread is about disaster recovery and relative paths, but I
>>>>>> wanted to ask an orthogonal but related question.
>>>>>> Do we see disaster recovery as the only (or main) use case for
>>>>>> multi-region?
>>>>>> Is data residency requirement a use case for anybody? Is it possible
>>>>>> to shard an iceberg table across regions? How is the location managed in
>>>>>> that case?
>>>>>>
>>>>>> thanks,
>>>>>> Anjali.
>>>>>>
>>>>>> On Fri, Aug 20, 2021 at 12:20 AM Peter Vary
>>>>>> <pv...@cloudera.com.invalid> wrote:
>>>>>>
>>>>>>> Sadly, I have missed the meeting :(
>>>>>>>
>>>>>>> Quick question:
>>>>>>> Was table rename / location change discussed for tables with
>>>>>>> relative paths?
>>>>>>>
>>>>>>> AFAIK when a table rename happens then we do not move old data /
>>>>>>> metadata files, we just change the root location of the new data / metadata
>>>>>>> files. If I am correct about this then we might need to handle this
>>>>>>> differently for tables with relative paths.
>>>>>>>
>>>>>>> Thanks, Peter
>>>>>>>
>>>>>>> On Fri, 13 Aug 2021, 15:12 Anjali Norwood,
>>>>>>> <an...@netflix.com.invalid> wrote:
>>>>>>>
>>>>>>>> Perfect, thank you Yufei.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Anjali
>>>>>>>>
>>>>>>>> On Thu, Aug 12, 2021 at 9:58 PM Yufei Gu <fl...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Anjali,
>>>>>>>>>
>>>>>>>>> Inline...
>>>>>>>>> On Thu, Aug 12, 2021 at 5:31 PM Anjali Norwood
>>>>>>>>> <an...@netflix.com.invalid> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks for the summary Yufei.
>>>>>>>>>> Sorry, if this was already discussed, I missed the meeting
>>>>>>>>>> yesterday.
>>>>>>>>>> Is there anything in the design that would prevent multiple roots
>>>>>>>>>> from being in different aws regions?
>>>>>>>>>>
>>>>>>>>> No. DR is the major use case of relative paths, if not the only
>>>>>>>>> one. So, it will support roots in different regions.
>>>>>>>>>
>>>>>>>>> For disaster recovery in the case of an entire aws region down or
>>>>>>>>>> slow, is metastore still a point of failure or can metastore be stood up in
>>>>>>>>>> a different region and could select a different root?
>>>>>>>>>>
>>>>>>>>> Normally, DR also requires a backup metastore, besides the
>>>>>>>>> storage(s3 bucket). In that case, the backup metastore will be in a
>>>>>>>>> different region along with the table files. For example, the primary table
>>>>>>>>> is located in region A as well as its metastore, the backup table is
>>>>>>>>> located in region B as well as its metastore. The primary table root points
>>>>>>>>> to a path in region A, while backup table root points to a path in region B.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> regards,
>>>>>>>>>> Anjali.
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 12, 2021 at 11:35 AM Yufei Gu <fl...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Here is a summary of yesterday's community sync-up.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Yufei gave a brief update on disaster recovery requirements and
>>>>>>>>>>> the current progress of relative path approach.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Ryan: We all agreed that relative path is the way for disaster
>>>>>>>>>>> recovery.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *Multiple roots for the relative path*
>>>>>>>>>>>
>>>>>>>>>>> Ryan proposed an idea to enable multiple roots for a table,
>>>>>>>>>>> basically, we can add a list of roots in table metadata, and use a selector
>>>>>>>>>>> to choose different roots when we move the table from one place to another.
>>>>>>>>>>> The selector reads a property to decide which root to use. The property
>>>>>>>>>>> could be either from catalog or the table metadata, which is yet to be
>>>>>>>>>>> decided.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Here is an example I’d image:
>>>>>>>>>>>
>>>>>>>>>>>    1. Root1: hdfs://nn:8020/path/to/the/table
>>>>>>>>>>>    2. Root2: s3://bucket1/path/to/the/table
>>>>>>>>>>>    3. Root3: s3://bucket2/path/to/the/table
>>>>>>>>>>>
>>>>>>>>>>> *Relative path use case*
>>>>>>>>>>>
>>>>>>>>>>> We brainstormed use cases for relative paths. Please let us know
>>>>>>>>>>> if there are any other use cases.
>>>>>>>>>>>
>>>>>>>>>>>    1. Disaster Recovery
>>>>>>>>>>>    2. Jack: AWS s3 bucket alias
>>>>>>>>>>>    3. Ryan: fall-back use case. In case that the root1 doesn’t
>>>>>>>>>>>    work, the table falls back to root2, then root3. As Russell mentioned, it
>>>>>>>>>>>    is challenging to do snapshot expiration and other table maintenance
>>>>>>>>>>>    actions.
>>>>>>>>>>>
>>>>>>>>>>> *Timeline*
>>>>>>>>>>>
>>>>>>>>>>> In terms of timeline, relative path could be a feature in Spec
>>>>>>>>>>> V3, since Spec V1 and V2 assume absolute path in metadata.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *Misc*
>>>>>>>>>>>
>>>>>>>>>>>    1. Miao: How is the relative path compatible with the
>>>>>>>>>>>    absolute path?
>>>>>>>>>>>    2. How do we migrate an existing table? Build a tool for
>>>>>>>>>>>    that.
>>>>>>>>>>>
>>>>>>>>>>> Please let us know if you have any ideas, questions, or concerns.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Yufei
>>>>>>>>>>>
>>>>>>>>>>
>>>>>
>>>>> --
>>>>> Ryan Blue
>>>>> Tabular
>>>>>
>>>>

Re: Iceberg disaster recovery and relative path sync-up

Posted by Anjali Norwood <an...@netflix.com.INVALID>.

Hi Ryan, All,

*"The more I think about this, the more I like the solution to add multiple
table roots to metadata, rather than removing table roots. Adding a way to
plug in a root selector makes a lot of sense to me and it ensures that the
metadata is complete (table location is set in metadata) and that multiple
locations can be used. Are there any objections or arguments against doing
it that way?"*

In the context of your comment on multiple locations above, I am thinking
about the following scenarios:
1) Disaster recovery or low-latency use case where clients connect to the
region geographically closest to them: In this case, multiple table roots
represent a copy of the table, the copies may or may not be in sync. (Most
likely active-active replication would be set up in this case and the
copies are near-identical). A root level selector works/makes sense.
2) Data residency requirement: data must not leave a country/region. In
this case, federation of data from multiple roots constitutes the entirety
of the table.
3) One can also imagine combinations of 1 and 2 above where some locations
need to be federated and some locations have data replicated from other
locations.

Curious how the 2nd and 3rd scenarios would be supported with this design.

regards,
Anjali.



On Sun, Aug 22, 2021 at 11:22 PM Peter Vary <pv...@cloudera.com.invalid>
wrote:

>
> @Ryan: If I understand correctly, currently there is a possibility to
> change the root location of the table, and it will not change/move the old
> data/metadata files created before the change, only the new data/metadata
> files will be created in the new location.
>
> Are we planning to make sure that the tables with relative paths will
> always contain every data/metadata file in single root folder?
>
> Here is a few scenarios which I am thinking about:
> 1. Table T1 is created with relative path in HadoopCatalog/HiveCatalog
> where the root location is L1, and this location is generated from the
> TableIdentifier (at least in creation time)
> 2. Data inserted to the table, so data files are created under L1, and the
> metadata files contain R1 relative path to the current L1.
> 3a. Table location is updated to L2 (Hive: ALTER TABLE T1 SET LOCATION L2)
> 3b. Table renamed to T2 (Hive: ALTER TABLE T1 RENAME TO T2) - Table
> location is updated for Hive tables if the old location was the default
> location.
> 4. When we try to read this table we should read the old data/metadata
> files as well.
>
> So in both cases we have to move the old data/metadata files around like
> Hive does for the native tables, and for the tables with relative paths we
> do not have to change the metadata other than the root path? Will we do the
> same thing with other engines as well?
>
> Thanks,
> Peter
>
> On Mon, 23 Aug 2021, 06:38 Yufei Gu, <fl...@gmail.com> wrote:
>
>> Steven, here is my understanding. It depends on whether you want to move
>> the data. In the DR case, we do move the data, we expect data to be
>> identical from time to time, but not always be. In the case of S3 aliases,
>> different roots actually point to the same location, there is no data move,
>> and data is identical for sure.
>>
>> On Sun, Aug 22, 2021 at 8:19 PM Steven Wu <st...@gmail.com> wrote:
>>
>>> For the multiple table roots, do we expect or ensure that the data are
>>> identical across the different roots? or this is best-effort background
>>> synchronization across the different roots?
>>>
>>> On Sun, Aug 22, 2021 at 11:53 AM Ryan Blue <bl...@tabular.io> wrote:
>>>
>>>> Peter, I think that this feature would be useful when moving tables
>>>> between root locations or when you want to maintain multiple root
>>>> locations. Renames are orthogonal because a rename doesn't change the table
>>>> location. You may want to move the table after a rename, and this would
>>>> help in that case. But actually moving data is optional. That's why we put
>>>> the table location in metadata.
>>>>
>>>> Anjali, DR is a big use case, but we also talked about directing
>>>> accesses through other URLs, like S3 access points, table migration (like
>>>> the rename case), and background data migration (e.g. lifting files between
>>>> S3 regions). There are a few uses for it.
>>>>
>>>> The more I think about this, the more I like the solution to add
>>>> multiple table roots to metadata, rather than removing table roots. Adding
>>>> a way to plug in a root selector makes a lot of sense to me and it ensures
>>>> that the metadata is complete (table location is set in metadata) and that
>>>> multiple locations can be used. Are there any objections or arguments
>>>> against doing it that way?
>>>>
>>>> On Fri, Aug 20, 2021 at 9:00 AM Anjali Norwood
>>>> <an...@netflix.com.invalid> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> This thread is about disaster recovery and relative paths, but I
>>>>> wanted to ask an orthogonal but related question.
>>>>> Do we see disaster recovery as the only (or main) use case for
>>>>> multi-region?
>>>>> Is data residency requirement a use case for anybody? Is it possible
>>>>> to shard an iceberg table across regions? How is the location managed in
>>>>> that case?
>>>>>
>>>>> thanks,
>>>>> Anjali.
>>>>>
>>>>> On Fri, Aug 20, 2021 at 12:20 AM Peter Vary <pv...@cloudera.com.invalid>
>>>>> wrote:
>>>>>
>>>>>> Sadly, I have missed the meeting :(
>>>>>>
>>>>>> Quick question:
>>>>>> Was table rename / location change discussed for tables with relative
>>>>>> paths?
>>>>>>
>>>>>> AFAIK when a table rename happens then we do not move old data /
>>>>>> metadata files, we just change the root location of the new data / metadata
>>>>>> files. If I am correct about this then we might need to handle this
>>>>>> differently for tables with relative paths.
>>>>>>
>>>>>> Thanks, Peter
>>>>>>
>>>>>> On Fri, 13 Aug 2021, 15:12 Anjali Norwood,
>>>>>> <an...@netflix.com.invalid> wrote:
>>>>>>
>>>>>>> Perfect, thank you Yufei.
>>>>>>>
>>>>>>> Regards
>>>>>>> Anjali
>>>>>>>
>>>>>>> On Thu, Aug 12, 2021 at 9:58 PM Yufei Gu <fl...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Anjali,
>>>>>>>>
>>>>>>>> Inline...
>>>>>>>> On Thu, Aug 12, 2021 at 5:31 PM Anjali Norwood
>>>>>>>> <an...@netflix.com.invalid> wrote:
>>>>>>>>
>>>>>>>>> Thanks for the summary Yufei.
>>>>>>>>> Sorry, if this was already discussed, I missed the meeting
>>>>>>>>> yesterday.
>>>>>>>>> Is there anything in the design that would prevent multiple roots
>>>>>>>>> from being in different aws regions?
>>>>>>>>>
>>>>>>>> No. DR is the major use case of relative paths, if not the only
>>>>>>>> one. So, it will support roots in different regions.
>>>>>>>>
>>>>>>>> For disaster recovery in the case of an entire aws region down or
>>>>>>>>> slow, is metastore still a point of failure or can metastore be stood up in
>>>>>>>>> a different region and could select a different root?
>>>>>>>>>
>>>>>>>> Normally, DR also requires a backup metastore, besides the
>>>>>>>> storage(s3 bucket). In that case, the backup metastore will be in a
>>>>>>>> different region along with the table files. For example, the primary table
>>>>>>>> is located in region A as well as its metastore, the backup table is
>>>>>>>> located in region B as well as its metastore. The primary table root points
>>>>>>>> to a path in region A, while backup table root points to a path in region B.
>>>>>>>>
>>>>>>>>
>>>>>>>>> regards,
>>>>>>>>> Anjali.
>>>>>>>>>
>>>>>>>>> On Thu, Aug 12, 2021 at 11:35 AM Yufei Gu <fl...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Here is a summary of yesterday's community sync-up.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yufei gave a brief update on disaster recovery requirements and
>>>>>>>>>> the current progress of relative path approach.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Ryan: We all agreed that relative path is the way for disaster
>>>>>>>>>> recovery.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Multiple roots for the relative path*
>>>>>>>>>>
>>>>>>>>>> Ryan proposed an idea to enable multiple roots for a table,
>>>>>>>>>> basically, we can add a list of roots in table metadata, and use a selector
>>>>>>>>>> to choose different roots when we move the table from one place to another.
>>>>>>>>>> The selector reads a property to decide which root to use. The property
>>>>>>>>>> could be either from catalog or the table metadata, which is yet to be
>>>>>>>>>> decided.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here is an example I’d image:
>>>>>>>>>>
>>>>>>>>>>    1. Root1: hdfs://nn:8020/path/to/the/table
>>>>>>>>>>    2. Root2: s3://bucket1/path/to/the/table
>>>>>>>>>>    3. Root3: s3://bucket2/path/to/the/table
>>>>>>>>>>
>>>>>>>>>> *Relative path use case*
>>>>>>>>>>
>>>>>>>>>> We brainstormed use cases for relative paths. Please let us know
>>>>>>>>>> if there are any other use cases.
>>>>>>>>>>
>>>>>>>>>>    1. Disaster Recovery
>>>>>>>>>>    2. Jack: AWS s3 bucket alias
>>>>>>>>>>    3. Ryan: fall-back use case. In case that the root1 doesn’t
>>>>>>>>>>    work, the table falls back to root2, then root3. As Russell mentioned, it
>>>>>>>>>>    is challenging to do snapshot expiration and other table maintenance
>>>>>>>>>>    actions.
>>>>>>>>>>
>>>>>>>>>> *Timeline*
>>>>>>>>>>
>>>>>>>>>> In terms of timeline, relative path could be a feature in Spec
>>>>>>>>>> V3, since Spec V1 and V2 assume absolute path in metadata.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Misc*
>>>>>>>>>>
>>>>>>>>>>    1. Miao: How is the relative path compatible with the
>>>>>>>>>>    absolute path?
>>>>>>>>>>    2. How do we migrate an existing table? Build a tool for that.
>>>>>>>>>>
>>>>>>>>>> Please let us know if you have any ideas, questions, or concerns.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yufei
>>>>>>>>>>
>>>>>>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Tabular
>>>>
>>>

Re: Iceberg disaster recovery and relative path sync-up

Posted by Ryan Blue <bl...@tabular.io>.

Are we planning to make sure that the tables with relative paths will
always contain every data/metadata file in single root folder?

This depends on the use case. For table mirroring, there would be a
back-end service making copies of all metadata, but not for other use
cases. For example, the rename use case would have a service that is moving
files, possibly deleting the original as it goes.

Many decisions depend on external services and the components that are
configured for a table. Whether to fall back to reading a different path
depends on the use case and what you’re trying to do. If you’re mirroring
tables across regions or clouds, then it’s unlikely you’d want an automatic
fallback. But if you’re renaming a table and moving the files then you
probably do. A location selector can select the initial path and FileIO
should be able to handle the fallback strategy. So this is customizable.

There are a lot of choices and what I think we are aiming for is to provide
flexibility to get these tasks done. The current single location system is
not very flexible. And I don’t think that using a single table location
from the catalog level is flexible enough for these use cases, either. But
the trade-off is making it well-defined enough that you can make some
assumptions about file access.

I think the default should be that we assume that all files exist in the
location provided by a table location selector. That supports the DR use
case. Anything more complicated could be plugged in through catalog
customizations, with the understanding that if you change the assumption
about whether a file exists, you’d need to update your FileIO to have a
fallback strategy.

On Sun, Aug 22, 2021 at 11:22 PM Peter Vary pvary@cloudera.com.invalid
<ht...@cloudera.com.invalid> wrote:

Here is a few scenarios which I am thinking about:
> 1. Table T1 is created with relative path in HadoopCatalog/HiveCatalog
> where the root location is L1, and this location is generated from the
> TableIdentifier (at least in creation time)
> 2. Data inserted to the table, so data files are created under L1, and the
> metadata files contain R1 relative path to the current L1.
> 3a. Table location is updated to L2 (Hive: ALTER TABLE T1 SET LOCATION L2)
> 3b. Table renamed to T2 (Hive: ALTER TABLE T1 RENAME TO T2) - Table
> location is updated for Hive tables if the old location was the default
> location.
> 4. When we try to read this table we should read the old data/metadata
> files as well.
>
> So in both cases we have to move the old data/metadata files around like
> Hive does for the native tables, and for the tables with relative paths we
> do not have to change the metadata other than the root path? Will we do the
> same thing with other engines as well?
>
>> I think this depends on how the engine handles it. If the engine doesn’t
move files, then moving the location breaks the table. But the engine might
move the files before returning from 3a. It may also choose to add a second
location to the table and move the files in the background, using a custom
FileIO that will fall back across table locations.

This is a good example of how Iceberg is always dependent on what engines
or services choose to do with it. There are other cases where engines or
services can do bad things. For example, deleting a data file and adding it
back later will cause Iceberg’s expireSnapshots operation to delete the
data file because it doesn’t check whether the file is currently referenced
(that’s really expensive, and what the action does).

Ryan
-- 
Ryan Blue
Tabular

Re: Iceberg disaster recovery and relative path sync-up

Posted by Peter Vary <pv...@cloudera.com.INVALID>.

@Ryan: If I understand correctly, currently there is a possibility to
change the root location of the table, and it will not change/move the old
data/metadata files created before the change, only the new data/metadata
files will be created in the new location.

Are we planning to make sure that the tables with relative paths will
always contain every data/metadata file in single root folder?

Here is a few scenarios which I am thinking about:
1. Table T1 is created with relative path in HadoopCatalog/HiveCatalog
where the root location is L1, and this location is generated from the
TableIdentifier (at least in creation time)
2. Data inserted to the table, so data files are created under L1, and the
metadata files contain R1 relative path to the current L1.
3a. Table location is updated to L2 (Hive: ALTER TABLE T1 SET LOCATION L2)
3b. Table renamed to T2 (Hive: ALTER TABLE T1 RENAME TO T2) - Table
location is updated for Hive tables if the old location was the default
location.
4. When we try to read this table we should read the old data/metadata
files as well.

So in both cases we have to move the old data/metadata files around like
Hive does for the native tables, and for the tables with relative paths we
do not have to change the metadata other than the root path? Will we do the
same thing with other engines as well?

Thanks,
Peter

On Mon, 23 Aug 2021, 06:38 Yufei Gu, <fl...@gmail.com> wrote:

> Steven, here is my understanding. It depends on whether you want to move
> the data. In the DR case, we do move the data, we expect data to be
> identical from time to time, but not always be. In the case of S3 aliases,
> different roots actually point to the same location, there is no data move,
> and data is identical for sure.
>
> On Sun, Aug 22, 2021 at 8:19 PM Steven Wu <st...@gmail.com> wrote:
>
>> For the multiple table roots, do we expect or ensure that the data are
>> identical across the different roots? or this is best-effort background
>> synchronization across the different roots?
>>
>> On Sun, Aug 22, 2021 at 11:53 AM Ryan Blue <bl...@tabular.io> wrote:
>>
>>> Peter, I think that this feature would be useful when moving tables
>>> between root locations or when you want to maintain multiple root
>>> locations. Renames are orthogonal because a rename doesn't change the table
>>> location. You may want to move the table after a rename, and this would
>>> help in that case. But actually moving data is optional. That's why we put
>>> the table location in metadata.
>>>
>>> Anjali, DR is a big use case, but we also talked about directing
>>> accesses through other URLs, like S3 access points, table migration (like
>>> the rename case), and background data migration (e.g. lifting files between
>>> S3 regions). There are a few uses for it.
>>>
>>> The more I think about this, the more I like the solution to add
>>> multiple table roots to metadata, rather than removing table roots. Adding
>>> a way to plug in a root selector makes a lot of sense to me and it ensures
>>> that the metadata is complete (table location is set in metadata) and that
>>> multiple locations can be used. Are there any objections or arguments
>>> against doing it that way?
>>>
>>> On Fri, Aug 20, 2021 at 9:00 AM Anjali Norwood
>>> <an...@netflix.com.invalid> wrote:
>>>
>>>> Hi,
>>>>
>>>> This thread is about disaster recovery and relative paths, but I wanted
>>>> to ask an orthogonal but related question.
>>>> Do we see disaster recovery as the only (or main) use case for
>>>> multi-region?
>>>> Is data residency requirement a use case for anybody? Is it possible to
>>>> shard an iceberg table across regions? How is the location managed in that
>>>> case?
>>>>
>>>> thanks,
>>>> Anjali.
>>>>
>>>> On Fri, Aug 20, 2021 at 12:20 AM Peter Vary <pv...@cloudera.com.invalid>
>>>> wrote:
>>>>
>>>>> Sadly, I have missed the meeting :(
>>>>>
>>>>> Quick question:
>>>>> Was table rename / location change discussed for tables with relative
>>>>> paths?
>>>>>
>>>>> AFAIK when a table rename happens then we do not move old data /
>>>>> metadata files, we just change the root location of the new data / metadata
>>>>> files. If I am correct about this then we might need to handle this
>>>>> differently for tables with relative paths.
>>>>>
>>>>> Thanks, Peter
>>>>>
>>>>> On Fri, 13 Aug 2021, 15:12 Anjali Norwood,
>>>>> <an...@netflix.com.invalid> wrote:
>>>>>
>>>>>> Perfect, thank you Yufei.
>>>>>>
>>>>>> Regards
>>>>>> Anjali
>>>>>>
>>>>>> On Thu, Aug 12, 2021 at 9:58 PM Yufei Gu <fl...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Anjali,
>>>>>>>
>>>>>>> Inline...
>>>>>>> On Thu, Aug 12, 2021 at 5:31 PM Anjali Norwood
>>>>>>> <an...@netflix.com.invalid> wrote:
>>>>>>>
>>>>>>>> Thanks for the summary Yufei.
>>>>>>>> Sorry, if this was already discussed, I missed the meeting
>>>>>>>> yesterday.
>>>>>>>> Is there anything in the design that would prevent multiple roots
>>>>>>>> from being in different aws regions?
>>>>>>>>
>>>>>>> No. DR is the major use case of relative paths, if not the only one.
>>>>>>> So, it will support roots in different regions.
>>>>>>>
>>>>>>> For disaster recovery in the case of an entire aws region down or
>>>>>>>> slow, is metastore still a point of failure or can metastore be stood up in
>>>>>>>> a different region and could select a different root?
>>>>>>>>
>>>>>>> Normally, DR also requires a backup metastore, besides the
>>>>>>> storage(s3 bucket). In that case, the backup metastore will be in a
>>>>>>> different region along with the table files. For example, the primary table
>>>>>>> is located in region A as well as its metastore, the backup table is
>>>>>>> located in region B as well as its metastore. The primary table root points
>>>>>>> to a path in region A, while backup table root points to a path in region B.
>>>>>>>
>>>>>>>
>>>>>>>> regards,
>>>>>>>> Anjali.
>>>>>>>>
>>>>>>>> On Thu, Aug 12, 2021 at 11:35 AM Yufei Gu <fl...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Here is a summary of yesterday's community sync-up.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yufei gave a brief update on disaster recovery requirements and
>>>>>>>>> the current progress of relative path approach.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ryan: We all agreed that relative path is the way for disaster
>>>>>>>>> recovery.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Multiple roots for the relative path*
>>>>>>>>>
>>>>>>>>> Ryan proposed an idea to enable multiple roots for a table,
>>>>>>>>> basically, we can add a list of roots in table metadata, and use a selector
>>>>>>>>> to choose different roots when we move the table from one place to another.
>>>>>>>>> The selector reads a property to decide which root to use. The property
>>>>>>>>> could be either from catalog or the table metadata, which is yet to be
>>>>>>>>> decided.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here is an example I’d image:
>>>>>>>>>
>>>>>>>>>    1. Root1: hdfs://nn:8020/path/to/the/table
>>>>>>>>>    2. Root2: s3://bucket1/path/to/the/table
>>>>>>>>>    3. Root3: s3://bucket2/path/to/the/table
>>>>>>>>>
>>>>>>>>> *Relative path use case*
>>>>>>>>>
>>>>>>>>> We brainstormed use cases for relative paths. Please let us know
>>>>>>>>> if there are any other use cases.
>>>>>>>>>
>>>>>>>>>    1. Disaster Recovery
>>>>>>>>>    2. Jack: AWS s3 bucket alias
>>>>>>>>>    3. Ryan: fall-back use case. In case that the root1 doesn’t
>>>>>>>>>    work, the table falls back to root2, then root3. As Russell mentioned, it
>>>>>>>>>    is challenging to do snapshot expiration and other table maintenance
>>>>>>>>>    actions.
>>>>>>>>>
>>>>>>>>> *Timeline*
>>>>>>>>>
>>>>>>>>> In terms of timeline, relative path could be a feature in Spec V3,
>>>>>>>>> since Spec V1 and V2 assume absolute path in metadata.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Misc*
>>>>>>>>>
>>>>>>>>>    1. Miao: How is the relative path compatible with the absolute
>>>>>>>>>    path?
>>>>>>>>>    2. How do we migrate an existing table? Build a tool for that.
>>>>>>>>>
>>>>>>>>> Please let us know if you have any ideas, questions, or concerns.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yufei
>>>>>>>>>
>>>>>>>>
>>>
>>> --
>>> Ryan Blue
>>> Tabular
>>>
>>

Re: Iceberg disaster recovery and relative path sync-up

Posted by Yufei Gu <fl...@gmail.com>.

Steven, here is my understanding. It depends on whether you want to move
the data. In the DR case, we do move the data, we expect data to be
identical from time to time, but not always be. In the case of S3 aliases,
different roots actually point to the same location, there is no data move,
and data is identical for sure.

On Sun, Aug 22, 2021 at 8:19 PM Steven Wu <st...@gmail.com> wrote:

> For the multiple table roots, do we expect or ensure that the data are
> identical across the different roots? or this is best-effort background
> synchronization across the different roots?
>
> On Sun, Aug 22, 2021 at 11:53 AM Ryan Blue <bl...@tabular.io> wrote:
>
>> Peter, I think that this feature would be useful when moving tables
>> between root locations or when you want to maintain multiple root
>> locations. Renames are orthogonal because a rename doesn't change the table
>> location. You may want to move the table after a rename, and this would
>> help in that case. But actually moving data is optional. That's why we put
>> the table location in metadata.
>>
>> Anjali, DR is a big use case, but we also talked about directing accesses
>> through other URLs, like S3 access points, table migration (like the rename
>> case), and background data migration (e.g. lifting files between S3
>> regions). There are a few uses for it.
>>
>> The more I think about this, the more I like the solution to add multiple
>> table roots to metadata, rather than removing table roots. Adding a way to
>> plug in a root selector makes a lot of sense to me and it ensures that the
>> metadata is complete (table location is set in metadata) and that multiple
>> locations can be used. Are there any objections or arguments against doing
>> it that way?
>>
>> On Fri, Aug 20, 2021 at 9:00 AM Anjali Norwood
>> <an...@netflix.com.invalid> wrote:
>>
>>> Hi,
>>>
>>> This thread is about disaster recovery and relative paths, but I wanted
>>> to ask an orthogonal but related question.
>>> Do we see disaster recovery as the only (or main) use case for
>>> multi-region?
>>> Is data residency requirement a use case for anybody? Is it possible to
>>> shard an iceberg table across regions? How is the location managed in that
>>> case?
>>>
>>> thanks,
>>> Anjali.
>>>
>>> On Fri, Aug 20, 2021 at 12:20 AM Peter Vary <pv...@cloudera.com.invalid>
>>> wrote:
>>>
>>>> Sadly, I have missed the meeting :(
>>>>
>>>> Quick question:
>>>> Was table rename / location change discussed for tables with relative
>>>> paths?
>>>>
>>>> AFAIK when a table rename happens then we do not move old data /
>>>> metadata files, we just change the root location of the new data / metadata
>>>> files. If I am correct about this then we might need to handle this
>>>> differently for tables with relative paths.
>>>>
>>>> Thanks, Peter
>>>>
>>>> On Fri, 13 Aug 2021, 15:12 Anjali Norwood, <an...@netflix.com.invalid>
>>>> wrote:
>>>>
>>>>> Perfect, thank you Yufei.
>>>>>
>>>>> Regards
>>>>> Anjali
>>>>>
>>>>> On Thu, Aug 12, 2021 at 9:58 PM Yufei Gu <fl...@gmail.com> wrote:
>>>>>
>>>>>> Hi Anjali,
>>>>>>
>>>>>> Inline...
>>>>>> On Thu, Aug 12, 2021 at 5:31 PM Anjali Norwood
>>>>>> <an...@netflix.com.invalid> wrote:
>>>>>>
>>>>>>> Thanks for the summary Yufei.
>>>>>>> Sorry, if this was already discussed, I missed the meeting
>>>>>>> yesterday.
>>>>>>> Is there anything in the design that would prevent multiple roots
>>>>>>> from being in different aws regions?
>>>>>>>
>>>>>> No. DR is the major use case of relative paths, if not the only one.
>>>>>> So, it will support roots in different regions.
>>>>>>
>>>>>> For disaster recovery in the case of an entire aws region down or
>>>>>>> slow, is metastore still a point of failure or can metastore be stood up in
>>>>>>> a different region and could select a different root?
>>>>>>>
>>>>>> Normally, DR also requires a backup metastore, besides the storage(s3
>>>>>> bucket). In that case, the backup metastore will be in a different region
>>>>>> along with the table files. For example, the primary table is located in
>>>>>> region A as well as its metastore, the backup table is located in region B
>>>>>> as well as its metastore. The primary table root points to a path in region
>>>>>> A, while backup table root points to a path in region B.
>>>>>>
>>>>>>
>>>>>>> regards,
>>>>>>> Anjali.
>>>>>>>
>>>>>>> On Thu, Aug 12, 2021 at 11:35 AM Yufei Gu <fl...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Here is a summary of yesterday's community sync-up.
>>>>>>>>
>>>>>>>>
>>>>>>>> Yufei gave a brief update on disaster recovery requirements and the
>>>>>>>> current progress of relative path approach.
>>>>>>>>
>>>>>>>>
>>>>>>>> Ryan: We all agreed that relative path is the way for disaster
>>>>>>>> recovery.
>>>>>>>>
>>>>>>>>
>>>>>>>> *Multiple roots for the relative path*
>>>>>>>>
>>>>>>>> Ryan proposed an idea to enable multiple roots for a table,
>>>>>>>> basically, we can add a list of roots in table metadata, and use a selector
>>>>>>>> to choose different roots when we move the table from one place to another.
>>>>>>>> The selector reads a property to decide which root to use. The property
>>>>>>>> could be either from catalog or the table metadata, which is yet to be
>>>>>>>> decided.
>>>>>>>>
>>>>>>>>
>>>>>>>> Here is an example I’d image:
>>>>>>>>
>>>>>>>>    1. Root1: hdfs://nn:8020/path/to/the/table
>>>>>>>>    2. Root2: s3://bucket1/path/to/the/table
>>>>>>>>    3. Root3: s3://bucket2/path/to/the/table
>>>>>>>>
>>>>>>>> *Relative path use case*
>>>>>>>>
>>>>>>>> We brainstormed use cases for relative paths. Please let us know if
>>>>>>>> there are any other use cases.
>>>>>>>>
>>>>>>>>    1. Disaster Recovery
>>>>>>>>    2. Jack: AWS s3 bucket alias
>>>>>>>>    3. Ryan: fall-back use case. In case that the root1 doesn’t
>>>>>>>>    work, the table falls back to root2, then root3. As Russell mentioned, it
>>>>>>>>    is challenging to do snapshot expiration and other table maintenance
>>>>>>>>    actions.
>>>>>>>>
>>>>>>>> *Timeline*
>>>>>>>>
>>>>>>>> In terms of timeline, relative path could be a feature in Spec V3,
>>>>>>>> since Spec V1 and V2 assume absolute path in metadata.
>>>>>>>>
>>>>>>>>
>>>>>>>> *Misc*
>>>>>>>>
>>>>>>>>    1. Miao: How is the relative path compatible with the absolute
>>>>>>>>    path?
>>>>>>>>    2. How do we migrate an existing table? Build a tool for that.
>>>>>>>>
>>>>>>>> Please let us know if you have any ideas, questions, or concerns.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Yufei
>>>>>>>>
>>>>>>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>

Re: Iceberg disaster recovery and relative path sync-up

Posted by Steven Wu <st...@gmail.com>.

For the multiple table roots, do we expect or ensure that the data are
identical across the different roots? or this is best-effort background
synchronization across the different roots?

On Sun, Aug 22, 2021 at 11:53 AM Ryan Blue <bl...@tabular.io> wrote:

> Peter, I think that this feature would be useful when moving tables
> between root locations or when you want to maintain multiple root
> locations. Renames are orthogonal because a rename doesn't change the table
> location. You may want to move the table after a rename, and this would
> help in that case. But actually moving data is optional. That's why we put
> the table location in metadata.
>
> Anjali, DR is a big use case, but we also talked about directing accesses
> through other URLs, like S3 access points, table migration (like the rename
> case), and background data migration (e.g. lifting files between S3
> regions). There are a few uses for it.
>
> The more I think about this, the more I like the solution to add multiple
> table roots to metadata, rather than removing table roots. Adding a way to
> plug in a root selector makes a lot of sense to me and it ensures that the
> metadata is complete (table location is set in metadata) and that multiple
> locations can be used. Are there any objections or arguments against doing
> it that way?
>
> On Fri, Aug 20, 2021 at 9:00 AM Anjali Norwood
> <an...@netflix.com.invalid> wrote:
>
>> Hi,
>>
>> This thread is about disaster recovery and relative paths, but I wanted
>> to ask an orthogonal but related question.
>> Do we see disaster recovery as the only (or main) use case for
>> multi-region?
>> Is data residency requirement a use case for anybody? Is it possible to
>> shard an iceberg table across regions? How is the location managed in that
>> case?
>>
>> thanks,
>> Anjali.
>>
>> On Fri, Aug 20, 2021 at 12:20 AM Peter Vary <pv...@cloudera.com.invalid>
>> wrote:
>>
>>> Sadly, I have missed the meeting :(
>>>
>>> Quick question:
>>> Was table rename / location change discussed for tables with relative
>>> paths?
>>>
>>> AFAIK when a table rename happens then we do not move old data /
>>> metadata files, we just change the root location of the new data / metadata
>>> files. If I am correct about this then we might need to handle this
>>> differently for tables with relative paths.
>>>
>>> Thanks, Peter
>>>
>>> On Fri, 13 Aug 2021, 15:12 Anjali Norwood, <an...@netflix.com.invalid>
>>> wrote:
>>>
>>>> Perfect, thank you Yufei.
>>>>
>>>> Regards
>>>> Anjali
>>>>
>>>> On Thu, Aug 12, 2021 at 9:58 PM Yufei Gu <fl...@gmail.com> wrote:
>>>>
>>>>> Hi Anjali,
>>>>>
>>>>> Inline...
>>>>> On Thu, Aug 12, 2021 at 5:31 PM Anjali Norwood
>>>>> <an...@netflix.com.invalid> wrote:
>>>>>
>>>>>> Thanks for the summary Yufei.
>>>>>> Sorry, if this was already discussed, I missed the meeting yesterday.
>>>>>> Is there anything in the design that would prevent multiple roots
>>>>>> from being in different aws regions?
>>>>>>
>>>>> No. DR is the major use case of relative paths, if not the only one.
>>>>> So, it will support roots in different regions.
>>>>>
>>>>> For disaster recovery in the case of an entire aws region down or
>>>>>> slow, is metastore still a point of failure or can metastore be stood up in
>>>>>> a different region and could select a different root?
>>>>>>
>>>>> Normally, DR also requires a backup metastore, besides the storage(s3
>>>>> bucket). In that case, the backup metastore will be in a different region
>>>>> along with the table files. For example, the primary table is located in
>>>>> region A as well as its metastore, the backup table is located in region B
>>>>> as well as its metastore. The primary table root points to a path in region
>>>>> A, while backup table root points to a path in region B.
>>>>>
>>>>>
>>>>>> regards,
>>>>>> Anjali.
>>>>>>
>>>>>> On Thu, Aug 12, 2021 at 11:35 AM Yufei Gu <fl...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Here is a summary of yesterday's community sync-up.
>>>>>>>
>>>>>>>
>>>>>>> Yufei gave a brief update on disaster recovery requirements and the
>>>>>>> current progress of relative path approach.
>>>>>>>
>>>>>>>
>>>>>>> Ryan: We all agreed that relative path is the way for disaster
>>>>>>> recovery.
>>>>>>>
>>>>>>>
>>>>>>> *Multiple roots for the relative path*
>>>>>>>
>>>>>>> Ryan proposed an idea to enable multiple roots for a table,
>>>>>>> basically, we can add a list of roots in table metadata, and use a selector
>>>>>>> to choose different roots when we move the table from one place to another.
>>>>>>> The selector reads a property to decide which root to use. The property
>>>>>>> could be either from catalog or the table metadata, which is yet to be
>>>>>>> decided.
>>>>>>>
>>>>>>>
>>>>>>> Here is an example I’d image:
>>>>>>>
>>>>>>>    1. Root1: hdfs://nn:8020/path/to/the/table
>>>>>>>    2. Root2: s3://bucket1/path/to/the/table
>>>>>>>    3. Root3: s3://bucket2/path/to/the/table
>>>>>>>
>>>>>>> *Relative path use case*
>>>>>>>
>>>>>>> We brainstormed use cases for relative paths. Please let us know if
>>>>>>> there are any other use cases.
>>>>>>>
>>>>>>>    1. Disaster Recovery
>>>>>>>    2. Jack: AWS s3 bucket alias
>>>>>>>    3. Ryan: fall-back use case. In case that the root1 doesn’t
>>>>>>>    work, the table falls back to root2, then root3. As Russell mentioned, it
>>>>>>>    is challenging to do snapshot expiration and other table maintenance
>>>>>>>    actions.
>>>>>>>
>>>>>>> *Timeline*
>>>>>>>
>>>>>>> In terms of timeline, relative path could be a feature in Spec V3,
>>>>>>> since Spec V1 and V2 assume absolute path in metadata.
>>>>>>>
>>>>>>>
>>>>>>> *Misc*
>>>>>>>
>>>>>>>    1. Miao: How is the relative path compatible with the absolute
>>>>>>>    path?
>>>>>>>    2. How do we migrate an existing table? Build a tool for that.
>>>>>>>
>>>>>>> Please let us know if you have any ideas, questions, or concerns.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Yufei
>>>>>>>
>>>>>>
>
> --
> Ryan Blue
> Tabular
>

Re: Iceberg disaster recovery and relative path sync-up

Posted by Ryan Blue <bl...@tabular.io>.

Peter, I think that this feature would be useful when moving tables between
root locations or when you want to maintain multiple root locations.
Renames are orthogonal because a rename doesn't change the table location.
You may want to move the table after a rename, and this would help in that
case. But actually moving data is optional. That's why we put the table
location in metadata.

Anjali, DR is a big use case, but we also talked about directing accesses
through other URLs, like S3 access points, table migration (like the rename
case), and background data migration (e.g. lifting files between S3
regions). There are a few uses for it.

The more I think about this, the more I like the solution to add multiple
table roots to metadata, rather than removing table roots. Adding a way to
plug in a root selector makes a lot of sense to me and it ensures that the
metadata is complete (table location is set in metadata) and that multiple
locations can be used. Are there any objections or arguments against doing
it that way?

On Fri, Aug 20, 2021 at 9:00 AM Anjali Norwood <an...@netflix.com.invalid>
wrote:

> Hi,
>
> This thread is about disaster recovery and relative paths, but I wanted to
> ask an orthogonal but related question.
> Do we see disaster recovery as the only (or main) use case for
> multi-region?
> Is data residency requirement a use case for anybody? Is it possible to
> shard an iceberg table across regions? How is the location managed in that
> case?
>
> thanks,
> Anjali.
>
> On Fri, Aug 20, 2021 at 12:20 AM Peter Vary <pv...@cloudera.com.invalid>
> wrote:
>
>> Sadly, I have missed the meeting :(
>>
>> Quick question:
>> Was table rename / location change discussed for tables with relative
>> paths?
>>
>> AFAIK when a table rename happens then we do not move old data / metadata
>> files, we just change the root location of the new data / metadata files.
>> If I am correct about this then we might need to handle this differently
>> for tables with relative paths.
>>
>> Thanks, Peter
>>
>> On Fri, 13 Aug 2021, 15:12 Anjali Norwood, <an...@netflix.com.invalid>
>> wrote:
>>
>>> Perfect, thank you Yufei.
>>>
>>> Regards
>>> Anjali
>>>
>>> On Thu, Aug 12, 2021 at 9:58 PM Yufei Gu <fl...@gmail.com> wrote:
>>>
>>>> Hi Anjali,
>>>>
>>>> Inline...
>>>> On Thu, Aug 12, 2021 at 5:31 PM Anjali Norwood
>>>> <an...@netflix.com.invalid> wrote:
>>>>
>>>>> Thanks for the summary Yufei.
>>>>> Sorry, if this was already discussed, I missed the meeting yesterday.
>>>>> Is there anything in the design that would prevent multiple roots from
>>>>> being in different aws regions?
>>>>>
>>>> No. DR is the major use case of relative paths, if not the only one.
>>>> So, it will support roots in different regions.
>>>>
>>>> For disaster recovery in the case of an entire aws region down or slow,
>>>>> is metastore still a point of failure or can metastore be stood up in a
>>>>> different region and could select a different root?
>>>>>
>>>> Normally, DR also requires a backup metastore, besides the storage(s3
>>>> bucket). In that case, the backup metastore will be in a different region
>>>> along with the table files. For example, the primary table is located in
>>>> region A as well as its metastore, the backup table is located in region B
>>>> as well as its metastore. The primary table root points to a path in region
>>>> A, while backup table root points to a path in region B.
>>>>
>>>>
>>>>> regards,
>>>>> Anjali.
>>>>>
>>>>> On Thu, Aug 12, 2021 at 11:35 AM Yufei Gu <fl...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Here is a summary of yesterday's community sync-up.
>>>>>>
>>>>>>
>>>>>> Yufei gave a brief update on disaster recovery requirements and the
>>>>>> current progress of relative path approach.
>>>>>>
>>>>>>
>>>>>> Ryan: We all agreed that relative path is the way for disaster
>>>>>> recovery.
>>>>>>
>>>>>>
>>>>>> *Multiple roots for the relative path*
>>>>>>
>>>>>> Ryan proposed an idea to enable multiple roots for a table,
>>>>>> basically, we can add a list of roots in table metadata, and use a selector
>>>>>> to choose different roots when we move the table from one place to another.
>>>>>> The selector reads a property to decide which root to use. The property
>>>>>> could be either from catalog or the table metadata, which is yet to be
>>>>>> decided.
>>>>>>
>>>>>>
>>>>>> Here is an example I’d image:
>>>>>>
>>>>>>    1. Root1: hdfs://nn:8020/path/to/the/table
>>>>>>    2. Root2: s3://bucket1/path/to/the/table
>>>>>>    3. Root3: s3://bucket2/path/to/the/table
>>>>>>
>>>>>> *Relative path use case*
>>>>>>
>>>>>> We brainstormed use cases for relative paths. Please let us know if
>>>>>> there are any other use cases.
>>>>>>
>>>>>>    1. Disaster Recovery
>>>>>>    2. Jack: AWS s3 bucket alias
>>>>>>    3. Ryan: fall-back use case. In case that the root1 doesn’t work,
>>>>>>    the table falls back to root2, then root3. As Russell mentioned, it is
>>>>>>    challenging to do snapshot expiration and other table maintenance actions.
>>>>>>
>>>>>>
>>>>>> *Timeline*
>>>>>>
>>>>>> In terms of timeline, relative path could be a feature in Spec V3,
>>>>>> since Spec V1 and V2 assume absolute path in metadata.
>>>>>>
>>>>>>
>>>>>> *Misc*
>>>>>>
>>>>>>    1. Miao: How is the relative path compatible with the absolute
>>>>>>    path?
>>>>>>    2. How do we migrate an existing table? Build a tool for that.
>>>>>>
>>>>>> Please let us know if you have any ideas, questions, or concerns.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Yufei
>>>>>>
>>>>>

-- 
Ryan Blue
Tabular

Re: Iceberg disaster recovery and relative path sync-up

Posted by Anjali Norwood <an...@netflix.com.INVALID>.

Hi,

This thread is about disaster recovery and relative paths, but I wanted to
ask an orthogonal but related question.
Do we see disaster recovery as the only (or main) use case for
multi-region?
Is data residency requirement a use case for anybody? Is it possible to
shard an iceberg table across regions? How is the location managed in that
case?

thanks,
Anjali.

On Fri, Aug 20, 2021 at 12:20 AM Peter Vary <pv...@cloudera.com.invalid>
wrote:

> Sadly, I have missed the meeting :(
>
> Quick question:
> Was table rename / location change discussed for tables with relative
> paths?
>
> AFAIK when a table rename happens then we do not move old data / metadata
> files, we just change the root location of the new data / metadata files.
> If I am correct about this then we might need to handle this differently
> for tables with relative paths.
>
> Thanks, Peter
>
> On Fri, 13 Aug 2021, 15:12 Anjali Norwood, <an...@netflix.com.invalid>
> wrote:
>
>> Perfect, thank you Yufei.
>>
>> Regards
>> Anjali
>>
>> On Thu, Aug 12, 2021 at 9:58 PM Yufei Gu <fl...@gmail.com> wrote:
>>
>>> Hi Anjali,
>>>
>>> Inline...
>>> On Thu, Aug 12, 2021 at 5:31 PM Anjali Norwood
>>> <an...@netflix.com.invalid> wrote:
>>>
>>>> Thanks for the summary Yufei.
>>>> Sorry, if this was already discussed, I missed the meeting yesterday.
>>>> Is there anything in the design that would prevent multiple roots from
>>>> being in different aws regions?
>>>>
>>> No. DR is the major use case of relative paths, if not the only one. So,
>>> it will support roots in different regions.
>>>
>>> For disaster recovery in the case of an entire aws region down or slow,
>>>> is metastore still a point of failure or can metastore be stood up in a
>>>> different region and could select a different root?
>>>>
>>> Normally, DR also requires a backup metastore, besides the storage(s3
>>> bucket). In that case, the backup metastore will be in a different region
>>> along with the table files. For example, the primary table is located in
>>> region A as well as its metastore, the backup table is located in region B
>>> as well as its metastore. The primary table root points to a path in region
>>> A, while backup table root points to a path in region B.
>>>
>>>
>>>> regards,
>>>> Anjali.
>>>>
>>>> On Thu, Aug 12, 2021 at 11:35 AM Yufei Gu <fl...@gmail.com> wrote:
>>>>
>>>>> Here is a summary of yesterday's community sync-up.
>>>>>
>>>>>
>>>>> Yufei gave a brief update on disaster recovery requirements and the
>>>>> current progress of relative path approach.
>>>>>
>>>>>
>>>>> Ryan: We all agreed that relative path is the way for disaster
>>>>> recovery.
>>>>>
>>>>>
>>>>> *Multiple roots for the relative path*
>>>>>
>>>>> Ryan proposed an idea to enable multiple roots for a table, basically,
>>>>> we can add a list of roots in table metadata, and use a selector to choose
>>>>> different roots when we move the table from one place to another. The
>>>>> selector reads a property to decide which root to use. The property could
>>>>> be either from catalog or the table metadata, which is yet to be decided.
>>>>>
>>>>>
>>>>> Here is an example I’d image:
>>>>>
>>>>>    1. Root1: hdfs://nn:8020/path/to/the/table
>>>>>    2. Root2: s3://bucket1/path/to/the/table
>>>>>    3. Root3: s3://bucket2/path/to/the/table
>>>>>
>>>>> *Relative path use case*
>>>>>
>>>>> We brainstormed use cases for relative paths. Please let us know if
>>>>> there are any other use cases.
>>>>>
>>>>>    1. Disaster Recovery
>>>>>    2. Jack: AWS s3 bucket alias
>>>>>    3. Ryan: fall-back use case. In case that the root1 doesn’t work,
>>>>>    the table falls back to root2, then root3. As Russell mentioned, it is
>>>>>    challenging to do snapshot expiration and other table maintenance actions.
>>>>>
>>>>>
>>>>> *Timeline*
>>>>>
>>>>> In terms of timeline, relative path could be a feature in Spec V3,
>>>>> since Spec V1 and V2 assume absolute path in metadata.
>>>>>
>>>>>
>>>>> *Misc*
>>>>>
>>>>>    1. Miao: How is the relative path compatible with the absolute
>>>>>    path?
>>>>>    2. How do we migrate an existing table? Build a tool for that.
>>>>>
>>>>> Please let us know if you have any ideas, questions, or concerns.
>>>>>
>>>>>
>>>>>
>>>>> Yufei
>>>>>
>>>>

Re: Iceberg disaster recovery and relative path sync-up

Posted by Peter Vary <pv...@cloudera.com.INVALID>.

Sadly, I have missed the meeting :(

Quick question:
Was table rename / location change discussed for tables with relative paths?

AFAIK when a table rename happens then we do not move old data / metadata
files, we just change the root location of the new data / metadata files.
If I am correct about this then we might need to handle this differently
for tables with relative paths.

Thanks, Peter

On Fri, 13 Aug 2021, 15:12 Anjali Norwood, <an...@netflix.com.invalid>
wrote:

> Perfect, thank you Yufei.
>
> Regards
> Anjali
>
> On Thu, Aug 12, 2021 at 9:58 PM Yufei Gu <fl...@gmail.com> wrote:
>
>> Hi Anjali,
>>
>> Inline...
>> On Thu, Aug 12, 2021 at 5:31 PM Anjali Norwood
>> <an...@netflix.com.invalid> wrote:
>>
>>> Thanks for the summary Yufei.
>>> Sorry, if this was already discussed, I missed the meeting yesterday.
>>> Is there anything in the design that would prevent multiple roots from
>>> being in different aws regions?
>>>
>> No. DR is the major use case of relative paths, if not the only one. So,
>> it will support roots in different regions.
>>
>> For disaster recovery in the case of an entire aws region down or slow,
>>> is metastore still a point of failure or can metastore be stood up in a
>>> different region and could select a different root?
>>>
>> Normally, DR also requires a backup metastore, besides the storage(s3
>> bucket). In that case, the backup metastore will be in a different region
>> along with the table files. For example, the primary table is located in
>> region A as well as its metastore, the backup table is located in region B
>> as well as its metastore. The primary table root points to a path in region
>> A, while backup table root points to a path in region B.
>>
>>
>>> regards,
>>> Anjali.
>>>
>>> On Thu, Aug 12, 2021 at 11:35 AM Yufei Gu <fl...@gmail.com> wrote:
>>>
>>>> Here is a summary of yesterday's community sync-up.
>>>>
>>>>
>>>> Yufei gave a brief update on disaster recovery requirements and the
>>>> current progress of relative path approach.
>>>>
>>>>
>>>> Ryan: We all agreed that relative path is the way for disaster recovery.
>>>>
>>>>
>>>> *Multiple roots for the relative path*
>>>>
>>>> Ryan proposed an idea to enable multiple roots for a table, basically,
>>>> we can add a list of roots in table metadata, and use a selector to choose
>>>> different roots when we move the table from one place to another. The
>>>> selector reads a property to decide which root to use. The property could
>>>> be either from catalog or the table metadata, which is yet to be decided.
>>>>
>>>>
>>>> Here is an example I’d image:
>>>>
>>>>    1. Root1: hdfs://nn:8020/path/to/the/table
>>>>    2. Root2: s3://bucket1/path/to/the/table
>>>>    3. Root3: s3://bucket2/path/to/the/table
>>>>
>>>> *Relative path use case*
>>>>
>>>> We brainstormed use cases for relative paths. Please let us know if
>>>> there are any other use cases.
>>>>
>>>>    1. Disaster Recovery
>>>>    2. Jack: AWS s3 bucket alias
>>>>    3. Ryan: fall-back use case. In case that the root1 doesn’t work,
>>>>    the table falls back to root2, then root3. As Russell mentioned, it is
>>>>    challenging to do snapshot expiration and other table maintenance actions.
>>>>
>>>>
>>>> *Timeline*
>>>>
>>>> In terms of timeline, relative path could be a feature in Spec V3,
>>>> since Spec V1 and V2 assume absolute path in metadata.
>>>>
>>>>
>>>> *Misc*
>>>>
>>>>    1. Miao: How is the relative path compatible with the absolute path?
>>>>
>>>>    2. How do we migrate an existing table? Build a tool for that.
>>>>
>>>> Please let us know if you have any ideas, questions, or concerns.
>>>>
>>>>
>>>>
>>>> Yufei
>>>>
>>>

Re: Iceberg disaster recovery and relative path sync-up

Posted by Anjali Norwood <an...@netflix.com.INVALID>.

Perfect, thank you Yufei.

Regards
Anjali

On Thu, Aug 12, 2021 at 9:58 PM Yufei Gu <fl...@gmail.com> wrote:

> Hi Anjali,
>
> Inline...
> On Thu, Aug 12, 2021 at 5:31 PM Anjali Norwood
> <an...@netflix.com.invalid> wrote:
>
>> Thanks for the summary Yufei.
>> Sorry, if this was already discussed, I missed the meeting yesterday.
>> Is there anything in the design that would prevent multiple roots from
>> being in different aws regions?
>>
> No. DR is the major use case of relative paths, if not the only one. So,
> it will support roots in different regions.
>
> For disaster recovery in the case of an entire aws region down or slow, is
>> metastore still a point of failure or can metastore be stood up in a
>> different region and could select a different root?
>>
> Normally, DR also requires a backup metastore, besides the storage(s3
> bucket). In that case, the backup metastore will be in a different region
> along with the table files. For example, the primary table is located in
> region A as well as its metastore, the backup table is located in region B
> as well as its metastore. The primary table root points to a path in region
> A, while backup table root points to a path in region B.
>
>
>> regards,
>> Anjali.
>>
>> On Thu, Aug 12, 2021 at 11:35 AM Yufei Gu <fl...@gmail.com> wrote:
>>
>>> Here is a summary of yesterday's community sync-up.
>>>
>>>
>>> Yufei gave a brief update on disaster recovery requirements and the
>>> current progress of relative path approach.
>>>
>>>
>>> Ryan: We all agreed that relative path is the way for disaster recovery.
>>>
>>>
>>> *Multiple roots for the relative path*
>>>
>>> Ryan proposed an idea to enable multiple roots for a table, basically,
>>> we can add a list of roots in table metadata, and use a selector to choose
>>> different roots when we move the table from one place to another. The
>>> selector reads a property to decide which root to use. The property could
>>> be either from catalog or the table metadata, which is yet to be decided.
>>>
>>>
>>> Here is an example I’d image:
>>>
>>>    1. Root1: hdfs://nn:8020/path/to/the/table
>>>    2. Root2: s3://bucket1/path/to/the/table
>>>    3. Root3: s3://bucket2/path/to/the/table
>>>
>>> *Relative path use case*
>>>
>>> We brainstormed use cases for relative paths. Please let us know if
>>> there are any other use cases.
>>>
>>>    1. Disaster Recovery
>>>    2. Jack: AWS s3 bucket alias
>>>    3. Ryan: fall-back use case. In case that the root1 doesn’t work,
>>>    the table falls back to root2, then root3. As Russell mentioned, it is
>>>    challenging to do snapshot expiration and other table maintenance actions.
>>>
>>>
>>> *Timeline*
>>>
>>> In terms of timeline, relative path could be a feature in Spec V3, since
>>> Spec V1 and V2 assume absolute path in metadata.
>>>
>>>
>>> *Misc*
>>>
>>>    1. Miao: How is the relative path compatible with the absolute path?
>>>    2. How do we migrate an existing table? Build a tool for that.
>>>
>>> Please let us know if you have any ideas, questions, or concerns.
>>>
>>>
>>>
>>> Yufei
>>>
>>

Re: Iceberg disaster recovery and relative path sync-up

Posted by Yufei Gu <fl...@gmail.com>.

Hi Anjali,

Inline...
On Thu, Aug 12, 2021 at 5:31 PM Anjali Norwood <an...@netflix.com.invalid>
wrote:

> Thanks for the summary Yufei.
> Sorry, if this was already discussed, I missed the meeting yesterday.
> Is there anything in the design that would prevent multiple roots from
> being in different aws regions?
>
No. DR is the major use case of relative paths, if not the only one. So, it
will support roots in different regions.

For disaster recovery in the case of an entire aws region down or slow, is
> metastore still a point of failure or can metastore be stood up in a
> different region and could select a different root?
>
Normally, DR also requires a backup metastore, besides the storage(s3
bucket). In that case, the backup metastore will be in a different region
along with the table files. For example, the primary table is located in
region A as well as its metastore, the backup table is located in region B
as well as its metastore. The primary table root points to a path in region
A, while backup table root points to a path in region B.


> regards,
> Anjali.
>
> On Thu, Aug 12, 2021 at 11:35 AM Yufei Gu <fl...@gmail.com> wrote:
>
>> Here is a summary of yesterday's community sync-up.
>>
>>
>> Yufei gave a brief update on disaster recovery requirements and the
>> current progress of relative path approach.
>>
>>
>> Ryan: We all agreed that relative path is the way for disaster recovery.
>>
>>
>> *Multiple roots for the relative path*
>>
>> Ryan proposed an idea to enable multiple roots for a table, basically, we
>> can add a list of roots in table metadata, and use a selector to choose
>> different roots when we move the table from one place to another. The
>> selector reads a property to decide which root to use. The property could
>> be either from catalog or the table metadata, which is yet to be decided.
>>
>>
>> Here is an example I’d image:
>>
>>    1. Root1: hdfs://nn:8020/path/to/the/table
>>    2. Root2: s3://bucket1/path/to/the/table
>>    3. Root3: s3://bucket2/path/to/the/table
>>
>> *Relative path use case*
>>
>> We brainstormed use cases for relative paths. Please let us know if there
>> are any other use cases.
>>
>>    1. Disaster Recovery
>>    2. Jack: AWS s3 bucket alias
>>    3. Ryan: fall-back use case. In case that the root1 doesn’t work, the
>>    table falls back to root2, then root3. As Russell mentioned, it is
>>    challenging to do snapshot expiration and other table maintenance actions.
>>
>>
>> *Timeline*
>>
>> In terms of timeline, relative path could be a feature in Spec V3, since
>> Spec V1 and V2 assume absolute path in metadata.
>>
>>
>> *Misc*
>>
>>    1. Miao: How is the relative path compatible with the absolute path?
>>    2. How do we migrate an existing table? Build a tool for that.
>>
>> Please let us know if you have any ideas, questions, or concerns.
>>
>>
>>
>> Yufei
>>
>

Re: Iceberg disaster recovery and relative path sync-up

Posted by Anjali Norwood <an...@netflix.com.INVALID>.

Thanks for the summary Yufei.
Sorry, if this was already discussed, I missed the meeting yesterday.
Is there anything in the design that would prevent multiple roots from
being in different aws regions? For disaster recovery in the case of an
entire aws region down or slow, is metastore still a point of failure or
can metastore be stood up in a different region and could select a
different root?

regards,
Anjali.

On Thu, Aug 12, 2021 at 11:35 AM Yufei Gu <fl...@gmail.com> wrote:

> Here is a summary of yesterday's community sync-up.
>
>
> Yufei gave a brief update on disaster recovery requirements and the
> current progress of relative path approach.
>
>
> Ryan: We all agreed that relative path is the way for disaster recovery.
>
>
> *Multiple roots for the relative path*
>
> Ryan proposed an idea to enable multiple roots for a table, basically, we
> can add a list of roots in table metadata, and use a selector to choose
> different roots when we move the table from one place to another. The
> selector reads a property to decide which root to use. The property could
> be either from catalog or the table metadata, which is yet to be decided.
>
>
> Here is an example I’d image:
>
>    1. Root1: hdfs://nn:8020/path/to/the/table
>    2. Root2: s3://bucket1/path/to/the/table
>    3. Root3: s3://bucket2/path/to/the/table
>
> *Relative path use case*
>
> We brainstormed use cases for relative paths. Please let us know if there
> are any other use cases.
>
>    1. Disaster Recovery
>    2. Jack: AWS s3 bucket alias
>    3. Ryan: fall-back use case. In case that the root1 doesn’t work, the
>    table falls back to root2, then root3. As Russell mentioned, it is
>    challenging to do snapshot expiration and other table maintenance actions.
>
>
> *Timeline*
>
> In terms of timeline, relative path could be a feature in Spec V3, since
> Spec V1 and V2 assume absolute path in metadata.
>
>
> *Misc*
>
>    1. Miao: How is the relative path compatible with the absolute path?
>    2. How do we migrate an existing table? Build a tool for that.
>
> Please let us know if you have any ideas, questions, or concerns.
>
>
>
> Yufei
>