You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by chong luo <lu...@gmail.com> on 2021/01/28 07:02:02 UTC

How to get table/partition creation time/update time in iceberg

Hi Iceberg Devs


I’m currently working on delete expired table and partition in iceberg. However, I can not find table/partition creation time, it seems iceberg only stores snapshot creation time. In hive, transient_lastDdlTime, createTime and lastAccessTime are stored in metastore. With time metadata, we can know when table is changed and track related jobs. 


Is there any way to get the time metadata mentioned above in the current implementation of iceberg?



Thanks.

Re: How to get table/partition creation time/update time in iceberg

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Sure, you could implement that in a catalog, either one that you plug in or
contribute it to the Iceberg HiveCatalog.

On Mon, Feb 1, 2021 at 3:01 AM luochong.lxf <lu...@gmail.com> wrote:

> Hi Ryan,
>
> Generally we add createTime and modifiedTime in table schema. However, due
> to historical reasons some hive tables do not have createTime and
> modifiedTime. When these hive tables are transformed to iceberg tables, we
> hope createTime and transient_lastDdl can be retained, so we can still do
> data expiration and track table activities. Once snapshots expire, we can
> not get these time info from iceberg. It seems the only way to solve the
> problem I mentioned above is to modify these hive table schema and rewrite
> them. Do you think so?
>
> Thanks
>
> luochong.lxf
> luochong.lxf@gmail.com
>
> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=luochong.lxf&uid=luochong.lxf%40gmail.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22luochong.lxf%40gmail.com%22%5D>
> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>
> On 01/29/2021 02:40,Ryan Blue<rb...@netflix.com.INVALID>
> <rb...@netflix.com.INVALID> wrote:
>
> Chong,
>
> Once snapshots expire, I don't think that there is a way to recover the
> time that a given partition was created.
>
> Can you explain more about what you're trying to do? When we age off data,
> we use the age of the records themselves, not the age from metadata. In
> other words, we use the logical timestamp from a row to expire it, not the
> timestamp when it was added to the table. You might consider doing that as
> well. I think it is probably a better way to ensure compliance.
>
> rb
>
> On Thu, Jan 28, 2021 at 9:42 AM chong luo <lu...@gmail.com> wrote:
>
>> Hi Iceberg Devs
>>
>>
>> I’m currently working on delete expired table and partition in iceberg.
>> However, I can not find table/partition creation time, it seems iceberg
>> only stores snapshot creation time. In hive, transient_lastDdlTime,
>> createTime and lastAccessTime are stored in metastore. With time metadata,
>> we can know when table is changed and track related jobs.
>>
>>
>> Is there any way to get the time metadata mentioned above in the current
>> implementation of iceberg?
>>
>>
>>
>> Thanks.
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: How to get table/partition creation time/update time in iceberg

Posted by "luochong.lxf" <lu...@gmail.com>.
Hi Ryan,

  

Generally we add createTime and modifiedTime in table schema. However, due to
historical reasons some hive tables do not have createTime and modifiedTime.
When these hive tables are transformed to iceberg tables, we hope createTime
and transient_lastDdl can be retained, so we can still do data expiration and
track table activities. Once snapshots expire, we can not get these time info
from iceberg. It seems the only way to solve the problem I mentioned above is
to modify these hive table schema and rewrite them. Do you think so?

  

Thanks

  

[ ![](https://mail-online.nosdn.127.net/qiyelogo/defaultAvatar.png) |
luochong.lxf  
---|---  
luochong.lxf@gmail.com  
](https://maas.mail.163.com/dashi-web-
extend/html/proSignature.html?ftlId=1&name=luochong.lxf&uid=luochong.lxf%40gmail.com&iconUrl=https%3A%2F%2Fmail-
online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22luochong.lxf%40gmail.com%22%5D)

签名由 [网易邮箱大师](https://mail.163.com/dashi/dlpro.html?from=mail81) 定制

  

On 01/29/2021 02:40,[Ryan
Blue<rb...@netflix.com.INVALID>](mailto:rblue@netflix.com.INVALID) wrote:

> Chong,  
>

>

>  
>

>

> Once snapshots expire, I don't think that there is a way to recover the time
that a given partition was created.

>

>  
>

>

> Can you explain more about what you're trying to do? When we age off data,
we use the age of the records themselves, not the age from metadata. In other
words, we use the logical timestamp from a row to expire it, not the timestamp
when it was added to the table. You might consider doing that as well. I think
it is probably a better way to ensure compliance.

>

>  
>

>

> rb

>

>  
>

>

> On Thu, Jan 28, 2021 at 9:42 AM chong luo
<[luochong.lxf@gmail.com](mailto:luochong.lxf@gmail.com)> wrote:  
>

>

>> Hi Iceberg Devs

>>

>>  
>

>>

>>  
>

>>

>> I’m currently working on delete expired table and partition in iceberg.
However, I can not find table/partition creation time, it seems iceberg only
stores snapshot creation time. In hive, transient_lastDdlTime, createTime and
lastAccessTime are stored in metastore. With time metadata, we can know when
table is changed and track related jobs.

>>

>>  
>

>>

>>  
>

>>

>> Is there any way to get the time metadata mentioned above in the current
implementation of iceberg?

>>

>>  
>

>>

>>  
>

>>

>>  
>

>>

>> Thanks.

>

>  
>

>

>  
>

>

> \--  
>

>

> Ryan Blue

>

> Software Engineer

>

> Netflix


Re: How to get table/partition creation time/update time in iceberg

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Chong,

Once snapshots expire, I don't think that there is a way to recover the
time that a given partition was created.

Can you explain more about what you're trying to do? When we age off data,
we use the age of the records themselves, not the age from metadata. In
other words, we use the logical timestamp from a row to expire it, not the
timestamp when it was added to the table. You might consider doing that as
well. I think it is probably a better way to ensure compliance.

rb

On Thu, Jan 28, 2021 at 9:42 AM chong luo <lu...@gmail.com> wrote:

> Hi Iceberg Devs
>
>
> I’m currently working on delete expired table and partition in iceberg.
> However, I can not find table/partition creation time, it seems iceberg
> only stores snapshot creation time. In hive, transient_lastDdlTime,
> createTime and lastAccessTime are stored in metastore. With time metadata,
> we can know when table is changed and track related jobs.
>
>
> Is there any way to get the time metadata mentioned above in the current
> implementation of iceberg?
>
>
>
> Thanks.
>


-- 
Ryan Blue
Software Engineer
Netflix