You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2019/08/29 22:24:25 UTC

[GitHub] [incubator-iceberg] rdblue commented on issue #417: Adding support for time-based partitioning on long column type

rdblue commented on issue #417: Adding support for time-based partitioning on long column type
URL: https://github.com/apache/incubator-iceberg/issues/417#issuecomment-526384694
 
 
   @shardulm94, do you intend to use this for tables with existing data?
   
   If you do intend to use this with existing tables, then I'm not sure that you want to use the time-based hidden partitioning transforms. The problem is that changing the partitioning for existing tables that use identity partitioning is that your queries may start fail because you're no longer producing the old partition columns (e.g., `ts_date=cast(cast(ts as date) as string)`. And if you are producing the old partition columns, then there's not much of a point to add extra time-based partitioning (splits will also be pruned using time ranges from min/max metadata).
   
   If you don't intend to use existing data, then do normal timestamps work?
   
   I guess there's another case, where you want to rebuild the table metadata, but use old data files. In that case, is there anything to distinguish the data in these columns from timestamps with a different format, like long values that store microseconds from epoch?
   
   The problem is correctness when other people start using this. If Iceberg supports interpreting a long column as an instant, then it must be obvious what the unit of the long type is. Maybe we could allow this if the column name includes some clue, like `timestamp_millis` vs `timestamp_micros`, but that sounds hacky to me.
   
   Another solution is to add a way to promote from long to timestamp type and store the units of the long in metadata somewhere. Then you would be able to use old data as real timestamp columns.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org