You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/10/27 21:12:34 UTC

[GitHub] [iceberg] cccs-jc opened a new issue #1671: Support timestamp partition using truncate

cccs-jc opened a new issue #1671:
URL: https://github.com/apache/iceberg/issues/1671


   It's possible to partition timestamps using the built-in functions
   year, month, day, hour
   
   However you cannot partition for every quarter of hour.
   
   Using the truncate function you can truncate the value of int, long, decimal, string. However not date or timestamp.
   
   If that were possible you could partition timestamps by quarter hour ( 15min x 60seconds = 900seconds )
   
   `truncate(timestamp, 900)`
   
   Any reason the org.apache.iceberg.transforms.Truncate class does not support the timestamp type?
   
   Would this be a welcome addition to iceberg?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on issue #1671: Support timestamp partition using truncate

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #1671:
URL: https://github.com/apache/iceberg/issues/1671#issuecomment-718266990


   The set of partition transforms needs to be a standard set so that they are portable across implementations of the table format. So far, we haven't had any use cases that required more than hourly partitioning. We can add more partition transforms.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-jc commented on issue #1671: Support timestamp partition using truncate

Posted by GitBox <gi...@apache.org>.
cccs-jc commented on issue #1671:
URL: https://github.com/apache/iceberg/issues/1671#issuecomment-718714708


   For many years we have been partitioning our RDBMS into 15min. Because of ever growing dataset sizes we are transitioning to a mixed architecture where most of the data will reside in a datalake and some curated datasets will be stored into RDBMS. Ideally we would like the partitioning of our datalake to mirror the partitioning of our RDBMS.
   
   I agree that dividing time into minute granularity is to fine grained but I believe there is a case to be made for a quarter hour partitioning.
   
   If the community agrees to this I can contribute the patch for it.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] zhangdove commented on issue #1671: Support timestamp partition using truncate

Posted by GitBox <gi...@apache.org>.
zhangdove commented on issue #1671:
URL: https://github.com/apache/iceberg/issues/1671#issuecomment-717664290


   In most scenarios, I think the time partition to hour should be enough. If timestamp is used to partition for 15 minutes, the following phenomena may occur:
   1. There will be more file directories
   2. There will also be more small files than other partitions (hour/day/month/year)
   3. In a production system, is it common to divide time into minute granularity?
   
   The above is just my opinion, welcome more others to leave good suggestions and opinions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org