You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/11/04 06:39:24 UTC

[GitHub] [iceberg] ddrinka opened a new issue, #6120: [Python] The structure of a partition definition and partition instance should be consistent

ddrinka opened a new issue, #6120:
URL: https://github.com/apache/iceberg/issues/6120

   ### Feature Request / Improvement
   
   Consider the `PartitionSummary` in a `ManifestFile`.  The lower and upper bounds can be resolved with `conversions.from_bytes`, resulting in types according to the definitions in `conversions.py`.  For instance, a date of 2022-04-01 results in a `long(19083)`.
   
   Now consider `ManifestEntry.data_file.partition`.  The instance of a partition definition should resolve into the same datatypes as the `PartitionSummary`.  But here, since the Avro reader is used, the date above is resolved to `date(2022, 04, 01)`.
   
   I like the work being done in `expressions/literals.py`.  For matching up partitions of different datatypes, I'm testing out converting the partition values to literals.  A workaround to the disconnect above would be to allow `DateLiteral`s to be created directly from `date`s.
   
   #### literals.py
   ```python
   @literal.register(date)
   def _(value: date) -> Literal[int]:
       # We should not convert to and from a string here but I'm trying to reduce the surface area of this example code
       return DateLiteral(date_to_days(value.isoformat()))
   ```
   
   With this change in place I can create a `Literal` from both resulting representations of a date above and check equality and gt/lt.
   
   ### Query engine
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue closed issue #6120: [Python] The structure of a partition definition and partition instance should be consistent

Posted by GitBox <gi...@apache.org>.
rdblue closed issue #6120: [Python] The structure of a partition definition and partition instance should be consistent
URL: https://github.com/apache/iceberg/issues/6120


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on issue #6120: [Python] The structure of a partition definition and partition instance should be consistent

Posted by GitBox <gi...@apache.org>.
Fokko commented on issue #6120:
URL: https://github.com/apache/iceberg/issues/6120#issuecomment-1303064851

   Hey @ddrinka Thanks for opening the PR. The date is a so-called logical type that most storage formats store internally as the days since 1970-01-01. After reading, this should be converted into a date again. For the PartitionSummary, we convert it into the physical type (int) and compare it with the partition.
   
   It seems that the `DateLiteral` is missing here: https://github.com/apache/iceberg/blob/master/python/pyiceberg/expressions/literals.py#L108-L164 It would be great if you can open up a PR to fix this 👍🏻 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ddrinka commented on issue #6120: [Python] The structure of a partition definition and partition instance should be consistent

Posted by GitBox <gi...@apache.org>.
ddrinka commented on issue #6120:
URL: https://github.com/apache/iceberg/issues/6120#issuecomment-1304291834

   Ok @Fokko, check out the PR and let me know what you think.
   
   I think there's still room for improvement on aligning the handling of partitions between `PartitionSummary` and `DataFile.partition` so I propose leaving this feature request open, but having the ability to use Literals to convert between will let me keep toying with this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org