You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/08/08 03:17:29 UTC

[GitHub] [iceberg] jun-he opened a new pull request, #5462: [Python] add time (year, month, day, hour) transforms

jun-he opened a new pull request, #5462:
URL: https://github.com/apache/iceberg/pull/5462

   To split the PR https://github.com/apache/iceberg/pull/3450, open a new PR to add time (year, month, day, hour) transforms.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
jun-he commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r951044641


##########
python/pyiceberg/utils/datetime.py:
##########
@@ -116,3 +131,33 @@ def to_human_timestamptz(timestamp_micros: int) -> str:
 def to_human_timestamp(timestamp_micros: int) -> str:
     """Converts a TimestampType value to human string"""
     return (EPOCH_TIMESTAMP + timedelta(microseconds=timestamp_micros)).isoformat()
+
+
+def micros_to_hours(timestamp: int) -> int:
+    """Converts a timestamp in microseconds to a date in hours"""
+    return int((datetime.utcfromtimestamp(timestamp / 1_000_000) - EPOCH_TIMESTAMP).total_seconds() / 3600)

Review Comment:
   SG, but we still have to cast it to int as total_seconds returns a float.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r942783211


##########
python/tests/test_transforms.py:
##########
@@ -318,3 +420,31 @@ def test_void_transform_str():
 
 def test_void_transform_repr():
     assert repr(VoidTransform()) == "VoidTransform()"
+
+
+@pytest.mark.parametrize(
+    "transform,json",
+    [
+        (YearTransform(), '"year"'),
+        (MonthTransform(), '"month"'),
+        (DayTransform(), '"day"'),
+        (HourTransform(), '"hour"'),
+    ],
+)
+def test_datetime_transform_serde(transform, json):
+    assert transform.json() == json
+    assert TestType.parse_raw(json).__root__ == transform
+
+
+@pytest.mark.parametrize(

Review Comment:
   +1 for splitting out cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
jun-he commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r951034033


##########
python/pyiceberg/transforms.py:
##########
@@ -215,6 +227,243 @@ def __repr__(self) -> str:
         return f"BucketTransform(num_buckets={self._num_buckets})"
 
 
+class YearTransform(Transform[S, int]):
+    """Transforms a datetime value into a year value.
+
+    Example:
+        >>> transform = YearTransform()
+        >>> transform.transform(TimestampType())(1512151975038194)
+        47
+    """
+
+    __root__: Literal["year"] = Field(default="year")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def year_func(v):
+                return datetime.days_to_years(v)
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def year_func(v):
+                return datetime.micros_to_years(v)
+
+        else:
+            raise ValueError(f"Cannot apply year transform for type: {source}")
+
+        return lambda v: year_func(v) if v else None

Review Comment:
   Thanks for the comments. Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #5462: [Python] add time (year, month, day, hour) transforms

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r940066857


##########
python/pyiceberg/transforms.py:
##########
@@ -215,6 +227,243 @@ def __repr__(self) -> str:
         return f"BucketTransform(num_buckets={self._num_buckets})"
 
 
+class YearTransform(Transform[S, int]):
+    """Transforms a datetime value into a year value.
+
+    Example:
+        >>> transform = YearTransform()
+        >>> transform.transform(TimestampType())(1512151975038194)
+        47
+    """
+
+    __root__: Literal["year"] = Field(default="year")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def year_func(v):
+                return datetime.days_to_years(v)
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def year_func(v):
+                return datetime.micros_to_years(v)
+
+        else:
+            raise ValueError(f"Cannot apply year transform for type: {source}")
+
+        return lambda v: year_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 3
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_year(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    def __repr__(self) -> str:
+        return "YearTransform()"
+
+
+class MonthTransform(Transform[S, int]):
+    """Transforms a datetime value into a month value.
+
+    Example:
+        >>> transform = MonthTransform()
+        >>> transform.transform(DateType())(17501)
+        575
+    """
+
+    __root__: Literal["month"] = Field(default="month")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def month_func(v):
+                return datetime.days_to_months(v)
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def month_func(v):
+                return datetime.micros_to_months(v)
+
+        else:
+            raise ValueError(f"Cannot apply month transform for type: {source}")
+
+        return lambda v: month_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 2
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_month(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    def __repr__(self) -> str:
+        return "MonthTransform()"
+
+
+class DayTransform(Transform[S, int]):
+    """Transforms a datetime value into a day value.
+
+    Example:
+        >>> transform = MonthTransform()
+        >>> transform.transform(DateType())(17501)
+        17501
+    """
+
+    __root__: Literal["day"] = Field(default="day")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def day_func(v):
+                return v
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def day_func(v):
+                return datetime.micros_to_days(v)
+
+        else:
+            raise ValueError(f"Cannot apply day transform for type: {source}")
+
+        return lambda v: day_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return DateType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 1
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_day(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    def __repr__(self) -> str:
+        return "DayTransform()"
+
+
+class HourTransform(Transform[S, int]):
+    """Transforms a datetime value into a hour value.
+
+    Example:
+        >>> transform = HourTransform()
+        >>> transform.transform(TimestampType())(1512151975038194)
+        420042
+    """
+
+    __root__: Literal["hour"] = Field(default="hour")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        if type(source) in {TimestampType, TimestamptzType}:
+
+            def hour_func(v):
+                return datetime.micros_to_hours(v)
+
+        else:
+            raise ValueError(f"Cannot apply hour transform for type: {source}")
+
+        return lambda v: hour_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 0

Review Comment:
   Instead of using these numbers, what do you think of adding an `IntEnum` to make it a bit more readable?
   ```
   ➜  python git:(fd-add-http-code-to-error-message) ✗ python3
   Python 3.9.13 (main, May 24 2022, 21:13:51) 
   [Clang 13.1.6 (clang-1316.0.21.2)] on darwin
   Type "help", "copyright", "credits" or "license" for more information.
   >>> from enum import IntEnum
   >>> 
   >>> class TimeResolution(IntEnum):
   ...     YEAR = 6
   ...     MONTH = 5
   ...     WEEK = 4
   ...     DAY = 3
   ...     HOUR = 2
   ...     MINUTE = 1
   ...     SECOND = 0
   ... 
   >>> 
   >>> TimeResolution.YEAR <= TimeResolution.MONTH
   False
   ```



##########
python/tests/test_transforms.py:
##########
@@ -318,3 +420,31 @@ def test_void_transform_str():
 
 def test_void_transform_repr():
     assert repr(VoidTransform()) == "VoidTransform()"
+
+
+@pytest.mark.parametrize(
+    "transform,json",
+    [
+        (YearTransform(), '"year"'),
+        (MonthTransform(), '"month"'),
+        (DayTransform(), '"day"'),
+        (HourTransform(), '"hour"'),
+    ],
+)
+def test_datetime_transform_serde(transform, json):
+    assert transform.json() == json
+    assert TestType.parse_raw(json).__root__ == transform
+
+
+@pytest.mark.parametrize(

Review Comment:
   A bit of a nit, but I think this test both tests the str and repr, which I think makes sense to split into two tests (one for `repr` and one for `str`.
   
   We haven't discussed this, but I'm not a fan of the whole `parameterize` functionality in pytest mostly because the decorator initializes before the actual tests ran. For example, if there is an error in the `YearTransform()` constructor, it will fail directly. This conflicts a bit with my personal TDD approach to development.
   
   Therefore I just split everything into separate tests: https://github.com/apache/iceberg/blob/master/python/tests/test_types.py#L546
   This way you don't have to check on which input the test is failing.
   
   I've also added this to the list of Python topics for the next community sync.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
jun-he commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r951063978


##########
python/pyiceberg/utils/datetime.py:
##########
@@ -98,11 +98,26 @@ def micros_to_timestamptz(micros: int):
     return EPOCH_TIMESTAMPTZ + dt
 
 
+def to_human_year(year_ordinal: int) -> str:
+    """Converts a DateType value to human string"""
+    return f"{EPOCH_TIMESTAMP.year + year_ordinal:0=4d}"
+
+
+def to_human_month(month_ordinal: int) -> str:
+    """Converts a DateType value to human string"""
+    return f"{EPOCH_TIMESTAMP.year + int(month_ordinal / 12):0=4d}-{1 + int(month_ordinal % 12):0=2d}"

Review Comment:
   👌 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
jun-he commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r951040928


##########
python/pyiceberg/transforms.py:
##########
@@ -215,6 +227,243 @@ def __repr__(self) -> str:
         return f"BucketTransform(num_buckets={self._num_buckets})"
 
 
+class YearTransform(Transform[S, int]):
+    """Transforms a datetime value into a year value.
+
+    Example:
+        >>> transform = YearTransform()
+        >>> transform.transform(TimestampType())(1512151975038194)
+        47
+    """
+
+    __root__: Literal["year"] = Field(default="year")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def year_func(v):
+                return datetime.days_to_years(v)
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def year_func(v):
+                return datetime.micros_to_years(v)
+
+        else:
+            raise ValueError(f"Cannot apply year transform for type: {source}")
+
+        return lambda v: year_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 3
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_year(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    def __repr__(self) -> str:
+        return "YearTransform()"
+
+
+class MonthTransform(Transform[S, int]):
+    """Transforms a datetime value into a month value.
+
+    Example:
+        >>> transform = MonthTransform()
+        >>> transform.transform(DateType())(17501)
+        575
+    """
+
+    __root__: Literal["month"] = Field(default="month")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def month_func(v):
+                return datetime.days_to_months(v)
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def month_func(v):
+                return datetime.micros_to_months(v)
+
+        else:
+            raise ValueError(f"Cannot apply month transform for type: {source}")
+
+        return lambda v: month_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 2
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_month(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    def __repr__(self) -> str:
+        return "MonthTransform()"
+
+
+class DayTransform(Transform[S, int]):
+    """Transforms a datetime value into a day value.
+
+    Example:
+        >>> transform = MonthTransform()
+        >>> transform.transform(DateType())(17501)
+        17501
+    """
+
+    __root__: Literal["day"] = Field(default="day")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def day_func(v):
+                return v
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def day_func(v):
+                return datetime.micros_to_days(v)
+
+        else:
+            raise ValueError(f"Cannot apply day transform for type: {source}")
+
+        return lambda v: day_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return DateType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 1
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_day(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    def __repr__(self) -> str:
+        return "DayTransform()"
+
+
+class HourTransform(Transform[S, int]):
+    """Transforms a datetime value into a hour value.
+
+    Example:
+        >>> transform = HourTransform()
+        >>> transform.transform(TimestampType())(1512151975038194)
+        420042
+    """
+
+    __root__: Literal["hour"] = Field(default="hour")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        if type(source) in {TimestampType, TimestamptzType}:
+
+            def hour_func(v):
+                return datetime.micros_to_hours(v)
+
+        else:
+            raise ValueError(f"Cannot apply hour transform for type: {source}")
+
+        return lambda v: hour_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()

Review Comment:
   👌 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r952943730


##########
python/pyiceberg/transforms.py:
##########
@@ -215,6 +228,221 @@ def __repr__(self) -> str:
         return f"BucketTransform(num_buckets={self._num_buckets})"
 
 
+class TimeResolution(IntEnum):
+    YEAR = 6
+    MONTH = 5
+    WEEK = 4
+    DAY = 3
+    HOUR = 2
+    MINUTE = 1
+    SECOND = 0
+
+
+class TimeTransform(Transform[S, int]):
+    @property
+    @abstractmethod
+    def granularity(self) -> TimeResolution:
+        ...
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.granularity <= other.granularity if hasattr(other, "granularity") else False
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+
+class YearTransform(TimeTransform):
+    """Transforms a datetime value into a year value.
+
+    Example:
+        >>> transform = YearTransform()
+        >>> transform.transform(TimestampType())(1512151975038194)
+        47
+    """
+
+    __root__: Literal["year"] = Field(default="year")
+    _source_type: IcebergType = PrivateAttr()

Review Comment:
   Looks like these can all be removed because they aren't used.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
jun-he commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r956889012


##########
python/pyiceberg/transforms.py:
##########
@@ -215,6 +228,221 @@ def __repr__(self) -> str:
         return f"BucketTransform(num_buckets={self._num_buckets})"
 
 
+class TimeResolution(IntEnum):
+    YEAR = 6
+    MONTH = 5
+    WEEK = 4
+    DAY = 3
+    HOUR = 2
+    MINUTE = 1
+    SECOND = 0
+
+
+class TimeTransform(Transform[S, int]):
+    @property
+    @abstractmethod
+    def granularity(self) -> TimeResolution:
+        ...
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.granularity <= other.granularity if hasattr(other, "granularity") else False
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+
+class YearTransform(TimeTransform):
+    """Transforms a datetime value into a year value.
+
+    Example:
+        >>> transform = YearTransform()
+        >>> transform.transform(TimestampType())(1512151975038194)
+        47
+    """
+
+    __root__: Literal["year"] = Field(default="year")
+    _source_type: IcebergType = PrivateAttr()

Review Comment:
   Yep, removed `_source_type` from most of the transforms.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
jun-he commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r956891107


##########
python/tests/test_transforms.py:
##########
@@ -318,3 +441,43 @@ def test_void_transform_str():
 
 def test_void_transform_repr():
     assert repr(VoidTransform()) == "VoidTransform()"
+
+
+@pytest.mark.parametrize(
+    "transform,json",
+    [
+        (YearTransform(), '"year"'),
+        (MonthTransform(), '"month"'),
+        (DayTransform(), '"day"'),
+        (HourTransform(), '"hour"'),
+    ],
+)
+def test_datetime_transform_serde(transform, json):
+    assert transform.json() == json
+    assert TestType.parse_raw(json).__root__ == transform

Review Comment:
   Split this into different serialize and deserialize tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
jun-he commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r951041061


##########
python/pyiceberg/transforms.py:
##########
@@ -215,6 +227,243 @@ def __repr__(self) -> str:
         return f"BucketTransform(num_buckets={self._num_buckets})"
 
 
+class YearTransform(Transform[S, int]):
+    """Transforms a datetime value into a year value.
+
+    Example:
+        >>> transform = YearTransform()
+        >>> transform.transform(TimestampType())(1512151975038194)
+        47
+    """
+
+    __root__: Literal["year"] = Field(default="year")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def year_func(v):
+                return datetime.days_to_years(v)
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def year_func(v):
+                return datetime.micros_to_years(v)
+
+        else:
+            raise ValueError(f"Cannot apply year transform for type: {source}")
+
+        return lambda v: year_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:

Review Comment:
   👌 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
jun-he commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r951030747


##########
python/tests/test_transforms.py:
##########
@@ -318,3 +420,31 @@ def test_void_transform_str():
 
 def test_void_transform_repr():
     assert repr(VoidTransform()) == "VoidTransform()"
+
+
+@pytest.mark.parametrize(
+    "transform,json",
+    [
+        (YearTransform(), '"year"'),
+        (MonthTransform(), '"month"'),
+        (DayTransform(), '"day"'),
+        (HourTransform(), '"hour"'),
+    ],
+)
+def test_datetime_transform_serde(transform, json):
+    assert transform.json() == json
+    assert TestType.parse_raw(json).__root__ == transform
+
+
+@pytest.mark.parametrize(

Review Comment:
   👌  Split it into two tests. 
   Let's chat in the sync meeting to see if we should avoid parameterize at all. I can have followup changes if needed.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
jun-he commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r951033588


##########
python/pyiceberg/transforms.py:
##########
@@ -215,6 +227,243 @@ def __repr__(self) -> str:
         return f"BucketTransform(num_buckets={self._num_buckets})"
 
 
+class YearTransform(Transform[S, int]):
+    """Transforms a datetime value into a year value.
+
+    Example:
+        >>> transform = YearTransform()
+        >>> transform.transform(TimestampType())(1512151975038194)
+        47
+    """
+
+    __root__: Literal["year"] = Field(default="year")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def year_func(v):
+                return datetime.days_to_years(v)
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def year_func(v):
+                return datetime.micros_to_years(v)
+
+        else:
+            raise ValueError(f"Cannot apply year transform for type: {source}")
+
+        return lambda v: year_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 3
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_year(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    def __repr__(self) -> str:
+        return "YearTransform()"
+
+
+class MonthTransform(Transform[S, int]):
+    """Transforms a datetime value into a month value.
+
+    Example:
+        >>> transform = MonthTransform()
+        >>> transform.transform(DateType())(17501)
+        575
+    """
+
+    __root__: Literal["month"] = Field(default="month")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def month_func(v):
+                return datetime.days_to_months(v)
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def month_func(v):
+                return datetime.micros_to_months(v)
+
+        else:
+            raise ValueError(f"Cannot apply month transform for type: {source}")
+
+        return lambda v: month_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 2
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_month(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    def __repr__(self) -> str:
+        return "MonthTransform()"
+
+
+class DayTransform(Transform[S, int]):
+    """Transforms a datetime value into a day value.
+
+    Example:
+        >>> transform = MonthTransform()
+        >>> transform.transform(DateType())(17501)
+        17501
+    """
+
+    __root__: Literal["day"] = Field(default="day")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def day_func(v):
+                return v
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def day_func(v):
+                return datetime.micros_to_days(v)
+
+        else:
+            raise ValueError(f"Cannot apply day transform for type: {source}")
+
+        return lambda v: day_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return DateType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 1
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_day(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    def __repr__(self) -> str:
+        return "DayTransform()"
+
+
+class HourTransform(Transform[S, int]):
+    """Transforms a datetime value into a hour value.
+
+    Example:
+        >>> transform = HourTransform()
+        >>> transform.transform(TimestampType())(1512151975038194)
+        420042
+    """
+
+    __root__: Literal["hour"] = Field(default="hour")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        if type(source) in {TimestampType, TimestamptzType}:
+
+            def hour_func(v):
+                return datetime.micros_to_hours(v)
+
+        else:
+            raise ValueError(f"Cannot apply hour transform for type: {source}")
+
+        return lambda v: hour_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 0

Review Comment:
   👌  updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
jun-he commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r951070745


##########
python/tests/test_transforms.py:
##########
@@ -139,6 +143,104 @@ def test_string_with_surrogate_pair():
     assert bucket_transform(string_with_surrogate_pair) == mmh3.hash(as_bytes)
 
 
+@pytest.mark.parametrize(
+    "date,date_transform,expected",
+    [
+        (47, YearTransform(), "2017"),
+        (575, MonthTransform(), "2017-12"),
+        (17501, DayTransform(), "2017-12-01"),
+    ],
+)
+def test_date_to_human_string(date, date_transform, expected):
+    assert date_transform.to_human_string(DateType(), None) == "null"
+    assert date_transform.to_human_string(DateType(), date) == expected
+
+
+@pytest.mark.parametrize(
+    "timestamp,time_transform,expected",
+    [
+        (47, YearTransform(), "2017"),
+        (575, MonthTransform(), "2017-12"),
+        (17501, DayTransform(), "2017-12-01"),
+        (420042, HourTransform(), "2017-12-01-18"),
+    ],
+)
+def test_ts_to_human_string(timestamp, time_transform, expected):
+    assert time_transform.to_human_string(TimestampType(), None) == "null"
+    assert time_transform.to_human_string(TimestampType(), timestamp) == expected
+
+
+@pytest.mark.parametrize(
+    "type_var",
+    [
+        DateType(),
+        TimestampType(),
+        TimestamptzType(),
+    ],
+)
+def test_time_methods(type_var):
+    assert YearTransform().can_transform(type_var)
+    assert MonthTransform().can_transform(type_var)
+    assert DayTransform().can_transform(type_var)
+    assert YearTransform().preserves_order
+    assert MonthTransform().preserves_order
+    assert DayTransform().preserves_order
+    assert YearTransform().result_type(type_var) == IntegerType()
+    assert MonthTransform().result_type(type_var) == IntegerType()
+    assert DayTransform().result_type(type_var) == DateType()
+    assert YearTransform().dedup_name == "time"
+    assert MonthTransform().dedup_name == "time"
+    assert DayTransform().dedup_name == "time"
+
+
+@pytest.mark.parametrize(
+    "transform,type_var,value,expected",
+    [
+        (DayTransform(), DateType(), 17501, 17501),
+        (MonthTransform(), DateType(), 17501, 575),
+        (YearTransform(), DateType(), 17501, 47),
+        (YearTransform(), TimestampType(), 1512151975038194, 47),
+        (MonthTransform(), TimestamptzType(), 1512151975038194, 575),
+        (DayTransform(), TimestampType(), 1512151975038194, 17501),

Review Comment:
   The hour transform has been tested separately in `test_hour_method`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
jun-he commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r951069993


##########
python/tests/test_transforms.py:
##########
@@ -139,6 +143,104 @@ def test_string_with_surrogate_pair():
     assert bucket_transform(string_with_surrogate_pair) == mmh3.hash(as_bytes)
 
 
+@pytest.mark.parametrize(
+    "date,date_transform,expected",
+    [
+        (47, YearTransform(), "2017"),
+        (575, MonthTransform(), "2017-12"),
+        (17501, DayTransform(), "2017-12-01"),
+    ],
+)
+def test_date_to_human_string(date, date_transform, expected):
+    assert date_transform.to_human_string(DateType(), None) == "null"
+    assert date_transform.to_human_string(DateType(), date) == expected
+
+
+@pytest.mark.parametrize(
+    "timestamp,time_transform,expected",
+    [
+        (47, YearTransform(), "2017"),
+        (575, MonthTransform(), "2017-12"),
+        (17501, DayTransform(), "2017-12-01"),
+        (420042, HourTransform(), "2017-12-01-18"),

Review Comment:
   SG, thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r952957521


##########
python/tests/test_transforms.py:
##########
@@ -318,3 +420,31 @@ def test_void_transform_str():
 
 def test_void_transform_repr():
     assert repr(VoidTransform()) == "VoidTransform()"
+
+
+@pytest.mark.parametrize(
+    "transform,json",
+    [
+        (YearTransform(), '"year"'),
+        (MonthTransform(), '"month"'),
+        (DayTransform(), '"day"'),
+        (HourTransform(), '"hour"'),
+    ],
+)
+def test_datetime_transform_serde(transform, json):
+    assert transform.json() == json
+    assert TestType.parse_raw(json).__root__ == transform
+
+
+@pytest.mark.parametrize(

Review Comment:
   @jun-he, when in doubt, do not parameterize. If you need to do anything special whatsoever, like updating an assert to use `__root__` instead of `isinstance(..., YearTransform)` then parameters are not a good idea.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r952955973


##########
python/tests/test_transforms.py:
##########
@@ -318,3 +441,43 @@ def test_void_transform_str():
 
 def test_void_transform_repr():
     assert repr(VoidTransform()) == "VoidTransform()"
+
+
+@pytest.mark.parametrize(
+    "transform,json",
+    [
+        (YearTransform(), '"year"'),
+        (MonthTransform(), '"month"'),
+        (DayTransform(), '"day"'),
+        (HourTransform(), '"hour"'),
+    ],
+)
+def test_datetime_transform_serde(transform, json):
+    assert transform.json() == json
+    assert TestType.parse_raw(json).__root__ == transform

Review Comment:
   This is too generic. Each class should be tested individual to validate that parsing the string results in an instance of the correct transform.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko merged pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
Fokko merged PR #5462:
URL: https://github.com/apache/iceberg/pull/5462


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r942801391


##########
python/pyiceberg/utils/datetime.py:
##########
@@ -116,3 +131,33 @@ def to_human_timestamptz(timestamp_micros: int) -> str:
 def to_human_timestamp(timestamp_micros: int) -> str:
     """Converts a TimestampType value to human string"""
     return (EPOCH_TIMESTAMP + timedelta(microseconds=timestamp_micros)).isoformat()
+
+
+def micros_to_hours(timestamp: int) -> int:
+    """Converts a timestamp in microseconds to a date in hours"""
+    return int((datetime.utcfromtimestamp(timestamp / 1_000_000) - EPOCH_TIMESTAMP).total_seconds() / 3600)
+
+
+def days_to_months(days: int) -> int:
+    """Creates a date from the number of days from 1970-01-01"""
+    dt = datetime.utcfromtimestamp(days * 86400)
+    return (dt.year - EPOCH_TIMESTAMP.year) * 12 + (dt.month - EPOCH_TIMESTAMP.month) - (1 if dt.day < EPOCH_TIMESTAMP.day else 0)
+
+
+def micros_to_months(timestamp: int) -> int:
+    dt = datetime.utcfromtimestamp(timestamp / 1_000_000)
+    return (dt.year - EPOCH_TIMESTAMP.year) * 12 + (dt.month - EPOCH_TIMESTAMP.month) - (1 if dt.day < EPOCH_TIMESTAMP.day else 0)
+
+
+def days_to_years(days: int) -> int:
+    dt = datetime.utcfromtimestamp(days * 86400)
+    return (dt.year - EPOCH_TIMESTAMP.year) - (
+        1 if dt.month < EPOCH_TIMESTAMP.month or (dt.month == EPOCH_TIMESTAMP.month and dt.day < EPOCH_TIMESTAMP.day) else 0
+    )

Review Comment:
   I think it may be easier if you use dates?
   
   ```python
   return days_to_date(days).year - EPOCH_DATE.year
   ```
   
   That seems to have the right behavior to me: `days_to_years(-1)` is -1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r942777166


##########
python/tests/test_transforms.py:
##########
@@ -139,6 +143,104 @@ def test_string_with_surrogate_pair():
     assert bucket_transform(string_with_surrogate_pair) == mmh3.hash(as_bytes)
 
 
+@pytest.mark.parametrize(
+    "date,date_transform,expected",
+    [
+        (47, YearTransform(), "2017"),
+        (575, MonthTransform(), "2017-12"),
+        (17501, DayTransform(), "2017-12-01"),
+    ],
+)
+def test_date_to_human_string(date, date_transform, expected):
+    assert date_transform.to_human_string(DateType(), None) == "null"
+    assert date_transform.to_human_string(DateType(), date) == expected
+
+
+@pytest.mark.parametrize(
+    "timestamp,time_transform,expected",

Review Comment:
   The first value is not a timestamp, it is the result of applying a transform. The first 3 cases here are identical to the cases for "date" because of that. Can you remove the `date` version of this and rename this argument to "value"? The test name should also not include "timestamp" or "ts" because this is converting output values to strings.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r942773851


##########
python/pyiceberg/utils/datetime.py:
##########
@@ -98,11 +98,26 @@ def micros_to_timestamptz(micros: int):
     return EPOCH_TIMESTAMPTZ + dt
 
 
+def to_human_year(year_ordinal: int) -> str:
+    """Converts a DateType value to human string"""
+    return f"{EPOCH_TIMESTAMP.year + year_ordinal:0=4d}"
+
+
+def to_human_month(month_ordinal: int) -> str:
+    """Converts a DateType value to human string"""
+    return f"{EPOCH_TIMESTAMP.year + int(month_ordinal / 12):0=4d}-{1 + int(month_ordinal % 12):0=2d}"

Review Comment:
   Instead of using `int` to cast, I think this should use floor division, `//`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r942784005


##########
python/tests/test_transforms.py:
##########
@@ -139,6 +143,104 @@ def test_string_with_surrogate_pair():
     assert bucket_transform(string_with_surrogate_pair) == mmh3.hash(as_bytes)
 
 
+@pytest.mark.parametrize(
+    "date,date_transform,expected",
+    [
+        (47, YearTransform(), "2017"),
+        (575, MonthTransform(), "2017-12"),
+        (17501, DayTransform(), "2017-12-01"),
+    ],
+)
+def test_date_to_human_string(date, date_transform, expected):
+    assert date_transform.to_human_string(DateType(), None) == "null"
+    assert date_transform.to_human_string(DateType(), date) == expected
+
+
+@pytest.mark.parametrize(
+    "timestamp,time_transform,expected",
+    [
+        (47, YearTransform(), "2017"),
+        (575, MonthTransform(), "2017-12"),
+        (17501, DayTransform(), "2017-12-01"),
+        (420042, HourTransform(), "2017-12-01-18"),
+    ],
+)
+def test_ts_to_human_string(timestamp, time_transform, expected):
+    assert time_transform.to_human_string(TimestampType(), None) == "null"
+    assert time_transform.to_human_string(TimestampType(), timestamp) == expected
+
+
+@pytest.mark.parametrize(
+    "type_var",
+    [
+        DateType(),
+        TimestampType(),
+        TimestamptzType(),
+    ],
+)
+def test_time_methods(type_var):
+    assert YearTransform().can_transform(type_var)
+    assert MonthTransform().can_transform(type_var)
+    assert DayTransform().can_transform(type_var)
+    assert YearTransform().preserves_order
+    assert MonthTransform().preserves_order
+    assert DayTransform().preserves_order
+    assert YearTransform().result_type(type_var) == IntegerType()
+    assert MonthTransform().result_type(type_var) == IntegerType()
+    assert DayTransform().result_type(type_var) == DateType()
+    assert YearTransform().dedup_name == "time"
+    assert MonthTransform().dedup_name == "time"
+    assert DayTransform().dedup_name == "time"
+
+
+@pytest.mark.parametrize(
+    "transform,type_var,value,expected",
+    [
+        (DayTransform(), DateType(), 17501, 17501),

Review Comment:
   We also need tests for negative date and timestamp values. Can you add those?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r942790116


##########
python/pyiceberg/utils/datetime.py:
##########
@@ -116,3 +131,33 @@ def to_human_timestamptz(timestamp_micros: int) -> str:
 def to_human_timestamp(timestamp_micros: int) -> str:
     """Converts a TimestampType value to human string"""
     return (EPOCH_TIMESTAMP + timedelta(microseconds=timestamp_micros)).isoformat()
+
+
+def micros_to_hours(timestamp: int) -> int:
+    """Converts a timestamp in microseconds to a date in hours"""
+    return int((datetime.utcfromtimestamp(timestamp / 1_000_000) - EPOCH_TIMESTAMP).total_seconds() / 3600)
+
+
+def days_to_months(days: int) -> int:
+    """Creates a date from the number of days from 1970-01-01"""
+    dt = datetime.utcfromtimestamp(days * 86400)
+    return (dt.year - EPOCH_TIMESTAMP.year) * 12 + (dt.month - EPOCH_TIMESTAMP.month) - (1 if dt.day < EPOCH_TIMESTAMP.day else 0)
+
+
+def micros_to_months(timestamp: int) -> int:
+    dt = datetime.utcfromtimestamp(timestamp / 1_000_000)

Review Comment:
   To convert, please use `micros_to_timestamp`. We don't want to have duplicated or different logic in multiple places.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r942788555


##########
python/pyiceberg/utils/datetime.py:
##########
@@ -116,3 +131,33 @@ def to_human_timestamptz(timestamp_micros: int) -> str:
 def to_human_timestamp(timestamp_micros: int) -> str:
     """Converts a TimestampType value to human string"""
     return (EPOCH_TIMESTAMP + timedelta(microseconds=timestamp_micros)).isoformat()
+
+
+def micros_to_hours(timestamp: int) -> int:
+    """Converts a timestamp in microseconds to a date in hours"""
+    return int((datetime.utcfromtimestamp(timestamp / 1_000_000) - EPOCH_TIMESTAMP).total_seconds() / 3600)
+
+
+def days_to_months(days: int) -> int:
+    """Creates a date from the number of days from 1970-01-01"""
+    dt = datetime.utcfromtimestamp(days * 86400)

Review Comment:
   This should not use `datetime`. Instead, use `date` for date calculations. Also, use `days_to_date` for conversion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
jun-he commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r951071634


##########
python/pyiceberg/utils/datetime.py:
##########
@@ -116,3 +131,33 @@ def to_human_timestamptz(timestamp_micros: int) -> str:
 def to_human_timestamp(timestamp_micros: int) -> str:
     """Converts a TimestampType value to human string"""
     return (EPOCH_TIMESTAMP + timedelta(microseconds=timestamp_micros)).isoformat()
+
+
+def micros_to_hours(timestamp: int) -> int:
+    """Converts a timestamp in microseconds to a date in hours"""
+    return int((datetime.utcfromtimestamp(timestamp / 1_000_000) - EPOCH_TIMESTAMP).total_seconds() / 3600)
+
+
+def days_to_months(days: int) -> int:
+    """Creates a date from the number of days from 1970-01-01"""
+    dt = datetime.utcfromtimestamp(days * 86400)
+    return (dt.year - EPOCH_TIMESTAMP.year) * 12 + (dt.month - EPOCH_TIMESTAMP.month) - (1 if dt.day < EPOCH_TIMESTAMP.day else 0)
+
+
+def micros_to_months(timestamp: int) -> int:
+    dt = datetime.utcfromtimestamp(timestamp / 1_000_000)
+    return (dt.year - EPOCH_TIMESTAMP.year) * 12 + (dt.month - EPOCH_TIMESTAMP.month) - (1 if dt.day < EPOCH_TIMESTAMP.day else 0)
+
+
+def days_to_years(days: int) -> int:
+    dt = datetime.utcfromtimestamp(days * 86400)
+    return (dt.year - EPOCH_TIMESTAMP.year) - (
+        1 if dt.month < EPOCH_TIMESTAMP.month or (dt.month == EPOCH_TIMESTAMP.month and dt.day < EPOCH_TIMESTAMP.day) else 0
+    )

Review Comment:
   Thanks, updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r942765559


##########
python/pyiceberg/transforms.py:
##########
@@ -215,6 +227,243 @@ def __repr__(self) -> str:
         return f"BucketTransform(num_buckets={self._num_buckets})"
 
 
+class YearTransform(Transform[S, int]):
+    """Transforms a datetime value into a year value.
+
+    Example:
+        >>> transform = YearTransform()
+        >>> transform.transform(TimestampType())(1512151975038194)
+        47
+    """
+
+    __root__: Literal["year"] = Field(default="year")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def year_func(v):
+                return datetime.days_to_years(v)
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def year_func(v):
+                return datetime.micros_to_years(v)
+
+        else:
+            raise ValueError(f"Cannot apply year transform for type: {source}")
+
+        return lambda v: year_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 3
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_year(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    def __repr__(self) -> str:
+        return "YearTransform()"
+
+
+class MonthTransform(Transform[S, int]):
+    """Transforms a datetime value into a month value.
+
+    Example:
+        >>> transform = MonthTransform()
+        >>> transform.transform(DateType())(17501)
+        575
+    """
+
+    __root__: Literal["month"] = Field(default="month")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def month_func(v):
+                return datetime.days_to_months(v)
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def month_func(v):
+                return datetime.micros_to_months(v)
+
+        else:
+            raise ValueError(f"Cannot apply month transform for type: {source}")
+
+        return lambda v: month_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 2
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_month(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    def __repr__(self) -> str:
+        return "MonthTransform()"
+
+
+class DayTransform(Transform[S, int]):
+    """Transforms a datetime value into a day value.
+
+    Example:
+        >>> transform = MonthTransform()
+        >>> transform.transform(DateType())(17501)
+        17501
+    """
+
+    __root__: Literal["day"] = Field(default="day")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def day_func(v):
+                return v
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def day_func(v):
+                return datetime.micros_to_days(v)
+
+        else:
+            raise ValueError(f"Cannot apply day transform for type: {source}")
+
+        return lambda v: day_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return DateType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 1
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_day(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    def __repr__(self) -> str:
+        return "DayTransform()"
+
+
+class HourTransform(Transform[S, int]):
+    """Transforms a datetime value into a hour value.
+
+    Example:
+        >>> transform = HourTransform()
+        >>> transform.transform(TimestampType())(1512151975038194)
+        420042
+    """
+
+    __root__: Literal["hour"] = Field(default="hour")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        if type(source) in {TimestampType, TimestamptzType}:
+
+            def hour_func(v):
+                return datetime.micros_to_hours(v)
+
+        else:
+            raise ValueError(f"Cannot apply hour transform for type: {source}")
+
+        return lambda v: hour_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()

Review Comment:
   Can be moved to a superclass.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r942781144


##########
python/tests/test_transforms.py:
##########
@@ -139,6 +143,104 @@ def test_string_with_surrogate_pair():
     assert bucket_transform(string_with_surrogate_pair) == mmh3.hash(as_bytes)
 
 
+@pytest.mark.parametrize(
+    "date,date_transform,expected",
+    [
+        (47, YearTransform(), "2017"),
+        (575, MonthTransform(), "2017-12"),
+        (17501, DayTransform(), "2017-12-01"),
+    ],
+)
+def test_date_to_human_string(date, date_transform, expected):
+    assert date_transform.to_human_string(DateType(), None) == "null"
+    assert date_transform.to_human_string(DateType(), date) == expected
+
+
+@pytest.mark.parametrize(
+    "timestamp,time_transform,expected",
+    [
+        (47, YearTransform(), "2017"),
+        (575, MonthTransform(), "2017-12"),
+        (17501, DayTransform(), "2017-12-01"),
+        (420042, HourTransform(), "2017-12-01-18"),
+    ],
+)
+def test_ts_to_human_string(timestamp, time_transform, expected):
+    assert time_transform.to_human_string(TimestampType(), None) == "null"
+    assert time_transform.to_human_string(TimestampType(), timestamp) == expected
+
+
+@pytest.mark.parametrize(
+    "type_var",
+    [
+        DateType(),
+        TimestampType(),
+        TimestamptzType(),
+    ],
+)
+def test_time_methods(type_var):
+    assert YearTransform().can_transform(type_var)
+    assert MonthTransform().can_transform(type_var)
+    assert DayTransform().can_transform(type_var)
+    assert YearTransform().preserves_order
+    assert MonthTransform().preserves_order
+    assert DayTransform().preserves_order
+    assert YearTransform().result_type(type_var) == IntegerType()
+    assert MonthTransform().result_type(type_var) == IntegerType()
+    assert DayTransform().result_type(type_var) == DateType()
+    assert YearTransform().dedup_name == "time"
+    assert MonthTransform().dedup_name == "time"
+    assert DayTransform().dedup_name == "time"
+
+
+@pytest.mark.parametrize(
+    "transform,type_var,value,expected",
+    [
+        (DayTransform(), DateType(), 17501, 17501),
+        (MonthTransform(), DateType(), 17501, 575),
+        (YearTransform(), DateType(), 17501, 47),
+        (YearTransform(), TimestampType(), 1512151975038194, 47),
+        (MonthTransform(), TimestamptzType(), 1512151975038194, 575),
+        (DayTransform(), TimestampType(), 1512151975038194, 17501),

Review Comment:
   Can you add an hour test here as well?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r942779370


##########
python/tests/test_transforms.py:
##########
@@ -139,6 +143,104 @@ def test_string_with_surrogate_pair():
     assert bucket_transform(string_with_surrogate_pair) == mmh3.hash(as_bytes)
 
 
+@pytest.mark.parametrize(
+    "date,date_transform,expected",
+    [
+        (47, YearTransform(), "2017"),
+        (575, MonthTransform(), "2017-12"),
+        (17501, DayTransform(), "2017-12-01"),
+    ],
+)
+def test_date_to_human_string(date, date_transform, expected):
+    assert date_transform.to_human_string(DateType(), None) == "null"
+    assert date_transform.to_human_string(DateType(), date) == expected
+
+
+@pytest.mark.parametrize(
+    "timestamp,time_transform,expected",
+    [
+        (47, YearTransform(), "2017"),
+        (575, MonthTransform(), "2017-12"),
+        (17501, DayTransform(), "2017-12-01"),
+        (420042, HourTransform(), "2017-12-01-18"),

Review Comment:
   You also need test cases for negative values:
   
   ```python
   (-1, YearTransform(), "1969"),
   (-1, MonthTransform(), "1969-12"),
   (-1, DayTransform(), "1969-12-31"),
   (-1, HourTransform(), "1969-12-31-23")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
jun-he commented on PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#issuecomment-1229771970

   @Fokko @rdblue  Updated the PR accordingly. Can you please take another look? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
jun-he commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r951076268


##########
python/tests/test_transforms.py:
##########
@@ -139,6 +143,104 @@ def test_string_with_surrogate_pair():
     assert bucket_transform(string_with_surrogate_pair) == mmh3.hash(as_bytes)
 
 
+@pytest.mark.parametrize(
+    "date,date_transform,expected",
+    [
+        (47, YearTransform(), "2017"),
+        (575, MonthTransform(), "2017-12"),
+        (17501, DayTransform(), "2017-12-01"),
+    ],
+)
+def test_date_to_human_string(date, date_transform, expected):
+    assert date_transform.to_human_string(DateType(), None) == "null"
+    assert date_transform.to_human_string(DateType(), date) == expected
+
+
+@pytest.mark.parametrize(
+    "timestamp,time_transform,expected",

Review Comment:
   👌  updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
jun-he commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r951066003


##########
python/tests/test_transforms.py:
##########
@@ -139,6 +143,104 @@ def test_string_with_surrogate_pair():
     assert bucket_transform(string_with_surrogate_pair) == mmh3.hash(as_bytes)
 
 
+@pytest.mark.parametrize(
+    "date,date_transform,expected",
+    [
+        (47, YearTransform(), "2017"),
+        (575, MonthTransform(), "2017-12"),
+        (17501, DayTransform(), "2017-12-01"),
+    ],
+)
+def test_date_to_human_string(date, date_transform, expected):
+    assert date_transform.to_human_string(DateType(), None) == "null"

Review Comment:
   👌 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r942765103


##########
python/pyiceberg/transforms.py:
##########
@@ -215,6 +227,243 @@ def __repr__(self) -> str:
         return f"BucketTransform(num_buckets={self._num_buckets})"
 
 
+class YearTransform(Transform[S, int]):
+    """Transforms a datetime value into a year value.
+
+    Example:
+        >>> transform = YearTransform()
+        >>> transform.transform(TimestampType())(1512151975038194)
+        47
+    """
+
+    __root__: Literal["year"] = Field(default="year")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def year_func(v):
+                return datetime.days_to_years(v)
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def year_func(v):
+                return datetime.micros_to_years(v)
+
+        else:
+            raise ValueError(f"Cannot apply year transform for type: {source}")
+
+        return lambda v: year_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 3
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_year(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    def __repr__(self) -> str:
+        return "YearTransform()"
+
+
+class MonthTransform(Transform[S, int]):
+    """Transforms a datetime value into a month value.
+
+    Example:
+        >>> transform = MonthTransform()
+        >>> transform.transform(DateType())(17501)
+        575
+    """
+
+    __root__: Literal["month"] = Field(default="month")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def month_func(v):
+                return datetime.days_to_months(v)
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def month_func(v):
+                return datetime.micros_to_months(v)
+
+        else:
+            raise ValueError(f"Cannot apply month transform for type: {source}")
+
+        return lambda v: month_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 2
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_month(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    def __repr__(self) -> str:
+        return "MonthTransform()"
+
+
+class DayTransform(Transform[S, int]):
+    """Transforms a datetime value into a day value.
+
+    Example:
+        >>> transform = MonthTransform()
+        >>> transform.transform(DateType())(17501)
+        17501
+    """
+
+    __root__: Literal["day"] = Field(default="day")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def day_func(v):
+                return v
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def day_func(v):
+                return datetime.micros_to_days(v)
+
+        else:
+            raise ValueError(f"Cannot apply day transform for type: {source}")
+
+        return lambda v: day_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return DateType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 1
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_day(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    def __repr__(self) -> str:
+        return "DayTransform()"
+
+
+class HourTransform(Transform[S, int]):
+    """Transforms a datetime value into a hour value.
+
+    Example:
+        >>> transform = HourTransform()
+        >>> transform.transform(TimestampType())(1512151975038194)
+        420042
+    """
+
+    __root__: Literal["hour"] = Field(default="hour")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        if type(source) in {TimestampType, TimestamptzType}:
+
+            def hour_func(v):
+                return datetime.micros_to_hours(v)
+
+        else:
+            raise ValueError(f"Cannot apply hour transform for type: {source}")
+
+        return lambda v: hour_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 0
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False

Review Comment:
   Can this go in a superclass? I'd rather not have so many copies of it.



##########
python/pyiceberg/transforms.py:
##########
@@ -215,6 +227,243 @@ def __repr__(self) -> str:
         return f"BucketTransform(num_buckets={self._num_buckets})"
 
 
+class YearTransform(Transform[S, int]):
+    """Transforms a datetime value into a year value.
+
+    Example:
+        >>> transform = YearTransform()
+        >>> transform.transform(TimestampType())(1512151975038194)
+        47
+    """
+
+    __root__: Literal["year"] = Field(default="year")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def year_func(v):
+                return datetime.days_to_years(v)
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def year_func(v):
+                return datetime.micros_to_years(v)
+
+        else:
+            raise ValueError(f"Cannot apply year transform for type: {source}")
+
+        return lambda v: year_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 3
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_year(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    def __repr__(self) -> str:
+        return "YearTransform()"
+
+
+class MonthTransform(Transform[S, int]):
+    """Transforms a datetime value into a month value.
+
+    Example:
+        >>> transform = MonthTransform()
+        >>> transform.transform(DateType())(17501)
+        575
+    """
+
+    __root__: Literal["month"] = Field(default="month")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def month_func(v):
+                return datetime.days_to_months(v)
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def month_func(v):
+                return datetime.micros_to_months(v)
+
+        else:
+            raise ValueError(f"Cannot apply month transform for type: {source}")
+
+        return lambda v: month_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 2
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_month(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    def __repr__(self) -> str:
+        return "MonthTransform()"
+
+
+class DayTransform(Transform[S, int]):
+    """Transforms a datetime value into a day value.
+
+    Example:
+        >>> transform = MonthTransform()
+        >>> transform.transform(DateType())(17501)
+        17501
+    """
+
+    __root__: Literal["day"] = Field(default="day")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def day_func(v):
+                return v
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def day_func(v):
+                return datetime.micros_to_days(v)
+
+        else:
+            raise ValueError(f"Cannot apply day transform for type: {source}")
+
+        return lambda v: day_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return DateType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 1
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_day(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    def __repr__(self) -> str:
+        return "DayTransform()"
+
+
+class HourTransform(Transform[S, int]):
+    """Transforms a datetime value into a hour value.
+
+    Example:
+        >>> transform = HourTransform()
+        >>> transform.transform(TimestampType())(1512151975038194)
+        420042
+    """
+
+    __root__: Literal["hour"] = Field(default="hour")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        if type(source) in {TimestampType, TimestamptzType}:
+
+            def hour_func(v):
+                return datetime.micros_to_hours(v)
+
+        else:
+            raise ValueError(f"Cannot apply hour transform for type: {source}")
+
+        return lambda v: hour_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 0
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_hour(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"

Review Comment:
   Same with this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r942764509


##########
python/pyiceberg/transforms.py:
##########
@@ -215,6 +227,243 @@ def __repr__(self) -> str:
         return f"BucketTransform(num_buckets={self._num_buckets})"
 
 
+class YearTransform(Transform[S, int]):
+    """Transforms a datetime value into a year value.
+
+    Example:
+        >>> transform = YearTransform()
+        >>> transform.transform(TimestampType())(1512151975038194)
+        47
+    """
+
+    __root__: Literal["year"] = Field(default="year")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def year_func(v):
+                return datetime.days_to_years(v)
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def year_func(v):
+                return datetime.micros_to_years(v)
+
+        else:
+            raise ValueError(f"Cannot apply year transform for type: {source}")
+
+        return lambda v: year_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 3
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_year(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    def __repr__(self) -> str:
+        return "YearTransform()"
+
+
+class MonthTransform(Transform[S, int]):
+    """Transforms a datetime value into a month value.
+
+    Example:
+        >>> transform = MonthTransform()
+        >>> transform.transform(DateType())(17501)
+        575
+    """
+
+    __root__: Literal["month"] = Field(default="month")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def month_func(v):
+                return datetime.days_to_months(v)
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def month_func(v):
+                return datetime.micros_to_months(v)
+
+        else:
+            raise ValueError(f"Cannot apply month transform for type: {source}")
+
+        return lambda v: month_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 2
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_month(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    def __repr__(self) -> str:
+        return "MonthTransform()"
+
+
+class DayTransform(Transform[S, int]):
+    """Transforms a datetime value into a day value.
+
+    Example:
+        >>> transform = MonthTransform()
+        >>> transform.transform(DateType())(17501)
+        17501
+    """
+
+    __root__: Literal["day"] = Field(default="day")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def day_func(v):
+                return v
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def day_func(v):
+                return datetime.micros_to_days(v)
+
+        else:
+            raise ValueError(f"Cannot apply day transform for type: {source}")
+
+        return lambda v: day_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return DateType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 1
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_day(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    def __repr__(self) -> str:
+        return "DayTransform()"
+
+
+class HourTransform(Transform[S, int]):
+    """Transforms a datetime value into a hour value.
+
+    Example:
+        >>> transform = HourTransform()
+        >>> transform.transform(TimestampType())(1512151975038194)
+        420042
+    """
+
+    __root__: Literal["hour"] = Field(default="hour")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        if type(source) in {TimestampType, TimestamptzType}:
+
+            def hour_func(v):
+                return datetime.micros_to_hours(v)
+
+        else:
+            raise ValueError(f"Cannot apply hour transform for type: {source}")
+
+        return lambda v: hour_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 0

Review Comment:
   I agree. It would be much better to have an enum for this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r942766042


##########
python/pyiceberg/transforms.py:
##########
@@ -215,6 +227,243 @@ def __repr__(self) -> str:
         return f"BucketTransform(num_buckets={self._num_buckets})"
 
 
+class YearTransform(Transform[S, int]):
+    """Transforms a datetime value into a year value.
+
+    Example:
+        >>> transform = YearTransform()
+        >>> transform.transform(TimestampType())(1512151975038194)
+        47
+    """
+
+    __root__: Literal["year"] = Field(default="year")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def year_func(v):
+                return datetime.days_to_years(v)
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def year_func(v):
+                return datetime.micros_to_years(v)
+
+        else:
+            raise ValueError(f"Cannot apply year transform for type: {source}")
+
+        return lambda v: year_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:

Review Comment:
   Java uses the term "granularity" instead of order. Would that work here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r942763280


##########
python/pyiceberg/transforms.py:
##########
@@ -215,6 +227,243 @@ def __repr__(self) -> str:
         return f"BucketTransform(num_buckets={self._num_buckets})"
 
 
+class YearTransform(Transform[S, int]):
+    """Transforms a datetime value into a year value.
+
+    Example:
+        >>> transform = YearTransform()
+        >>> transform.transform(TimestampType())(1512151975038194)
+        47
+    """
+
+    __root__: Literal["year"] = Field(default="year")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def year_func(v):
+                return datetime.days_to_years(v)
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def year_func(v):
+                return datetime.micros_to_years(v)
+
+        else:
+            raise ValueError(f"Cannot apply year transform for type: {source}")
+
+        return lambda v: year_func(v) if v else None

Review Comment:
   This should be `is not None` because `if 0` is `False`:
   
   ```python
   In [1]: if 0:
      ...:     print("true")
      ...: 
   
   In [2]:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r952956370


##########
python/pyiceberg/transforms.py:
##########
@@ -215,6 +228,221 @@ def __repr__(self) -> str:
         return f"BucketTransform(num_buckets={self._num_buckets})"
 
 
+class TimeResolution(IntEnum):
+    YEAR = 6
+    MONTH = 5
+    WEEK = 4
+    DAY = 3
+    HOUR = 2
+    MINUTE = 1
+    SECOND = 0
+
+
+class TimeTransform(Transform[S, int]):

Review Comment:
   Can these also be `Singleton`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
jun-he commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r951071193


##########
python/pyiceberg/utils/datetime.py:
##########
@@ -116,3 +131,33 @@ def to_human_timestamptz(timestamp_micros: int) -> str:
 def to_human_timestamp(timestamp_micros: int) -> str:
     """Converts a TimestampType value to human string"""
     return (EPOCH_TIMESTAMP + timedelta(microseconds=timestamp_micros)).isoformat()
+
+
+def micros_to_hours(timestamp: int) -> int:
+    """Converts a timestamp in microseconds to a date in hours"""
+    return int((datetime.utcfromtimestamp(timestamp / 1_000_000) - EPOCH_TIMESTAMP).total_seconds() / 3600)
+
+
+def days_to_months(days: int) -> int:
+    """Creates a date from the number of days from 1970-01-01"""
+    dt = datetime.utcfromtimestamp(days * 86400)

Review Comment:
   👌 



##########
python/pyiceberg/utils/datetime.py:
##########
@@ -116,3 +131,33 @@ def to_human_timestamptz(timestamp_micros: int) -> str:
 def to_human_timestamp(timestamp_micros: int) -> str:
     """Converts a TimestampType value to human string"""
     return (EPOCH_TIMESTAMP + timedelta(microseconds=timestamp_micros)).isoformat()
+
+
+def micros_to_hours(timestamp: int) -> int:
+    """Converts a timestamp in microseconds to a date in hours"""
+    return int((datetime.utcfromtimestamp(timestamp / 1_000_000) - EPOCH_TIMESTAMP).total_seconds() / 3600)
+
+
+def days_to_months(days: int) -> int:
+    """Creates a date from the number of days from 1970-01-01"""
+    dt = datetime.utcfromtimestamp(days * 86400)
+    return (dt.year - EPOCH_TIMESTAMP.year) * 12 + (dt.month - EPOCH_TIMESTAMP.month) - (1 if dt.day < EPOCH_TIMESTAMP.day else 0)
+
+
+def micros_to_months(timestamp: int) -> int:
+    dt = datetime.utcfromtimestamp(timestamp / 1_000_000)

Review Comment:
   👌 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
jun-he commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r951074589


##########
python/tests/test_transforms.py:
##########
@@ -139,6 +143,104 @@ def test_string_with_surrogate_pair():
     assert bucket_transform(string_with_surrogate_pair) == mmh3.hash(as_bytes)
 
 
+@pytest.mark.parametrize(
+    "date,date_transform,expected",
+    [
+        (47, YearTransform(), "2017"),
+        (575, MonthTransform(), "2017-12"),
+        (17501, DayTransform(), "2017-12-01"),
+    ],
+)
+def test_date_to_human_string(date, date_transform, expected):
+    assert date_transform.to_human_string(DateType(), None) == "null"
+    assert date_transform.to_human_string(DateType(), date) == expected
+
+
+@pytest.mark.parametrize(
+    "timestamp,time_transform,expected",
+    [
+        (47, YearTransform(), "2017"),
+        (575, MonthTransform(), "2017-12"),
+        (17501, DayTransform(), "2017-12-01"),
+        (420042, HourTransform(), "2017-12-01-18"),
+    ],
+)
+def test_ts_to_human_string(timestamp, time_transform, expected):
+    assert time_transform.to_human_string(TimestampType(), None) == "null"
+    assert time_transform.to_human_string(TimestampType(), timestamp) == expected
+
+
+@pytest.mark.parametrize(
+    "type_var",
+    [
+        DateType(),
+        TimestampType(),
+        TimestamptzType(),
+    ],
+)
+def test_time_methods(type_var):
+    assert YearTransform().can_transform(type_var)
+    assert MonthTransform().can_transform(type_var)
+    assert DayTransform().can_transform(type_var)
+    assert YearTransform().preserves_order
+    assert MonthTransform().preserves_order
+    assert DayTransform().preserves_order
+    assert YearTransform().result_type(type_var) == IntegerType()
+    assert MonthTransform().result_type(type_var) == IntegerType()
+    assert DayTransform().result_type(type_var) == DateType()
+    assert YearTransform().dedup_name == "time"
+    assert MonthTransform().dedup_name == "time"
+    assert DayTransform().dedup_name == "time"
+
+
+@pytest.mark.parametrize(
+    "transform,type_var,value,expected",
+    [
+        (DayTransform(), DateType(), 17501, 17501),

Review Comment:
   👌  added.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r942788555


##########
python/pyiceberg/utils/datetime.py:
##########
@@ -116,3 +131,33 @@ def to_human_timestamptz(timestamp_micros: int) -> str:
 def to_human_timestamp(timestamp_micros: int) -> str:
     """Converts a TimestampType value to human string"""
     return (EPOCH_TIMESTAMP + timedelta(microseconds=timestamp_micros)).isoformat()
+
+
+def micros_to_hours(timestamp: int) -> int:
+    """Converts a timestamp in microseconds to a date in hours"""
+    return int((datetime.utcfromtimestamp(timestamp / 1_000_000) - EPOCH_TIMESTAMP).total_seconds() / 3600)
+
+
+def days_to_months(days: int) -> int:
+    """Creates a date from the number of days from 1970-01-01"""
+    dt = datetime.utcfromtimestamp(days * 86400)

Review Comment:
   This should not use `datetime`. Instead, use `date` for date calculations.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r942787781


##########
python/pyiceberg/utils/datetime.py:
##########
@@ -116,3 +131,33 @@ def to_human_timestamptz(timestamp_micros: int) -> str:
 def to_human_timestamp(timestamp_micros: int) -> str:
     """Converts a TimestampType value to human string"""
     return (EPOCH_TIMESTAMP + timedelta(microseconds=timestamp_micros)).isoformat()
+
+
+def micros_to_hours(timestamp: int) -> int:
+    """Converts a timestamp in microseconds to a date in hours"""
+    return int((datetime.utcfromtimestamp(timestamp / 1_000_000) - EPOCH_TIMESTAMP).total_seconds() / 3600)

Review Comment:
   Should this use floor division rather than converting to double?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r942775639


##########
python/tests/test_transforms.py:
##########
@@ -139,6 +143,104 @@ def test_string_with_surrogate_pair():
     assert bucket_transform(string_with_surrogate_pair) == mmh3.hash(as_bytes)
 
 
+@pytest.mark.parametrize(
+    "date,date_transform,expected",
+    [
+        (47, YearTransform(), "2017"),
+        (575, MonthTransform(), "2017-12"),
+        (17501, DayTransform(), "2017-12-01"),
+    ],
+)
+def test_date_to_human_string(date, date_transform, expected):
+    assert date_transform.to_human_string(DateType(), None) == "null"

Review Comment:
   Please separate this into its own test case. There's no need to run this assertion for each case above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r942809011


##########
python/pyiceberg/utils/datetime.py:
##########
@@ -116,3 +131,33 @@ def to_human_timestamptz(timestamp_micros: int) -> str:
 def to_human_timestamp(timestamp_micros: int) -> str:
     """Converts a TimestampType value to human string"""
     return (EPOCH_TIMESTAMP + timedelta(microseconds=timestamp_micros)).isoformat()
+
+
+def micros_to_hours(timestamp: int) -> int:
+    """Converts a timestamp in microseconds to a date in hours"""
+    return int((datetime.utcfromtimestamp(timestamp / 1_000_000) - EPOCH_TIMESTAMP).total_seconds() / 3600)
+
+
+def days_to_months(days: int) -> int:
+    """Creates a date from the number of days from 1970-01-01"""
+    dt = datetime.utcfromtimestamp(days * 86400)
+    return (dt.year - EPOCH_TIMESTAMP.year) * 12 + (dt.month - EPOCH_TIMESTAMP.month) - (1 if dt.day < EPOCH_TIMESTAMP.day else 0)

Review Comment:
   Is the final adjustment needed? This seems to work just fine for me:
   
   ```python
   d = days_to_date(days)
   return (d.year - EPOCH_DATE.year) * 12 + (d.month - EPOCH_DATE.month)
   ```
   
   Also, I don't think that the value's `day` will ever be less than `EPOCH_TIMESTAMP.day` or `EPOCH_DATE.day`. Both of those are 1 because Python never returns 0 for a day value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
jun-he commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r951039017


##########
python/pyiceberg/transforms.py:
##########
@@ -215,6 +227,243 @@ def __repr__(self) -> str:
         return f"BucketTransform(num_buckets={self._num_buckets})"
 
 
+class YearTransform(Transform[S, int]):
+    """Transforms a datetime value into a year value.
+
+    Example:
+        >>> transform = YearTransform()
+        >>> transform.transform(TimestampType())(1512151975038194)
+        47
+    """
+
+    __root__: Literal["year"] = Field(default="year")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def year_func(v):
+                return datetime.days_to_years(v)
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def year_func(v):
+                return datetime.micros_to_years(v)
+
+        else:
+            raise ValueError(f"Cannot apply year transform for type: {source}")
+
+        return lambda v: year_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 3
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_year(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    def __repr__(self) -> str:
+        return "YearTransform()"
+
+
+class MonthTransform(Transform[S, int]):
+    """Transforms a datetime value into a month value.
+
+    Example:
+        >>> transform = MonthTransform()
+        >>> transform.transform(DateType())(17501)
+        575
+    """
+
+    __root__: Literal["month"] = Field(default="month")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def month_func(v):
+                return datetime.days_to_months(v)
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def month_func(v):
+                return datetime.micros_to_months(v)
+
+        else:
+            raise ValueError(f"Cannot apply month transform for type: {source}")
+
+        return lambda v: month_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 2
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_month(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    def __repr__(self) -> str:
+        return "MonthTransform()"
+
+
+class DayTransform(Transform[S, int]):
+    """Transforms a datetime value into a day value.
+
+    Example:
+        >>> transform = MonthTransform()
+        >>> transform.transform(DateType())(17501)
+        17501
+    """
+
+    __root__: Literal["day"] = Field(default="day")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        source_type = type(source)
+        if source_type == DateType:
+
+            def day_func(v):
+                return v
+
+        elif source_type in {TimestampType, TimestamptzType}:
+
+            def day_func(v):
+                return datetime.micros_to_days(v)
+
+        else:
+            raise ValueError(f"Cannot apply day transform for type: {source}")
+
+        return lambda v: day_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            DateType,
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return DateType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 1
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False
+
+    def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
+        return datetime.to_human_day(value) if isinstance(value, int) else "null"
+
+    @property
+    def dedup_name(self) -> str:
+        return "time"
+
+    def __repr__(self) -> str:
+        return "DayTransform()"
+
+
+class HourTransform(Transform[S, int]):
+    """Transforms a datetime value into a hour value.
+
+    Example:
+        >>> transform = HourTransform()
+        >>> transform.transform(TimestampType())(1512151975038194)
+        420042
+    """
+
+    __root__: Literal["hour"] = Field(default="hour")
+    _source_type: IcebergType = PrivateAttr()
+
+    def transform(self, source: IcebergType) -> Callable[[Optional[S]], Optional[int]]:
+        if type(source) in {TimestampType, TimestamptzType}:
+
+            def hour_func(v):
+                return datetime.micros_to_hours(v)
+
+        else:
+            raise ValueError(f"Cannot apply hour transform for type: {source}")
+
+        return lambda v: hour_func(v) if v else None
+
+    def can_transform(self, source: IcebergType) -> bool:
+        return type(source) in {
+            TimestampType,
+            TimestamptzType,
+        }
+
+    def result_type(self, source: IcebergType) -> IcebergType:
+        return IntegerType()
+
+    @property
+    def preserves_order(self) -> bool:
+        return True
+
+    @property
+    def time_order(self) -> int:
+        return 0
+
+    def satisfies_order_of(self, other: Transform) -> bool:
+        return self.time_order <= other.time_order if hasattr(other, "time_order") else False

Review Comment:
   👌 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
Fokko commented on PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#issuecomment-1219753561

   Hey @jun-he We're thinking of doing a first Python release, and I would love to have this in as well. Would you have some time to dig into the comments? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a diff in pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
jun-he commented on code in PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#discussion_r956891603


##########
python/pyiceberg/transforms.py:
##########
@@ -215,6 +228,221 @@ def __repr__(self) -> str:
         return f"BucketTransform(num_buckets={self._num_buckets})"
 
 
+class TimeResolution(IntEnum):
+    YEAR = 6
+    MONTH = 5
+    WEEK = 4
+    DAY = 3
+    HOUR = 2
+    MINUTE = 1
+    SECOND = 0
+
+
+class TimeTransform(Transform[S, int]):

Review Comment:
   👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on pull request #5462: Python: Add year, month, day, and hour transforms

Posted by GitBox <gi...@apache.org>.
Fokko commented on PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#issuecomment-1230310637

   Thanks @jun-he looks great! Thanks! 🚀 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on pull request #5462: [Python] add time (year, month, day, hour) transforms

Posted by GitBox <gi...@apache.org>.
jun-he commented on PR #5462:
URL: https://github.com/apache/iceberg/pull/5462#issuecomment-1207631150

   @Fokko can you help to review it? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org