You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/01/19 23:45:43 UTC

[GitHub] [iceberg] huaxingao opened a new pull request #3933: Support timestamp in partition path

huaxingao opened a new pull request #3933:
URL: https://github.com/apache/iceberg/pull/3933


   Partition path containing timestamp such as `ts=2021-01-01 00:00:00.999` is not supported and iceberg throws Exception [here](https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/types/Conversions.java#L74). It seems to me that timestamp in partition path should be supported and this PR lifts the restriction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #3933: Support timestamp in partition path

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on pull request #3933:
URL: https://github.com/apache/iceberg/pull/3933#issuecomment-1018192717


   @huaxingao to be clear here, this is a fix mostly for MigrationTableUtils right? I was looking for any other consumers of the function but I only found that and test code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3933: Support timestamp in partition path

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3933:
URL: https://github.com/apache/iceberg/pull/3933#discussion_r789373751



##########
File path: api/src/main/java/org/apache/iceberg/types/Conversions.java
##########
@@ -71,6 +71,12 @@ public static Object fromPartitionString(Type type, String asString) {
         return new BigDecimal(asString);
       case DATE:
         return Literal.of(asString).to(Types.DateType.get()).value();
+      case TIMESTAMP:
+        if (!asString.contains("T")) {
+          return java.sql.Timestamp.valueOf(asString).getTime() * 1000;

Review comment:
       I was wondering here if we can find do the same inversion that's in the hive code for parsing this? Or does hive really allow producing either path?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3933: Support timestamp in partition path

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3933:
URL: https://github.com/apache/iceberg/pull/3933#discussion_r792231929



##########
File path: api/src/main/java/org/apache/iceberg/types/Conversions.java
##########
@@ -71,6 +73,14 @@ public static Object fromPartitionString(Type type, String asString) {
         return new BigDecimal(asString);
       case DATE:
         return Literal.of(asString).to(Types.DateType.get()).value();
+      case TIMESTAMP:
+        if (!asString.contains("T")) {
+          Instant instant = java.sql.Timestamp.valueOf(asString).toInstant();

Review comment:
       Is it possible to do this conversion without using `java.sql.Timestamp`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] huaxingao commented on a change in pull request #3933: Support timestamp in partition path

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #3933:
URL: https://github.com/apache/iceberg/pull/3933#discussion_r788240499



##########
File path: api/src/main/java/org/apache/iceberg/types/Conversions.java
##########
@@ -71,6 +71,12 @@ public static Object fromPartitionString(Type type, String asString) {
         return new BigDecimal(asString);
       case DATE:
         return Literal.of(asString).to(Types.DateType.get()).value();
+      case TIMESTAMP:
+        if (!asString.contains("T")) {
+          return java.sql.Timestamp.valueOf(asString).getTime() * 1000;

Review comment:
       The path is something like this `ts=2021-01-01 00:00:00.999`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on pull request #3933: Support timestamp in partition path

Posted by GitBox <gi...@apache.org>.
flyrain commented on pull request #3933:
URL: https://github.com/apache/iceberg/pull/3933#issuecomment-1017950666


   +1 for the solution. It should be pretty useful in the following scenario. The table only have a timestamp column, but want to partitioned by year/date.
   ```
   CREATE TABLE %s (id Integer, name String, dept String, ts Timestamp) USING iceberg PARTITIONED BY year(ts)
   CREATE TABLE %s (id Integer, name String, dept String, ts Timestamp) USING iceberg PARTITIONED BY date(ts)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] huaxingao commented on pull request #3933: Support timestamp in partition path

Posted by GitBox <gi...@apache.org>.
huaxingao commented on pull request #3933:
URL: https://github.com/apache/iceberg/pull/3933#issuecomment-1019041517


   @RussellSpitzer 
   Sorry for the late reply. Yes, the fix is mostly for MigrationTableUtils. It is for this path `DataFiles.withPartitionPath` -> `DataFiles.fillFromPath` -> `Conversions.fromPartitionString`. 
   
   After I took a look at `PartitionSpec`, I feel that timestamp is intentionally blocked in  `Conversions.fromPartitionString`, because in `PartitionSpec Builder`, it has year, month, day, time, but no timestamp. I guess timestamp as a partition column is not supported because it doesn't have any real usages. However, creating table using timestamp as a partition column 
   `CREATE TABLE test (id Integer, name String, dept String, ts Timestamp) USING iceberg PARTITIONED BY (ts)` 
   can create the table successfully, but users don't know that the partition column `ts` is not working, so I feel that we probably should support partitioned by timestamp. If not, we probably should block creating table partitioned by timestamp.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3933: Support timestamp in partition path

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3933:
URL: https://github.com/apache/iceberg/pull/3933#discussion_r791090930



##########
File path: api/src/main/java/org/apache/iceberg/types/Conversions.java
##########
@@ -71,6 +71,12 @@ public static Object fromPartitionString(Type type, String asString) {
         return new BigDecimal(asString);
       case DATE:
         return Literal.of(asString).to(Types.DateType.get()).value();
+      case TIMESTAMP:
+        if (!asString.contains("T")) {
+          return java.sql.Timestamp.valueOf(asString).getTime() * 1000;

Review comment:
       We need to be _very_ strict with what we parse and accept here and delegating this work to `java.sql.Timestamp` seems like a good way to introduce formats that we don't intend to support. I would rather use what is built into `StringLiteral` and supply different formats. Can you look at using `OffsetDateTime.parse` and a format for this instead?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain edited a comment on pull request #3933: Support timestamp in partition path

Posted by GitBox <gi...@apache.org>.
flyrain edited a comment on pull request #3933:
URL: https://github.com/apache/iceberg/pull/3933#issuecomment-1017950666


   +1 for the solution. It should be pretty useful in the following scenario. The table only have a timestamp column, but want to partitioned by year/date.
   ```
   CREATE TABLE %s (id Integer, name String, dept String, ts Timestamp) USING iceberg PARTITIONED BY (years(ts))
   CREATE TABLE %s (id Integer, name String, dept String, ts Timestamp) USING iceberg PARTITIONED BY (date(ts))
   ```
   
   The Iceberg doc gives an example about it, https://iceberg.apache.org/#spark-ddl/#partitioned-by.
   ```
   CREATE TABLE prod.db.sample (
       id bigint,
       data string,
       category string,
       ts timestamp)
   USING iceberg
   PARTITIONED BY (bucket(16, id), days(ts), category)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3933: Support timestamp in partition path

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3933:
URL: https://github.com/apache/iceberg/pull/3933#discussion_r791092054



##########
File path: spark/v3.2/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAddFilesProcedure.java
##########
@@ -819,6 +839,26 @@ public void testPartitionedImportFromEmptyPartitionDoesNotThrow() {
               RowFactory.create(4, "Will Doe", "facilities", toDate("2021-01-02"))),
           new StructType(dateStruct)).repartition(2);
 
+  private static final StructField[] timeStampstruct = {
+      new StructField("id", DataTypes.IntegerType, true, Metadata.empty()),
+      new StructField("name", DataTypes.StringType, true, Metadata.empty()),
+      new StructField("dept", DataTypes.StringType, true, Metadata.empty()),
+      new StructField("ts", DataTypes.TimestampType, true, Metadata.empty())
+  };
+
+  private static Timestamp toTimestamp(String value) {
+    return new Timestamp(DateTime.parse(value).getMillis());

Review comment:
       This is not a reliable way to convert to a timestamp because Java Date and SQL Timestamp have internal time zone logic. Instead, please use Literal or parse values directly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] huaxingao commented on a change in pull request #3933: Support timestamp in partition path

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #3933:
URL: https://github.com/apache/iceberg/pull/3933#discussion_r788971851



##########
File path: api/src/main/java/org/apache/iceberg/types/Conversions.java
##########
@@ -71,6 +71,12 @@ public static Object fromPartitionString(Type type, String asString) {
         return new BigDecimal(asString);
       case DATE:
         return Literal.of(asString).to(Types.DateType.get()).value();
+      case TIMESTAMP:
+        if (!asString.contains("T")) {
+          return java.sql.Timestamp.valueOf(asString).getTime() * 1000;

Review comment:
       My understanding is that the timezone has already been taken care of. For example, if I have row
   ```
   1, "John Doe", "hr", toTimestamp("2021-01-01T00:00:00.999999999Z"
   ```
   The partition path is actually `ts=2020-12-31 16:00:00.999`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #3933: Support timestamp in partition path

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on pull request #3933:
URL: https://github.com/apache/iceberg/pull/3933#issuecomment-1020465126


   https://github.com/apache/iceberg/blob/12bf61dace38d7468d80a6d8e71a3cc42ffdd363/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L173-L186 // Iceberg Code (which escapes string value)
   
   https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java#L[…]6 // Hive Code


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] huaxingao commented on a change in pull request #3933: Support timestamp in partition path

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #3933:
URL: https://github.com/apache/iceberg/pull/3933#discussion_r788240499



##########
File path: api/src/main/java/org/apache/iceberg/types/Conversions.java
##########
@@ -71,6 +71,12 @@ public static Object fromPartitionString(Type type, String asString) {
         return new BigDecimal(asString);
       case DATE:
         return Literal.of(asString).to(Types.DateType.get()).value();
+      case TIMESTAMP:
+        if (!asString.contains("T")) {
+          return java.sql.Timestamp.valueOf(asString).getTime() * 1000;

Review comment:
       The path is something like this `ts=2021-01-01 00:00:00.999`. It doesn't contain `T` or `Z`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain edited a comment on pull request #3933: Support timestamp in partition path

Posted by GitBox <gi...@apache.org>.
flyrain edited a comment on pull request #3933:
URL: https://github.com/apache/iceberg/pull/3933#issuecomment-1017950666


   +1 for the solution. It should be pretty useful in the following scenario. The table only have a timestamp column, but want to partitioned by year/date.
   ```
   CREATE TABLE %s (id Integer, name String, dept String, ts Timestamp) USING iceberg PARTITIONED BY (years(ts))
   CREATE TABLE %s (id Integer, name String, dept String, ts Timestamp) USING iceberg PARTITIONED BY (date(ts))
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #3933: Support timestamp in partition path

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on pull request #3933:
URL: https://github.com/apache/iceberg/pull/3933#issuecomment-1020465126


   https://github.com/apache/iceberg/blob/12bf61dace38d7468d80a6d8e71a3cc42ffdd363/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L173-L186 // Iceberg Code (which escapes string value)
   
   https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java#L[…]6 // Hive Code


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3933: Support timestamp in partition path

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3933:
URL: https://github.com/apache/iceberg/pull/3933#discussion_r789373751



##########
File path: api/src/main/java/org/apache/iceberg/types/Conversions.java
##########
@@ -71,6 +71,12 @@ public static Object fromPartitionString(Type type, String asString) {
         return new BigDecimal(asString);
       case DATE:
         return Literal.of(asString).to(Types.DateType.get()).value();
+      case TIMESTAMP:
+        if (!asString.contains("T")) {
+          return java.sql.Timestamp.valueOf(asString).getTime() * 1000;

Review comment:
       I was wondering here if we can find do the same inversion that's in the hive code for parsing this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #3933: Support timestamp in partition path

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on pull request #3933:
URL: https://github.com/apache/iceberg/pull/3933#issuecomment-1020123274


   > @RussellSpitzer Sorry for the late reply. Yes, the fix is mostly for MigrationTableUtils. It is for this path `DataFiles.withPartitionPath` -> `DataFiles.fillFromPath` -> `Conversions.fromPartitionString`.
   > 
   > After I took a look at `PartitionSpec`, I feel that timestamp is intentionally blocked in `Conversions.fromPartitionString`, because in `PartitionSpec Builder`, it has year, month, day, time, but no timestamp. I guess timestamp as a partition column is not supported because it doesn't have any real usages. However, creating table using timestamp as a partition column `CREATE TABLE test (id Integer, name String, dept String, ts Timestamp) USING iceberg PARTITIONED BY (ts)` can create the table successfully, but users don't know that the partition column `ts` is not working, so I feel that we probably should support partitioned by timestamp. If not, we probably should block creating table partitioned by timestamp.
   
   Yep makes sense to me! Yufei and I were just discussing another bug which we thought might be related (it was not). I think this is fine to fix. I just want to make sure our string parsing is correct, from what I could tell the string that is generated for "timestamp" should be environment dependent since in the Hive code it just uses "value.toString". If we are confident that this covers most use cases I'm ok with merging.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] hililiwei commented on a change in pull request #3933: Support timestamp in partition path

Posted by GitBox <gi...@apache.org>.
hililiwei commented on a change in pull request #3933:
URL: https://github.com/apache/iceberg/pull/3933#discussion_r788787787



##########
File path: api/src/main/java/org/apache/iceberg/types/Conversions.java
##########
@@ -71,6 +71,12 @@ public static Object fromPartitionString(Type type, String asString) {
         return new BigDecimal(asString);
       case DATE:
         return Literal.of(asString).to(Types.DateType.get()).value();
+      case TIMESTAMP:
+        if (!asString.contains("T")) {
+          return java.sql.Timestamp.valueOf(asString).getTime() * 1000;

Review comment:
       Will time zones be confused?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3933: Support timestamp in partition path

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3933:
URL: https://github.com/apache/iceberg/pull/3933#discussion_r789376739



##########
File path: api/src/main/java/org/apache/iceberg/types/Conversions.java
##########
@@ -71,6 +71,12 @@ public static Object fromPartitionString(Type type, String asString) {
         return new BigDecimal(asString);
       case DATE:
         return Literal.of(asString).to(Types.DateType.get()).value();
+      case TIMESTAMP:
+        if (!asString.contains("T")) {
+          return java.sql.Timestamp.valueOf(asString).getTime() * 1000;

Review comment:
       I notice that the test only creates one type of timestamp string so do we only need to cover that?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3933: Support timestamp in partition path

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3933:
URL: https://github.com/apache/iceberg/pull/3933#discussion_r792232443



##########
File path: spark/v3.2/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAddFilesProcedure.java
##########
@@ -819,6 +842,27 @@ public void testPartitionedImportFromEmptyPartitionDoesNotThrow() {
               RowFactory.create(4, "Will Doe", "facilities", toDate("2021-01-02"))),
           new StructType(dateStruct)).repartition(2);
 
+  private static final StructField[] timeStampstruct = {
+      new StructField("id", DataTypes.IntegerType, true, Metadata.empty()),
+      new StructField("name", DataTypes.StringType, true, Metadata.empty()),
+      new StructField("dept", DataTypes.StringType, true, Metadata.empty()),
+      new StructField("ts", DataTypes.TimestampType, true, Metadata.empty())
+  };
+
+  private static Timestamp toTimestamp(String value) {

Review comment:
       Okay, I see. I was wrong before. This is actually producing a Timestamp because that's what Spark's public row API uses. Instead, could you use SparkSQL and literal values to avoid doing this conversion manually? I think that will make for a much more reliable test.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3933: Support timestamp in partition path

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3933:
URL: https://github.com/apache/iceberg/pull/3933#discussion_r791090930



##########
File path: api/src/main/java/org/apache/iceberg/types/Conversions.java
##########
@@ -71,6 +71,12 @@ public static Object fromPartitionString(Type type, String asString) {
         return new BigDecimal(asString);
       case DATE:
         return Literal.of(asString).to(Types.DateType.get()).value();
+      case TIMESTAMP:
+        if (!asString.contains("T")) {
+          return java.sql.Timestamp.valueOf(asString).getTime() * 1000;

Review comment:
       We need to be _very_ strict with what we parse and accept here and delegating this work to `java.sql.Timestamp` seems like a good way to introduce formats that we don't intend to support. I would rather use what is built into `StringLiteral` and supply different formats. Can you look at using `OffsetDateTime.parse` and a format for this instead?

##########
File path: spark/v3.2/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAddFilesProcedure.java
##########
@@ -819,6 +839,26 @@ public void testPartitionedImportFromEmptyPartitionDoesNotThrow() {
               RowFactory.create(4, "Will Doe", "facilities", toDate("2021-01-02"))),
           new StructType(dateStruct)).repartition(2);
 
+  private static final StructField[] timeStampstruct = {
+      new StructField("id", DataTypes.IntegerType, true, Metadata.empty()),
+      new StructField("name", DataTypes.StringType, true, Metadata.empty()),
+      new StructField("dept", DataTypes.StringType, true, Metadata.empty()),
+      new StructField("ts", DataTypes.TimestampType, true, Metadata.empty())
+  };
+
+  private static Timestamp toTimestamp(String value) {
+    return new Timestamp(DateTime.parse(value).getMillis());

Review comment:
       This is not a reliable way to convert to a timestamp because Java Date and SQL Timestamp have internal time zone logic. Instead, please use Literal or parse values directly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org