You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/06/25 08:53:47 UTC

[GitHub] [iceberg] nastra opened a new pull request #2739: Arrow: Add support for TimeType / UUIDType

nastra opened a new pull request #2739:
URL: https://github.com/apache/iceberg/pull/2739


   This is partly fixing https://github.com/apache/iceberg/issues/2486 and https://github.com/apache/iceberg/issues/2485. I didn't want to include all types as otherwise the PR would become too large. It's been a bit of a pain adding new type support. So I'm planning to refactor the code after this PR is merged in the `arrow` project in order to reduce code duplication and complexity before adding support for more types.
   
   Note that this PR currently depends on https://github.com/apache/iceberg/pull/2732 so it might make sense to review only commit 
   92a551b


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a change in pull request #2739: Arrow: Add support for TimeType / UUIDType

Posted by GitBox <gi...@apache.org>.
nastra commented on a change in pull request #2739:
URL: https://github.com/apache/iceberg/pull/2739#discussion_r659783420



##########
File path: arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java
##########
@@ -240,6 +247,12 @@ private void allocateFieldVector(boolean dictionaryEncodedVector) {
             this.readType = ReadType.LONG;
             this.typeWidth = (int) BigIntVector.TYPE_WIDTH;
             break;
+          case TIME_MICROS:

Review comment:
       I will look into supporting `TIME_MILLIS` as a follow-up tomorrow

##########
File path: arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java
##########
@@ -240,6 +247,12 @@ private void allocateFieldVector(boolean dictionaryEncodedVector) {
             this.readType = ReadType.LONG;
             this.typeWidth = (int) BigIntVector.TYPE_WIDTH;
             break;
+          case TIME_MICROS:

Review comment:
       after some investigation, supporting `TIME_MILLIS` might be a bit more involved. I opened https://github.com/apache/iceberg/issues/2755




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on a change in pull request #2739: Arrow: Add support for TimeType / UUIDType

Posted by GitBox <gi...@apache.org>.
rymurr commented on a change in pull request #2739:
URL: https://github.com/apache/iceberg/pull/2739#discussion_r659596566



##########
File path: arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java
##########
@@ -169,6 +172,9 @@ public VectorHolder read(VectorHolder reuse, int numValsToRead) {
           case TIMESTAMP_MILLIS:
             vectorizedColumnIterator.nextBatchTimestampMillis(vec, typeWidth, nullabilityHolder);
             break;
+          case UUID:

Review comment:
       How come only UUID was added to this switch?

##########
File path: arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java
##########
@@ -240,6 +247,12 @@ private void allocateFieldVector(boolean dictionaryEncodedVector) {
             this.readType = ReadType.LONG;
             this.typeWidth = (int) BigIntVector.TYPE_WIDTH;
             break;
+          case TIME_MICROS:

Review comment:
       as discussed offline do we want to add support for MILLIs? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a change in pull request #2739: Arrow: Add support for TimeType / UUIDType

Posted by GitBox <gi...@apache.org>.
nastra commented on a change in pull request #2739:
URL: https://github.com/apache/iceberg/pull/2739#discussion_r659627659



##########
File path: arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java
##########
@@ -240,6 +247,12 @@ private void allocateFieldVector(boolean dictionaryEncodedVector) {
             this.readType = ReadType.LONG;
             this.typeWidth = (int) BigIntVector.TYPE_WIDTH;
             break;
+          case TIME_MICROS:

Review comment:
       It looks like Parquet's `TimeWriter` only writes micros in https://github.com/nastra/iceberg/blob/50f4ecca7711e69f63589fea828d26230fac8d59/parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetWriter.java#L256, so I think the answer would be that we don't need to handle `TIME_MILLIS`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a change in pull request #2739: Arrow: Add support for TimeType / UUIDType

Posted by GitBox <gi...@apache.org>.
nastra commented on a change in pull request #2739:
URL: https://github.com/apache/iceberg/pull/2739#discussion_r659783420



##########
File path: arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java
##########
@@ -240,6 +247,12 @@ private void allocateFieldVector(boolean dictionaryEncodedVector) {
             this.readType = ReadType.LONG;
             this.typeWidth = (int) BigIntVector.TYPE_WIDTH;
             break;
+          case TIME_MICROS:

Review comment:
       I will look into supporting `TIME_MILLIS` as a follow-up tomorrow




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a change in pull request #2739: Arrow: Add support for TimeType / UUIDType

Posted by GitBox <gi...@apache.org>.
nastra commented on a change in pull request #2739:
URL: https://github.com/apache/iceberg/pull/2739#discussion_r660417216



##########
File path: arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java
##########
@@ -240,6 +247,12 @@ private void allocateFieldVector(boolean dictionaryEncodedVector) {
             this.readType = ReadType.LONG;
             this.typeWidth = (int) BigIntVector.TYPE_WIDTH;
             break;
+          case TIME_MICROS:

Review comment:
       after some investigation, supporting `TIME_MILLIS` might be a bit more involved. I opened https://github.com/apache/iceberg/issues/2755




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr merged pull request #2739: Arrow: Add support for TimeType / UUIDType

Posted by GitBox <gi...@apache.org>.
rymurr merged pull request #2739:
URL: https://github.com/apache/iceberg/pull/2739


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr merged pull request #2739: Arrow: Add support for TimeType / UUIDType

Posted by GitBox <gi...@apache.org>.
rymurr merged pull request #2739:
URL: https://github.com/apache/iceberg/pull/2739


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a change in pull request #2739: Arrow: Add support for TimeType / UUIDType

Posted by GitBox <gi...@apache.org>.
nastra commented on a change in pull request #2739:
URL: https://github.com/apache/iceberg/pull/2739#discussion_r659604909



##########
File path: arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java
##########
@@ -169,6 +172,9 @@ public VectorHolder read(VectorHolder reuse, int numValsToRead) {
           case TIMESTAMP_MILLIS:
             vectorizedColumnIterator.nextBatchTimestampMillis(vec, typeWidth, nullabilityHolder);
             break;
+          case UUID:

Review comment:
       `TIME_MICROS` and `TIMESTAMP_MICROS` are evaluated as `LONG`: https://github.com/nastra/iceberg/blob/21af7bcd73f450e997d1af085634567a734a16b9/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java#L240-L255




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a change in pull request #2739: Arrow: Add support for TimeType / UUIDType

Posted by GitBox <gi...@apache.org>.
nastra commented on a change in pull request #2739:
URL: https://github.com/apache/iceberg/pull/2739#discussion_r659604909



##########
File path: arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java
##########
@@ -169,6 +172,9 @@ public VectorHolder read(VectorHolder reuse, int numValsToRead) {
           case TIMESTAMP_MILLIS:
             vectorizedColumnIterator.nextBatchTimestampMillis(vec, typeWidth, nullabilityHolder);
             break;
+          case UUID:

Review comment:
       `TIME_MICROS` and `TIMESTAMP_MICROS` are evaluated as `LONG`: https://github.com/nastra/iceberg/blob/21af7bcd73f450e997d1af085634567a734a16b9/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java#L240-L255
   
   so both are basically handled in the `LONG` part of that switch statement. Only `UUID` needs to be handled differently




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org