You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "nastra (via GitHub)" <gi...@apache.org> on 2023/04/26 08:58:40 UTC

[GitHub] [iceberg] nastra opened a new pull request, #7399: Spark: Add read/write support for UUIDs

nastra opened a new pull request, #7399:
URL: https://github.com/apache/iceberg/pull/7399

   fixes https://github.com/apache/iceberg/issues/4581


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1177135907


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkParquetWriters.java:
##########
@@ -316,6 +323,35 @@ public void write(int repetitionLevel, Decimal decimal) {
     }
   }
 
+  private static PrimitiveWriter<UTF8String> uuids(ColumnDescriptor desc) {
+    return new UUIDWriter(desc);
+  }
+
+  private static class UUIDWriter extends PrimitiveWriter<UTF8String> {
+    private static final ThreadLocal<ByteBuffer> BUFFER =
+        ThreadLocal.withInitial(
+            () -> {
+              ByteBuffer buffer = ByteBuffer.allocate(16);
+              buffer.order(ByteOrder.BIG_ENDIAN);
+              return buffer;
+            });
+
+    private UUIDWriter(ColumnDescriptor desc) {
+      super(desc);
+    }
+
+    @Override
+    public void write(int repetitionLevel, UTF8String string) {
+      UUID uuid = UUID.fromString(string.toString());
+      ByteBuffer buffer = BUFFER.get();
+      buffer.rewind();
+      buffer.putLong(uuid.getMostSignificantBits());
+      buffer.putLong(uuid.getLeastSignificantBits());

Review Comment:
   In other places, like `UUIDUtil`, we use `putLong(int offset, long value)` instead of `putLong(long value)` so that the position is not updated and we don't need to worry about the buffer's internal state. I think that's usually a better approach.
   
   Also, we might want to update `UUIDUtil` to share this code:
   
   ```java
     public static ByteBuffer convertToByteBuffer(UUID value) {
       return convertToByteBuffer(value, null);
     }
   
     public static ByteBuffer convertToByteBuffer(UUID value, ByteBuffer reuse) {
       ByteBuffer buffer;
       if (reuse != null) {
         buffer = reuse;
       } else {
         buffer = ByteBuffer.allocate(16);
       }
   
       buffer.order(ByteOrder.BIG_ENDIAN);
       buffer.putLong(0, value.getMostSignificantBits());
       buffer.putLong(8, value.getLeastSignificantBits());
       return buffer;
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1177090287


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueReaders.java:
##########
@@ -170,6 +177,31 @@ public UTF8String nonNullRead(ColumnVector vector, int row) {
     }
   }
 
+  private static class UUIDReader implements OrcValueReader<UTF8String> {
+    private static final ThreadLocal<ByteBuffer> BUFFER =
+        ThreadLocal.withInitial(
+            () -> {
+              ByteBuffer buffer = ByteBuffer.allocate(16);
+              buffer.order(ByteOrder.BIG_ENDIAN);
+              return buffer;
+            });
+
+    private static final UUIDReader INSTANCE = new UUIDReader();
+
+    private UUIDReader() {}
+
+    @Override
+    public UTF8String nonNullRead(ColumnVector vector, int row) {
+      BytesColumnVector bytesVector = (BytesColumnVector) vector;
+      ByteBuffer buffer = BUFFER.get();
+      buffer.rewind();
+      buffer.put(bytesVector.vector[row], bytesVector.start[row], bytesVector.length[row]);

Review Comment:
   Length is always 16 right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1177137576


##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/data/RandomData.java:
##########
@@ -329,6 +329,8 @@ public Object primitive(Type.PrimitiveType primitive) {
           return UTF8String.fromString((String) obj);
         case DECIMAL:
           return Decimal.apply((BigDecimal) obj);
+        case UUID:
+          return UTF8String.fromString(UUID.nameUUIDFromBytes((byte[]) obj).toString());

Review Comment:
   Why does `generatePrimitive` provide `byte[]`? Shouldn't it create a String for Spark already?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1177410076


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkParquetWriters.java:
##########
@@ -176,6 +180,9 @@ public ParquetValueWriter<?> primitive(DataType sType, PrimitiveType primitive)
       switch (primitive.getPrimitiveTypeName()) {
         case FIXED_LEN_BYTE_ARRAY:
         case BINARY:
+          if (LogicalTypeAnnotation.uuidType().equals(primitive.getLogicalTypeAnnotation())) {

Review Comment:
   yes correct, it is being set in https://github.com/apache/iceberg/blob/master/parquet/src/main/java/org/apache/iceberg/parquet/TypeToMessageType.java#L186



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #7399: Spark: Add read/write support for UUIDs

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#issuecomment-1530599986

   Thanks, @nastra!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1177091707


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcWriter.java:
##########
@@ -173,7 +176,13 @@ static FieldGetter<?> createFieldGetter(TypeDescription fieldType) {
         fieldGetter = SpecializedGetters::getDouble;
         break;
       case BINARY:
-        fieldGetter = SpecializedGetters::getBinary;
+        if (ORCSchemaUtil.BinaryType.UUID
+            .toString()
+            .equals(fieldType.getAttributeValue(ORCSchemaUtil.ICEBERG_BINARY_TYPE_ATTRIBUTE))) {

Review Comment:
   `equalsIgnoreCase`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1177406276


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ArrowVectorAccessorFactory.java:
##########
@@ -74,6 +76,11 @@ public UTF8String ofRow(VarCharVector vector, int rowId) {
           null, vector.getDataBuffer().memoryAddress() + start, end - start);
     }
 
+    @Override
+    public UTF8String ofRow(FixedSizeBinaryVector vector, int rowId) {
+      return UTF8String.fromString(UUIDUtil.convert(vector.get(rowId)).toString());

Review Comment:
   `vector.get(rowId)` will return the byte[] with a length of 16 for the given `rowId`. I think we could get underlying array and offset from the underlying `ArrowBuf`, but we would need to read it into a new byte[], which is what `vector.get(rowId)` is doing underneath 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1177091236


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueReaders.java:
##########
@@ -170,6 +177,31 @@ public UTF8String nonNullRead(ColumnVector vector, int row) {
     }
   }
 
+  private static class UUIDReader implements OrcValueReader<UTF8String> {
+    private static final ThreadLocal<ByteBuffer> BUFFER =
+        ThreadLocal.withInitial(
+            () -> {
+              ByteBuffer buffer = ByteBuffer.allocate(16);
+              buffer.order(ByteOrder.BIG_ENDIAN);

Review Comment:
   I think it's usually good to be explicit. Are we sure this the default, or is it the default on certain architectures?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1177133537


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkParquetWriters.java:
##########
@@ -176,6 +180,9 @@ public ParquetValueWriter<?> primitive(DataType sType, PrimitiveType primitive)
       switch (primitive.getPrimitiveTypeName()) {
         case FIXED_LEN_BYTE_ARRAY:
         case BINARY:
+          if (LogicalTypeAnnotation.uuidType().equals(primitive.getLogicalTypeAnnotation())) {

Review Comment:
   The logical type annotation will be set by Parquet conversion?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1177361153


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueReaders.java:
##########
@@ -170,6 +177,31 @@ public UTF8String nonNullRead(ColumnVector vector, int row) {
     }
   }
 
+  private static class UUIDReader implements OrcValueReader<UTF8String> {
+    private static final ThreadLocal<ByteBuffer> BUFFER =
+        ThreadLocal.withInitial(
+            () -> {
+              ByteBuffer buffer = ByteBuffer.allocate(16);
+              buffer.order(ByteOrder.BIG_ENDIAN);
+              return buffer;
+            });
+
+    private static final UUIDReader INSTANCE = new UUIDReader();
+
+    private UUIDReader() {}
+
+    @Override
+    public UTF8String nonNullRead(ColumnVector vector, int row) {
+      BytesColumnVector bytesVector = (BytesColumnVector) vector;

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1177361899


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueReaders.java:
##########
@@ -170,6 +177,31 @@ public UTF8String nonNullRead(ColumnVector vector, int row) {
     }
   }
 
+  private static class UUIDReader implements OrcValueReader<UTF8String> {
+    private static final ThreadLocal<ByteBuffer> BUFFER =
+        ThreadLocal.withInitial(
+            () -> {
+              ByteBuffer buffer = ByteBuffer.allocate(16);
+              buffer.order(ByteOrder.BIG_ENDIAN);
+              return buffer;
+            });
+
+    private static final UUIDReader INSTANCE = new UUIDReader();
+
+    private UUIDReader() {}
+
+    @Override
+    public UTF8String nonNullRead(ColumnVector vector, int row) {
+      BytesColumnVector bytesVector = (BytesColumnVector) vector;
+      ByteBuffer buffer = BUFFER.get();
+      buffer.rewind();
+      buffer.put(bytesVector.vector[row], bytesVector.start[row], bytesVector.length[row]);

Review Comment:
   yes the length here is always 16 of the UUID bytes array



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1184785254


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkParquetWriters.java:
##########
@@ -316,6 +323,35 @@ public void write(int repetitionLevel, Decimal decimal) {
     }
   }
 
+  private static PrimitiveWriter<UTF8String> uuids(ColumnDescriptor desc) {
+    return new UUIDWriter(desc);
+  }
+
+  private static class UUIDWriter extends PrimitiveWriter<UTF8String> {
+    private static final ThreadLocal<ByteBuffer> BUFFER =
+        ThreadLocal.withInitial(
+            () -> {
+              ByteBuffer buffer = ByteBuffer.allocate(16);
+              buffer.order(ByteOrder.BIG_ENDIAN);
+              return buffer;
+            });
+
+    private UUIDWriter(ColumnDescriptor desc) {
+      super(desc);
+    }
+
+    @Override
+    public void write(int repetitionLevel, UTF8String string) {
+      UUID uuid = UUID.fromString(string.toString());
+      ByteBuffer buffer = BUFFER.get();
+      buffer.rewind();
+      buffer.putLong(uuid.getMostSignificantBits());
+      buffer.putLong(uuid.getLeastSignificantBits());

Review Comment:
   I've opened https://github.com/apache/iceberg/pull/7525 to address those other places in Spark



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1177136358


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ArrowVectorAccessorFactory.java:
##########
@@ -74,6 +76,11 @@ public UTF8String ofRow(VarCharVector vector, int rowId) {
           null, vector.getDataBuffer().memoryAddress() + start, end - start);
     }
 
+    @Override
+    public UTF8String ofRow(FixedSizeBinaryVector vector, int rowId) {
+      return UTF8String.fromString(UUIDUtil.convert(vector.get(rowId)).toString());

Review Comment:
   Is there a way to get the underlying array and offset?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1177361899


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueReaders.java:
##########
@@ -170,6 +177,31 @@ public UTF8String nonNullRead(ColumnVector vector, int row) {
     }
   }
 
+  private static class UUIDReader implements OrcValueReader<UTF8String> {
+    private static final ThreadLocal<ByteBuffer> BUFFER =
+        ThreadLocal.withInitial(
+            () -> {
+              ByteBuffer buffer = ByteBuffer.allocate(16);
+              buffer.order(ByteOrder.BIG_ENDIAN);
+              return buffer;
+            });
+
+    private static final UUIDReader INSTANCE = new UUIDReader();
+
+    private UUIDReader() {}
+
+    @Override
+    public UTF8String nonNullRead(ColumnVector vector, int row) {
+      BytesColumnVector bytesVector = (BytesColumnVector) vector;
+      ByteBuffer buffer = BUFFER.get();
+      buffer.rewind();
+      buffer.put(bytesVector.vector[row], bytesVector.start[row], bytesVector.length[row]);

Review Comment:
   yes the lengh here is always 16 of the UUID bytes array



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1182149652


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueWriters.java:
##########
@@ -73,6 +80,16 @@ public void nonNullWrite(int rowId, UTF8String data, ColumnVector output) {
     }
   }
 
+  private static class UUIDWriter implements OrcValueWriter<UTF8String> {
+    private static final UUIDWriter INSTANCE = new UUIDWriter();
+
+    @Override
+    public void nonNullWrite(int rowId, UTF8String data, ColumnVector output) {
+      ByteBuffer buffer = UUIDUtil.convertToByteBuffer(UUID.fromString(data.toString()));

Review Comment:
   makes sense, I've added a comment to this as part of https://github.com/apache/iceberg/pull/7496



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1174017002


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueReaders.java:
##########
@@ -170,6 +177,31 @@ public UTF8String nonNullRead(ColumnVector vector, int row) {
     }
   }
 
+  private static class UUIDReader implements OrcValueReader<UTF8String> {
+    private static final ThreadLocal<ByteBuffer> BUFFER =
+        ThreadLocal.withInitial(
+            () -> {
+              ByteBuffer buffer = ByteBuffer.allocate(16);
+              buffer.order(ByteOrder.BIG_ENDIAN);
+              return buffer;
+            });
+
+    private static final UUIDReader INSTANCE = new UUIDReader();
+
+    private UUIDReader() {}
+
+    @Override
+    public UTF8String nonNullRead(ColumnVector vector, int row) {
+      BytesColumnVector bytesVector = (BytesColumnVector) vector;

Review Comment:
   Do we need to put this into a buffer? Just wondering if we should just use the array itself?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1175364019


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueReaders.java:
##########
@@ -170,6 +177,31 @@ public UTF8String nonNullRead(ColumnVector vector, int row) {
     }
   }
 
+  private static class UUIDReader implements OrcValueReader<UTF8String> {
+    private static final ThreadLocal<ByteBuffer> BUFFER =
+        ThreadLocal.withInitial(
+            () -> {
+              ByteBuffer buffer = ByteBuffer.allocate(16);
+              buffer.order(ByteOrder.BIG_ENDIAN);
+              return buffer;
+            });
+
+    private static final UUIDReader INSTANCE = new UUIDReader();
+
+    private UUIDReader() {}
+
+    @Override
+    public UTF8String nonNullRead(ColumnVector vector, int row) {
+      BytesColumnVector bytesVector = (BytesColumnVector) vector;

Review Comment:
   I was mainly aligning with other places in the code that also used a Thread local ByteBuffer when reading UUIds, but we might be able to use the array itself (need to double-check that)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1177397388


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkParquetWriters.java:
##########
@@ -316,6 +323,35 @@ public void write(int repetitionLevel, Decimal decimal) {
     }
   }
 
+  private static PrimitiveWriter<UTF8String> uuids(ColumnDescriptor desc) {
+    return new UUIDWriter(desc);
+  }
+
+  private static class UUIDWriter extends PrimitiveWriter<UTF8String> {
+    private static final ThreadLocal<ByteBuffer> BUFFER =
+        ThreadLocal.withInitial(
+            () -> {
+              ByteBuffer buffer = ByteBuffer.allocate(16);
+              buffer.order(ByteOrder.BIG_ENDIAN);
+              return buffer;
+            });
+
+    private UUIDWriter(ColumnDescriptor desc) {
+      super(desc);
+    }
+
+    @Override
+    public void write(int repetitionLevel, UTF8String string) {
+      UUID uuid = UUID.fromString(string.toString());
+      ByteBuffer buffer = BUFFER.get();
+      buffer.rewind();
+      buffer.putLong(uuid.getMostSignificantBits());
+      buffer.putLong(uuid.getLeastSignificantBits());

Review Comment:
   that makes sense, I've updated that. 
   We still have a few places in the code that do `buffer.putLong(uuid.getMostSignificantBits());`. I'll follow up on those and update them independently.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1177089702


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueWriters.java:
##########
@@ -73,6 +80,16 @@ public void nonNullWrite(int rowId, UTF8String data, ColumnVector output) {
     }
   }
 
+  private static class UUIDWriter implements OrcValueWriter<UTF8String> {
+    private static final UUIDWriter INSTANCE = new UUIDWriter();
+
+    @Override
+    public void nonNullWrite(int rowId, UTF8String data, ColumnVector output) {
+      ByteBuffer buffer = UUIDUtil.convertToByteBuffer(UUID.fromString(data.toString()));

Review Comment:
   This allocates a buffer. We may want to have a buffer here as a thread-local or a field to avoid allocation in a tight loop.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #7399: Spark: Add read/write support for UUIDs

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue merged PR #7399:
URL: https://github.com/apache/iceberg/pull/7399


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1177133107


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkParquetReaders.java:
##########
@@ -282,6 +284,9 @@ public ParquetValueReader<?> primitive(
       switch (primitive.getPrimitiveTypeName()) {
         case FIXED_LEN_BYTE_ARRAY:
         case BINARY:
+          if (expected != null && expected.typeId() == TypeID.UUID) {

Review Comment:
   Do we need to check the logical type in the switch above as well?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1177420757


##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/data/RandomData.java:
##########
@@ -329,6 +329,8 @@ public Object primitive(Type.PrimitiveType primitive) {
           return UTF8String.fromString((String) obj);
         case DECIMAL:
           return Decimal.apply((BigDecimal) obj);
+        case UUID:
+          return UTF8String.fromString(UUID.nameUUIDFromBytes((byte[]) obj).toString());

Review Comment:
   my guess would be because `RandomUtil.generatePrimitive(..)` is used in other places where UUIDs are expected to be `byte[]`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1175364019


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueReaders.java:
##########
@@ -170,6 +177,31 @@ public UTF8String nonNullRead(ColumnVector vector, int row) {
     }
   }
 
+  private static class UUIDReader implements OrcValueReader<UTF8String> {
+    private static final ThreadLocal<ByteBuffer> BUFFER =
+        ThreadLocal.withInitial(
+            () -> {
+              ByteBuffer buffer = ByteBuffer.allocate(16);
+              buffer.order(ByteOrder.BIG_ENDIAN);
+              return buffer;
+            });
+
+    private static final UUIDReader INSTANCE = new UUIDReader();
+
+    private UUIDReader() {}
+
+    @Override
+    public UTF8String nonNullRead(ColumnVector vector, int row) {
+      BytesColumnVector bytesVector = (BytesColumnVector) vector;

Review Comment:
   I was mainly aligning with other places in the code that also used a Thread local ByteBuffer when reading UUIDs, but need to double-check whether we can use the array directly here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1174008034


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueReaders.java:
##########
@@ -170,6 +177,31 @@ public UTF8String nonNullRead(ColumnVector vector, int row) {
     }
   }
 
+  private static class UUIDReader implements OrcValueReader<UTF8String> {
+    private static final ThreadLocal<ByteBuffer> BUFFER =
+        ThreadLocal.withInitial(
+            () -> {
+              ByteBuffer buffer = ByteBuffer.allocate(16);
+              buffer.order(ByteOrder.BIG_ENDIAN);

Review Comment:
   This is the default right? Just setting it to be sure?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1177093061


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueReaders.java:
##########
@@ -170,6 +177,31 @@ public UTF8String nonNullRead(ColumnVector vector, int row) {
     }
   }
 
+  private static class UUIDReader implements OrcValueReader<UTF8String> {
+    private static final ThreadLocal<ByteBuffer> BUFFER =
+        ThreadLocal.withInitial(
+            () -> {
+              ByteBuffer buffer = ByteBuffer.allocate(16);
+              buffer.order(ByteOrder.BIG_ENDIAN);

Review Comment:
   https://docs.oracle.com/javase/7/docs/api/java/nio/ByteBuffer.html#order()
   
   ```
   The byte order is used when reading or writing multibyte values, and when creating buffers that are views of this byte buffer. The order of a newly-created byte buffer is always [BIG_ENDIAN](https://docs.oracle.com/javase/7/docs/api/java/nio/ByteOrder.html#BIG_ENDIAN).
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1177133923


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkParquetWriters.java:
##########
@@ -316,6 +323,35 @@ public void write(int repetitionLevel, Decimal decimal) {
     }
   }
 
+  private static PrimitiveWriter<UTF8String> uuids(ColumnDescriptor desc) {

Review Comment:
   The other factory methods are grouped at the top just under the visitor. Should we move this there as well?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1177438860


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueWriters.java:
##########
@@ -73,6 +80,16 @@ public void nonNullWrite(int rowId, UTF8String data, ColumnVector output) {
     }
   }
 
+  private static class UUIDWriter implements OrcValueWriter<UTF8String> {
+    private static final UUIDWriter INSTANCE = new UUIDWriter();
+
+    @Override
+    public void nonNullWrite(int rowId, UTF8String data, ColumnVector output) {
+      ByteBuffer buffer = UUIDUtil.convertToByteBuffer(UUID.fromString(data.toString()));

Review Comment:
   I agree with that observation and I initially used a Thread local to reduce byte[] allocation but couldn't get it to work because `((BytesColumnVector) output).setRef(..)` would just store a reference to the passed byte[] and on subsequent writes we would end up overwriting previous values.
   Worth mentioning that `GenericOrcWriters` does the same thing when [writing UUIDs](https://github.com/apache/iceberg/blob/c07f2aabc0a1d02f068ecf1514d2479c0fbdd3b0/orc/src/main/java/org/apache/iceberg/data/orc/GenericOrcWriters.java#L298-L302).
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra closed pull request #7399: Spark: Add read/write support for UUIDs

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra closed pull request #7399: Spark: Add read/write support for UUIDs
URL: https://github.com/apache/iceberg/pull/7399


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1177090660


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueReaders.java:
##########
@@ -170,6 +177,31 @@ public UTF8String nonNullRead(ColumnVector vector, int row) {
     }
   }
 
+  private static class UUIDReader implements OrcValueReader<UTF8String> {
+    private static final ThreadLocal<ByteBuffer> BUFFER =
+        ThreadLocal.withInitial(
+            () -> {
+              ByteBuffer buffer = ByteBuffer.allocate(16);
+              buffer.order(ByteOrder.BIG_ENDIAN);
+              return buffer;
+            });
+
+    private static final UUIDReader INSTANCE = new UUIDReader();
+
+    private UUIDReader() {}
+
+    @Override
+    public UTF8String nonNullRead(ColumnVector vector, int row) {
+      BytesColumnVector bytesVector = (BytesColumnVector) vector;

Review Comment:
   Using the array directly would be ideal.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1177407741


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcWriter.java:
##########
@@ -173,7 +176,13 @@ static FieldGetter<?> createFieldGetter(TypeDescription fieldType) {
         fieldGetter = SpecializedGetters::getDouble;
         break;
       case BINARY:
-        fieldGetter = SpecializedGetters::getBinary;
+        if (ORCSchemaUtil.BinaryType.UUID
+            .toString()
+            .equals(fieldType.getAttributeValue(ORCSchemaUtil.ICEBERG_BINARY_TYPE_ATTRIBUTE))) {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1176731882


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueReaders.java:
##########
@@ -170,6 +177,31 @@ public UTF8String nonNullRead(ColumnVector vector, int row) {
     }
   }
 
+  private static class UUIDReader implements OrcValueReader<UTF8String> {
+    private static final ThreadLocal<ByteBuffer> BUFFER =
+        ThreadLocal.withInitial(
+            () -> {
+              ByteBuffer buffer = ByteBuffer.allocate(16);
+              buffer.order(ByteOrder.BIG_ENDIAN);

Review Comment:
   Not that it's incorrect, it's just the default for all new ByteBuffers. Just wondering why we were setting it explicitly 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1176728148


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueReaders.java:
##########
@@ -170,6 +177,31 @@ public UTF8String nonNullRead(ColumnVector vector, int row) {
     }
   }
 
+  private static class UUIDReader implements OrcValueReader<UTF8String> {
+    private static final ThreadLocal<ByteBuffer> BUFFER =
+        ThreadLocal.withInitial(
+            () -> {
+              ByteBuffer buffer = ByteBuffer.allocate(16);
+              buffer.order(ByteOrder.BIG_ENDIAN);

Review Comment:
   Big endian is correct. See `UUIDUtil` for another implementation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1177362544


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkParquetWriters.java:
##########
@@ -316,6 +323,35 @@ public void write(int repetitionLevel, Decimal decimal) {
     }
   }
 
+  private static PrimitiveWriter<UTF8String> uuids(ColumnDescriptor desc) {

Review Comment:
   yep, done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1175489779


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueReaders.java:
##########
@@ -170,6 +177,31 @@ public UTF8String nonNullRead(ColumnVector vector, int row) {
     }
   }
 
+  private static class UUIDReader implements OrcValueReader<UTF8String> {
+    private static final ThreadLocal<ByteBuffer> BUFFER =
+        ThreadLocal.withInitial(
+            () -> {
+              ByteBuffer buffer = ByteBuffer.allocate(16);
+              buffer.order(ByteOrder.BIG_ENDIAN);

Review Comment:
   I was mainly aligning with other places in the code that also used a Thread local ByteBuffer when reading UUIDs



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7399: Spark: Add read/write support for UUIDs

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7399:
URL: https://github.com/apache/iceberg/pull/7399#discussion_r1181973948


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueWriters.java:
##########
@@ -73,6 +80,16 @@ public void nonNullWrite(int rowId, UTF8String data, ColumnVector output) {
     }
   }
 
+  private static class UUIDWriter implements OrcValueWriter<UTF8String> {
+    private static final UUIDWriter INSTANCE = new UUIDWriter();
+
+    @Override
+    public void nonNullWrite(int rowId, UTF8String data, ColumnVector output) {
+      ByteBuffer buffer = UUIDUtil.convertToByteBuffer(UUID.fromString(data.toString()));

Review Comment:
   Probably worth mentioning in a comment?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org