You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/12/13 09:25:47 UTC

[GitHub] [iceberg] xiaotianzhang01 opened a new pull request #3728: Spark: adds the ability to parse iceberg hexadecimal text

xiaotianzhang01 opened a new pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728


   adds the ability to assert binary arrays for "assertEquals"  and  adds  iceberg hexadecimal text parsing
   
   solve cannot convert bytes to SQL literal
   
   cc/ @aokolnychyi  @rdblue 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] xiaotianzhang01 commented on a change in pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

xiaotianzhang01 commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r779538896



##########
File path: spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/SparkTestBase.java
##########
@@ -118,6 +119,8 @@ protected long waitUntilAfter(long timestampMillis) {
             return row.getList(pos);
           } else if (value instanceof scala.collection.Map) {
             return row.getJavaMap(pos);
+          } else if (value.getClass().isArray() && value.getClass().getComponentType().isPrimitive()) {
+            return IntStream.range(0, Array.getLength(value)).mapToObj(i -> Array.get(value, i)).toArray();

Review comment:
       The `assertEquals` method cannot verify whether binary arrays are equal.  If we do not convert binary arrays to Object[], we may need to create a separate method to compare binary arrays




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #3728: Spark: adds the ability to parse iceberg hexadecimal text

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r771576907



##########
File path: spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/sql/TestSelect.java
##########
@@ -49,8 +49,8 @@ public TestSelect(String catalogName, String implementation, Map<String, String>
 
   @Before
   public void createTables() {
-    sql("CREATE TABLE %s (id bigint, data string, float float) USING iceberg", tableName);
-    sql("INSERT INTO %s VALUES (1, 'a', 1.0), (2, 'b', 2.0), (3, 'c', float('NaN'))", tableName);
+    sql("CREATE TABLE %s (id bigint, data string, float float, binary binary) USING iceberg", tableName);
+    sql("INSERT INTO %s VALUES (1, 'a', 1.0, X''), (2, 'b', 2.0, X'1111'), (3, 'c', float('NaN'), X'11')", tableName);

Review comment:
       Please don't edit existing test cases. Instead, add new test cases/methods for what you are changing.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r776502117



##########
File path: spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/sql/TestSelect.java
##########
@@ -203,4 +207,12 @@ public void testSpecifySnapshotAndTimestamp() {
             .collectAsList();
         });
   }
+
+  @Test
+  public void testBinaryInFilter() {
+    List<Object[]> expected = ImmutableList.of(row(2L, new Byte[]{0x11, 0x11}));
+
+    assertEquals("Should return all expected rows", expected,
+        sql("SELECT id,binary FROM %s where binary > X'11'", binaryTableName));

Review comment:
       Can you add a space between the columns that are being projected?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r780822348



##########
File path: spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/SparkTestBase.java
##########
@@ -118,6 +119,8 @@ protected long waitUntilAfter(long timestampMillis) {
             return row.getList(pos);
           } else if (value instanceof scala.collection.Map) {
             return row.getJavaMap(pos);
+          } else if (value.getClass().isArray() && value.getClass().getComponentType().isPrimitive()) {
+            return IntStream.range(0, Array.getLength(value)).mapToObj(i -> Array.get(value, i)).toArray();

Review comment:
       This representation should not change for tests. I think that the original type is correct. If you need to fix tests that are failing because of array comparison, then you should update the assertions instead of runtime code.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r781453391



##########
File path: spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/Spark3Util.java
##########
@@ -598,7 +598,8 @@ private static String sqlString(org.apache.iceberg.expressions.Literal<?> lit) {
       if (lit.value() instanceof String) {
         return "'" + lit.value() + "'";
       } else if (lit.value() instanceof ByteBuffer) {
-        throw new IllegalArgumentException("Cannot convert bytes to SQL literal: " + lit);
+        // produce a literal value that Spark can parse
+        return lit.toString();

Review comment:
       No, we should not rely on `toString` using a representation that works for Spark. Instead, this should convert the literal for Spark here. We may change the toString representation in `Literal` and we don't want this to depend on that method. It's fine for numbers, but there are different representations for binary that are equally valid. This is where we customize the one that Spark uses.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r778282338



##########
File path: spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/SparkTestBase.java
##########
@@ -118,6 +119,8 @@ protected long waitUntilAfter(long timestampMillis) {
             return row.getList(pos);
           } else if (value instanceof scala.collection.Map) {
             return row.getJavaMap(pos);
+          } else if (value.getClass().isArray() && value.getClass().getComponentType().isPrimitive()) {
+            return IntStream.range(0, Array.getLength(value)).mapToObj(i -> Array.get(value, i)).toArray();

Review comment:
       What does this convert the byte array to? Why does it need to change? I thought `byte[]` was fine to return.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r781455397



##########
File path: spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/SparkTestBase.java
##########
@@ -157,7 +157,11 @@ private void assertEquals(String context, Object[] expectedRow, Object[] actualR
       Object actualValue = actualRow[col];
       if (expectedValue != null && expectedValue.getClass().isArray()) {
         String newContext = String.format("%s (nested col %d)", context, col + 1);
-        assertEquals(newContext, (Object[]) expectedValue, (Object[]) actualValue);
+        if (expectedValue instanceof byte[]) {
+          Assert.assertArrayEquals(newContext, (byte[]) expectedValue, (byte[]) actualValue);
+        } else {
+          assertEquals(newContext, (Object[]) expectedValue, (Object[]) actualValue);

Review comment:
       Why does the `assertEquals` with `Object[]` not work for `byte[]`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] xiaotianzhang01 commented on a change in pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

xiaotianzhang01 commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r781716514



##########
File path: spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/Spark3Util.java
##########
@@ -598,7 +598,8 @@ private static String sqlString(org.apache.iceberg.expressions.Literal<?> lit) {
       if (lit.value() instanceof String) {
         return "'" + lit.value() + "'";
       } else if (lit.value() instanceof ByteBuffer) {
-        throw new IllegalArgumentException("Cannot convert bytes to SQL literal: " + lit);
+        // produce a literal value that Spark can parse
+        return lit.toString();

Review comment:
       Okay, got it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #3728: Spark: adds the ability to parse iceberg hexadecimal text

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r771576492



##########
File path: api/src/main/java/org/apache/iceberg/expressions/Literals.java
##########
@@ -126,7 +127,17 @@ public final ByteBuffer toByteBuffer() {
 
     @Override
     public String toString() {
-      return String.valueOf(value);
+      if (value == null) {
+        return "null";
+      }
+
+      switch (typeId()) {
+        case FIXED:
+        case BINARY:
+          return "0x" + BaseEncoding.base16().encode((toByteBuffer().array()));
+        default:
+          return String.valueOf(value);
+      }

Review comment:
       This method should not change. It is the base implementation. Instead, you should override the implementation in `BinaryLiteral` and `FixedLiteral`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] xiaotianzhang01 commented on a change in pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

xiaotianzhang01 commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r779538896



##########
File path: spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/SparkTestBase.java
##########
@@ -118,6 +119,8 @@ protected long waitUntilAfter(long timestampMillis) {
             return row.getList(pos);
           } else if (value instanceof scala.collection.Map) {
             return row.getJavaMap(pos);
+          } else if (value.getClass().isArray() && value.getClass().getComponentType().isPrimitive()) {
+            return IntStream.range(0, Array.getLength(value)).mapToObj(i -> Array.get(value, i)).toArray();

Review comment:
       The `assertEquals` method cannot verify whether binary arrays are equal.  If we do not convert binary arrays to Object arrays, we may need to create a separate method to compare binary arrays




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #3728: Spark: adds the ability to parse iceberg hexadecimal text

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r771575276



##########
File path: spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/Spark3Util.java
##########
@@ -596,7 +596,7 @@ private static String sqlString(org.apache.iceberg.expressions.Literal<?> lit) {
       if (lit.value() instanceof String) {
         return "'" + lit.value() + "'";
       } else if (lit.value() instanceof ByteBuffer) {
-        throw new IllegalArgumentException("Cannot convert bytes to SQL literal: " + lit);
+        return lit.toString();

Review comment:
       I think @kbendick is right. This is supposed to produce a literal value that Spark can parse. If you're going to implement it, then it needs to produce the correct format. Looks like that is `x'<hex>'`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] xiaotianzhang01 commented on a change in pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

xiaotianzhang01 commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r773883737



##########
File path: spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/sql/TestSelect.java
##########
@@ -49,8 +49,8 @@ public TestSelect(String catalogName, String implementation, Map<String, String>
 
   @Before
   public void createTables() {
-    sql("CREATE TABLE %s (id bigint, data string, float float) USING iceberg", tableName);
-    sql("INSERT INTO %s VALUES (1, 'a', 1.0), (2, 'b', 2.0), (3, 'c', float('NaN'))", tableName);
+    sql("CREATE TABLE %s (id bigint, data string, float float, binary binary) USING iceberg", tableName);
+    sql("INSERT INTO %s VALUES (1, 'a', 1.0, X''), (2, 'b', 2.0, X'1111'), (3, 'c', float('NaN'), X'11')", tableName);

Review comment:
       ok, thanks for your comment




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

rdblue commented on pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#issuecomment-1011254633


   Thanks, @xiaotianzhang01! Nice work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r776502425



##########
File path: spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/sql/TestSelect.java
##########
@@ -55,6 +56,8 @@ public TestSelect(String catalogName, String implementation, Map<String, String>
   public void createTables() {
     sql("CREATE TABLE %s (id bigint, data string, float float) USING iceberg", tableName);
     sql("INSERT INTO %s VALUES (1, 'a', 1.0), (2, 'b', 2.0), (3, 'c', float('NaN'))", tableName);
+    sql("CREATE TABLE %s (id bigint, binary binary) USING iceberg", binaryTableName);
+    sql("INSERT INTO %s VALUES (1, X''), (2, X'1111'), (3, X'11')", binaryTableName);

Review comment:
       This table is only used in `testBinaryInFilter`. Can you move these to the start of that test instead of creating the table for every test?
   
   You can leave the drop in `removeTables`, though. It doesn't hurt to always attempt to drop.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] xiaotianzhang01 commented on a change in pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

xiaotianzhang01 commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r780873854



##########
File path: spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/SparkTestBase.java
##########
@@ -118,6 +119,8 @@ protected long waitUntilAfter(long timestampMillis) {
             return row.getList(pos);
           } else if (value instanceof scala.collection.Map) {
             return row.getJavaMap(pos);
+          } else if (value.getClass().isArray() && value.getClass().getComponentType().isPrimitive()) {
+            return IntStream.range(0, Array.getLength(value)).mapToObj(i -> Array.get(value, i)).toArray();

Review comment:
       okey




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] izchen commented on pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

izchen commented on pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#issuecomment-1011826050


   Nice work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on a change in pull request #3728: Spark: adds the ability to parse iceberg hexadecimal text

Posted by GitBox <gi...@apache.org>.

kbendick commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r769225800



##########
File path: spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/SparkTestBase.java
##########
@@ -118,6 +119,8 @@ protected long waitUntilAfter(long timestampMillis) {
             return row.getList(pos);
           } else if (value instanceof scala.collection.Map) {
             return row.getJavaMap(pos);
+          } else if (value.getClass().isArray() && value.getClass().getComponentType().isPrimitive()) {
+            return IntStream.range(0, Array.getLength(value)).mapToObj(i -> Array.get(value, i)).toArray();

Review comment:
       Question: This change to the test class seems somewhat complicated relative to the other cases. Is there any way to check for `value instanceof scala.collection.Array` (assuming that Scala types are returned) like in the map case and in the basic list case?
   
   I guess my question is, why do we need this block when we have the check for `scala.collection.Seq` already. Can you help me understand?
   
   And possibly, since it is a bit more complicated, a comment addressing what case this is expected to handle in the code might be called for.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] xiaotianzhang01 commented on a change in pull request #3728: Spark: adds the ability to parse iceberg hexadecimal text

Posted by GitBox <gi...@apache.org>.

xiaotianzhang01 commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r768287028



##########
File path: spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/Spark3Util.java
##########
@@ -596,7 +596,7 @@ private static String sqlString(org.apache.iceberg.expressions.Literal<?> lit) {
       if (lit.value() instanceof String) {
         return "'" + lit.value() + "'";
       } else if (lit.value() instanceof ByteBuffer) {
-        throw new IllegalArgumentException("Cannot convert bytes to SQL literal: " + lit);
+        return lit.toString();

Review comment:
       The problem now is that as long as the query filter column is of the binary type, no matter what the filter condition is, it will report an error directly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on a change in pull request #3728: Spark: adds the ability to parse iceberg hexadecimal text

Posted by GitBox <gi...@apache.org>.

kbendick commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r768192493



##########
File path: spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/Spark3Util.java
##########
@@ -596,7 +596,7 @@ private static String sqlString(org.apache.iceberg.expressions.Literal<?> lit) {
       if (lit.value() instanceof String) {
         return "'" + lit.value() + "'";
       } else if (lit.value() instanceof ByteBuffer) {
-        throw new IllegalArgumentException("Cannot convert bytes to SQL literal: " + lit);
+        return lit.toString();

Review comment:
       Would these need to be quoted as SQL literals? I believe, at least with Spark, that there is a leading x followed by single quotes around the rest.
   
   Eg. when I open a spark-shell, I use the following to represent bytes.
   
   ```
   spark.sql("select x'0FA67B' as value").show
   +----------+
   |     value|
   +----------+
   |[0F A6 7B]|
   +----------+
   
   scala> spark.sql("select x'0FA67B' as value").printSchema
   root
    |-- value: binary (nullable = false)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] xiaotianzhang01 commented on pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

xiaotianzhang01 commented on pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#issuecomment-1002850916


   @rdblue Thank you very much for your review!  I have addressed your comments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] xiaotianzhang01 commented on a change in pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

xiaotianzhang01 commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r779530457



##########
File path: api/src/main/java/org/apache/iceberg/expressions/Literals.java
##########
@@ -599,6 +600,11 @@ public String toString() {
     Object writeReplace() throws ObjectStreamException {
       return new SerializationProxies.FixedLiteralProxy(value());
     }
+
+    @Override
+    public String toString() {
+      return "X'" + BaseEncoding.base16().encode((toByteBuffer().array())) + "'";

Review comment:
       okey, update




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r780822525



##########
File path: spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/Spark3Util.java
##########
@@ -598,7 +598,8 @@ private static String sqlString(org.apache.iceberg.expressions.Literal<?> lit) {
       if (lit.value() instanceof String) {
         return "'" + lit.value() + "'";
       } else if (lit.value() instanceof ByteBuffer) {
-        throw new IllegalArgumentException("Cannot convert bytes to SQL literal: " + lit);
+        // produce a literal value that Spark can parse
+        return lit.toString();

Review comment:
       It is fine to update `BinaryLiteral.toString` and `FixedLiteral.toString`, but this should not assume that the `toString` representation of a literal can be parsed by Spark. Instead, this should produce the correct form here, just like it does for `String` literals.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on a change in pull request #3728: Spark: adds the ability to parse iceberg hexadecimal text

Posted by GitBox <gi...@apache.org>.

kbendick commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r769225800



##########
File path: spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/SparkTestBase.java
##########
@@ -118,6 +119,8 @@ protected long waitUntilAfter(long timestampMillis) {
             return row.getList(pos);
           } else if (value instanceof scala.collection.Map) {
             return row.getJavaMap(pos);
+          } else if (value.getClass().isArray() && value.getClass().getComponentType().isPrimitive()) {
+            return IntStream.range(0, Array.getLength(value)).mapToObj(i -> Array.get(value, i)).toArray();

Review comment:
       Question: This change to the test class seems somewhat complicated relative to the other cases. Is there any way to check for `value instanceof scala.collection.Array` (assuming that Scala types are returned) like in the map case and in the basic list case?
   
   I guess my question is, why do we need this block when we have the check for `scala.collection.Seq` already? Can you help me understand?
   
   And possibly, since it is a bit more complicated, a comment addressing what case this is expected to handle in the code might be called for.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] xiaotianzhang01 commented on a change in pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

xiaotianzhang01 commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r773880957



##########
File path: api/src/main/java/org/apache/iceberg/expressions/Literals.java
##########
@@ -126,7 +127,17 @@ public final ByteBuffer toByteBuffer() {
 
     @Override
     public String toString() {
-      return String.valueOf(value);
+      if (value == null) {
+        return "null";
+      }
+
+      switch (typeId()) {
+        case FIXED:
+        case BINARY:
+          return "0x" + BaseEncoding.base16().encode((toByteBuffer().array()));
+        default:
+          return String.valueOf(value);
+      }

Review comment:
       ok, thanks for your reply




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] xiaotianzhang01 commented on a change in pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

xiaotianzhang01 commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r780874288



##########
File path: spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/Spark3Util.java
##########
@@ -598,7 +598,8 @@ private static String sqlString(org.apache.iceberg.expressions.Literal<?> lit) {
       if (lit.value() instanceof String) {
         return "'" + lit.value() + "'";
       } else if (lit.value() instanceof ByteBuffer) {
-        throw new IllegalArgumentException("Cannot convert bytes to SQL literal: " + lit);
+        // produce a literal value that Spark can parse
+        return lit.toString();

Review comment:
       If we implement `BinaryLiteral.toString` and `FixedLiteral.toString`, then we don't need to do it separately for ByteBuffer




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r778280394



##########
File path: api/src/main/java/org/apache/iceberg/expressions/Literals.java
##########
@@ -599,6 +600,11 @@ public String toString() {
     Object writeReplace() throws ObjectStreamException {
       return new SerializationProxies.FixedLiteralProxy(value());
     }
+
+    @Override
+    public String toString() {
+      return "X'" + BaseEncoding.base16().encode((toByteBuffer().array())) + "'";

Review comment:
       It is not safe to use `ByteBuffer.array` like this. Can you please use `ByteBuffers.toByteArray`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] xiaotianzhang01 commented on pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

xiaotianzhang01 commented on pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#issuecomment-1000151988


   @rdblue @kbendick Please take a look at it again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] xiaotianzhang01 commented on a change in pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

xiaotianzhang01 commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r773882823



##########
File path: spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/SparkTestBase.java
##########
@@ -118,6 +119,8 @@ protected long waitUntilAfter(long timestampMillis) {
             return row.getList(pos);
           } else if (value instanceof scala.collection.Map) {
             return row.getJavaMap(pos);
+          } else if (value.getClass().isArray() && value.getClass().getComponentType().isPrimitive()) {
+            return IntStream.range(0, Array.getLength(value)).mapToObj(i -> Array.get(value, i)).toArray();

Review comment:
       yeap, i’ll simplify the code and add instructions




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] xiaotianzhang01 commented on a change in pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

xiaotianzhang01 commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r776551521



##########
File path: spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/sql/TestSelect.java
##########
@@ -55,6 +56,8 @@ public TestSelect(String catalogName, String implementation, Map<String, String>
   public void createTables() {
     sql("CREATE TABLE %s (id bigint, data string, float float) USING iceberg", tableName);
     sql("INSERT INTO %s VALUES (1, 'a', 1.0), (2, 'b', 2.0), (3, 'c', float('NaN'))", tableName);
+    sql("CREATE TABLE %s (id bigint, binary binary) USING iceberg", binaryTableName);
+    sql("INSERT INTO %s VALUES (1, X''), (2, X'1111'), (3, X'11')", binaryTableName);

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] xiaotianzhang01 commented on a change in pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

xiaotianzhang01 commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r776549914



##########
File path: spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/sql/TestSelect.java
##########
@@ -203,4 +207,12 @@ public void testSpecifySnapshotAndTimestamp() {
             .collectAsList();
         });
   }
+
+  @Test
+  public void testBinaryInFilter() {
+    List<Object[]> expected = ImmutableList.of(row(2L, new Byte[]{0x11, 0x11}));
+
+    assertEquals("Should return all expected rows", expected,
+        sql("SELECT id,binary FROM %s where binary > X'11'", binaryTableName));

Review comment:
       okey, done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on a change in pull request #3728: Spark: adds the ability to parse iceberg hexadecimal text

Posted by GitBox <gi...@apache.org>.

kbendick commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r769029170



##########
File path: spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/Spark3Util.java
##########
@@ -596,7 +596,7 @@ private static String sqlString(org.apache.iceberg.expressions.Literal<?> lit) {
       if (lit.value() instanceof String) {
         return "'" + lit.value() + "'";
       } else if (lit.value() instanceof ByteBuffer) {
-        throw new IllegalArgumentException("Cannot convert bytes to SQL literal: " + lit);
+        return lit.toString();

Review comment:
       Ah ok. I was just wondering about the formatting of the final string. Anytime I see a `toString` call, I get a bit suspicious. It was the leading `0` before the X that threw me off.
   
   But in this area, it is indeed the correct way to go. And it does appear that your unit tests are passing and you're using the expected spark literal syntax there. Thanks for the explanation. 👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on pull request #3728: Spark: adds the ability to parse iceberg hexadecimal text

Posted by GitBox <gi...@apache.org>.

rdblue commented on pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#issuecomment-996906901


   @xiaotianzhang01 can you fix the description and summary for this? This is not adding any parsing to Iceberg, so the title is misleading. Instead, it producing Spark's SQL literal format for binary and updating the toString representation of binary/fixed literals.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] xiaotianzhang01 commented on a change in pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

xiaotianzhang01 commented on a change in pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728#discussion_r781712277



##########
File path: spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/SparkTestBase.java
##########
@@ -157,7 +157,11 @@ private void assertEquals(String context, Object[] expectedRow, Object[] actualR
       Object actualValue = actualRow[col];
       if (expectedValue != null && expectedValue.getClass().isArray()) {
         String newContext = String.format("%s (nested col %d)", context, col + 1);
-        assertEquals(newContext, (Object[]) expectedValue, (Object[]) actualValue);
+        if (expectedValue instanceof byte[]) {
+          Assert.assertArrayEquals(newContext, (byte[]) expectedValue, (byte[]) actualValue);
+        } else {
+          assertEquals(newContext, (Object[]) expectedValue, (Object[]) actualValue);

Review comment:
       The SQL query returns `byte[]` instead of `Byte[]`. The encapsulated type `Byte[]` can be converted to `Object[]`, but the conversion of the basic type `byte[]` will report an error, `[B cannot be cast to [Ljava.lang.Object`;




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue merged pull request #3728: Spark: Generate Spark's SQL literal format for binary and update the toString representation of binary/fixed literals.

Posted by GitBox <gi...@apache.org>.

rdblue merged pull request #3728:
URL: https://github.com/apache/iceberg/pull/3728


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org