You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/01/26 07:27:48 UTC

[GitHub] [incubator-iceberg] jun-he opened a new pull request #749: Convert Spark In filter to iceberg IN Expression

jun-he opened a new pull request #749: Convert Spark In filter to iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749
 
 
   Issue #748
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [incubator-iceberg] aokolnychyi commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

Posted by GitBox <gi...@apache.org>.

aokolnychyi commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r371641202
 
 

 ##########
 File path: spark/src/main/java/org/apache/iceberg/spark/SparkFilters.java
 ##########
 @@ -122,11 +122,7 @@ public static Expression convert(Filter filter) {
 
         case IN:
           In inFilter = (In) filter;
-          Expression in = alwaysFalse();
-          for (Object value : inFilter.values()) {
-            in = or(in, equal(inFilter.attribute(), convertLiteral(value)));
-          }
-          return in;
+          return in(inFilter.attribute(), inFilter.values());
 
 Review comment:
   Actually, null is never equal to null.
   ```
   null IN (1, 2, 3) -> null
   null IN (1, 2, 3, null) -> null
   1 IN (1, 2, 3) -> true
   1 IN (1, 2, 3, null) -> true
   -1 IN (1, 2, 3) -> false
   -1 IN (1, 2, 3, null) -> null
   ```
   I wonder if we can simply filter out null values to avoid the exception. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r371388439
 
 

 ##########
 File path: spark/src/main/java/org/apache/iceberg/spark/SparkFilters.java
 ##########
 @@ -122,11 +122,7 @@ public static Expression convert(Filter filter) {
 
         case IN:
           In inFilter = (In) filter;
-          Expression in = alwaysFalse();
-          for (Object value : inFilter.values()) {
-            in = or(in, equal(inFilter.attribute(), convertLiteral(value)));
-          }
-          return in;
+          return in(inFilter.attribute(), inFilter.values());
 
 Review comment:
   This dropped the call to `convertLiteral` for each value. That converts values that Spark uses into Iceberg values. I think we still need to call it:
   
   ```java
             In inFilter = (In) filter;
             List<Object> nonNullLiterals = Stream.of(inFilter.values())
                 .filter(Objects::nonNull)
                 .map(SparkFilters::convertLiteral)
                 .collect(Collectors.toList());
             return Expressions.in(inFilter.attribute(), nonNullLiterals);
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [incubator-iceberg] aokolnychyi commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

Posted by GitBox <gi...@apache.org>.

aokolnychyi commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r371261479
 
 

 ##########
 File path: spark/src/test/java/org/apache/iceberg/spark/source/TestFilteredScan.java
 ##########
 @@ -425,6 +426,21 @@ public void testFilterByNonProjectedColumn() {
     }
   }
 
+  @Test
+  public void testInFilter() {
+    File location = buildPartitionedTable("partitioned_by_data", PARTITION_BY_DATA, "data_ident", "data");
+
+    DataSourceOptions options = new DataSourceOptions(ImmutableMap.of(
+        "path", location.toString())
+    );
+
+    IcebergSource source = new IcebergSource();
+    DataSourceReader reader = source.createReader(options);
+    pushFilters(reader, new In("data", new String[]{"foo", "junction", "brush"}));
 
 Review comment:
   Will it be hard to add tests for a couple of other data types?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue merged pull request #749: Convert Spark In filter to iceberg IN Expression

Posted by GitBox <gi...@apache.org>.

rdblue merged pull request #749: Convert Spark In filter to iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r371385370
 
 

 ##########
 File path: spark/src/main/java/org/apache/iceberg/spark/SparkFilters.java
 ##########
 @@ -122,11 +122,7 @@ public static Expression convert(Filter filter) {
 
         case IN:
           In inFilter = (In) filter;
-          Expression in = alwaysFalse();
-          for (Object value : inFilter.values()) {
-            in = or(in, equal(inFilter.attribute(), convertLiteral(value)));
-          }
-          return in;
+          return in(inFilter.attribute(), inFilter.values());
 
 Review comment:
   Now that I read this a second time, I think you may be talking about when Spark passes a null value here?
   
   I agree that's a problem. This should detect and filter out null values because Iceberg's `in` expression will throw an exception if it encounters them. If there is a null, then this can return `or(isNull(attr), in(attr, nonNullValues))`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [incubator-iceberg] jun-he commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

Posted by GitBox <gi...@apache.org>.

jun-he commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r372840052
 
 

 ##########
 File path: spark/src/test/java/org/apache/iceberg/spark/source/TestFilteredScan.java
 ##########
 @@ -425,6 +426,21 @@ public void testFilterByNonProjectedColumn() {
     }
   }
 
+  @Test
+  public void testInFilter() {
+    File location = buildPartitionedTable("partitioned_by_data", PARTITION_BY_DATA, "data_ident", "data");
+
+    DataSourceOptions options = new DataSourceOptions(ImmutableMap.of(
+        "path", location.toString())
+    );
+
+    IcebergSource source = new IcebergSource();
+    DataSourceReader reader = source.createReader(options);
+    pushFilters(reader, new In("data", new String[]{"foo", "junction", "brush"}));
 
 Review comment:
   👌 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [incubator-iceberg] aokolnychyi commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

Posted by GitBox <gi...@apache.org>.

aokolnychyi commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r371261529
 
 

 ##########
 File path: spark/src/main/java/org/apache/iceberg/spark/SparkFilters.java
 ##########
 @@ -122,11 +122,7 @@ public static Expression convert(Filter filter) {
 
         case IN:
           In inFilter = (In) filter;
-          Expression in = alwaysFalse();
-          for (Object value : inFilter.values()) {
-            in = or(in, equal(inFilter.attribute(), convertLiteral(value)));
-          }
-          return in;
+          return in(inFilter.attribute(), inFilter.values());
 
 Review comment:
   What if values contain null? Can we add a test for this?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [incubator-iceberg] jun-he commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

Posted by GitBox <gi...@apache.org>.

jun-he commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r373900954
 
 

 ##########
 File path: spark/src/test/java/org/apache/iceberg/spark/source/TestFilteredScan.java
 ##########
 @@ -543,11 +579,11 @@ private File buildPartitionedTable(String desc, PartitionSpec spec, String udf,
 
   private List<Record> testRecords(org.apache.avro.Schema avroSchema) {
     return Lists.newArrayList(
-        record(avroSchema, 0L, timestamp("2017-12-22T09:20:44.294658+00:00"), "junction"),
+        record(avroSchema, 0L, timestamp("2017-12-22T09:20:44.294+00:00"), "junction"),
 
 Review comment:
   The test fails because those partitions picked in the tests have only one value (equals the lower and higher bound) so the Timestamp must exactly match.
   
   To avoid change those values, I will update the test to use the partition of `2017-12-21T15`, which contains two records. So any Timestamp between them will match.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [incubator-iceberg] aokolnychyi commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

Posted by GitBox <gi...@apache.org>.

aokolnychyi commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r371383547
 
 

 ##########
 File path: spark/src/main/java/org/apache/iceberg/spark/SparkFilters.java
 ##########
 @@ -122,11 +122,7 @@ public static Expression convert(Filter filter) {
 
         case IN:
           In inFilter = (In) filter;
-          Expression in = alwaysFalse();
-          for (Object value : inFilter.values()) {
-            in = or(in, equal(inFilter.attribute(), convertLiteral(value)));
-          }
-          return in;
+          return in(inFilter.attribute(), inFilter.values());
 
 Review comment:
   Yes, but what if Spark passes a list of values that includes null?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r373880794
 
 

 ##########
 File path: spark/src/test/java/org/apache/iceberg/spark/source/TestFilteredScan.java
 ##########
 @@ -543,11 +579,11 @@ private File buildPartitionedTable(String desc, PartitionSpec spec, String udf,
 
   private List<Record> testRecords(org.apache.avro.Schema avroSchema) {
     return Lists.newArrayList(
-        record(avroSchema, 0L, timestamp("2017-12-22T09:20:44.294658+00:00"), "junction"),
+        record(avroSchema, 0L, timestamp("2017-12-22T09:20:44.294+00:00"), "junction"),
 
 Review comment:
   These don't use the constructor that only supports milliseconds, so these should be precise to microseconds. But the test only uses Timestamp to create a filter that gets converted to a partition filter, so the timestamps used to create the Spark filter and these timestamps shouldn't need to match. Doesn't the test pass if these are unchanged?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r371388779
 
 

 ##########
 File path: spark/src/test/java/org/apache/iceberg/spark/source/TestFilteredScan.java
 ##########
 @@ -425,6 +426,21 @@ public void testFilterByNonProjectedColumn() {
     }
   }
 
+  @Test
+  public void testInFilter() {
+    File location = buildPartitionedTable("partitioned_by_data", PARTITION_BY_DATA, "data_ident", "data");
+
+    DataSourceOptions options = new DataSourceOptions(ImmutableMap.of(
+        "path", location.toString())
+    );
+
+    IcebergSource source = new IcebergSource();
+    DataSourceReader reader = source.createReader(options);
+    pushFilters(reader, new In("data", new String[]{"foo", "junction", "brush"}));
 
 Review comment:
   I agree. Let's add a test for a `DateType` at least to validate that `convertLiteral` is called.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r371382914
 
 

 ##########
 File path: spark/src/main/java/org/apache/iceberg/spark/SparkFilters.java
 ##########
 @@ -122,11 +122,7 @@ public static Expression convert(Filter filter) {
 
         case IN:
           In inFilter = (In) filter;
-          Expression in = alwaysFalse();
-          for (Object value : inFilter.values()) {
-            in = or(in, equal(inFilter.attribute(), convertLiteral(value)));
-          }
-          return in;
+          return in(inFilter.attribute(), inFilter.values());
 
 Review comment:
   The values cannot contain null. It is not allowed when you [create the `in` expression](https://github.com/apache/incubator-iceberg/blob/master/api/src/main/java/org/apache/iceberg/expressions/Expressions.java#L191). There's a secondary check when [creating a literal](https://github.com/apache/incubator-iceberg/blob/master/api/src/main/java/org/apache/iceberg/expressions/Literals.java#L59) as well.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [incubator-iceberg] jun-he commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

Posted by GitBox <gi...@apache.org>.

jun-he commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r371617351
 
 

 ##########
 File path: spark/src/main/java/org/apache/iceberg/spark/SparkFilters.java
 ##########
 @@ -122,11 +122,7 @@ public static Expression convert(Filter filter) {
 
         case IN:
           In inFilter = (In) filter;
-          Expression in = alwaysFalse();
-          for (Object value : inFilter.values()) {
-            in = or(in, equal(inFilter.attribute(), convertLiteral(value)));
-          }
-          return in;
+          return in(inFilter.attribute(), inFilter.values());
 
 Review comment:
   Thanks @aokolnychyi for the comments. I will add additional tests for those cases. 
   @rdblue I am thinking if we can add this transformation `(in -> or(isNull, in))` into iceberg's `Expressions` so each caller does not need to repeatedly implement this logic.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r374225417
 
 

 ##########
 File path: spark/src/test/java/org/apache/iceberg/spark/source/TestFilteredScan.java
 ##########
 @@ -543,11 +579,11 @@ private File buildPartitionedTable(String desc, PartitionSpec spec, String udf,
 
   private List<Record> testRecords(org.apache.avro.Schema avroSchema) {
     return Lists.newArrayList(
-        record(avroSchema, 0L, timestamp("2017-12-22T09:20:44.294658+00:00"), "junction"),
+        record(avroSchema, 0L, timestamp("2017-12-22T09:20:44.294+00:00"), "junction"),
 
 Review comment:
   Thanks, @jun-he! I think that's a better solution to the problem.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r371957566
 
 

 ##########
 File path: spark/src/main/java/org/apache/iceberg/spark/SparkFilters.java
 ##########
 @@ -122,11 +122,7 @@ public static Expression convert(Filter filter) {
 
         case IN:
           In inFilter = (In) filter;
-          Expression in = alwaysFalse();
-          for (Object value : inFilter.values()) {
-            in = or(in, equal(inFilter.attribute(), convertLiteral(value)));
-          }
-          return in;
+          return in(inFilter.attribute(), inFilter.values());
 
 Review comment:
   @aokolnychyi is right. `null` is never equal to `null`.
   
   @jun-he, that's the reason why we shouldn't automatically transform in `Expressions`. By not allowing callers to create predicates like `in(1, null, 2)` we avoid the problem in most Iceberg.
   
   We still need to fix the case where Spark passes in `null`, though. I think that Anton is right and we can simply filter `null` out of the list. A null in the list will never cause a value to be accepted. It will only cause the filter to return `null` instead of `false`, which is handled like `false` when filtering: if the filter evaluates to `null` then the row is not selected.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r371388439
 
 

 ##########
 File path: spark/src/main/java/org/apache/iceberg/spark/SparkFilters.java
 ##########
 @@ -122,11 +122,7 @@ public static Expression convert(Filter filter) {
 
         case IN:
           In inFilter = (In) filter;
-          Expression in = alwaysFalse();
-          for (Object value : inFilter.values()) {
-            in = or(in, equal(inFilter.attribute(), convertLiteral(value)));
-          }
-          return in;
+          return in(inFilter.attribute(), inFilter.values());
 
 Review comment:
   This dropped the call to `convertLiteral` for each value. That converts values that Spark uses into Iceberg values. I think we still need to call it:
   
   ```java
             In inFilter = (In) filter;
             List<Object> nonNullLiterals = Stream.of(inFilter.values())
                 .filter(Objects::nonNull)
                 .map(SparkFilters::convertLiteral)
                 .collect(Collectors.toList());
             boolean hasNull = Stream.of(inFilter.values()).anyMatch(Objects::isNull);
             Expression in = Expressions.in(inFilter.attribute(), nonNullLiterals);
             if (hasNull) {
               return or(isNull(inFilter.attribute()), in);
             } else {
               return in;
             }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r373746033
 
 

 ##########
 File path: spark/src/test/java/org/apache/iceberg/spark/source/TestFilteredScan.java
 ##########
 @@ -543,11 +579,11 @@ private File buildPartitionedTable(String desc, PartitionSpec spec, String udf,
 
   private List<Record> testRecords(org.apache.avro.Schema avroSchema) {
     return Lists.newArrayList(
-        record(avroSchema, 0L, timestamp("2017-12-22T09:20:44.294658+00:00"), "junction"),
+        record(avroSchema, 0L, timestamp("2017-12-22T09:20:44.294+00:00"), "junction"),
 
 Review comment:
   Why was it necessary to change these values? This doesn't change the hour partitions the values are stored in.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [incubator-iceberg] jun-he commented on issue #749: Convert Spark In filter to iceberg IN Expression

Posted by GitBox <gi...@apache.org>.

jun-he commented on issue #749: Convert Spark In filter to iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#issuecomment-578476509
 
 
   @rdblue  can you help to review it? Thanks!
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [incubator-iceberg] jun-he commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

Posted by GitBox <gi...@apache.org>.

jun-he commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r373880462
 
 

 ##########
 File path: spark/src/test/java/org/apache/iceberg/spark/source/TestFilteredScan.java
 ##########
 @@ -543,11 +579,11 @@ private File buildPartitionedTable(String desc, PartitionSpec spec, String udf,
 
   private List<Record> testRecords(org.apache.avro.Schema avroSchema) {
     return Lists.newArrayList(
-        record(avroSchema, 0L, timestamp("2017-12-22T09:20:44.294658+00:00"), "junction"),
+        record(avroSchema, 0L, timestamp("2017-12-22T09:20:44.294+00:00"), "junction"),
 
 Review comment:
   @rdblue It is because `java.sql.Timestamp` constructor uses a milliseconds time value. 
   There is a deprecated `java.sql.Timestamp` constructor to use year, month, date, hour, minute, second, and nano. But we also need take care of timezone issue (java timestamp is always UTC). 
   
   So to avoid using deprecated method and make the test straightforward, I just update two records to be millisecond scale.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [incubator-iceberg] jun-he commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

Posted by GitBox <gi...@apache.org>.

jun-he commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r373900954
 
 

 ##########
 File path: spark/src/test/java/org/apache/iceberg/spark/source/TestFilteredScan.java
 ##########
 @@ -543,11 +579,11 @@ private File buildPartitionedTable(String desc, PartitionSpec spec, String udf,
 
   private List<Record> testRecords(org.apache.avro.Schema avroSchema) {
     return Lists.newArrayList(
-        record(avroSchema, 0L, timestamp("2017-12-22T09:20:44.294658+00:00"), "junction"),
+        record(avroSchema, 0L, timestamp("2017-12-22T09:20:44.294+00:00"), "junction"),
 
 Review comment:
   The test fails because those partitions picked in the tests have only one value (equals the lower and higher bound) so the Timestamp must exactly match.
   
   To avoid changing those values, I will update the test to use the partition of `2017-12-21T15`, which contains two records. So any Timestamp between them will match.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [incubator-iceberg] jun-he commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

Posted by GitBox <gi...@apache.org>.

jun-he commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r373900954
 
 

 ##########
 File path: spark/src/test/java/org/apache/iceberg/spark/source/TestFilteredScan.java
 ##########
 @@ -543,11 +579,11 @@ private File buildPartitionedTable(String desc, PartitionSpec spec, String udf,
 
   private List<Record> testRecords(org.apache.avro.Schema avroSchema) {
     return Lists.newArrayList(
-        record(avroSchema, 0L, timestamp("2017-12-22T09:20:44.294658+00:00"), "junction"),
+        record(avroSchema, 0L, timestamp("2017-12-22T09:20:44.294+00:00"), "junction"),
 
 Review comment:
   The test fails because those partitions picked in the tests have only one value (equals the lower and higher bound) so the Timestamp must exactly match.
   
   I will update the test to use the partition of `2017-12-21T15`, which contains two records. So any Timestamp between them will match.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r371957787
 
 

 ##########
 File path: spark/src/main/java/org/apache/iceberg/spark/SparkFilters.java
 ##########
 @@ -122,11 +122,7 @@ public static Expression convert(Filter filter) {
 
         case IN:
           In inFilter = (In) filter;
-          Expression in = alwaysFalse();
-          for (Object value : inFilter.values()) {
-            in = or(in, equal(inFilter.attribute(), convertLiteral(value)));
-          }
-          return in;
+          return in(inFilter.attribute(), inFilter.values());
 
 Review comment:
   I'm updating the code above with better null handling.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org