You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/10/05 20:00:54 UTC

[GitHub] [iceberg] findepi opened a new pull request, #5908: Reduce 'Scanning table' log verbosiy for long IN list

findepi opened a new pull request, #5908:
URL: https://github.com/apache/iceberg/pull/5908

   Follows https://github.com/apache/iceberg/pull/4672
   
   cc @ebyhr @alexjo2144 @electrum @dain 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] findepi commented on a diff in pull request #5908: Reduce 'Scanning table' log verbosity for long IN list

Posted by GitBox <gi...@apache.org>.
findepi commented on code in PR #5908:
URL: https://github.com/apache/iceberg/pull/5908#discussion_r988878343


##########
api/src/test/java/org/apache/iceberg/expressions/TestExpressionUtil.java:
##########
@@ -63,6 +66,23 @@ public void testSanitizeIn() {
         ExpressionUtil.toSanitizedString(Expressions.in("test", 34, 345)));
   }
 
+  @Test
+  public void testSanitizeLongIn() {

Review Comment:
   >I think it would be good to have a small test around the threshold value
   
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #5908: Reduce 'Scanning table' log verbosity for long IN list

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #5908:
URL: https://github.com/apache/iceberg/pull/5908#discussion_r988823794


##########
api/src/test/java/org/apache/iceberg/expressions/TestExpressionUtil.java:
##########
@@ -63,6 +66,23 @@ public void testSanitizeIn() {
         ExpressionUtil.toSanitizedString(Expressions.in("test", 34, 345)));
   }
 
+  @Test
+  public void testSanitizeLongIn() {

Review Comment:
   I think it would be good to have a small test around the threshold value `LONG_IN_PREDICATE_ABBREVIATION_THRESHOLD` to make sure values are abbreviated/not abbreviated.
   
   EDIT: I accidentally noticed that something like `ExpressionUtil.toSanitizedString(Expressions.in("test", IntStream.range(0, 10).boxed().toArray()))` actually produces `test IN ((-2147483647-digit-int), (1-digit-int), ... (8 values hidden, 10 in total) ...)`. Notice the `-2147483647-digit-int` because of the `0`. FWIW, this isn't something that is being caused by this PR, so I'll take a look into this separately



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #5908: Reduce 'Scanning table' log verbosity for long IN list

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5908:
URL: https://github.com/apache/iceberg/pull/5908#issuecomment-1270828529

   Thanks, @findepi!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5908: Reduce 'Scanning table' log verbosity for long IN list

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5908:
URL: https://github.com/apache/iceberg/pull/5908#discussion_r989547192


##########
api/src/main/java/org/apache/iceberg/expressions/ExpressionUtil.java:
##########
@@ -258,6 +270,23 @@ public <T> String predicate(UnboundPredicate<T> pred) {
     }
   }
 
+  private static <T> List<String> abbreviateValues(List<String> sanitizedValues) {
+    if (sanitizedValues.size() >= LONG_IN_PREDICATE_ABBREVIATION_THRESHOLD) {
+      Set<String> distinctValues = ImmutableSet.copyOf(sanitizedValues);
+      if (distinctValues.size()
+          <= sanitizedValues.size() - LONG_IN_PREDICATE_ABBREVIATION_MIN_GAIN) {
+        List<String> abbreviatedList = Lists.newArrayListWithCapacity(distinctValues.size() + 1);
+        abbreviatedList.addAll(distinctValues);
+        abbreviatedList.add(
+            String.format(
+                "... (%d values hidden, %d in total) ...",

Review Comment:
   Can we remove the second `...`? I don't think it is very useful



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] findepi commented on pull request #5908: Reduce 'Scanning table' log verbosity for long IN list

Posted by GitBox <gi...@apache.org>.
findepi commented on PR #5908:
URL: https://github.com/apache/iceberg/pull/5908#issuecomment-1270602429

   rebased to resolve conflict after https://github.com/apache/iceberg/pull/5928 was merged


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra closed pull request #5908: Reduce 'Scanning table' log verbosity for long IN list

Posted by GitBox <gi...@apache.org>.
nastra closed pull request #5908: Reduce 'Scanning table' log verbosity for long IN list
URL: https://github.com/apache/iceberg/pull/5908


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #5908: Reduce 'Scanning table' log verbosity for long IN list

Posted by GitBox <gi...@apache.org>.
rdblue merged PR #5908:
URL: https://github.com/apache/iceberg/pull/5908


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5908: Reduce 'Scanning table' log verbosity for long IN list

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5908:
URL: https://github.com/apache/iceberg/pull/5908#discussion_r989546518


##########
api/src/main/java/org/apache/iceberg/expressions/ExpressionUtil.java:
##########
@@ -258,6 +270,23 @@ public <T> String predicate(UnboundPredicate<T> pred) {
     }
   }
 
+  private static <T> List<String> abbreviateValues(List<String> sanitizedValues) {
+    if (sanitizedValues.size() >= LONG_IN_PREDICATE_ABBREVIATION_THRESHOLD) {
+      Set<String> distinctValues = ImmutableSet.copyOf(sanitizedValues);
+      if (distinctValues.size()
+          <= sanitizedValues.size() - LONG_IN_PREDICATE_ABBREVIATION_MIN_GAIN) {
+        List<String> abbreviatedList = Lists.newArrayListWithCapacity(distinctValues.size() + 1);
+        abbreviatedList.addAll(distinctValues);
+        abbreviatedList.add(
+            String.format(
+                "... (%d values hidden, %d in total) ...",
+                sanitizedValues.size() - distinctValues.size(), sanitizedValues.size()));
+        return abbreviatedList;
+      }
+    }
+    return sanitizedValues;

Review Comment:
   In Iceberg, we add an empty newline between a control flow block and the following statement.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] findepi commented on pull request #5908: Reduce 'Scanning table' log verbosity for long IN list

Posted by GitBox <gi...@apache.org>.
findepi commented on PR #5908:
URL: https://github.com/apache/iceberg/pull/5908#issuecomment-1269794965

   There is some CI problem, which may not be related to my change
   
   ```
   Download action repository 'actions/checkout@v3' (SHA:2541b1294d2704b0964813337f33b291d3f8596b)
   Error: Can't use 'tar -xzf' extract archive file: /home/runner/work/_actions/_temp_7b1e7ac4-a85a-4410-ae9d-ae03a77dd17f/07983924-9158-4490-8b19-5006d31956a1.tar.gz. return code: 2.
   ```
   
   i tried re-running, but got same results.
   Will re-run build later, in hope it's transient.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org