You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "szehon-ho (via GitHub)" <gi...@apache.org> on 2023/05/26 18:13:01 UTC

[GitHub] [iceberg] szehon-ho opened a new pull request, #7713: Spark 3.3,3.4: Add back log making it clearer when we are not pushing down filters

szehon-ho opened a new pull request, #7713:
URL: https://github.com/apache/iceberg/pull/7713

   https://github.com/apache/iceberg/pull/6524 changed the log pattern a little bit.
   
   In previous versions, we logged ' skipping push down for this expression' when failing to pushdown.
   
   As its important for users to see when pushdown filters are skipped, I think its easier to have the user just grep the same expression across versions.  So be consistent with older Spark versions (3.2) and also previous release of Iceberg before #6524 across Spark versions, I think we can add back this to the sentence to clarify that the filter is indeed being skipped.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7713: Spark 3.3,3.4: Add back log making it clearer when we are not pushing down filters

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7713:
URL: https://github.com/apache/iceberg/pull/7713#discussion_r1207215047


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -171,7 +171,10 @@ public Filter[] pushFilters(Filter[] filters) {
         }
 
       } catch (Exception e) {
-        LOG.warn("Failed to check if {} can be pushed down: {}", filter, e.getMessage());

Review Comment:
   Yea I looked further, in the case that I was trying (a UDF column), it doesn't even get into Iceberg.  So we'd have to add the log in Spark side.  
   
   in Iceberg, the exception case that I add the log for here is just for column which we dont support yet, like metadata column. 
   
   I guess we can add a log when it doesnt match the entire partition as well, although not sure how to make correct phrasing, as iiuc we don't skip it but do evaluate it on lower levels?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7713: Spark 3.3,3.4: Add back log making it clearer when we are not pushing down filters

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7713:
URL: https://github.com/apache/iceberg/pull/7713#discussion_r1207294674


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -171,7 +171,10 @@ public Filter[] pushFilters(Filter[] filters) {
         }
 
       } catch (Exception e) {
-        LOG.warn("Failed to check if {} can be pushed down: {}", filter, e.getMessage());

Review Comment:
   There is a check above to see if the converted filter is not null, that tells us if the filter is supported or not. We could log it there.
   
   I am 50/50 on logging the trace. Let me think a bit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7713: Spark 3.3,3.4: Add back log making it clearer when we are not pushing down filters

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7713:
URL: https://github.com/apache/iceberg/pull/7713#discussion_r1207175005


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -171,7 +171,10 @@ public Filter[] pushFilters(Filter[] filters) {
         }
 
       } catch (Exception e) {
-        LOG.warn("Failed to check if {} can be pushed down: {}", filter, e.getMessage());

Review Comment:
   I thought you wanted to log a message when a filter could not be converted above. This block is only for abnormal cases. I also wonder whether we should include the actual exception together with the trace. I kept the old logic when I touched it but I am not sure why we did not want to include a trace in the first place.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7713: Spark 3.3,3.4: Add back log making it clearer when we are not pushing down filters

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7713:
URL: https://github.com/apache/iceberg/pull/7713#discussion_r1207215047


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -171,7 +171,10 @@ public Filter[] pushFilters(Filter[] filters) {
         }
 
       } catch (Exception e) {
-        LOG.warn("Failed to check if {} can be pushed down: {}", filter, e.getMessage());

Review Comment:
   Yea I looked further, in the case that I was trying (a UDF column), it doesn't even get into Iceberg.  So we'd have to add the log in Spark side.  
   
   in Iceberg, the exception case that I add the log for here is just for simple column which we dont support yet, like metadata column. 
   
   I guess we can add a log when it doesnt match the entire partition as well, although not sure how to make correct phrasing, as iiuc we don't skip it but do evaluate it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7713: Spark 3.3,3.4: Add back log making it clearer when we are not pushing down filters

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7713:
URL: https://github.com/apache/iceberg/pull/7713#discussion_r1207194184


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -171,7 +171,10 @@ public Filter[] pushFilters(Filter[] filters) {
         }
 
       } catch (Exception e) {
-        LOG.warn("Failed to check if {} can be pushed down: {}", filter, e.getMessage());

Review Comment:
   Yea let me check this, i thought initially in those cases that we fail to bind, and get an exception.
   
   Hence thought we should not log the stack, as that is a "normal" case.  
   
   I think we had decided it was a bit polluting , in some discussion here:  https://github.com/apache/iceberg/pull/5254#issuecomment-1183804207  , but if it only for abnormal case , I agree we should have a stack.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7713: Spark 3.3,3.4: Add back log making it clearer when we are not pushing down filters

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7713:
URL: https://github.com/apache/iceberg/pull/7713#discussion_r1207215047


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -171,7 +171,10 @@ public Filter[] pushFilters(Filter[] filters) {
         }
 
       } catch (Exception e) {
-        LOG.warn("Failed to check if {} can be pushed down: {}", filter, e.getMessage());

Review Comment:
   Yea I looked further, in the case that I was trying (a UDF column), it doesn't even get into Iceberg.  So we'd have to add the log in Spark side.  
   
   in Iceberg, the exception case that I add the log for here is just for simple column which we dont support yet, like metadata column. 
   
   I guess we can add a log when it doesnt match the entire partition as well, although not sure how to make correct phrasing, as iiuc we don't skip it but do evaluate it on lower levels?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org