You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/05/23 12:52:16 UTC

[GitHub] [iceberg] ConeyLiu opened a new pull request, #4842: [CORE] Support file filtering based on schema

ConeyLiu opened a new pull request, #4842:
URL: https://github.com/apache/iceberg/pull/4842

   This patch adds the support of file filtering based on the schema. This aims to reduce the file scan time when we have done a schema evaluation. For the following example:
   ```
   1. create a table with schema <id: long, name: string> and partition on id
   2. add new files
   3. update the schema and add a new column: address. Now the schema is <id: long, name: string, address: string>
   4. add new files
   // Before this patch, we have to read those files added at 2 and do the filter after reading those files.
   6. scan with filter start_with('address', 'some_value') and id > 10;
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #4842: [CORE] Support file filtering based on schema

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #4842:
URL: https://github.com/apache/iceberg/pull/4842#issuecomment-1139773993

   @ConeyLiu, can you break this down into smaller commits? I think it's reasonable to use the schema information to know whether a filter will succeed for fail, but there are a lot of changes that don't need to be combined into a single PR here. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on pull request #4842: [CORE] Support file filtering based on schema

Posted by GitBox <gi...@apache.org>.
ConeyLiu commented on PR #4842:
URL: https://github.com/apache/iceberg/pull/4842#issuecomment-1140745677

   @rdblue, thanks for the suggestion. The first part is here: #4898


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #4842: [CORE] Support file filtering based on schema

Posted by GitBox <gi...@apache.org>.
ConeyLiu commented on code in PR #4842:
URL: https://github.com/apache/iceberg/pull/4842#discussion_r879442889


##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -277,6 +280,9 @@ public void put(int i, Object value) {
         this.sortOrderId = (Integer) value;
         return;
       case 17:
+        this.schemaId = (int) value;

Review Comment:
   I updated the order because we add the `ROW_POSITION` after the schema fields of DataFile here: https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/BaseFile.java#L99. Some experts could give some advice for this update.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org