You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/07/26 14:43:13 UTC

[GitHub] [iceberg] chenjunjiedada opened a new pull request, #4769: Api: add sequenceNumber method

chenjunjiedada opened a new pull request, #4769:
URL: https://github.com/apache/iceberg/pull/4769

   Currently, there is no straight way to get sequence numbers of `DeleteFile` and `DataFile`.  It is not convenient to use, for example, it uses two arrays and the `Pair` class to build the delete file index. Furthermore, we may have some actions which depend on comparing the sequence numbers between deletes and data files. This PR tries to add `sequenceNumber` method to `ContentFile` and set the sequence number when applying the inherited metadata thus both `DeleteFile` and `DataFile` can get sequence numbers directly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada closed pull request #4769: Api: add sequenceNumber method

Posted by GitBox <gi...@apache.org>.
chenjunjiedada closed pull request #4769: Api: add sequenceNumber method
URL: https://github.com/apache/iceberg/pull/4769


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a diff in pull request #4769: Api: add sequenceNumber method

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on code in PR #4769:
URL: https://github.com/apache/iceberg/pull/4769#discussion_r873090272


##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -205,6 +207,15 @@ void setSpecId(int specId) {
     this.partitionSpecId = specId;
   }
 
+  @Override
+  public Long sequenceNumber() {
+    return sequenceNumber;
+  }
+
+  void setSequenceNumber(Long seq) {

Review Comment:
   I was following the `specId` change mentioned in https://github.com/apache/iceberg/issues/4749.  As I mentioned in the issue, getting the sequence number directly from the `BaseFile` can make things a bit easy,  for example, we may not want to access the manifest again in the executor. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a diff in pull request #4769: Api: add sequenceNumber method

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on code in PR #4769:
URL: https://github.com/apache/iceberg/pull/4769#discussion_r875387777


##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -205,6 +207,15 @@ void setSpecId(int specId) {
     this.partitionSpecId = specId;
   }
 
+  @Override
+  public Long sequenceNumber() {
+    return sequenceNumber;
+  }
+
+  void setSequenceNumber(Long seq) {

Review Comment:
   This is motivated when writing https://github.com/apache/iceberg/pull/4748 which handles sequence numbers. And we should need to deal with sequence numbers when cleaning the delete files in the future. Right now, we don't clean the delete files after bin-packing. Does that make sense to you?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #4769: Api: add sequenceNumber method

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #4769:
URL: https://github.com/apache/iceberg/pull/4769#discussion_r873060170


##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -205,6 +207,15 @@ void setSpecId(int specId) {
     this.partitionSpecId = specId;
   }
 
+  @Override
+  public Long sequenceNumber() {
+    return sequenceNumber;
+  }
+
+  void setSequenceNumber(Long seq) {

Review Comment:
   Sequence number is tracked by the manifest entry, not the content file. And I don't think it is a good idea to make this mutable. What is the rationale for adding it here (not just a ref to another issue)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a diff in pull request #4769: Api: add sequenceNumber method

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on code in PR #4769:
URL: https://github.com/apache/iceberg/pull/4769#discussion_r873090272


##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -205,6 +207,15 @@ void setSpecId(int specId) {
     this.partitionSpecId = specId;
   }
 
+  @Override
+  public Long sequenceNumber() {
+    return sequenceNumber;
+  }
+
+  void setSequenceNumber(Long seq) {

Review Comment:
   I was following the specId change mentioned in https://github.com/apache/iceberg/issues/4749. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #4769: Api: add sequenceNumber method

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on PR #4769:
URL: https://github.com/apache/iceberg/pull/4769#issuecomment-1304452847

   Moved changes to https://github.com/apache/iceberg/pull/5760.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada closed pull request #4769: Api: add sequenceNumber method

Posted by GitBox <gi...@apache.org>.
chenjunjiedada closed pull request #4769: Api: add sequenceNumber method
URL: https://github.com/apache/iceberg/pull/4769


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #4769: Api: add sequenceNumber method

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #4769:
URL: https://github.com/apache/iceberg/pull/4769#discussion_r875042417


##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -205,6 +207,15 @@ void setSpecId(int specId) {
     this.partitionSpecId = specId;
   }
 
+  @Override
+  public Long sequenceNumber() {
+    return sequenceNumber;
+  }
+
+  void setSequenceNumber(Long seq) {

Review Comment:
   The spec ID isn't available in manifest entry, which is why we thought it was reasonable to add it. Since this is available in the manifest entry, I would prefer to use it from there. That is, unless this is blocking something and we _need_ to add it for some reason.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a diff in pull request #4769: Api: add sequenceNumber method

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on code in PR #4769:
URL: https://github.com/apache/iceberg/pull/4769#discussion_r930072100


##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -205,6 +207,15 @@ void setSpecId(int specId) {
     this.partitionSpecId = specId;
   }
 
+  @Override
+  public Long sequenceNumber() {
+    return sequenceNumber;
+  }
+
+  void setSequenceNumber(Long seq) {

Review Comment:
   I found we actually do clean up delete files when applying the snapshot update by comparing the min-sequence-number of the generated manifests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #4769: Api: add sequenceNumber method

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on PR #4769:
URL: https://github.com/apache/iceberg/pull/4769#issuecomment-1195588160

   Reopen this since people ask for a similar requirement from slack.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a diff in pull request #4769: Api: add sequenceNumber method

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on code in PR #4769:
URL: https://github.com/apache/iceberg/pull/4769#discussion_r875387777


##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -205,6 +207,15 @@ void setSpecId(int specId) {
     this.partitionSpecId = specId;
   }
 
+  @Override
+  public Long sequenceNumber() {
+    return sequenceNumber;
+  }
+
+  void setSequenceNumber(Long seq) {

Review Comment:
   This is motivated when writing https://github.com/apache/iceberg/pull/4748 which handles sequence numbers. And I think we should need this when cleaning the delete files in the future.  Right now, we don't clean the delete files after bin-packing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on pull request #4769: Api: add sequenceNumber method

Posted by GitBox <gi...@apache.org>.
gaborkaszab commented on PR #4769:
URL: https://github.com/apache/iceberg/pull/4769#issuecomment-1197755232

   Just for the record, Apache Impala would also need this sequence number to be exposed through the Iceberg API. Would be nice to know the feasibility, and if the community is open to such a change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org