You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/12/09 19:40:15 UTC

[GitHub] [iceberg] jackye1995 opened a new issue #3700: Flink cannot restart from checkpoint due to shifted index order in DataFile

jackye1995 opened a new issue #3700:
URL: https://github.com/apache/iceberg/issues/3700


   track the issue found in https://github.com/apache/iceberg/pull/3015/files#r753631713 as 0.13 blocker
   
   @stevenzwu @rdblue @RussellSpitzer @szehon-ho 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on issue #3700: Flink cannot restart from checkpoint due to shifted index order in DataFile

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on issue #3700:
URL: https://github.com/apache/iceberg/issues/3700#issuecomment-991357906


   Had some discussion with Steven and Ryan in Slack, the Flink checkpoint serialization is handled by `FlinkManifestUtil`, which calls the core library for serdes:
   
   ```java
     static ManifestFile writeDataFiles(OutputFile outputFile, PartitionSpec spec, List<DataFile> dataFiles)
         throws IOException {
       ManifestWriter<DataFile> writer = ManifestFiles.write(FORMAT_V2, spec, outputFile, DUMMY_SNAPSHOT_ID);
   
       try (ManifestWriter<DataFile> closeableWriter = writer) {
         closeableWriter.addAll(dataFiles);
       }
   
       return writer.toManifestFile();
     }
   
     static List<DataFile> readDataFiles(ManifestFile manifestFile, FileIO io) throws IOException {
       try (CloseableIterable<DataFile> dataFiles = ManifestFiles.read(manifestFile, io)) {
         return Lists.newArrayList(dataFiles);
       }
     }
   ```
   
   The schema at write time is fixed through `MetadataV1/2` even when we add new fields in `DataFile`, so there should not be any compatibility issue.
   
   I tried the following using 0.12 and 0.13:
   
   In 0.12:
   
   ```
   OutputFile outputFile = icebergTable.io().newOutputFile("file:/Users/yzhaoqin/0.12-out.avro");
   FlinkManifestUtil.writeDataFiles(outputFile, icebergTable.spec(), Lists.newArrayList(icebergTable.currentSnapshot().addedFiles()));
   ```
   
   In 0.13:
   
   ```
   List<DataFile> dataFiles = FlinkManifestUtil.readDataFiles(
       new GenericManifestFile(icebergTable.io().newInputFile("file:/Users/yzhaoqin/0.12-out.avro"), icebergTable.spec().specId()),
       icebergTable.io());
   ```
   
   and it worked.
   
   So so far I am not able to reproduce the issue, @stevenzwu @openinx please let me know if you are able to have a repro.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on issue #3700: Flink cannot restart from checkpoint due to shifted index order in DataFile

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on issue #3700:
URL: https://github.com/apache/iceberg/issues/3700#issuecomment-992749857


   Thanks Steven for the repro!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on issue #3700: Flink cannot restart from checkpoint due to shifted index order in DataFile

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on issue #3700:
URL: https://github.com/apache/iceberg/issues/3700#issuecomment-990176433


   I am not totally familiar with the PR, after reading it just now, it seems like the quickest fix is to move the `spec_id` column back as the last column, given this is not released yet in past versions. Would that be enough? @rdblue 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on issue #3700: Flink cannot restart from checkpoint due to shifted index order in DataFile

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #3700:
URL: https://github.com/apache/iceberg/issues/3700#issuecomment-991252584


   @jackye1995, a way to "fix" the problem is to move `spec_id` so that the original column ordinals match. The problem with doing that is that we leave a known bug in the codebase. And it's very unlikely that we'll get lucky and have someone catch it before release next time. There was no test that caught this.
   
   Another reason not to is that we state that Iceberg metadata files follow Iceberg schema evolution rules. If we violate that here and start avoiding certain valid changes because of fear something might break, then we're back to Hadoop usability: you have to know what not to do and you're artificially limited. We don't want Iceberg developers to be afraid of using tools that are perfectly valid according to the spec.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on issue #3700: Flink cannot restart from checkpoint due to shifted index order in DataFile

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on issue #3700:
URL: https://github.com/apache/iceberg/issues/3700#issuecomment-991357906


   Had some discussion with Steven and Ryan in Slack, the Flink checkpoint serialization is handled by `FlinkManifestUtil`, which calls the core library for serdes:
   
   ```java
     static ManifestFile writeDataFiles(OutputFile outputFile, PartitionSpec spec, List<DataFile> dataFiles)
         throws IOException {
       ManifestWriter<DataFile> writer = ManifestFiles.write(FORMAT_V2, spec, outputFile, DUMMY_SNAPSHOT_ID);
   
       try (ManifestWriter<DataFile> closeableWriter = writer) {
         closeableWriter.addAll(dataFiles);
       }
   
       return writer.toManifestFile();
     }
   
     static List<DataFile> readDataFiles(ManifestFile manifestFile, FileIO io) throws IOException {
       try (CloseableIterable<DataFile> dataFiles = ManifestFiles.read(manifestFile, io)) {
         return Lists.newArrayList(dataFiles);
       }
     }
   ```
   
   The schema at write time is fixed through `MetadataV1/2` even when we add new fields in `DataFile`, so there should not be any compatibility issue.
   
   I tried the following using 0.12 and 0.13:
   
   In 0.12:
   
   ```
   OutputFile outputFile = icebergTable.io().newOutputFile("file:/Users/yzhaoqin/0.12-out.avro");
   FlinkManifestUtil.writeDataFiles(outputFile, icebergTable.spec(), Lists.newArrayList(icebergTable.currentSnapshot().addedFiles()));
   ```
   
   In 0.13:
   
   ```
   List<DataFile> dataFiles = FlinkManifestUtil.readDataFiles(
       new GenericManifestFile(icebergTable.io().newInputFile("file:/Users/yzhaoqin/0.12-out.avro"), icebergTable.spec().specId()),
       icebergTable.io());
   ```
   
   and it worked.
   
   So so far I am not able to reproduce the issue, @stevenzwu @openinx please let me know if you are able to have a repro.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 closed issue #3700: Flink cannot restart from checkpoint due to shifted index order in DataFile

Posted by GitBox <gi...@apache.org>.
jackye1995 closed issue #3700:
URL: https://github.com/apache/iceberg/issues/3700


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on issue #3700: Flink cannot restart from checkpoint due to shifted index order in DataFile

Posted by GitBox <gi...@apache.org>.
stevenzwu commented on issue #3700:
URL: https://github.com/apache/iceberg/issues/3700#issuecomment-991993169


   I also tried to reproduce the issue and wasn't able to reproduce it.
   https://gist.github.com/stevenzwu/62198e32b1174f6ea6d093fdc45b161c
   
   Now I realized that our users also reported some other issue with both 0.12
   and 0.13 Iceberg jars in the classpath. Maybe that was also the cause of
   the exception I reported in
   https://github.com/apache/iceberg/pull/3015/files#r753631713.
   
   In summary, this is a non-issue and we can remove it as a blocker for 0.13.
   
   On Fri, Dec 10, 2021 at 2:55 PM Jack Ye ***@***.***> wrote:
   
   > Had some discussion with Steven and Ryan in Slack, the Flink checkpoint
   > serialization is handled by FlinkManifestUtil, which calls the core
   > library for serdes:
   >
   >   static ManifestFile writeDataFiles(OutputFile outputFile, PartitionSpec spec, List<DataFile> dataFiles)
   >       throws IOException {
   >     ManifestWriter<DataFile> writer = ManifestFiles.write(FORMAT_V2, spec, outputFile, DUMMY_SNAPSHOT_ID);
   >
   >     try (ManifestWriter<DataFile> closeableWriter = writer) {
   >       closeableWriter.addAll(dataFiles);
   >     }
   >
   >     return writer.toManifestFile();
   >   }
   >
   >   static List<DataFile> readDataFiles(ManifestFile manifestFile, FileIO io) throws IOException {
   >     try (CloseableIterable<DataFile> dataFiles = ManifestFiles.read(manifestFile, io)) {
   >       return Lists.newArrayList(dataFiles);
   >     }
   >   }
   >
   > The schema at write time is fixed through MetadataV1/2 even when we add
   > new fields in DataFile, so there should not be any compatibility issue.
   >
   > I tried the following using 0.12 and 0.13:
   >
   > In 0.12:
   >
   > OutputFile outputFile = icebergTable.io().newOutputFile("file:/Users/yzhaoqin/0.12-out.avro");
   > FlinkManifestUtil.writeDataFiles(outputFile, icebergTable.spec(), Lists.newArrayList(icebergTable.currentSnapshot().addedFiles()));
   >
   > In 0.13:
   >
   > List<DataFile> dataFiles = FlinkManifestUtil.readDataFiles(
   >     new GenericManifestFile(icebergTable.io().newInputFile("file:/Users/yzhaoqin/0.12-out.avro"), icebergTable.spec().specId()),
   >     icebergTable.io());
   >
   > and it worked.
   >
   > So so far I am not able to reproduce the issue, @stevenzwu
   > <https://github.com/stevenzwu> @openinx <https://github.com/openinx>
   > please let me know if you are able to have a repro.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/iceberg/issues/3700#issuecomment-991357906>,
   > or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AALZLP23JRVNLWCIJ373KYTUQKAO5ANCNFSM5JXJW3PA>
   > .
   > Triage notifications on the go with GitHub Mobile for iOS
   > <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
   > or Android
   > <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
   >
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on issue #3700: Flink cannot restart from checkpoint due to shifted index order in DataFile

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #3700:
URL: https://github.com/apache/iceberg/issues/3700#issuecomment-991252584


   @jackye1995, a way to "fix" the problem is to move `spec_id` so that the original column ordinals match. The problem with doing that is that we leave a known bug in the codebase. And it's very unlikely that we'll get lucky and have someone catch it before release next time. There was no test that caught this.
   
   Another reason not to is that we state that Iceberg metadata files follow Iceberg schema evolution rules. If we violate that here and start avoiding certain valid changes because of fear something might break, then we're back to Hadoop usability: you have to know what not to do and you're artificially limited. We don't want Iceberg developers to be afraid of using tools that are perfectly valid according to the spec.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org