You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/12/12 23:32:31 UTC

[GitHub] [iceberg] stevenzwu commented on issue #3700: Flink cannot restart from checkpoint due to shifted index order in DataFile

stevenzwu commented on issue #3700:
URL: https://github.com/apache/iceberg/issues/3700#issuecomment-991993169


   I also tried to reproduce the issue and wasn't able to reproduce it.
   https://gist.github.com/stevenzwu/62198e32b1174f6ea6d093fdc45b161c
   
   Now I realized that our users also reported some other issue with both 0.12
   and 0.13 Iceberg jars in the classpath. Maybe that was also the cause of
   the exception I reported in
   https://github.com/apache/iceberg/pull/3015/files#r753631713.
   
   In summary, this is a non-issue and we can remove it as a blocker for 0.13.
   
   On Fri, Dec 10, 2021 at 2:55 PM Jack Ye ***@***.***> wrote:
   
   > Had some discussion with Steven and Ryan in Slack, the Flink checkpoint
   > serialization is handled by FlinkManifestUtil, which calls the core
   > library for serdes:
   >
   >   static ManifestFile writeDataFiles(OutputFile outputFile, PartitionSpec spec, List<DataFile> dataFiles)
   >       throws IOException {
   >     ManifestWriter<DataFile> writer = ManifestFiles.write(FORMAT_V2, spec, outputFile, DUMMY_SNAPSHOT_ID);
   >
   >     try (ManifestWriter<DataFile> closeableWriter = writer) {
   >       closeableWriter.addAll(dataFiles);
   >     }
   >
   >     return writer.toManifestFile();
   >   }
   >
   >   static List<DataFile> readDataFiles(ManifestFile manifestFile, FileIO io) throws IOException {
   >     try (CloseableIterable<DataFile> dataFiles = ManifestFiles.read(manifestFile, io)) {
   >       return Lists.newArrayList(dataFiles);
   >     }
   >   }
   >
   > The schema at write time is fixed through MetadataV1/2 even when we add
   > new fields in DataFile, so there should not be any compatibility issue.
   >
   > I tried the following using 0.12 and 0.13:
   >
   > In 0.12:
   >
   > OutputFile outputFile = icebergTable.io().newOutputFile("file:/Users/yzhaoqin/0.12-out.avro");
   > FlinkManifestUtil.writeDataFiles(outputFile, icebergTable.spec(), Lists.newArrayList(icebergTable.currentSnapshot().addedFiles()));
   >
   > In 0.13:
   >
   > List<DataFile> dataFiles = FlinkManifestUtil.readDataFiles(
   >     new GenericManifestFile(icebergTable.io().newInputFile("file:/Users/yzhaoqin/0.12-out.avro"), icebergTable.spec().specId()),
   >     icebergTable.io());
   >
   > and it worked.
   >
   > So so far I am not able to reproduce the issue, @stevenzwu
   > <https://github.com/stevenzwu> @openinx <https://github.com/openinx>
   > please let me know if you are able to have a repro.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/iceberg/issues/3700#issuecomment-991357906>,
   > or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AALZLP23JRVNLWCIJ373KYTUQKAO5ANCNFSM5JXJW3PA>
   > .
   > Triage notifications on the go with GitHub Mobile for iOS
   > <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
   > or Android
   > <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
   >
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org