You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/08/07 11:21:20 UTC

[GitHub] [iceberg] singhpk234 commented on a diff in pull request #5437: AWS: Fix kryo serialization failure for S3 FileIO

singhpk234 commented on code in PR #5437:
URL: https://github.com/apache/iceberg/pull/5437#discussion_r939650527


##########
aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java:
##########
@@ -210,25 +211,24 @@ void initialize(
   protected TableOperations newTableOps(TableIdentifier tableIdentifier) {
     if (catalogProperties != null) {
       Map<String, String> tableSpecificCatalogProperties =
-          ImmutableMap.<String, String>builder()
-              .putAll(catalogProperties)
-              .put(
-                  AwsProperties.LAKE_FORMATION_DB_NAME,
-                  IcebergToGlueConverter.getDatabaseName(
-                      tableIdentifier, awsProperties.glueCatalogSkipNameValidation()))
-              .put(
-                  AwsProperties.LAKE_FORMATION_TABLE_NAME,
-                  IcebergToGlueConverter.getTableName(
-                      tableIdentifier, awsProperties.glueCatalogSkipNameValidation()))
-              .build();
+          Maps.newHashMapWithExpectedSize(catalogProperties.size() + 2);
+      tableSpecificCatalogProperties.putAll(catalogProperties);
+      tableSpecificCatalogProperties.put(
+          AwsProperties.LAKE_FORMATION_DB_NAME,
+          IcebergToGlueConverter.getDatabaseName(
+              tableIdentifier, awsProperties.glueCatalogSkipNameValidation()));
+      tableSpecificCatalogProperties.put(
+          AwsProperties.LAKE_FORMATION_TABLE_NAME,
+          IcebergToGlueConverter.getTableName(
+              tableIdentifier, awsProperties.glueCatalogSkipNameValidation()));
       // FileIO initialization depends on tableSpecificCatalogProperties, so a new FileIO is
       // initialized each time
       return new GlueTableOperations(
           glue,
           lockManager,
           catalogName,
           awsProperties,
-          initializeFileIO(tableSpecificCatalogProperties),
+          initializeFileIO(Collections.unmodifiableMap(tableSpecificCatalogProperties)),

Review Comment:
   +1, having a UT will certainly help us catching this, if this happens again. 
   
   > To test this, you could do newTableOps on a GlueCatalog and then ensure that the resulting FileIO can be round trip serialized through Kryo (which there's a helper method for I believe).
   
   +1 mostly did the same just wrote a new helper for it (not happy with it though, still thinking other alternatives), as at present Kryo helper depends on spark, in background includes Twitter#chill serializers. newTableOps presently have a protected access, so was facing issues in writing spark UT. 
   
   The new helper takes a dependency in twitter-chill which spark uses and i think flink as well (may be wrong here, ref [link](https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html#kryo)).
   
   All suggestions are welcomed here :) !
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org