You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/02/23 17:23:30 UTC

[GitHub] [iceberg] yegangy0718 opened a new issue #4209: delete field in a referred schema brings redefine schema issue after applying schema projection

yegangy0718 opened a new issue #4209:
URL: https://github.com/apache/iceberg/issues/4209


   We have a schema called RequestContextEvent. Inside the schema, it defines a record named RecognitionMetrics.  RecognitionMetrics is referred by another record called FailedRecord in the schema.
   {
     "type": "record",
     "name": "RequestContextEvent",
     "namespace": "avro.com.schemas",
     "fields": [
       {
         "name": "payload",
         "type": [
           "null",
           {
             "type": "record",
             "name": "RequestContext",
             "fields": [
               {
                 "name": "ended",
                 "type": [
                   "null",
                   {
                     "type": "record",
                     "name": "EndedRecord",
                     "fields": [
                       {
                         "name": "metrics",
                         "type": [
                           "null",
                           {
                             "type": "record",
                             "name": "RecognitionMetrics",
                             "fields": [
                               {
                                 "name": "optionalStringField",
                                 "type": [
                                   "null",
                                   "string"
                                 ],
                                 "default": null
                               },
                               {
                                 "name": "optionalLongField",
                                 "type": [
                                   "null",
                                   "long"
                                 ],
                                 "default": null
                               }
                             ]
                           }
                         ],
                         "default": null
                       }
                     ]
                   }
                 ],
                 "default": null
               },
               {
                 "name": "failed",
                 "type": [
                   "null",
                   {
                     "type": "record",
                     "name": "FailedRecord",
                     "fields": [
                       {
                         "name": "metrics",
                         "type": [
                           "null",
                           "RecognitionMetrics"
                         ],
                         "default": null
                       }
                     ]
                   }
                 ],
                 "default": null
               }
             ]
           }
         ],
         "default": null
       }
     ]
   }
   
   We use this schema to create the iceberg table. And later, we want to delete the optionalLongField in RecognitionMetrics. So the new schema becomes
   
   {
     "type": "record",
     "name": "RequestContextEvent",
     "namespace": "avro.com.schemas",
     "fields": [
       {
         "name": "payload",
         "type": [
           "null",
           {
             "type": "record",
             "name": "RequestContext",
             "fields": [
               {
                 "name": "ended",
                 "type": [
                   "null",
                   {
                     "type": "record",
                     "name": "EndedRecord",
                     "fields": [
                       {
                         "name": "metrics",
                         "type": [
                           "null",
                           {
                             "type": "record",
                             "name": "RecognitionMetrics",
                             "fields": [
                               {
                                 "name": "optionalStringField",
                                 "type": [
                                   "null",
                                   "string"
                                 ],
                                 "default": null
                               }
                             ]
                           }
                         ],
                         "default": null
                       }
                     ]
                   }
                 ],
                 "default": null
               },
               {
                 "name": "failed",
                 "type": [
                   "null",
                   {
                     "type": "record",
                     "name": "FailedRecord",
                     "fields": [
                       {
                         "name": "metrics",
                         "type": [
                           "null",
                           "RecognitionMetrics"
                         ],
                         "default": null
                       }
                     ]
                   }
                 ],
                 "default": null
               }
             ]
           }
         ],
         "default": null
       }
     ]
   }
   
   Our iceberg evolution code never drops column/field since it contains historical data. And to make sure the data ingestion write data in right order, we always apply the function AvroSchemaUtil.buildAvroProjection to the new schema based on the iceberg table schema. 
   
   But for this case, the projectionSchema will throw error when converting to string format: Can't redefine: avro.com.schemas.RecognitionMetrics
   since when building the projection schema, iceberg table still has the optionalLongField, the function will add the deleted field(optionalLongField) back. It creates RecognitionMetrics twice with different field name inside, one is optionalLongField_r2, the other is optionalLongField_r5. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org