You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "haizhou-zhao (via GitHub)" <gi...@apache.org> on 2023/02/10 00:56:43 UTC

[GitHub] [iceberg] haizhou-zhao opened a new issue, #6798: [Rest Catalog Open API] Usage of "oneof" in the definition

haizhou-zhao opened a new issue, #6798:
URL: https://github.com/apache/iceberg/issues/6798

   ### Query engine
   
   _No response_
   
   ### Question
   
   ## **Summary**
   
   I attempted to use open api generator (jaxrs-spec) to generate java api&service code using the provided Iceberg Rest Catalog OpenAPI spec provided [here](https://github.com/apache/iceberg/blob/master/open-api/rest-catalog-open-api.yaml). I spotted some potential misusage of "**oneof**" key word in the API yaml definition that produced undesired result in the generated java code, therefore I want to raise a question to see if there's anything I did incorrectly.
   
   ## **Details**
   
   ### Background:
   
   1. Tools I'm using:
   
   OpenAPI generator cli version: 6.3.0
   
   2. Section concerned in the API definition:
   
   [#/components/schemas/Type](https://github.com/apache/iceberg/blob/master/open-api/rest-catalog-open-api.yaml#L1068)
   
   3. Java Model generated by OpenAPI generator:
   
   https://github.com/haizhou-zhao/test-iceberg-rest-jaxrs/blob/master/src/gen/java/org/openapitools/model/Type.java
   
   ### What went wrong:
   
   Java code generated successfully. Yet the generated code is very different from the original Type API in Iceberg: [ref](https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/types/Type.java).
   
   My guess is that the original intention is to generate code using inheritance (polymorphism) like `MapType extends Type` , but OpenAPI generated code is closer to a Uber class (composition) like 
   ```
   class Type {
     MapType mapType;
     StructType structType;
     ListType listType;
     ...
   }
   ```
   
   ### Question:
   
   Am I correct assuming that that the original intention was to define a polymorphic (instead of composite) relationship between `Type` and `{MapType, StructType, ListType}`?
   
   ### Further thoughts
   
   If we want polymorphism instead of composition, then instead of this definition (the current version)
   ```
       Type:
         oneOf:
           - $ref: '#/components/schemas/PrimitiveType'
           - $ref: '#/components/schemas/StructType'
           - $ref: '#/components/schemas/ListType'
           - $ref: '#/components/schemas/MapType'
   ```
   the following definition might be better
   ```
      Type:
         discriminator:
           propertyName: typeId
         type: object
         properties:
           typeId:
             type: string
             example:
               - "string"
               - "boolean"
               - "integer"
               - ...
   
       NestedType:
         allOf:
           - $ref: '#/components/schemas/Type'
   
       MapType:
         allOf:
           - $ref: '#/components/schemas/NestedType'
           - type: object
              properties:
                 keyField:
                    type: $ref: '#/components/schemas/NestedField'
                 valueField:
                    type: $ref: '#/components/schemas/NestedField'
   
       ...
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] haizhou-zhao commented on issue #6798: [Rest Catalog Open API] Usage of "oneof" in the definition

Posted by "haizhou-zhao (via GitHub)" <gi...@apache.org>.
haizhou-zhao commented on issue #6798:
URL: https://github.com/apache/iceberg/issues/6798#issuecomment-1426168486

   @Fokko Thanks for your response. If I understand you correctly, both of us found that the generated models from the provided OpenAPI spec is different from the existing models provided by iceberg java library, and therefore might be unusable to parse the existing schema and other Iceberg Metadata. So in summary, it sounds like the current OpenAPI spec is "incorrect". Let me know this is the right claim.
   
   First, thanks for bringing to my attention this PR https://github.com/apache/iceberg/pull/6672, I think this is a worthy effort and will improve the quality of Iceberg's Open API spec in the long run.
   
   Secondly, yes, I'm considering using iceberg-core library to parse Iceberg Metadata (including schema information). Just like what you experienced, I don't think the current OpenAPI generated model could be used to parse an Iceberg Metadata File, then using the original java lib sounds like a good temporary work around.
   
   Finally, since we talked about potentially correct the OpenAPI spec, I'd like to learn more about the "endeavor" we need to take to update it. Regarding backward compatibility, I'm trying to make a bold guess: if the current OpenAPI spec defined model is not usable for parsing Iceberg Metadata, then perhaps not very much people are actually using it? I'm trying to get an idea whether the current OpenAPI spec is closer to alpha testing, or it's fully production launched. If it is just in alpha testing stage, then perhaps backward compatibility is not very important?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] haizhou-zhao commented on issue #6798: [Rest Catalog Open API] Usage of "oneof" in the definition

Posted by "haizhou-zhao (via GitHub)" <gi...@apache.org>.
haizhou-zhao commented on issue #6798:
URL: https://github.com/apache/iceberg/issues/6798#issuecomment-1472907739

   @Fokko Hello Fokko, sorry it's been a while. I just wanna follow up to learn more about what is the future roadmap/vision for the rest spec look like? Specifically, is it gonna 
   (1) achieve feature parity with iceberg-core models, like @snazy suggested in https://github.com/apache/iceberg/pull/6672 where openapi generated models and iceberg-core models can mutually serialize/deserialize into each other; 
   (2) or even one step further, having openapi generated models replace iceberg-core models entirely at some point in the future;
   (3) or keep them different because there is good reason for them to be different
   
   Would appreciate your opinion or anyone else's in the community who's familiar with the matter. Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on issue #6798: [Rest Catalog Open API] Usage of "oneof" in the definition

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on issue #6798:
URL: https://github.com/apache/iceberg/issues/6798#issuecomment-1425519691

   Hey @haizhou-zhao thanks for the elaborate explanation, much appreciated.
   
   I ran into the same issue with generating the classes for PyIceberg. I tried to do this using the generator but ran into the same issue that a type can be both a string and an object (or a dictionary in Python world). I ended up generating the code and implementing a custom parser to distinguish between the primitive and complex types. Agreed that having this as an object is nicer because this would also allow us not to have to parse the `fixed[22]` and just have `{'typeId': 'fixed', 'length': 22}`. Changing this is possible, but would be quite a bit endeavor since we would have to keep backward compatibility.
   
   Did you consider relying on the Iceberg library itself for parsing the schema?
   
   Related to https://github.com/apache/iceberg/pull/6672


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on issue #6798: [Rest Catalog Open API] Usage of "oneof" in the definition

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on issue #6798:
URL: https://github.com/apache/iceberg/issues/6798#issuecomment-1430363718

   @haizhou-zhao Thanks again for the elaborate response.
   
   I don't think the open-API spec is incorrect, but the generated code isn't smart enough to handle the complexity of the spec. We validate the spec in the CI when it is changed in a PR, so it is technically correct, but the issue here is that the code was written first, and then it was encapsulated in the spec.
   
   > Finally, since we talked about potentially correct the OpenAPI spec, I'd like to learn more about the "endeavor" we need to take to update it.
   
   Let me give an example. The open-API spec also describes the Iceberg schema, and here we have the issue of the primitive (`"string"`), and complex types (`{"type": "list", "element-id": 3, "element-required": true, "element": "string"}`). To make use of the generated code, would need to change the type to always be a dictionary instead of a plain string (at least this was one of the issues I bumped into with PyIceberg). These schemas live everywhere, for example, they are encoded in the metadata and the parquet metadata. If we would change this in an incompatible way then older readers won't be able to read newer files (because they don't understand `{"type": "string"}`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on issue #6798: [Rest Catalog Open API] Usage of "oneof" in the definition

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on issue #6798:
URL: https://github.com/apache/iceberg/issues/6798#issuecomment-1431247665

   Feel free to ask any question, even the rudimental ones :)
   
   > Let me know if my assumption is correct so far, cause my later statements/questions are based on them.
   
   This assumption is correct. The Java code is the implementation of the [Iceberg table specification](https://iceberg.apache.org/spec/). And, coincidentally they are the same. I think this is the main reason that changing is difficult we don't want them to diverge.
   
   The `RESTSerializers.java` use the TableMetadata (de)serializer from core: https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/rest/RESTSerializers.java#L108-L123
   
   > Q1: But shouldn't the rest service return an object named [LoadTableResponse](https://github.com/apache/iceberg/blob/master/open-api/rest-catalog-open-api.yaml#L2310) as defined by OpenAPI?
   
   You mean `LoadTableResult`? Looking at the spec, I think the author wasn't super consistent with `Response` and `Result`. I think renaming `LoadTableResult` to `LoadTableResponse` in the Open API Spec makes sense to avoid confusion.
   
   > Q2: On Q2: I understand your point that there needs to be two sets of model so that we could maintain backward compatibility for the existing Tables (metadata, snapshots, etc.)? Conceptually, if we technically have a way to merge these two sets of models (which are conceptually for the same purpose), then would it make things easier by eliminating the need of an "converter/adapter" between the 2?
   
   The models overlap, but we should probably make this more clear in the docs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] haizhou-zhao commented on issue #6798: [Rest Catalog Open API] Usage of "oneof" in the definition

Posted by "haizhou-zhao (via GitHub)" <gi...@apache.org>.
haizhou-zhao commented on issue #6798:
URL: https://github.com/apache/iceberg/issues/6798#issuecomment-1430593707

   @Fokko thanks for the explanation and your patience. I might be asking many rudimental questions as I'm new to this section of the code base. I see what you meant by the existing OpenAPI spec is just not complex enough to handle all the scenarios. I apologize if I talk too much from the perspective of java, as I haven't played around too much with Iceberg's python or other language APIs.
   
   Perhaps taking a step back, my initial question was out of the assumption that currently there're **two sets of models co-existing**:
   1. the set of models that were written in pure Java inside the iceberg-core library - e.g. [Schema](https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/Schema.java)
   2. the set of models that could be generated from OpenAPI definition - e.g. [Schema](https://github.com/apache/iceberg/blob/master/open-api/rest-catalog-open-api.yaml#L1075)
   
   I believe the first set of models are used by [Iceberg Catalog](https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/catalog/Catalog.java), and the second set of models are used by the "Rest Catalog Services" (who implements the open API spec).
   
   Let me know if my assumption is correct so far, cause my later statements/questions are based on them.
   
   Now, I summarize my questions on rest service models and spec into these **2 questions**:
   
   1. Is there any existing (model) converter/adapter to convert the Rest Service's model, for example, "`LoadTableResponse`" (of generated model, set 2 ^) to "Table" (of the existing pure java code based model, set 1 ^), in a ["getTable"](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/rest/RESTCatalog.java#L93) call initiated by RestCatalog?
   2. Is there any future plan to converge the two sets of models?
   --------------------------
   Some investigation and thoughts I had on the questions above:
   On Q1: I found this part of [RESTSessionCatalog](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java#L269) might be related to "converting/adapting" RestService's LoadTableResponse. To my reading, it looks like the code here is assuming that the rest service will return an object of [`LoadTableResponse.class`](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/rest/responses/LoadTableResponse.java)? But shouldn't the rest service return an object named [`LoadTableResponse`](https://github.com/apache/iceberg/blob/master/open-api/rest-catalog-open-api.yaml#L2310) as defined by OpenAPI? To me, it doesn't look like the two different `LoadTableResponse`s have the same layout, and can be directly de/serialized into one another.
   
   
   On Q2: I understand your point that there needs to be two sets of model so that we could maintain backward compatibility for the existing Tables (metadata, snapshots, etc.)? Conceptually, if we technically have a way to merge these two sets of models (which are conceptually for the same purpose), then would it make things easier by eliminating the need of an "converter/adapter" between the 2?
   
   Again, thx for your patience reading through my insanely long post.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] haizhou-zhao commented on issue #6798: [Rest Catalog Open API] Usage of "oneof" in the definition

Posted by "haizhou-zhao (via GitHub)" <gi...@apache.org>.
haizhou-zhao commented on issue #6798:
URL: https://github.com/apache/iceberg/issues/6798#issuecomment-1442164391

   @Fokko I think that these minor corrections would be helpful to make sure the OpenAPI def stay close to the java models: https://github.com/apache/iceberg/pull/6921 . Can you take a look and let me know if that make sense to you? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] [Rest Catalog Open API] Usage of "oneof" in the definition [iceberg]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #6798:
URL: https://github.com/apache/iceberg/issues/6798#issuecomment-1882020405

   This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org