You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2021/04/20 07:14:20 UTC

[GitHub] [kafka] Naros opened a new pull request #10566: KAFKA-12694 Avoid schema mismatch DataException when validating default values

Naros opened a new pull request #10566:
URL: https://github.com/apache/kafka/pull/10566


   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hjwalt commented on pull request #10566: KAFKA-12694 Avoid schema mismatch DataException when validating default values

Posted by GitBox <gi...@apache.org>.
hjwalt commented on pull request #10566:
URL: https://github.com/apache/kafka/pull/10566#issuecomment-894255662


   I encountered this same problem with debezium postgresql connector, where when the connector is set up with precise decimal handling, will use a struct for its numeric value. When a default value is present, the connector fails.
   
   [Here](https://issues.redhat.com/browse/DBZ-3816) are the details I provided to Debezium team.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] C0urante edited a comment on pull request #10566: KAFKA-12694 Avoid schema mismatch DataException when validating default values

Posted by GitBox <gi...@apache.org>.
C0urante edited a comment on pull request #10566:
URL: https://github.com/apache/kafka/pull/10566#issuecomment-914640232


   Hey Gunnar! Sorry for the delay.
   
   > Initializing a Struct with a builder rather than a Schema seems to be an odd thing to do, no?
   
   Indeed it is rather odd, but it's still possible at the moment and I've definitely seen it in use (intentionally or otherwise...) in at least one fairly large and popular code base. If we can avoid breaking projects that use this pattern (strange as it is) I think that'd be best.
   
   I haven't spent too much time thinking about a clean solution but it's at least possible to satisfy both cases with the following:
   - The change proposed in this PR
   - The change I proposed above where the order of schemas in the comparison in `ConnectSchema::validateValue` is reversed
   - The `ConnectSchema::equals` method is altered to accept any `Schema` instance and only operate on the public methods exposed by the `Schema` interface instead of internal variables for comparison of the schemas' fields, key schema, and value schema
   
   I've verified this locally with the following two test cases:
   
   ```java
       @Test
       public void testDefaultValueStructSchema() {
           SchemaBuilder builder = SchemaBuilder.struct()
               .field("f1", Schema.BOOLEAN_SCHEMA);
   
           Struct defaultValue = new Struct(builder.build()); // the Struct receives a schema, not a builder
           defaultValue.put("f1", true);
   
           builder.defaultValue(defaultValue)
               .build();
       }
   
       @Test
       public void testDefaultValueStructSchemaBuilder() {
           SchemaBuilder builder = SchemaBuilder.struct()
               .field("f1", Schema.BOOLEAN_SCHEMA);
   
           Struct defaultValue = new Struct(builder);
           defaultValue.put("f1", true);
   
           builder.defaultValue(defaultValue).build();
       }
   ```
   
   This would be fairly involved for such a small bug so hopefully there's something less invasive out there, but at least it's possible to favor one use case without compromising the other.
   
   As an alternative, could the connector use the `SchemaBuilder` when instantiating the `Struct` object, instead of calling `build` and using a `ConnectSchema`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] gunnarmorling commented on pull request #10566: KAFKA-12694 Avoid schema mismatch DataException when validating default values

Posted by GitBox <gi...@apache.org>.
gunnarmorling commented on pull request #10566:
URL: https://github.com/apache/kafka/pull/10566#issuecomment-895027031


   Hey @C0urante,
   
   ```
   Struct defaultValue = new Struct(builder);
   ```
   
   Initializing a `Struct` with a builder rather than a `Schema` seems to be an odd thing to do, no? Here's an example for triggering the exception we ran into, without the proposed change:
   
   ```
   SchemaBuilder builder = SchemaBuilder.struct()
       .field("f1", Schema.BOOLEAN_SCHEMA);
   
   Struct defaultValue = new Struct(builder.build()); // the Struct receives a schema, not a builder
   defaultValue.put("f1", true);
   
   builder.defaultValue(defaultValue)
        .build();
   ```
   
   I think ultimately this exposes sort of a gap in the KC schema API, as there's a recursive relationship between struct and schema in the default value case. The suggested change should improve the situation though for the IMO more common case where a `Struct` is initialized with a `Schema` rather than a `SchemaBuilder`.
   
   @C0urante, @rhauch, any thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] C0urante edited a comment on pull request #10566: KAFKA-12694 Avoid schema mismatch DataException when validating default values

Posted by GitBox <gi...@apache.org>.
C0urante edited a comment on pull request #10566:
URL: https://github.com/apache/kafka/pull/10566#issuecomment-824292010


   Hmm... I'm wondering if this might break existing setups. Since the `SchemaBuilder` class does implement the `Schema` interface, it's currently possible to do something like this:
   
   ```java
   import org.apache.kafka.connect.data.Schema;
   import org.apache.kafka.connect.data.SchemaBuilder;
   import org.apache.kafka.connect.data.Struct;
   
   SchemaBuilder builder = SchemaBuilder.struct()
       .field("f1", Schema.BOOLEAN_SCHEMA);
   
   Struct defaultValue = new Struct(builder);
   defaultValue.put("f1", true);
   
   Schema schema = builder.defaultValue(defaultValue).build();
   ```
   
   Validation currently uses the equality method of the `Struct`'s schema: https://github.com/apache/kafka/blob/87b24025cedf08c550a180819b2a5a2dbb75f020/connect/api/src/main/java/org/apache/kafka/connect/data/ConnectSchema.java#L247
   which in this case is a `SchemaBuilder` instance that doesn't have an overridden `equals` method, so I think the change proposed here might cause the above example to start to fail.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] gunnarmorling edited a comment on pull request #10566: KAFKA-12694 Avoid schema mismatch DataException when validating default values

Posted by GitBox <gi...@apache.org>.
gunnarmorling edited a comment on pull request #10566:
URL: https://github.com/apache/kafka/pull/10566#issuecomment-895027031


   Hey @C0urante,
   
   ```
   Struct defaultValue = new Struct(builder);
   ```
   
   Initializing a `Struct` with a builder rather than a `Schema` seems to be an odd thing to do, no? Here's an example for triggering the exception we ran into, without the proposed change:
   
   ```java
   SchemaBuilder builder = SchemaBuilder.struct()
       .field("f1", Schema.BOOLEAN_SCHEMA);
   
   Struct defaultValue = new Struct(builder.build()); // the Struct receives a schema, not a builder
   defaultValue.put("f1", true);
   
   builder.defaultValue(defaultValue)
        .build();
   ```
   
   I think ultimately this exposes sort of a gap in the KC schema API, as there's a recursive relationship between struct and schema in the default value case. The suggested change should improve the situation though for the IMO more common case where a `Struct` is initialized with a `Schema` rather than a `SchemaBuilder`.
   
   @C0urante, @rhauch, any thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] C0urante edited a comment on pull request #10566: KAFKA-12694 Avoid schema mismatch DataException when validating default values

Posted by GitBox <gi...@apache.org>.
C0urante edited a comment on pull request #10566:
URL: https://github.com/apache/kafka/pull/10566#issuecomment-824292010


   Hmm... I'm wondering if this might break existing setups. Since the `SchemaBuilder` class does implement the `Schema` interface, it's currently possible to do something like this:
   
   ```java
   import org.apache.kafka.connect.data.Schema;
   import org.apache.kafka.connect.data.SchemaBuilder;
   import org.apache.kafka.connect.data.Struct;
   
   SchemaBuilder builder = SchemaBuilder.struct()
       .field("f1", Schema.BOOLEAN_SCHEMA);
   
   Struct defaultValue = new Struct(builder);
   defaultValue.put("f1", true);
   
   Schema schema = builder.defaultValue(defaultValue).build();
   ```
   
   Validation currently uses the equality method of the `Struct`'s schema: https://github.com/apache/kafka/blob/87b24025cedf08c550a180819b2a5a2dbb75f020/connect/api/src/main/java/org/apache/kafka/connect/data/ConnectSchema.java#L247
   which in this case is a `SchemaBuilder` instance that doesn't have an overridden `equals` method, so I think the change proposed here might cause the above example to start to fail.
   
   A naive solution could be to modify the validation to be something like `if (!schema.equals(struct.schema()))` but that may introduce other edge cases. Can you provide a brief example of what sort of connector code ran into this bug in the first place? That might help guide the path forward.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] Naros commented on pull request #10566: KAFKA-12694 Avoid schema mismatch DataException when validating default values

Posted by GitBox <gi...@apache.org>.
Naros commented on pull request #10566:
URL: https://github.com/apache/kafka/pull/10566#issuecomment-824277958


   @C0urante @rhauch , could either of you review this to see if its acceptable?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] C0urante commented on pull request #10566: KAFKA-12694 Avoid schema mismatch DataException when validating default values

Posted by GitBox <gi...@apache.org>.
C0urante commented on pull request #10566:
URL: https://github.com/apache/kafka/pull/10566#issuecomment-824292010


   Hmm... I'm wondering if this might break existing setups. Since the `SchemaBuilder` class does implement the `Schema` interface, it's currently possible to do something like this:
   
   ```java
   import org.apache.kafka.connect.data.Schema;
   import org.apache.kafka.connect.data.SchemaBuilder;
   import org.apache.kafka.connect.data.Struct;
   
   SchemaBuilder builder = SchemaBuilder.struct()
       .field("f1", Schema.BOOLEAN_SCHEMA);
   
   Struct defaultValue = new Struct(builder);
   defaultValue.put("f1", true);
   
   Schema schema = builder.defaultValue(defaultValue).build();
   ```
   
   Since validation currently uses the equality method of the `Struct`'s schema (https://github.com/apache/kafka/blob/87b24025cedf08c550a180819b2a5a2dbb75f020/connect/api/src/main/java/org/apache/kafka/connect/data/ConnectSchema.java#L247), which in this case is a `SchemaBuilder` instance that doesn't have an overridden `equals` method, I think the change proposed here might cause the above example to start to fail.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] C0urante commented on pull request #10566: KAFKA-12694 Avoid schema mismatch DataException when validating default values

Posted by GitBox <gi...@apache.org>.
C0urante commented on pull request #10566:
URL: https://github.com/apache/kafka/pull/10566#issuecomment-914640232


   Hey Gunnar! Sorry for the delay.
   
   > Initializing a Struct with a builder rather than a Schema seems to be an odd thing to do, no?
   
   Indeed it is rather odd, but it's still possible at the moment and I've definitely seen it in use (intentionally or otherwise...) in at least one fairly large and popular code base. If we can avoid breaking projects that use this pattern (strange as it is) I think that'd be best.
   
   I haven't spent too much time thinking about a clean solution but it's at least possible to satisfy both cases with the following:
   - The change proposed in this PR
   - The change I proposed above where the order of schemas in the comparison in `ConnectSchema::validateValue` is reversed
   - The `ConnectSchema::equals` method is altered to accept any `Schema` instance and only operate on the public methods exposed by the `Schema` interface instead of internal variables for comparison of the schemas' fields, key schema, and value schema
   
   This would be fairly involved for such a small bug so hopefully there's something less invasive out there, but at least it's possible to favor one use case without compromising the other.
   
   As an alternative, could the connector use the `SchemaBuilder` when instantiating the `Struct` object, instead of calling `build` and using a `ConnectSchema`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org