You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2017/02/16 17:56:41 UTC

[jira] [Commented] (AVRO-2002) Canonical form strip the default value : Schema resolution may provide 2 different answers with same schema's fingerprint

    [ https://issues.apache.org/jira/browse/AVRO-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870374#comment-15870374 ] 

Doug Cutting commented on AVRO-2002:
------------------------------------

I believe you have misunderstood the semantics of fingerprints.  Identical fingerprints mean that one schema can read output of the other without schema resolution, not that both can read a third using schema resolution.

Schema resolution permits interoperability (in some cases) between a pair of schemas whose fingerprints do not match.  The SchemaCompatiblity class can determine whether a pair of schemas can, through resolution, interoperate.

> Canonical form strip the default value : Schema resolution may provide 2 different answers with same schema's fingerprint
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-2002
>                 URL: https://issues.apache.org/jira/browse/AVRO-2002
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.8.1
>            Reporter: Deslandes Hugues
>
> I understand that the schema‘s fingerprint describes uniquely the Avro Schema. The following example shows 2 different schemas, with the same fingerprint but different behaviours: one can read the writer, the other one can’t. I guess it is a bug but maybe it's only a misinterpretation…  
> Here are the details : 
> First, the Canonical form of an Avro Schema is derived using this rule: (see http://avro.apache.org/docs/1.8.1/spec.html#Transforming+into+Parsing+Canonical+Form  )
> {quote}
> [STRIP] Keep only attributes that are relevant to parsing data, which are: type, name, fields, symbols, items, values, size. Strip all others (e.g., doc and aliases). {quote}  
> So any default attribute is removed.
> On the other hand, Schema Resolution is done using this particular rule: (http://avro.apache.org/docs/1.8.1/spec.html#Schema+Resolution  )
> {quote}if the reader's record schema has a field with no default value, and writer's schema does not have a field with the same name, an error is signalled.{quote}
> To illustrate the situation on a simple schema (writer), I have created a new version by adding a new field to the schema with 2 options: one has a default attribute and value, the other one hasn’t.  The first one can read old version of writer, the second one can’t.
> In other words, the canonical form does not take into account any default attribute for the record fields but the resolution algorithm uses the default attribute to evaluate the compatibility. The conclusion is that 2 schemas that differ only with a default attribute have the same finger print: one is compatible with the writer schema, the other one is not.
> I understand the different behaviors but not with the same fingerprint.
> I would suggest that the canonical form would not strip the default attribute (but strip the default value which should not interfere with the compatibility).
> The immediate workaround I will use is to systematically use a default value for any additional field.
> {code:linenumbers=true|language=java}
> package Main;
> import java.util.Collections;
> import org.apache.avro.Schema;
> import org.apache.avro.SchemaCompatibility;
> import org.apache.avro.SchemaNormalization;
> import org.apache.avro.SchemaValidationException;
> import org.apache.avro.SchemaValidator;
> import org.apache.avro.SchemaValidatorBuilder;
> public class Main {
> 	public static void main(String[] args) {
> 		Schema schemaWriter = new org.apache.avro.Schema.Parser().parse(
> 				"{\"type\":\"record\",\"name\":\"ExampleAvro\",\"fields\":[{\"name\":\"field\",\"type\":\"long\"}]}");
> 		Schema schemaReader = new org.apache.avro.Schema.Parser().parse(
> 				"{\"type\":\"record\",\"name\":\"ExampleAvro\",\"fields\":[{\"name\":\"field\",\"type\":\"long\"},{\"name\":\"newField\",\"type\":\"int\",\"default\":0}]}");
> 		Schema schemaReaderNoDefault = new org.apache.avro.Schema.Parser().parse(
> 				"{\"type\":\"record\",\"name\":\"ExampleAvro\",\"fields\":[{\"name\":\"field\",\"type\":\"long\"},{\"name\":\"newField\",\"type\":\"int\"}]}");
> 		long fpWriter = SchemaNormalization.parsingFingerprint64(schemaWriter);
> 		long fpReader = SchemaNormalization.parsingFingerprint64(schemaReader);
> 		long fpReaderNoDefault = SchemaNormalization.parsingFingerprint64(schemaReaderNoDefault);
> 		
> 		System.out.println("Schema writer          " + fpWriter + " "+ schemaWriter);
> 		System.out.println("Schema reader          " + fpReader + " "+ schemaReader);
> 		System.out.println("Schema readerNoDefault " + fpReaderNoDefault + " "+ schemaReaderNoDefault);
> 		// check compatibility : method 1
> 		String res = SchemaCompatibility.checkReaderWriterCompatibility(schemaReader, schemaWriter).getType().toString() ;
> 		String resNoDefault = SchemaCompatibility.checkReaderWriterCompatibility(schemaReaderNoDefault, schemaWriter).getType().toString() ;
> 		
> 		System.out.println(fpReader + " is " + res +  " with " +fpWriter);
> 		System.out.println(fpReaderNoDefault + " is " + resNoDefault +  " with " +fpWriter);
> 		// check compatibility : method 2 
> 		SchemaValidator validator = new SchemaValidatorBuilder().canReadStrategy().validateAll();
> 		String isCompatible="";
> 		try {
> 			validator.validate(schemaReaderNoDefault,  Collections.singletonList(schemaWriter));
> 		} catch (SchemaValidationException e) {
> 			isCompatible="not ";
> 		}	
> 		System.out.println(fpReaderNoDefault + " is "+ isCompatible +"compatible with " +fpWriter);
> 		isCompatible="";
> 		try {
> 			validator.validate(schemaReader,  Collections.singletonList(schemaWriter));
> 		} catch (SchemaValidationException e) {
> 			isCompatible="not ";
> 		}	
> 		System.out.println(fpReader + " is "+ isCompatible +"compatible with " +fpWriter);
> 		System.out.println("------------");
> 	}
> 	//The output is :
> 	//Schema writer          8957007963871099370 {"type":"record","name":"ExampleAvro","fields":[{"name":"field","type":"long"}]}
> 	//Schema reader          489516346825099350 {"type":"record","name":"ExampleAvro","fields":[{"name":"field","type":"long"},{"name":"newField","type":"int","default":0}]}
> 	//Schema readerNoDefault 489516346825099350 {"type":"record","name":"ExampleAvro","fields":[{"name":"field","type":"long"},{"name":"newField","type":"int"}]}
> 	//489516346825099350 is COMPATIBLE with 8957007963871099370
> 	//489516346825099350 is INCOMPATIBLE with 8957007963871099370
> 	//489516346825099350 is not compatible with 8957007963871099370
> 	//489516346825099350 is compatible with 8957007963871099370
> 	
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)