You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by Cody Jones <cj...@greshamtech.com> on 2022/10/06 11:06:55 UTC

Don't throw can't redefine error when the schema is identical

Hey,

We are using Avro via the Clojure wrapper Abracad (https://github.com/ovotech/abracad). When adding schemas, it's sometimes quite difficult to get the order of type definitions correct and we often encounter "can't redefine ..." errors. To make this easier we had an idea that instead of checking whether a type has been defined already using only its name, we could then check the structure of the existing type with the new one and if they are identical not throw an error at all. This should allow the same type to be parsed in multiple places if it's not been changed.

We forked off of the main branch of your library and started playing with that idea (see our code here: https://github.com/apache/avro/compare/master...cjones1122:avro:allow-redefine-if-exactly-the-same), only adding an additional predicate to the put() methods in Schema and SchemaBuilder where the exception is currently thrown. Now if a type already exists it will compare it using the existing equals() function and only throw an error if they differ structurally. Since we don't know all the caveats we'd like to ask if this is a possible/sensible approach and a change you'd be interested in getting into the library?

We also created some tests in TestSchema and TestSchemaBuilder which mostly seem to work as we had hoped; however, the test "TestSchema#testSameRecordSameNameSucceedBeingParsedTwice" parses the same record twice and throws the can't redefine exception. We are not sure why this happens with our added changes and would appreciate any feedback/help with this.

Many thanks,
Cody Jones
Privileged or confidential information may be contained in this message. If you are not the addressee of this message please notify the sender by return and thereafter delete the message, and you may not use, copy, disclose or rely on the information contained in it. Internet e-mail may be susceptible to data corruption, interception and unauthorised amendment for which Gresham does not accept liability. Whilst we have taken reasonable precautions to ensure that this e-mail and any attachments have been swept for viruses, Gresham does not accept liability for any damage sustained as a result of viruses. Statements in this message that do not relate to the business of Gresham are neither given nor endorsed by the company or its directors. Gresham Technologies plc Registered in England and Wales. Company No. 01072032 Registered Office: Aldermary House, 10-15 Queen Street, London, EC4N 1TX. Further information about Gresham Technologies can be found on our website: www.greshamtech.com

Re: Don't throw can't redefine error when the schema is identical

Posted by Oscar Westra van Holthe - Kind <os...@westravanholthe.nl>.
Hi,

Performance issues are unlikely: the equality check only fires when the
code would otherwise throw (which is much slower).

Worse however, is that the two adjusted code snippets are not the only
place for this functionality. There are several more: when writing (!) with
Schema.toString(boolean), and in the IDL parsers (of which there are now 2).

I'm currently working on a change to align this, while refactoring the
schema parsers (both JSON and IDL). A PR will follow in the near future.

Kind regards,
Oscar


On Fri, 7 Oct 2022 at 09:54, Martin Grigorov <mg...@apache.org> wrote:

> Hi Cody,
>
> I think these extra checks for equality will affect the performance for all
> users.
>
> Woud it be possible to wrap the "put" calls with "if !containsKey(name)"
> check in your code ?
>
>
> On Thu, Oct 6, 2022 at 2:23 PM Cody Jones <cj...@greshamtech.com> wrote:
>
> > Hey,
> >
> > We are using Avro via the Clojure wrapper Abracad (
> > https://github.com/ovotech/abracad). When adding schemas, it's sometimes
> > quite difficult to get the order of type definitions correct and we often
> > encounter "can't redefine ..." errors. To make this easier we had an idea
> > that instead of checking whether a type has been defined already using
> only
> > its name, we could then check the structure of the existing type with the
> > new one and if they are identical not throw an error at all. This should
> > allow the same type to be parsed in multiple places if it's not been
> > changed.
> >
> > We forked off of the main branch of your library and started playing with
> > that idea (see our code here:
> >
> https://github.com/apache/avro/compare/master...cjones1122:avro:allow-redefine-if-exactly-the-same
> ),
> > only adding an additional predicate to the put() methods in Schema and
> > SchemaBuilder where the exception is currently thrown. Now if a type
> > already exists it will compare it using the existing equals() function
> and
> > only throw an error if they differ structurally. Since we don't know all
> > the caveats we'd like to ask if this is a possible/sensible approach and
> a
> > change you'd be interested in getting into the library?
> >
> > We also created some tests in TestSchema and TestSchemaBuilder which
> > mostly seem to work as we had hoped; however, the test
> > "TestSchema#testSameRecordSameNameSucceedBeingParsedTwice" parses the
> same
> > record twice and throws the can't redefine exception. We are not sure why
> > this happens with our added changes and would appreciate any
> feedback/help
> > with this.
> >
> > Many thanks,
> > Cody Jones
> > Privileged or confidential information may be contained in this message.
> > If you are not the addressee of this message please notify the sender by
> > return and thereafter delete the message, and you may not use, copy,
> > disclose or rely on the information contained in it. Internet e-mail may
> be
> > susceptible to data corruption, interception and unauthorised amendment
> for
> > which Gresham does not accept liability. Whilst we have taken reasonable
> > precautions to ensure that this e-mail and any attachments have been
> swept
> > for viruses, Gresham does not accept liability for any damage sustained
> as
> > a result of viruses. Statements in this message that do not relate to the
> > business of Gresham are neither given nor endorsed by the company or its
> > directors. Gresham Technologies plc Registered in England and Wales.
> > Company No. 01072032 Registered Office: Aldermary House, 10-15 Queen
> > Street, London, EC4N 1TX. Further information about Gresham Technologies
> > can be found on our website: www.greshamtech.com
> >
>


-- 

✉️ Oscar Westra van Holthe - Kind <os...@westravanholthe.nl>

Re: Don't throw can't redefine error when the schema is identical

Posted by Martin Grigorov <mg...@apache.org>.
Hi Cody,

I think these extra checks for equality will affect the performance for all
users.

Woud it be possible to wrap the "put" calls with "if !containsKey(name)"
check in your code ?


On Thu, Oct 6, 2022 at 2:23 PM Cody Jones <cj...@greshamtech.com> wrote:

> Hey,
>
> We are using Avro via the Clojure wrapper Abracad (
> https://github.com/ovotech/abracad). When adding schemas, it's sometimes
> quite difficult to get the order of type definitions correct and we often
> encounter "can't redefine ..." errors. To make this easier we had an idea
> that instead of checking whether a type has been defined already using only
> its name, we could then check the structure of the existing type with the
> new one and if they are identical not throw an error at all. This should
> allow the same type to be parsed in multiple places if it's not been
> changed.
>
> We forked off of the main branch of your library and started playing with
> that idea (see our code here:
> https://github.com/apache/avro/compare/master...cjones1122:avro:allow-redefine-if-exactly-the-same),
> only adding an additional predicate to the put() methods in Schema and
> SchemaBuilder where the exception is currently thrown. Now if a type
> already exists it will compare it using the existing equals() function and
> only throw an error if they differ structurally. Since we don't know all
> the caveats we'd like to ask if this is a possible/sensible approach and a
> change you'd be interested in getting into the library?
>
> We also created some tests in TestSchema and TestSchemaBuilder which
> mostly seem to work as we had hoped; however, the test
> "TestSchema#testSameRecordSameNameSucceedBeingParsedTwice" parses the same
> record twice and throws the can't redefine exception. We are not sure why
> this happens with our added changes and would appreciate any feedback/help
> with this.
>
> Many thanks,
> Cody Jones
> Privileged or confidential information may be contained in this message.
> If you are not the addressee of this message please notify the sender by
> return and thereafter delete the message, and you may not use, copy,
> disclose or rely on the information contained in it. Internet e-mail may be
> susceptible to data corruption, interception and unauthorised amendment for
> which Gresham does not accept liability. Whilst we have taken reasonable
> precautions to ensure that this e-mail and any attachments have been swept
> for viruses, Gresham does not accept liability for any damage sustained as
> a result of viruses. Statements in this message that do not relate to the
> business of Gresham are neither given nor endorsed by the company or its
> directors. Gresham Technologies plc Registered in England and Wales.
> Company No. 01072032 Registered Office: Aldermary House, 10-15 Queen
> Street, London, EC4N 1TX. Further information about Gresham Technologies
> can be found on our website: www.greshamtech.com
>