You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by GitBox <gi...@apache.org> on 2021/09/13 18:31:25 UTC

[GitHub] [avro] unchuckable opened a new pull request #1333: AVRO-3048: Backport to avro 1.10 branch

unchuckable opened a new pull request #1333:
URL: https://github.com/apache/avro/pull/1333


   Backport of PR https://github.com/apache/avro/pull/1206 to 1.10 branch. For details, also see there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] xkrogen commented on pull request #1333: AVRO-3048: Backport of performance fix to Record Builders to avro 1.10 branch

Posted by GitBox <gi...@apache.org>.
xkrogen commented on pull request #1333:
URL: https://github.com/apache/avro/pull/1333#issuecomment-1009091763


   Hi @martin-g, thanks for checking in. Not yet, we are still working through the process of bringing our ecosystem onto Avro 1.10 and have worked around this issue in the meantime with [a change in the wrapper we use for generating SpecificRecord classes](https://github.com/linkedin/avro-util/pull/242). I do hope/anticipate it will be easy for us to move users to Avro 1.11 once this is complete, but we're taking it one step at a time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] RyanSkraba commented on pull request #1333: AVRO-3048: Backport of performance fix to Record Builders to avro 1.10 branch

Posted by GitBox <gi...@apache.org>.
RyanSkraba commented on pull request #1333:
URL: https://github.com/apache/avro/pull/1333#issuecomment-930355417


   Oh, as a quick note -- the Avro convention (for historical reasons) is that `1.11.0` is a major release, and `1.10.3` is a minor release!  It's unexpected AND undocumented (at least you have to search for it).
   
   All that being said, pragmatically for your immediate problem, I think you'll find `1.10.2` -> `1.11.0` to be a pretty easy migration, so I'm going to focus on that.  Depending on the outcome of the discussion on the mailing list, we can revisit supporting a `1.10.3` and _especially_ policy and process improvements for the future!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] xkrogen commented on pull request #1333: AVRO-3048: Backport of performance fix to Record Builders to avro 1.10 branch

Posted by GitBox <gi...@apache.org>.
xkrogen commented on pull request #1333:
URL: https://github.com/apache/avro/pull/1333#issuecomment-1010371532


   Thanks @RyanSkraba ! The `avro-util/helper` module is mostly the work of @radai-rosenblatt , so I'll let him chime in for your last questions about interaction between the two projects. I work on Spark at LinkedIn, so I am merely a user of Avro :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] xkrogen commented on pull request #1333: AVRO-3048: Backport of performance fix to Record Builders to avro 1.10 branch

Posted by GitBox <gi...@apache.org>.
xkrogen commented on pull request #1333:
URL: https://github.com/apache/avro/pull/1333#issuecomment-925988839


   @RyanSkraba or @Fokko -- I see you helped review the original PR #1206, would you be able to help take a look at this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] xkrogen commented on pull request #1333: AVRO-3048: Backport of performance fix to Record Builders to avro 1.10 branch

Posted by GitBox <gi...@apache.org>.
xkrogen commented on pull request #1333:
URL: https://github.com/apache/avro/pull/1333#issuecomment-929349477


   Thanks for looking @RyanSkraba !
   
   Speaking as an enterprise user of Avro, there is a fair amount of FUD around Avro minor version upgrades. I won't comment on whether or not it's warranted, but it definitely exists. My organization would benefit greatly from a patch release of Avro 1.10 that has this performance regression addressed, allowing for an easy upgrade without switching minor versions, and I'm sure many other organizations would benefit similarly.
   
   IMO, the case for a backport is particularly strong here as this is a fix for a performance _regression_, as opposed to a net-new enhancement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] martin-g commented on pull request #1333: AVRO-3048: Backport of performance fix to Record Builders to avro 1.10 branch

Posted by GitBox <gi...@apache.org>.
martin-g commented on pull request #1333:
URL: https://github.com/apache/avro/pull/1333#issuecomment-929879103


   @xkrogen Please do not involve corporate/enterprise thinking/talking into OSS project unless your enterprise wants to offer its help!
   Most of us do this in our spare time!
   The best you could do is to test Avro master (1.11.0-SNAPSHOT) in your application and give feedback! If you find more regressions then you will have more ground to ask for 1.10.x!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] RyanSkraba commented on pull request #1333: AVRO-3048: Backport of performance fix to Record Builders to avro 1.10 branch

Posted by GitBox <gi...@apache.org>.
RyanSkraba commented on pull request #1333:
URL: https://github.com/apache/avro/pull/1333#issuecomment-1010238781


   I brought up the issue of supporting more than one version at a time on the mailing list again!  If it would make your work easier, please go ahead and weigh in :D 
   
   It looks like your avro-util repo is one of the only places that deals with interoperating between Java SpecificRecords that were generated with different versions of Avro.  I remain impressed by the task!
   
   Is this something that the Avro project can or should be more involved in?  Either making it easier for your utilities or doing some specific tests when validating a new release candidate?  Is this something to bring up on the mailing list?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] martin-g commented on pull request #1333: AVRO-3048: Backport of performance fix to Record Builders to avro 1.10 branch

Posted by GitBox <gi...@apache.org>.
martin-g commented on pull request #1333:
URL: https://github.com/apache/avro/pull/1333#issuecomment-1008810079


   @xkrogen Have you tried 1.11.0 ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] xkrogen commented on pull request #1333: AVRO-3048: Backport of performance fix to Record Builders to avro 1.10 branch

Posted by GitBox <gi...@apache.org>.
xkrogen commented on pull request #1333:
URL: https://github.com/apache/avro/pull/1333#issuecomment-929349477


   Thanks for looking @RyanSkraba !
   
   Speaking as an enterprise user of Avro, there is a fair amount of FUD around Avro minor version upgrades. I won't comment on whether or not it's warranted, but it definitely exists. My organization would benefit greatly from a patch release of Avro 1.10 that has this performance regression addressed, allowing for an easy upgrade without switching minor versions, and I'm sure many other organizations would benefit similarly.
   
   IMO, the case for a backport is particularly strong here as this is a fix for a performance _regression_, as opposed to a net-new enhancement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] RyanSkraba commented on pull request #1333: AVRO-3048: Backport of performance fix to Record Builders to avro 1.10 branch

Posted by GitBox <gi...@apache.org>.
RyanSkraba commented on pull request #1333:
URL: https://github.com/apache/avro/pull/1333#issuecomment-929998696






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] radai-rosenblatt commented on pull request #1333: AVRO-3048: Backport of performance fix to Record Builders to avro 1.10 branch

Posted by GitBox <gi...@apache.org>.
radai-rosenblatt commented on pull request #1333:
URL: https://github.com/apache/avro/pull/1333#issuecomment-1011640589


   Hey folks.
   
   I'd like to elaborate a little here about why the avro-util project exists. we can then see if involving upstream avro makes sense in anything.
   
   TL;DR - when using avro in a large, diverse ecosystem, supporting multiple versions of avro simultaneously across the ecosystem is required. there a few things that would make this work easier listed near the bottom.
   
   the main reasons are code/schema reuse and portable libraries. Linkein uses avro for encoding across a lot of its "data plane" (kafka being a good example). a lot of these payloads are "shared models" - data produced by service X that is consumed by services Y and Z over a kafka topic, or any other storage medium or rpc. as such all parties involved must have "the same schema". this is especially true for avro - which requires the exact writer schema on-hand for decoding - vs other serialization formats. the best way to guarantee all parties have the same schema is to have the schema(s) in question in their own library, imported by all parties. 
   
   schemas are not the only thing shared here, code is as well: its much more convenient for developers to operate on generated POJO classes than it is to operate on generic records, and there are libraries who's top level APIs accept/return IndexedRecords, meaning they expose avro.
   
   given the sheer number of codebases involved, and that some constraints on avro are beyond the ability of an organization to control (dictated by requirements of 3rd party external libraries), its not feasible to align on a single version of avro across the organization, and individual codebases change their version of avro on their own schedules.
   
   and so, even maintaining a library of generated record classes, not to mention writing a "portable" library that accepts/returns IndexedRecords requires navigating a very wide range of runtime avro versions. we do so by combination of adapters (different implementations of "the same thing" for different avro versions) and tweaking the generated specific classes.
   
   its obviously unreasonable to expect the avro API to never change. however, there have been previous breaking changes that could have been done "nicer". some examples are below if youre curious.
   
   things the avro project could change to make maintaining avro-util easier:
   
   - document breaking changes to APIs and/or behaviors. currently we try "guessing" those by looking at commit logs but more often than not our users find them 1st 
   - consider adding options to better support cross-version compatibility. this is especially painful in generated code where something as simple as not adding an @Overrides annotation to new methods immediately saves a surprising amount of compilation issues. having default implementations for new methods in the parent abstract classes would also help a great deal.
   - "phase in" breaking changes with a period of error logs?
   
   examples of past breaking changes that could have been less breaking:
   
   - the json format for union encoding was changed between 1.4 and 1.5 to use a fullname as discriminator instead of simple name. the parser could have been made to accept both forms for cases where the simple names are distinct to make this a less breaking change (we eventually wrote our own version of JsonDecoder that does exactly that)
   - in more recent memory (https://issues.apache.org/jira/browse/AVRO-2035) validation for default values was tightened - which is great. but some things like a default value of "true" (a json string) for a binary property could have been kept around initially (with a very nagging error printed at parse time)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] martin-g commented on pull request #1333: AVRO-3048: Backport of performance fix to Record Builders to avro 1.10 branch

Posted by GitBox <gi...@apache.org>.
martin-g commented on pull request #1333:
URL: https://github.com/apache/avro/pull/1333#issuecomment-929879103


   @xkrogen Please do not involve corporate/enterprise thinking/talking into OSS project unless your enterprise wants to offer its help!
   Most of us do this in our spare time!
   The best you could do is to test Avro master (1.11.0-SNAPSHOT) in your application and give feedback! If you find more regressions then you will have more ground to ask for 1.10.x!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] xkrogen commented on pull request #1333: AVRO-3048: Backport of performance fix to Record Builders to avro 1.10 branch

Posted by GitBox <gi...@apache.org>.
xkrogen commented on pull request #1333:
URL: https://github.com/apache/avro/pull/1333#issuecomment-930290879


   Sorry folks, I feel I may have come off in a different way than I intended.
   
   > Please do not involve corporate/enterprise thinking/talking into OSS project unless your enterprise wants to offer its help!
   
   My point was simply that there are large bases of Avro users who could benefit from such a fix. My apologies if bringing an enterprise perspective into the discussion is contrary to the Avro project's normal thought processes. Noting that large-scale users (particularly if there are several) are interested in something is fairly common in the other Apache projects I've worked on, but I recognize that different communities can be very different. I completely understand that work is volunteer here and I am not expecting anything, only trying to provide perspective that may not have been considered.
   
   > This is a fair criticism!
   
   Actually I did not mean this as a criticism of the Avro project, I think it can just as easily be a criticism of enterprise mindsets and upgrade paces :) But regardless, thank you for starting a discussion!
   
   Your mail has also made me see why my backport request might be a bit larger than I thought. It sounds like the Avro project has historically not really done patch releases on non-current minor versions, meaning the scope of the request has expanded beyond "backport this PR" to "backport this PR and cut a patch release specifically for this issue." Completely understood if that's too big of an ask, and I see now why you brought the 1.11.0 release into the picture. I will follow the dev list discussion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] xkrogen commented on pull request #1333: AVRO-3048: Backport of performance fix to Record Builders to avro 1.10 branch

Posted by GitBox <gi...@apache.org>.
xkrogen commented on pull request #1333:
URL: https://github.com/apache/avro/pull/1333#issuecomment-930290879


   Sorry folks, I feel I may have come off in a different way than I intended.
   
   > Please do not involve corporate/enterprise thinking/talking into OSS project unless your enterprise wants to offer its help!
   
   My point was simply that there are large bases of Avro users who could benefit from such a fix. My apologies if bringing an enterprise perspective into the discussion is contrary to the Avro project's normal thought processes. Noting that large-scale users (particularly if there are several) are interested in something is fairly common in the other Apache projects I've worked on, but I recognize that different communities can be very different. I completely understand that work is volunteer here and I am not expecting anything, only trying to provide perspective that may not have been considered.
   
   > This is a fair criticism!
   
   Actually I did not mean this as a criticism of the Avro project, I think it can just as easily be a criticism of enterprise mindsets and upgrade paces :) But regardless, thank you for starting a discussion!
   
   Your mail has also made me see why my backport request might be a bit larger than I thought. It sounds like the Avro project has historically not really done patch releases on non-current minor versions, meaning the scope of the request has expanded beyond "backport this PR" to "backport this PR and cut a patch release specifically for this issue." Completely understood if that's too big of an ask, and I see now why you brought the 1.11.0 release into the picture. I will follow the dev list discussion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] RyanSkraba commented on pull request #1333: AVRO-3048: Backport of performance fix to Record Builders to avro 1.10 branch

Posted by GitBox <gi...@apache.org>.
RyanSkraba commented on pull request #1333:
URL: https://github.com/apache/avro/pull/1333#issuecomment-929998696


   > Speaking as an enterprise user of Avro, there is a fair amount of FUD around Avro minor version upgrades. I won't comment on whether or not it's warranted, but it definitely exists.
   
   This is a fair criticism!  I brought it up on the [mailing list](https://lists.apache.org/thread.html/r3d42328e2e76ea238af09f7ea04a8765e5e9acf40f9f88913c4b1785%40%3Cdev.avro.apache.org%3E), and your comments are welcome.  I feel like the community would be open to making changes!  There's also a very quiet #avro channel on the ASF slack if you want to chat openly or privately (but as always, decisions are made on the mailing list).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org