You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Alexey Romanenko <ar...@gmail.com> on 2022/05/12 17:29:01 UTC

[DISCUSS] Next steps for update of Avro dependency in Beam

Hi everyone,

Sorry in advance for a long email. 
TL;DR: Let’s discuss the next steps to update Avro dependency in Beam.

I’d like to come back to this old and quite sensitive topic here which is Apache Avro version update in Beam. Along the time, we already had several discussions on this (for example [1]) but without any concrete resolutions in the end, iirc.

As we all know, Beam still depends on quite old Avro version 1.8.2 and there were some attempts to bump it to more recent ones. One of the main reasons to bump an Avro version, imho, is that Avro 1.8.2 dependency brings several CVEs [2], but the latest Avro 1.11.0 brings only one [3]

In the same time, this update with introduce some incompatible changes that Avro has between versions and this may affect Beam users and potentially it may affect transitive dependencies while using Beam with other project that use Avro as well:
- Avro completely moved to java.time.* instead of org.joda.time.*. So, we need to adjust date/time conversions from/to Beam schema accordingly since Beam schema still uses joda.time. It will require users to regenerate already generated Java code with avro-compiler (if any) otherwise it won’t compile; 
- Some minor changes in Avro dependencies and user API;
- Something else?

I know that here, on the list, we have people from Avro community that are much more experienced in this than me - so, please correct me if I say something wrong or not 100% correct. 


In Beam, we also performed several attempts to update Avro - for example, [4], [5], [6] and others.

To make such update easier in the future, we also discussed to move Avro dependency out of core Beam [7] and there were an attempt to do that [8] by finally this PR was closed with a resolution that it’s not actually needed and we may just want to test Beam with different Avro versions [9] 

The latest work on this was a PR to support several versions of Avro in Beam (1.8.x and 1.9.x) [10] which still introduces some breaking changes for users, iirc.

So, seems that we are a bit stuck on this topic, though, imho, we need to decide how move forward mostly because of CVEs in old Avro versions and future Avro updates in Beam.

The potential options (as I can see them):

1) Bump Avro dependency to the latest one (1.11.0) or the possible more recent one
	- Pros: 
		- latest/recent Avro dependency; 
		- potentially easy to update in the future;
	- Cons: 
		- breaking change for users; 
		- potentially issues with other projects that use Avro (like Apache Spark e.g.).

2) Support different Avro versions in Beam, make Avro dependency provided 
	- Pros: 
		- user decides which versions to use;
		- easy to update in the future;
	- Cons: 
		- breaking change for users; 
		- not fact that it’s possible to implement in reality; 
		- more tests to test Beam with different Avro versions

3) Extract Avro as an extension, like we do for other formats, and update to latest Avro version, but keep and shade Avro for “core” needs as v.1.8.2 (still have an issue with CVEs)

4) Anything else?


Please, share your thoughts on this and correct me if I stated something wrong. The goal of this discussion is finally to move forward with Avro update topic.

—
Alexey 


[1] https://lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r <https://lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r>
[2] https://mvnrepository.com/artifact/org.apache.avro/avro/1.8.2 <https://mvnrepository.com/artifact/org.apache.avro/avro/1.8.2>
[3] https://mvnrepository.com/artifact/org.apache.avro/avro/1.11.0 <https://mvnrepository.com/artifact/org.apache.avro/avro/1.11.0>
[4] https://github.com/apache/beam/pull/9779 <https://github.com/apache/beam/pull/9779>
[5] https://github.com/apache/beam/pull/17372 <https://github.com/apache/beam/pull/17372>
[6] https://github.com/apache/beam/pull/17246 <https://github.com/apache/beam/pull/17246>
[7] https://lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on <https://lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on>
[8] https://github.com/apache/beam/pull/12748 <https://github.com/apache/beam/pull/12748>
[9] https://lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg <https://lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg>
[10] https://github.com/apache/beam/pull/16271 <https://github.com/apache/beam/pull/16271>







Re: [DISCUSS] Next steps for update of Avro dependency in Beam

Posted by Moritz Mack <mm...@talend.com>.
Sorry, please ignore the previous empty reply …

On 17.05.22, 09:31, "Moritz Mack" <mm...@talend.com> wrote:

On 16.05.22, 18:09, "Robert Bradshaw" <ro...@google.com> wrote:

On Mon, May 16, 2022 at 8:53 AM Alexey Romanenko
<ar...@gmail.com> wrote:
>
>> On 13 May 2022, at 18:38, Robert Bradshaw <ro...@google.com> wrote:
>>
>> We should probably remove the experimental annotations from
> SchemaCoder at this point.
>
> Is there anything that stops us from that?

It's a commitment to not change things, but I think we're at that point by now.

> Having that in mind, here’s my 2 cents on the options mentioned:
>
> Bump Avro in core:
>
> This will seamlessly work for users that just need a coder for complex types, but don’t use Avro themselves.
> Anyone using Avro APIs from user code might get into trouble.
>
> Support different Avro versions & making the dependency provided
>
> In terms of user impact, making Avro provided will be very troublesome. Unless Avro remains available on the classpath through different ways, this would break things in a really bad way. Worst of all, it might not even break at compile time but at runtime only.
> I don’t think this option requires Avro being a provided dependency. If the default is kept as is, anyone can (safely) bump Avro to a higher (supported) version on their classpath. In that case it would be good to warn users about CVEs in logs if they remain on an old version of Avro. But I’m not sure how feasible (reliable) it is to detect the Avro minor version given that things might be repackaged into fat jars.
>
> This seems both difficult and potentially dangerous.
>
> What do you mean by “potentially dangerous”, Robert? Any specific point mentioned above or just having another version of Avro jars in the classpath?

The most dangerous is if a change in avro version had slightly
[1] different semantics when it was swapped out. I think the chances of
that are not too high, but it'll be hard to detect. Mostly I'm
concerned about [2] breaking at runtime in the case of a missing or
incompatible avro.

[1] Don’t you think these can be mitigated by testing relevant parts of Beam against a number of supported Avro versions the same way parts of Beam are tested against various different Hadoop versions.

[2] I agree, this is a huge concern and for that reason I believe Avro cannot be turned into a provided dependency. It would be a dangerous trap for users. But if the default Avro dependency remains as is and Beam is tested to work with a well-defined set of Avro versions (swapping out the default using standard dependency resolution rules), I’d consider the risk to be rather manageable.


>> Extract Avro as an extension
> I think a change that breaks loudly at compile time (if the features
> in question are being depended on), with a clear fix (e.g. "depend on
> this instead") is the "cleanest" way to do a breaking change.
>
> On this note, would it be possible to do explicit "without avro" and
> "with avro" targets/variants in the core module, with plain old "core"
> being an alias first to the latter, and then to the former, according
> to some deprecation timeline? (I'm not familiar enough with
> gradle/maven to know if this is possible.) This could unblock users
> needing a new avro in the short run while providing a longer-term
> migration path. (Ideally the (relatively limited) avro parts of core
> would also get moved to their own extension under their own namespace,
> and the core one would remain solely for backwards compatibility
> reasons.)
>
>
> Good point. Are you talking about keeping the default “core" artifacts with old Avro version (as for now) as a main dependency and provide additionally other “core” artifacts with a recent Avro version or even as provided for users that need other Avro versions?

I'm thinking about keeping the existing "core" artifact, and also
shipping an [3] "avro-less" core target (whether that be via shading or
total removal, but ripping it out of the public API). We would then
ship a [4] separate "avro" target (TBD if providing the dependency or not
makes the most sense here) that the existing core would depend on and
would be a target for avro users to migrate to.

[3] It wouldn’t be too hard modelling this using feature variants. Though, I’m worried about the complexity this creates. In the end, every single Beam module depends on core, which would then depend on the new Avro target. Effectively, we would need to build two different versions of each module depending on either core or the avro-less core. If there would be a well-planned, temporary transition period ahead, it should be fine. But I fear with Beam 3 not even being on the horizon, we’d have to carry this forward for far too long.

[4] Also, what are your thoughts about the new Avro target? It could be done just by moving respective code into a new module (maintaining all packages / fully qualified names). Or “clean” by moving classes adequately but breaking fully qualified names / imports of users. Using the former approach obviously avoids users having to figure out the correct new imports. On the other hand, it might become a blocker supporting Java 11 as it conflicts with the new module system.


> To finally make progress on this topic, I’d suggest starting off with supporting multiple different Avro versions in core (2) but to keep the default Avro version as is.

As stated above, I think it's safer to make the avroless version a new
target initially than change core.

> At the same time, extracting Avro as extension in a backwards compatible way seems to be the right path forward long term together with a well communicated deprecation plan.
>
>
>
> Best,
>
> Moritz
>
>
>
>
>
> From: Brian Hulette <bh...@google.com>
> Date: Thursday, 12. May 2022 at 23:21
> To: dev <de...@beam.apache.org>
> Subject: Re: [DISCUSS] Next steps for update of Avro dependency in Beam
>
> Regarding Option (3) "but keep and shade Avro for “core” needs as v.1.8.2 (still have an issue with CVEs)" Do we actually need to keep avro in core for any reason? I thought we only had it in core for AvroCoder, schema support, and
>
> ZjQcmQRYFpfptBannerStart
>
> This Message Is From an External Sender
>
> This message came from outside your organization.
>
> Exercise caution when opening attachments or clicking any links.
>
> ZjQcmQRYFpfptBannerEnd
>
> Regarding Option (3) "but keep and shade Avro for “core” needs as v.1.8.2 (still have an issue with CVEs)"
>
>
>
> Do we actually need to keep avro in core for any reason? I thought we only had it in core for AvroCoder, schema support, and IOs, which I think are all reasonable to separate out into an extension (this would be comparable to the protobuf extension). To confirm I just grepped for files in core that import avro:
>
> ❯ grep -liIrn 'import org\.apache\.avro' sdks/java/core/src/main
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroByteBuddyUtils.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ConvertHelpers.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroRecordSchema.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/DynamicAvroDestinations.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/SerializableAvroCodecFactory.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSink.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSource.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSchemaIOProvider.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/ConstantAvroDestination.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroGenericCoder.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroCoder.java
>
> Brian
>
>
>
> On Thu, May 12, 2022 at 2:08 PM Robert Bradshaw <ro...@google.com> wrote:
>
> Keeping avro in our public (core) API and as an internal dependency
> seems to be a recurring pain point, I would be all for pulling it out
> (basically option 3) and subsequently updating our internal version
> (hopefully no backwards compatibility issues here) and letting the
> extension live with a variety of versions (insofar as this is
> feasible).
>
> On Thu, May 12, 2022 at 10:29 AM Alexey Romanenko
> <ar...@gmail.com> wrote:
>
>
> Hi everyone,
>
> Sorry in advance for a long email.
> TL;DR: Let’s discuss the next steps to update Avro dependency in Beam.
>
> I’d like to come back to this old and quite sensitive topic here which is Apache Avro version update in Beam. Along the time, we already had several discussions on this (for example [1]) but without any concrete resolutions in the end, iirc.
>
> As we all know, Beam still depends on quite old Avro version 1.8.2 and there were some attempts to bump it to more recent ones. One of the main reasons to bump an Avro version, imho, is that Avro 1.8.2 dependency brings several CVEs [2], but the latest Avro 1.11.0 brings only one [3]
>
> In the same time, this update with introduce some incompatible changes that Avro has between versions and this may affect Beam users and potentially it may affect transitive dependencies while using Beam with other project that use Avro as well:
> - Avro completely moved to java.time.* instead of org.joda.time.*. So, we need to adjust date/time conversions from/to Beam schema accordingly since Beam schema still uses joda.time. It will require users to regenerate already generated Java code with avro-compiler (if any) otherwise it won’t compile;
> - Some minor changes in Avro dependencies and user API;
> - Something else?
>
> I know that here, on the list, we have people from Avro community that are much more experienced in this than me - so, please correct me if I say something wrong or not 100% correct.
>
>
> In Beam, we also performed several attempts to update Avro - for example, [4], [5], [6] and others.
>
> To make such update easier in the future, we also discussed to move Avro dependency out of core Beam [7] and there were an attempt to do that [8] by finally this PR was closed with a resolution that it’s not actually needed and we may just want to test Beam with different Avro versions [9]
>
> The latest work on this was a PR to support several versions of Avro in Beam (1.8.x and 1.9.x) [10] which still introduces some breaking changes for users, iirc.
>
> So, seems that we are a bit stuck on this topic, though, imho, we need to decide how move forward mostly because of CVEs in old Avro versions and future Avro updates in Beam.
>
> The potential options (as I can see them):
>
> 1) Bump Avro dependency to the latest one (1.11.0) or the possible more recent one
> - Pros:
> - latest/recent Avro dependency;
> - potentially easy to update in the future;
> - Cons:
> - breaking change for users;
> - potentially issues with other projects that use Avro (like Apache Spark e.g.).
>
> 2) Support different Avro versions in Beam, make Avro dependency provided
> - Pros:
> - user decides which versions to use;
> - easy to update in the future;
> - Cons:
> - breaking change for users;
> - not fact that it’s possible to implement in reality;
> - more tests to test Beam with different Avro versions
>
> 3) Extract Avro as an extension, like we do for other formats, and update to latest Avro version, but keep and shade Avro for “core” needs as v.1.8.2 (still have an issue with CVEs)
>
> 4) Anything else?
>
>
> Please, share your thoughts on this and correct me if I stated something wrong. The goal of this discussion is finally to move forward with Avro update topic.
>
> —
> Alexey
>
>
> [1] https://urldefense.com/v3/__https://lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FafuMdp7Q$<https://urldefense.com/v3/__https:/lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FafuMdp7Q$>
> [2] https://urldefense.com/v3/__https://mvnrepository.com/artifact/org.apache.avro/avro/1.8.2__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FYIRwfz1A$<https://urldefense.com/v3/__https:/mvnrepository.com/artifact/org.apache.avro/avro/1.8.2__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FYIRwfz1A$>
> [3] https://urldefense.com/v3/__https://mvnrepository.com/artifact/org.apache.avro/avro/1.11.0__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6Fa7FyrH-w$<https://urldefense.com/v3/__https:/mvnrepository.com/artifact/org.apache.avro/avro/1.11.0__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6Fa7FyrH-w$>
> [4] https://urldefense.com/v3/__https://github.com/apache/beam/pull/9779__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FYPzULSoA$<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/9779__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FYPzULSoA$>
> [5] https://urldefense.com/v3/__https://github.com/apache/beam/pull/17372__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FZ1EJn6zw$<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/17372__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FZ1EJn6zw$>
> [6] https://urldefense.com/v3/__https://github.com/apache/beam/pull/17246__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FbBlkHi_A$<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/17246__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FbBlkHi_A$>
> [7] https://urldefense.com/v3/__https://lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FaisZP_Dg$<https://urldefense.com/v3/__https:/lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FaisZP_Dg$>
> [8] https://urldefense.com/v3/__https://github.com/apache/beam/pull/12748__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FZvPO_Cfw$<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/12748__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FZvPO_Cfw$>
> [9] https://urldefense.com/v3/__https://lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FZHw_JT_g$<https://urldefense.com/v3/__https:/lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FZHw_JT_g$>
> [10] https://urldefense.com/v3/__https://github.com/apache/beam/pull/16271__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FZBoQ0uSQ$<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/16271__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FZBoQ0uSQ$>
>
>
>
>
>
>
>
> As a recipient of an email from Talend, your contact personal data will be on our systems. Please see our privacy notice.
>
>

As a recipient of an email from Talend, your contact personal data will be on our systems. Please see our privacy notice. <https://www.talend.com/privacy/>


As a recipient of an email from Talend, your contact personal data will be on our systems. Please see our privacy notice. <https://www.talend.com/privacy/>



Re: [DISCUSS] Next steps for update of Avro dependency in Beam

Posted by Moritz Mack <mm...@talend.com>.

On 17.05.22, 09:31, "Moritz Mack" <mm...@talend.com> wrote:
On 16.05.22, 18:09, "Robert Bradshaw" <ro...@google.com> wrote: On Mon, May 16, 2022 at 8:53 AM Alexey Romanenko <ar...@gmail.com> wrote: > >> On 13 May 2022, at 18:38, Robert Bradshaw <ro...@google.com>
ZjQcmQRYFpfptBannerStar

On 16.05.22, 18:09, "Robert Bradshaw" <ro...@google.com> wrote:

On Mon, May 16, 2022 at 8:53 AM Alexey Romanenko
<ar...@gmail.com> wrote:
>
>> On 13 May 2022, at 18:38, Robert Bradshaw <ro...@google.com> wrote:
>>
>> We should probably remove the experimental annotations from
> SchemaCoder at this point.
>
> Is there anything that stops us from that?

It's a commitment to not change things, but I think we're at that point by now.

> Having that in mind, here’s my 2 cents on the options mentioned:
>
> Bump Avro in core:
>
> This will seamlessly work for users that just need a coder for complex types, but don’t use Avro themselves.
> Anyone using Avro APIs from user code might get into trouble.
>
> Support different Avro versions & making the dependency provided
>
> In terms of user impact, making Avro provided will be very troublesome. Unless Avro remains available on the classpath through different ways, this would break things in a really bad way. Worst of all, it might not even break at compile time but at runtime only.
> I don’t think this option requires Avro being a provided dependency. If the default is kept as is, anyone can (safely) bump Avro to a higher (supported) version on their classpath. In that case it would be good to warn users about CVEs in logs if they remain on an old version of Avro. But I’m not sure how feasible (reliable) it is to detect the Avro minor version given that things might be repackaged into fat jars.
>
> This seems both difficult and potentially dangerous.
>
> What do you mean by “potentially dangerous”, Robert? Any specific point mentioned above or just having another version of Avro jars in the classpath?

The most dangerous is if a change in avro version had slightly
[1] different semantics when it was swapped out. I think the chances of
that are not too high, but it'll be hard to detect. Mostly I'm
concerned about [2] breaking at runtime in the case of a missing or
incompatible avro.

[1] Don’t you think these can be mitigated by testing relevant parts of Beam against a number of supported Avro versions the same way parts of Beam are tested against various different Hadoop versions.

[2] I agree, this is a huge concern and for that reason I believe Avro cannot be turned into a provided dependency. It would be a dangerous trap for users. But if the default Avro dependency remains as is and Beam is tested to work with a well-defined set of Avro versions (swapping out the default using standard dependency resolution rules), I’d consider the risk to be rather manageable.


>> Extract Avro as an extension
> I think a change that breaks loudly at compile time (if the features
> in question are being depended on), with a clear fix (e.g. "depend on
> this instead") is the "cleanest" way to do a breaking change.
>
> On this note, would it be possible to do explicit "without avro" and
> "with avro" targets/variants in the core module, with plain old "core"
> being an alias first to the latter, and then to the former, according
> to some deprecation timeline? (I'm not familiar enough with
> gradle/maven to know if this is possible.) This could unblock users
> needing a new avro in the short run while providing a longer-term
> migration path. (Ideally the (relatively limited) avro parts of core
> would also get moved to their own extension under their own namespace,
> and the core one would remain solely for backwards compatibility
> reasons.)
>
>
> Good point. Are you talking about keeping the default “core" artifacts with old Avro version (as for now) as a main dependency and provide additionally other “core” artifacts with a recent Avro version or even as provided for users that need other Avro versions?

I'm thinking about keeping the existing "core" artifact, and also
shipping an [3] "avro-less" core target (whether that be via shading or
total removal, but ripping it out of the public API). We would then
ship a separate "avro" target (TBD if providing the dependency or not
makes the most sense here) that the existing core would depend on and
would be a target for avro users to migrate to.

[3] It wouldn’t be too hard modelling this using feature variants. Though, I’m worried about the complexity this creates. In the end, every single Beam module depends on core, which would then depend on the new Avro target. Effectively, we would need to build two different versions of each module depending on either core or the avro-less core. If there would be a well-planned, temporary transition period ahead, it should be fine. But I fear with Beam 3 not even being on the horizon, we’d have to carry this forward for far too long.


> To finally make progress on this topic, I’d suggest starting off with supporting multiple different Avro versions in core (2) but to keep the default Avro version as is.

As stated above, I think it's safer to make the avroless version a new
target initially than change core.

> At the same time, extracting Avro as extension in a backwards compatible way seems to be the right path forward long term together with a well communicated deprecation plan.
>
>
>
> Best,
>
> Moritz
>
>
>
>
>
> From: Brian Hulette <bh...@google.com>
> Date: Thursday, 12. May 2022 at 23:21
> To: dev <de...@beam.apache.org>
> Subject: Re: [DISCUSS] Next steps for update of Avro dependency in Beam
>
> Regarding Option (3) "but keep and shade Avro for “core” needs as v.1.8.2 (still have an issue with CVEs)" Do we actually need to keep avro in core for any reason? I thought we only had it in core for AvroCoder, schema support, and
>
> ZjQcmQRYFpfptBannerStart
>
> This Message Is From an External Sender
>
> This message came from outside your organization.
>
> Exercise caution when opening attachments or clicking any links.
>
> ZjQcmQRYFpfptBannerEnd
>
> Regarding Option (3) "but keep and shade Avro for “core” needs as v.1.8.2 (still have an issue with CVEs)"
>
>
>
> Do we actually need to keep avro in core for any reason? I thought we only had it in core for AvroCoder, schema support, and IOs, which I think are all reasonable to separate out into an extension (this would be comparable to the protobuf extension). To confirm I just grepped for files in core that import avro:
>
> ❯ grep -liIrn 'import org\.apache\.avro' sdks/java/core/src/main
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroByteBuddyUtils.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ConvertHelpers.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroRecordSchema.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/DynamicAvroDestinations.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/SerializableAvroCodecFactory.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSink.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSource.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSchemaIOProvider.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/ConstantAvroDestination.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroGenericCoder.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroCoder.java
>
> Brian
>
>
>
> On Thu, May 12, 2022 at 2:08 PM Robert Bradshaw <ro...@google.com> wrote:
>
> Keeping avro in our public (core) API and as an internal dependency
> seems to be a recurring pain point, I would be all for pulling it out
> (basically option 3) and subsequently updating our internal version
> (hopefully no backwards compatibility issues here) and letting the
> extension live with a variety of versions (insofar as this is
> feasible).
>
> On Thu, May 12, 2022 at 10:29 AM Alexey Romanenko
> <ar...@gmail.com> wrote:
>
>
> Hi everyone,
>
> Sorry in advance for a long email.
> TL;DR: Let’s discuss the next steps to update Avro dependency in Beam.
>
> I’d like to come back to this old and quite sensitive topic here which is Apache Avro version update in Beam. Along the time, we already had several discussions on this (for example [1]) but without any concrete resolutions in the end, iirc.
>
> As we all know, Beam still depends on quite old Avro version 1.8.2 and there were some attempts to bump it to more recent ones. One of the main reasons to bump an Avro version, imho, is that Avro 1.8.2 dependency brings several CVEs [2], but the latest Avro 1.11.0 brings only one [3]
>
> In the same time, this update with introduce some incompatible changes that Avro has between versions and this may affect Beam users and potentially it may affect transitive dependencies while using Beam with other project that use Avro as well:
> - Avro completely moved to java.time.* instead of org.joda.time.*. So, we need to adjust date/time conversions from/to Beam schema accordingly since Beam schema still uses joda.time. It will require users to regenerate already generated Java code with avro-compiler (if any) otherwise it won’t compile;
> - Some minor changes in Avro dependencies and user API;
> - Something else?
>
> I know that here, on the list, we have people from Avro community that are much more experienced in this than me - so, please correct me if I say something wrong or not 100% correct.
>
>
> In Beam, we also performed several attempts to update Avro - for example, [4], [5], [6] and others.
>
> To make such update easier in the future, we also discussed to move Avro dependency out of core Beam [7] and there were an attempt to do that [8] by finally this PR was closed with a resolution that it’s not actually needed and we may just want to test Beam with different Avro versions [9]
>
> The latest work on this was a PR to support several versions of Avro in Beam (1.8.x and 1.9.x) [10] which still introduces some breaking changes for users, iirc.
>
> So, seems that we are a bit stuck on this topic, though, imho, we need to decide how move forward mostly because of CVEs in old Avro versions and future Avro updates in Beam.
>
> The potential options (as I can see them):
>
> 1) Bump Avro dependency to the latest one (1.11.0) or the possible more recent one
> - Pros:
> - latest/recent Avro dependency;
> - potentially easy to update in the future;
> - Cons:
> - breaking change for users;
> - potentially issues with other projects that use Avro (like Apache Spark e.g.).
>
> 2) Support different Avro versions in Beam, make Avro dependency provided
> - Pros:
> - user decides which versions to use;
> - easy to update in the future;
> - Cons:
> - breaking change for users;
> - not fact that it’s possible to implement in reality;
> - more tests to test Beam with different Avro versions
>
> 3) Extract Avro as an extension, like we do for other formats, and update to latest Avro version, but keep and shade Avro for “core” needs as v.1.8.2 (still have an issue with CVEs)
>
> 4) Anything else?
>
>
> Please, share your thoughts on this and correct me if I stated something wrong. The goal of this discussion is finally to move forward with Avro update topic.
>
> —
> Alexey
>
>
> [1] https://urldefense.com/v3/__https://lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FafuMdp7Q$<https://urldefense.com/v3/__https:/lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FafuMdp7Q$>
> [2] https://urldefense.com/v3/__https://mvnrepository.com/artifact/org.apache.avro/avro/1.8.2__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FYIRwfz1A$<https://urldefense.com/v3/__https:/mvnrepository.com/artifact/org.apache.avro/avro/1.8.2__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FYIRwfz1A$>
> [3] https://urldefense.com/v3/__https://mvnrepository.com/artifact/org.apache.avro/avro/1.11.0__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6Fa7FyrH-w$<https://urldefense.com/v3/__https:/mvnrepository.com/artifact/org.apache.avro/avro/1.11.0__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6Fa7FyrH-w$>
> [4] https://urldefense.com/v3/__https://github.com/apache/beam/pull/9779__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FYPzULSoA$<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/9779__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FYPzULSoA$>
> [5] https://urldefense.com/v3/__https://github.com/apache/beam/pull/17372__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FZ1EJn6zw$<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/17372__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FZ1EJn6zw$>
> [6] https://urldefense.com/v3/__https://github.com/apache/beam/pull/17246__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FbBlkHi_A$<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/17246__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FbBlkHi_A$>
> [7] https://urldefense.com/v3/__https://lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FaisZP_Dg$<https://urldefense.com/v3/__https:/lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FaisZP_Dg$>
> [8] https://urldefense.com/v3/__https://github.com/apache/beam/pull/12748__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FZvPO_Cfw$<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/12748__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FZvPO_Cfw$>
> [9] https://urldefense.com/v3/__https://lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FZHw_JT_g$<https://urldefense.com/v3/__https:/lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FZHw_JT_g$>
> [10] https://urldefense.com/v3/__https://github.com/apache/beam/pull/16271__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FZBoQ0uSQ$<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/16271__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FZBoQ0uSQ$>
>
>
>
>
>
>
>
> As a recipient of an email from Talend, your contact personal data will be on our systems. Please see our privacy notice.
>
>

As a recipient of an email from Talend, your contact personal data will be on our systems. Please see our privacy notice. <https://www.talend.com/privacy/>


As a recipient of an email from Talend, your contact personal data will be on our systems. Please see our privacy notice. <https://www.talend.com/privacy/>



Re: [DISCUSS] Next steps for update of Avro dependency in Beam

Posted by Moritz Mack <mm...@talend.com>.

On 16.05.22, 18:09, "Robert Bradshaw" <ro...@google.com> wrote:

On Mon, May 16, 2022 at 8:53 AM Alexey Romanenko
<ar...@gmail.com> wrote:
>
>> On 13 May 2022, at 18:38, Robert Bradshaw <ro...@google.com> wrote:
>>
>> We should probably remove the experimental annotations from
> SchemaCoder at this point.
>
> Is there anything that stops us from that?

It's a commitment to not change things, but I think we're at that point by now.

> Having that in mind, here’s my 2 cents on the options mentioned:
>
> Bump Avro in core:
>
> This will seamlessly work for users that just need a coder for complex types, but don’t use Avro themselves.
> Anyone using Avro APIs from user code might get into trouble.
>
> Support different Avro versions & making the dependency provided
>
> In terms of user impact, making Avro provided will be very troublesome. Unless Avro remains available on the classpath through different ways, this would break things in a really bad way. Worst of all, it might not even break at compile time but at runtime only.
> I don’t think this option requires Avro being a provided dependency. If the default is kept as is, anyone can (safely) bump Avro to a higher (supported) version on their classpath. In that case it would be good to warn users about CVEs in logs if they remain on an old version of Avro. But I’m not sure how feasible (reliable) it is to detect the Avro minor version given that things might be repackaged into fat jars.
>
> This seems both difficult and potentially dangerous.
>
> What do you mean by “potentially dangerous”, Robert? Any specific point mentioned above or just having another version of Avro jars in the classpath?

The most dangerous is if a change in avro version had slightly
[1] different semantics when it was swapped out. I think the chances of
that are not too high, but it'll be hard to detect. Mostly I'm
concerned about [2] breaking at runtime in the case of a missing or
incompatible avro.

[1] Don’t you think these can be mitigated by testing relevant parts of Beam against a number of supported Avro versions the same way parts of Beam are tested against various different Hadoop versions.

[2] I agree, this is a huge concern and for that reason I believe Avro cannot be turned into a provided dependency. It would be a dangerous trap for users. But if the default Avro dependency remains as is and Beam is tested to work with a well-defined set of Avro versions (swapping out the default using standard dependency resolution rules), I’d consider the risk to be rather manageable.


>> Extract Avro as an extension
> I think a change that breaks loudly at compile time (if the features
> in question are being depended on), with a clear fix (e.g. "depend on
> this instead") is the "cleanest" way to do a breaking change.
>
> On this note, would it be possible to do explicit "without avro" and
> "with avro" targets/variants in the core module, with plain old "core"
> being an alias first to the latter, and then to the former, according
> to some deprecation timeline? (I'm not familiar enough with
> gradle/maven to know if this is possible.) This could unblock users
> needing a new avro in the short run while providing a longer-term
> migration path. (Ideally the (relatively limited) avro parts of core
> would also get moved to their own extension under their own namespace,
> and the core one would remain solely for backwards compatibility
> reasons.)
>
>
> Good point. Are you talking about keeping the default “core" artifacts with old Avro version (as for now) as a main dependency and provide additionally other “core” artifacts with a recent Avro version or even as provided for users that need other Avro versions?

I'm thinking about keeping the existing "core" artifact, and also
shipping an [3] "avro-less" core target (whether that be via shading or
total removal, but ripping it out of the public API). We would then
ship a separate "avro" target (TBD if providing the dependency or not
makes the most sense here) that the existing core would depend on and
would be a target for avro users to migrate to.

[3] It wouldn’t be too hard modelling this using feature variants. Though, I’m worried about the complexity this creates. In the end, every single Beam module depends on core, which would then depend on the new Avro target. Effectively, we would need to build two different versions of each module depending on either core or the avro-less core. If there would be a well-planned, temporary transition period ahead, it should be fine. But I fear with Beam 3 not even being on the horizon, we’d have to carry this forward for far too long.


> To finally make progress on this topic, I’d suggest starting off with supporting multiple different Avro versions in core (2) but to keep the default Avro version as is.

As stated above, I think it's safer to make the avroless version a new
target initially than change core.

> At the same time, extracting Avro as extension in a backwards compatible way seems to be the right path forward long term together with a well communicated deprecation plan.
>
>
>
> Best,
>
> Moritz
>
>
>
>
>
> From: Brian Hulette <bh...@google.com>
> Date: Thursday, 12. May 2022 at 23:21
> To: dev <de...@beam.apache.org>
> Subject: Re: [DISCUSS] Next steps for update of Avro dependency in Beam
>
> Regarding Option (3) "but keep and shade Avro for “core” needs as v.1.8.2 (still have an issue with CVEs)" Do we actually need to keep avro in core for any reason? I thought we only had it in core for AvroCoder, schema support, and
>
> ZjQcmQRYFpfptBannerStart
>
> This Message Is From an External Sender
>
> This message came from outside your organization.
>
> Exercise caution when opening attachments or clicking any links.
>
> ZjQcmQRYFpfptBannerEnd
>
> Regarding Option (3) "but keep and shade Avro for “core” needs as v.1.8.2 (still have an issue with CVEs)"
>
>
>
> Do we actually need to keep avro in core for any reason? I thought we only had it in core for AvroCoder, schema support, and IOs, which I think are all reasonable to separate out into an extension (this would be comparable to the protobuf extension). To confirm I just grepped for files in core that import avro:
>
> ❯ grep -liIrn 'import org\.apache\.avro' sdks/java/core/src/main
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroByteBuddyUtils.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ConvertHelpers.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroRecordSchema.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/DynamicAvroDestinations.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/SerializableAvroCodecFactory.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSink.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSource.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSchemaIOProvider.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/ConstantAvroDestination.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroGenericCoder.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroCoder.java
>
> Brian
>
>
>
> On Thu, May 12, 2022 at 2:08 PM Robert Bradshaw <ro...@google.com> wrote:
>
> Keeping avro in our public (core) API and as an internal dependency
> seems to be a recurring pain point, I would be all for pulling it out
> (basically option 3) and subsequently updating our internal version
> (hopefully no backwards compatibility issues here) and letting the
> extension live with a variety of versions (insofar as this is
> feasible).
>
> On Thu, May 12, 2022 at 10:29 AM Alexey Romanenko
> <ar...@gmail.com> wrote:
>
>
> Hi everyone,
>
> Sorry in advance for a long email.
> TL;DR: Let’s discuss the next steps to update Avro dependency in Beam.
>
> I’d like to come back to this old and quite sensitive topic here which is Apache Avro version update in Beam. Along the time, we already had several discussions on this (for example [1]) but without any concrete resolutions in the end, iirc.
>
> As we all know, Beam still depends on quite old Avro version 1.8.2 and there were some attempts to bump it to more recent ones. One of the main reasons to bump an Avro version, imho, is that Avro 1.8.2 dependency brings several CVEs [2], but the latest Avro 1.11.0 brings only one [3]
>
> In the same time, this update with introduce some incompatible changes that Avro has between versions and this may affect Beam users and potentially it may affect transitive dependencies while using Beam with other project that use Avro as well:
> - Avro completely moved to java.time.* instead of org.joda.time.*. So, we need to adjust date/time conversions from/to Beam schema accordingly since Beam schema still uses joda.time. It will require users to regenerate already generated Java code with avro-compiler (if any) otherwise it won’t compile;
> - Some minor changes in Avro dependencies and user API;
> - Something else?
>
> I know that here, on the list, we have people from Avro community that are much more experienced in this than me - so, please correct me if I say something wrong or not 100% correct.
>
>
> In Beam, we also performed several attempts to update Avro - for example, [4], [5], [6] and others.
>
> To make such update easier in the future, we also discussed to move Avro dependency out of core Beam [7] and there were an attempt to do that [8] by finally this PR was closed with a resolution that it’s not actually needed and we may just want to test Beam with different Avro versions [9]
>
> The latest work on this was a PR to support several versions of Avro in Beam (1.8.x and 1.9.x) [10] which still introduces some breaking changes for users, iirc.
>
> So, seems that we are a bit stuck on this topic, though, imho, we need to decide how move forward mostly because of CVEs in old Avro versions and future Avro updates in Beam.
>
> The potential options (as I can see them):
>
> 1) Bump Avro dependency to the latest one (1.11.0) or the possible more recent one
> - Pros:
> - latest/recent Avro dependency;
> - potentially easy to update in the future;
> - Cons:
> - breaking change for users;
> - potentially issues with other projects that use Avro (like Apache Spark e.g.).
>
> 2) Support different Avro versions in Beam, make Avro dependency provided
> - Pros:
> - user decides which versions to use;
> - easy to update in the future;
> - Cons:
> - breaking change for users;
> - not fact that it’s possible to implement in reality;
> - more tests to test Beam with different Avro versions
>
> 3) Extract Avro as an extension, like we do for other formats, and update to latest Avro version, but keep and shade Avro for “core” needs as v.1.8.2 (still have an issue with CVEs)
>
> 4) Anything else?
>
>
> Please, share your thoughts on this and correct me if I stated something wrong. The goal of this discussion is finally to move forward with Avro update topic.
>
> —
> Alexey
>
>
> [1] https://urldefense.com/v3/__https://lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FafuMdp7Q$<https://urldefense.com/v3/__https:/lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FafuMdp7Q$>
> [2] https://urldefense.com/v3/__https://mvnrepository.com/artifact/org.apache.avro/avro/1.8.2__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FYIRwfz1A$<https://urldefense.com/v3/__https:/mvnrepository.com/artifact/org.apache.avro/avro/1.8.2__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FYIRwfz1A$>
> [3] https://urldefense.com/v3/__https://mvnrepository.com/artifact/org.apache.avro/avro/1.11.0__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6Fa7FyrH-w$<https://urldefense.com/v3/__https:/mvnrepository.com/artifact/org.apache.avro/avro/1.11.0__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6Fa7FyrH-w$>
> [4] https://urldefense.com/v3/__https://github.com/apache/beam/pull/9779__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FYPzULSoA$<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/9779__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FYPzULSoA$>
> [5] https://urldefense.com/v3/__https://github.com/apache/beam/pull/17372__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FZ1EJn6zw$<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/17372__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FZ1EJn6zw$>
> [6] https://urldefense.com/v3/__https://github.com/apache/beam/pull/17246__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FbBlkHi_A$<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/17246__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FbBlkHi_A$>
> [7] https://urldefense.com/v3/__https://lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FaisZP_Dg$<https://urldefense.com/v3/__https:/lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FaisZP_Dg$>
> [8] https://urldefense.com/v3/__https://github.com/apache/beam/pull/12748__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FZvPO_Cfw$<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/12748__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FZvPO_Cfw$>
> [9] https://urldefense.com/v3/__https://lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FZHw_JT_g$<https://urldefense.com/v3/__https:/lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FZHw_JT_g$>
> [10] https://urldefense.com/v3/__https://github.com/apache/beam/pull/16271__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FZBoQ0uSQ$<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/16271__;!!CiXD_PY!Uf7prSniW96ajGEcqZrwabMQ7X85axEPG2_VFpwk98-WAGoZ6NlRTLVx_1jeYk7PKEay6FZBoQ0uSQ$>
>
>
>
>
>
>
>
> As a recipient of an email from Talend, your contact personal data will be on our systems. Please see our privacy notice.
>
>

As a recipient of an email from Talend, your contact personal data will be on our systems. Please see our privacy notice. <https://www.talend.com/privacy/>



Re: [DISCUSS] Next steps for update of Avro dependency in Beam

Posted by Robert Bradshaw <ro...@google.com>.
On Mon, May 16, 2022 at 8:53 AM Alexey Romanenko
<ar...@gmail.com> wrote:
>
>> On 13 May 2022, at 18:38, Robert Bradshaw <ro...@google.com> wrote:
>>
>> We should probably remove the experimental annotations from
> SchemaCoder at this point.
>
> Is there anything that stops us from that?

It's a commitment to not change things, but I think we're at that point by now.

> Having that in mind, here’s my 2 cents on the options mentioned:
>
> Bump Avro in core:
>
> This will seamlessly work for users that just need a coder for complex types, but don’t use Avro themselves.
> Anyone using Avro APIs from user code might get into trouble.
>
> Support different Avro versions & making the dependency provided
>
> In terms of user impact, making Avro provided will be very troublesome. Unless Avro remains available on the classpath through different ways, this would break things in a really bad way. Worst of all, it might not even break at compile time but at runtime only.
> I don’t think this option requires Avro being a provided dependency. If the default is kept as is, anyone can (safely) bump Avro to a higher (supported) version on their classpath. In that case it would be good to warn users about CVEs in logs if they remain on an old version of Avro. But I’m not sure how feasible (reliable) it is to detect the Avro minor version given that things might be repackaged into fat jars.
>
> This seems both difficult and potentially dangerous.
>
> What do you mean by “potentially dangerous”, Robert? Any specific point mentioned above or just having another version of Avro jars in the classpath?

The most dangerous is if a change in avro version had slightly
different semantics when it was swapped out. I think the chances of
that are not too high, but it'll be hard to detect. Mostly I'm
concerned about breaking at runtime in the case of a missing or
incompatible avro.

>> Extract Avro as an extension
> I think a change that breaks loudly at compile time (if the features
> in question are being depended on), with a clear fix (e.g. "depend on
> this instead") is the "cleanest" way to do a breaking change.
>
> On this note, would it be possible to do explicit "without avro" and
> "with avro" targets/variants in the core module, with plain old "core"
> being an alias first to the latter, and then to the former, according
> to some deprecation timeline? (I'm not familiar enough with
> gradle/maven to know if this is possible.) This could unblock users
> needing a new avro in the short run while providing a longer-term
> migration path. (Ideally the (relatively limited) avro parts of core
> would also get moved to their own extension under their own namespace,
> and the core one would remain solely for backwards compatibility
> reasons.)
>
>
> Good point. Are you talking about keeping the default “core" artifacts with old Avro version (as for now) as a main dependency and provide additionally other “core” artifacts with a recent Avro version or even as provided for users that need other Avro versions?

I'm thinking about keeping the existing "core" artifact, and also
shipping an "avro-less" core target (whether that be via shading or
total removal, but ripping it out of the public API). We would then
ship a separate "avro" target (TBD if providing the dependency or not
makes the most sense here) that the existing core would depend on and
would be a target for avro users to migrate to.

> To finally make progress on this topic, I’d suggest starting off with supporting multiple different Avro versions in core (2) but to keep the default Avro version as is.

As stated above, I think it's safer to make the avroless version a new
target initially than change core.

> At the same time, extracting Avro as extension in a backwards compatible way seems to be the right path forward long term together with a well communicated deprecation plan.
>
>
>
> Best,
>
> Moritz
>
>
>
>
>
> From: Brian Hulette <bh...@google.com>
> Date: Thursday, 12. May 2022 at 23:21
> To: dev <de...@beam.apache.org>
> Subject: Re: [DISCUSS] Next steps for update of Avro dependency in Beam
>
> Regarding Option (3) "but keep and shade Avro for “core” needs as v.1.8.2 (still have an issue with CVEs)" Do we actually need to keep avro in core for any reason? I thought we only had it in core for AvroCoder, schema support, and
>
> ZjQcmQRYFpfptBannerStart
>
> This Message Is From an External Sender
>
> This message came from outside your organization.
>
> Exercise caution when opening attachments or clicking any links.
>
> ZjQcmQRYFpfptBannerEnd
>
> Regarding Option (3) "but keep and shade Avro for “core” needs as v.1.8.2 (still have an issue with CVEs)"
>
>
>
> Do we actually need to keep avro in core for any reason? I thought we only had it in core for AvroCoder, schema support, and IOs, which I think are all reasonable to separate out into an extension (this would be comparable to the protobuf extension). To confirm I just grepped for files in core that import avro:
>
> ❯ grep -liIrn 'import org\.apache\.avro' sdks/java/core/src/main
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroByteBuddyUtils.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ConvertHelpers.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroRecordSchema.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/DynamicAvroDestinations.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/SerializableAvroCodecFactory.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSink.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSource.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSchemaIOProvider.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/ConstantAvroDestination.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroGenericCoder.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroCoder.java
>
> Brian
>
>
>
> On Thu, May 12, 2022 at 2:08 PM Robert Bradshaw <ro...@google.com> wrote:
>
> Keeping avro in our public (core) API and as an internal dependency
> seems to be a recurring pain point, I would be all for pulling it out
> (basically option 3) and subsequently updating our internal version
> (hopefully no backwards compatibility issues here) and letting the
> extension live with a variety of versions (insofar as this is
> feasible).
>
> On Thu, May 12, 2022 at 10:29 AM Alexey Romanenko
> <ar...@gmail.com> wrote:
>
>
> Hi everyone,
>
> Sorry in advance for a long email.
> TL;DR: Let’s discuss the next steps to update Avro dependency in Beam.
>
> I’d like to come back to this old and quite sensitive topic here which is Apache Avro version update in Beam. Along the time, we already had several discussions on this (for example [1]) but without any concrete resolutions in the end, iirc.
>
> As we all know, Beam still depends on quite old Avro version 1.8.2 and there were some attempts to bump it to more recent ones. One of the main reasons to bump an Avro version, imho, is that Avro 1.8.2 dependency brings several CVEs [2], but the latest Avro 1.11.0 brings only one [3]
>
> In the same time, this update with introduce some incompatible changes that Avro has between versions and this may affect Beam users and potentially it may affect transitive dependencies while using Beam with other project that use Avro as well:
> - Avro completely moved to java.time.* instead of org.joda.time.*. So, we need to adjust date/time conversions from/to Beam schema accordingly since Beam schema still uses joda.time. It will require users to regenerate already generated Java code with avro-compiler (if any) otherwise it won’t compile;
> - Some minor changes in Avro dependencies and user API;
> - Something else?
>
> I know that here, on the list, we have people from Avro community that are much more experienced in this than me - so, please correct me if I say something wrong or not 100% correct.
>
>
> In Beam, we also performed several attempts to update Avro - for example, [4], [5], [6] and others.
>
> To make such update easier in the future, we also discussed to move Avro dependency out of core Beam [7] and there were an attempt to do that [8] by finally this PR was closed with a resolution that it’s not actually needed and we may just want to test Beam with different Avro versions [9]
>
> The latest work on this was a PR to support several versions of Avro in Beam (1.8.x and 1.9.x) [10] which still introduces some breaking changes for users, iirc.
>
> So, seems that we are a bit stuck on this topic, though, imho, we need to decide how move forward mostly because of CVEs in old Avro versions and future Avro updates in Beam.
>
> The potential options (as I can see them):
>
> 1) Bump Avro dependency to the latest one (1.11.0) or the possible more recent one
> - Pros:
> - latest/recent Avro dependency;
> - potentially easy to update in the future;
> - Cons:
> - breaking change for users;
> - potentially issues with other projects that use Avro (like Apache Spark e.g.).
>
> 2) Support different Avro versions in Beam, make Avro dependency provided
> - Pros:
> - user decides which versions to use;
> - easy to update in the future;
> - Cons:
> - breaking change for users;
> - not fact that it’s possible to implement in reality;
> - more tests to test Beam with different Avro versions
>
> 3) Extract Avro as an extension, like we do for other formats, and update to latest Avro version, but keep and shade Avro for “core” needs as v.1.8.2 (still have an issue with CVEs)
>
> 4) Anything else?
>
>
> Please, share your thoughts on this and correct me if I stated something wrong. The goal of this discussion is finally to move forward with Avro update topic.
>
> —
> Alexey
>
>
> [1] https://lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r
> [2] https://mvnrepository.com/artifact/org.apache.avro/avro/1.8.2
> [3] https://mvnrepository.com/artifact/org.apache.avro/avro/1.11.0
> [4] https://github.com/apache/beam/pull/9779
> [5] https://github.com/apache/beam/pull/17372
> [6] https://github.com/apache/beam/pull/17246
> [7] https://lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on
> [8] https://github.com/apache/beam/pull/12748
> [9] https://lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg
> [10] https://github.com/apache/beam/pull/16271
>
>
>
>
>
>
>
> As a recipient of an email from Talend, your contact personal data will be on our systems. Please see our privacy notice.
>
>

Re: [DISCUSS] Next steps for update of Avro dependency in Beam

Posted by Alexey Romanenko <ar...@gmail.com>.

> On 13 May 2022, at 18:38, Robert Bradshaw <ro...@google.com> wrote:
> 
> On Fri, May 13, 2022 at 5:56 AM Moritz Mack <mmack@talend.com <ma...@talend.com>> wrote:
>> 
>> Thanks so much for all these pointers, Alexey. Having that context really helps!
>> 
>> Skimming through the past conversations, this one key consideration hasn’t changed and seems still critical:
>> 
>> AvroCoder is the de facto standard for encoding complex user types (with SchemaCoder still being experimental).
> 
> We should probably remove the experimental annotations from
> SchemaCoder at this point.

Is there anything that stops us from that?

> 
>> Consequently, any backwards incompatible change in that respect would massively impact users.
> 
> True. Some changes would be more painful than others though.

I think that this is a main point of the difficulties with some progress on this topic - make this as smooth as possible for users. In the same time, keep the dependencies with several CVEs for a long time is not good thing as well. 

> 
>> Having that in mind, here’s my 2 cents on the options mentioned:
>> 
>> Bump Avro in core:
>> 
>> This will seamlessly work for users that just need a coder for complex types, but don’t use Avro themselves.
>> Anyone using Avro APIs from user code might get into trouble.
>> 
>> Support different Avro versions & making the dependency provided
>> 
>> In terms of user impact, making Avro provided will be very troublesome. Unless Avro remains available on the classpath through different ways, this would break things in a really bad way. Worst of all, it might not even break at compile time but at runtime only.
>> I don’t think this option requires Avro being a provided dependency. If the default is kept as is, anyone can (safely) bump Avro to a higher (supported) version on their classpath. In that case it would be good to warn users about CVEs in logs if they remain on an old version of Avro. But I’m not sure how feasible (reliable) it is to detect the Avro minor version given that things might be repackaged into fat jars.
> 
> This seems both difficult and potentially dangerous.

What do you mean by “potentially dangerous”, Robert? Any specific point mentioned above or just having another version of Avro jars in the classpath?

> 
>> Extract Avro as an extension
>> 
>> I agree with Brian, I don’t think there’s a need to keep a shaded/vendored Avro in core… but
> 
> If we can leave it out, that'd be great.
> 
>> It’s pretty hard to get this done without causing trouble for users. Making this a seamless migration would require core to depend on the new extension module and to maintain all the old public user facing APIs.
> 
> I think a change that breaks loudly at compile time (if the features
> in question are being depended on), with a clear fix (e.g. "depend on
> this instead") is the "cleanest" way to do a breaking change.
> 
> On this note, would it be possible to do explicit "without avro" and
> "with avro" targets/variants in the core module, with plain old "core"
> being an alias first to the latter, and then to the former, according
> to some deprecation timeline? (I'm not familiar enough with
> gradle/maven to know if this is possible.) This could unblock users
> needing a new avro in the short run while providing a longer-term
> migration path. (Ideally the (relatively limited) avro parts of core
> would also get moved to their own extension under their own namespace,
> and the core one would remain solely for backwards compatibility
> reasons.)

Good point. Are you talking about keeping the default “core" artifacts with old Avro version (as for now) as a main dependency and provide additionally other “core” artifacts with a recent Avro version or even as provided for users that need other Avro versions? 

> 
>> Duplicating packages between core and the new extension module is going to cause issues with Java 11. That pretty much means existing public user facing APIs have to remain as a (deprecated) façade in core using the code in the new extension.
>> The above makes the effort almost look worthless in the short term, but it would certainly help to have a well-defined scope that can be tested for compatibility with multiple different Avro versions.
>> 
>> 
>> 
>> To finally make progress on this topic, I’d suggest starting off with supporting multiple different Avro versions in core (2) but to keep the default Avro version as is.
>> 
>> At the same time, extracting Avro as extension in a backwards compatible way seems to be the right path forward long term together with a well communicated deprecation plan.
>> 
>> 
>> 
>> Best,
>> 
>> Moritz
>> 
>> 
>> 
>> 
>> 
>> From: Brian Hulette <bh...@google.com>
>> Date: Thursday, 12. May 2022 at 23:21
>> To: dev <de...@beam.apache.org>
>> Subject: Re: [DISCUSS] Next steps for update of Avro dependency in Beam
>> 
>> Regarding Option (3) "but keep and shade Avro for “core” needs as v.1.8.2 (still have an issue with CVEs)" Do we actually need to keep avro in core for any reason? I thought we only had it in core for AvroCoder, schema support, and
>> 
>> ZjQcmQRYFpfptBannerStart
>> 
>> This Message Is From an External Sender
>> 
>> This message came from outside your organization.
>> 
>> Exercise caution when opening attachments or clicking any links.
>> 
>> ZjQcmQRYFpfptBannerEnd
>> 
>> Regarding Option (3) "but keep and shade Avro for “core” needs as v.1.8.2 (still have an issue with CVEs)"
>> 
>> 
>> 
>> Do we actually need to keep avro in core for any reason? I thought we only had it in core for AvroCoder, schema support, and IOs, which I think are all reasonable to separate out into an extension (this would be comparable to the protobuf extension). To confirm I just grepped for files in core that import avro:
>> 
>> ❯ grep -liIrn 'import org\.apache\.avro' sdks/java/core/src/main
>> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroByteBuddyUtils.java
>> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ConvertHelpers.java
>> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
>> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroRecordSchema.java
>> sdks/java/core/src/main/java/org/apache/beam/sdk/io/DynamicAvroDestinations.java
>> sdks/java/core/src/main/java/org/apache/beam/sdk/io/SerializableAvroCodecFactory.java
>> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSink.java
>> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSource.java
>> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSchemaIOProvider.java
>> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
>> sdks/java/core/src/main/java/org/apache/beam/sdk/io/ConstantAvroDestination.java
>> sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroGenericCoder.java
>> sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroCoder.java
>> 
>> Brian
>> 
>> 
>> 
>> On Thu, May 12, 2022 at 2:08 PM Robert Bradshaw <ro...@google.com> wrote:
>> 
>> Keeping avro in our public (core) API and as an internal dependency
>> seems to be a recurring pain point, I would be all for pulling it out
>> (basically option 3) and subsequently updating our internal version
>> (hopefully no backwards compatibility issues here) and letting the
>> extension live with a variety of versions (insofar as this is
>> feasible).
>> 
>> On Thu, May 12, 2022 at 10:29 AM Alexey Romanenko
>> <ar...@gmail.com> wrote:
>>> 
>>> Hi everyone,
>>> 
>>> Sorry in advance for a long email.
>>> TL;DR: Let’s discuss the next steps to update Avro dependency in Beam.
>>> 
>>> I’d like to come back to this old and quite sensitive topic here which is Apache Avro version update in Beam. Along the time, we already had several discussions on this (for example [1]) but without any concrete resolutions in the end, iirc.
>>> 
>>> As we all know, Beam still depends on quite old Avro version 1.8.2 and there were some attempts to bump it to more recent ones. One of the main reasons to bump an Avro version, imho, is that Avro 1.8.2 dependency brings several CVEs [2], but the latest Avro 1.11.0 brings only one [3]
>>> 
>>> In the same time, this update with introduce some incompatible changes that Avro has between versions and this may affect Beam users and potentially it may affect transitive dependencies while using Beam with other project that use Avro as well:
>>> - Avro completely moved to java.time.* instead of org.joda.time.*. So, we need to adjust date/time conversions from/to Beam schema accordingly since Beam schema still uses joda.time. It will require users to regenerate already generated Java code with avro-compiler (if any) otherwise it won’t compile;
>>> - Some minor changes in Avro dependencies and user API;
>>> - Something else?
>>> 
>>> I know that here, on the list, we have people from Avro community that are much more experienced in this than me - so, please correct me if I say something wrong or not 100% correct.
>>> 
>>> 
>>> In Beam, we also performed several attempts to update Avro - for example, [4], [5], [6] and others.
>>> 
>>> To make such update easier in the future, we also discussed to move Avro dependency out of core Beam [7] and there were an attempt to do that [8] by finally this PR was closed with a resolution that it’s not actually needed and we may just want to test Beam with different Avro versions [9]
>>> 
>>> The latest work on this was a PR to support several versions of Avro in Beam (1.8.x and 1.9.x) [10] which still introduces some breaking changes for users, iirc.
>>> 
>>> So, seems that we are a bit stuck on this topic, though, imho, we need to decide how move forward mostly because of CVEs in old Avro versions and future Avro updates in Beam.
>>> 
>>> The potential options (as I can see them):
>>> 
>>> 1) Bump Avro dependency to the latest one (1.11.0) or the possible more recent one
>>> - Pros:
>>> - latest/recent Avro dependency;
>>> - potentially easy to update in the future;
>>> - Cons:
>>> - breaking change for users;
>>> - potentially issues with other projects that use Avro (like Apache Spark e.g.).
>>> 
>>> 2) Support different Avro versions in Beam, make Avro dependency provided
>>> - Pros:
>>> - user decides which versions to use;
>>> - easy to update in the future;
>>> - Cons:
>>> - breaking change for users;
>>> - not fact that it’s possible to implement in reality;
>>> - more tests to test Beam with different Avro versions
>>> 
>>> 3) Extract Avro as an extension, like we do for other formats, and update to latest Avro version, but keep and shade Avro for “core” needs as v.1.8.2 (still have an issue with CVEs)
>>> 
>>> 4) Anything else?
>>> 
>>> 
>>> Please, share your thoughts on this and correct me if I stated something wrong. The goal of this discussion is finally to move forward with Avro update topic.
>>> 
>>> —
>>> Alexey
>>> 
>>> 
>>> [1] https://lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r
>>> [2] https://mvnrepository.com/artifact/org.apache.avro/avro/1.8.2
>>> [3] https://mvnrepository.com/artifact/org.apache.avro/avro/1.11.0
>>> [4] https://github.com/apache/beam/pull/9779
>>> [5] https://github.com/apache/beam/pull/17372
>>> [6] https://github.com/apache/beam/pull/17246
>>> [7] https://lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on
>>> [8] https://github.com/apache/beam/pull/12748
>>> [9] https://lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg
>>> [10] https://github.com/apache/beam/pull/16271
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> As a recipient of an email from Talend, your contact personal data will be on our systems. Please see our privacy notice.


Re: [DISCUSS] Next steps for update of Avro dependency in Beam

Posted by Robert Bradshaw <ro...@google.com>.
On Fri, May 13, 2022 at 5:56 AM Moritz Mack <mm...@talend.com> wrote:
>
> Thanks so much for all these pointers, Alexey. Having that context really helps!
>
> Skimming through the past conversations, this one key consideration hasn’t changed and seems still critical:
>
> AvroCoder is the de facto standard for encoding complex user types (with SchemaCoder still being experimental).

We should probably remove the experimental annotations from
SchemaCoder at this point.

> Consequently, any backwards incompatible change in that respect would massively impact users.

True. Some changes would be more painful than others though.

> Having that in mind, here’s my 2 cents on the options mentioned:
>
> Bump Avro in core:
>
> This will seamlessly work for users that just need a coder for complex types, but don’t use Avro themselves.
> Anyone using Avro APIs from user code might get into trouble.
>
> Support different Avro versions & making the dependency provided
>
> In terms of user impact, making Avro provided will be very troublesome. Unless Avro remains available on the classpath through different ways, this would break things in a really bad way. Worst of all, it might not even break at compile time but at runtime only.
> I don’t think this option requires Avro being a provided dependency. If the default is kept as is, anyone can (safely) bump Avro to a higher (supported) version on their classpath. In that case it would be good to warn users about CVEs in logs if they remain on an old version of Avro. But I’m not sure how feasible (reliable) it is to detect the Avro minor version given that things might be repackaged into fat jars.

This seems both difficult and potentially dangerous.

> Extract Avro as an extension
>
> I agree with Brian, I don’t think there’s a need to keep a shaded/vendored Avro in core… but

If we can leave it out, that'd be great.

> It’s pretty hard to get this done without causing trouble for users. Making this a seamless migration would require core to depend on the new extension module and to maintain all the old public user facing APIs.

I think a change that breaks loudly at compile time (if the features
in question are being depended on), with a clear fix (e.g. "depend on
this instead") is the "cleanest" way to do a breaking change.

On this note, would it be possible to do explicit "without avro" and
"with avro" targets/variants in the core module, with plain old "core"
being an alias first to the latter, and then to the former, according
to some deprecation timeline? (I'm not familiar enough with
gradle/maven to know if this is possible.) This could unblock users
needing a new avro in the short run while providing a longer-term
migration path. (Ideally the (relatively limited) avro parts of core
would also get moved to their own extension under their own namespace,
and the core one would remain solely for backwards compatibility
reasons.)

> Duplicating packages between core and the new extension module is going to cause issues with Java 11. That pretty much means existing public user facing APIs have to remain as a (deprecated) façade in core using the code in the new extension.
> The above makes the effort almost look worthless in the short term, but it would certainly help to have a well-defined scope that can be tested for compatibility with multiple different Avro versions.
>
>
>
> To finally make progress on this topic, I’d suggest starting off with supporting multiple different Avro versions in core (2) but to keep the default Avro version as is.
>
> At the same time, extracting Avro as extension in a backwards compatible way seems to be the right path forward long term together with a well communicated deprecation plan.
>
>
>
> Best,
>
> Moritz
>
>
>
>
>
> From: Brian Hulette <bh...@google.com>
> Date: Thursday, 12. May 2022 at 23:21
> To: dev <de...@beam.apache.org>
> Subject: Re: [DISCUSS] Next steps for update of Avro dependency in Beam
>
> Regarding Option (3) "but keep and shade Avro for “core” needs as v.1.8.2 (still have an issue with CVEs)" Do we actually need to keep avro in core for any reason? I thought we only had it in core for AvroCoder, schema support, and
>
> ZjQcmQRYFpfptBannerStart
>
> This Message Is From an External Sender
>
> This message came from outside your organization.
>
> Exercise caution when opening attachments or clicking any links.
>
> ZjQcmQRYFpfptBannerEnd
>
> Regarding Option (3) "but keep and shade Avro for “core” needs as v.1.8.2 (still have an issue with CVEs)"
>
>
>
> Do we actually need to keep avro in core for any reason? I thought we only had it in core for AvroCoder, schema support, and IOs, which I think are all reasonable to separate out into an extension (this would be comparable to the protobuf extension). To confirm I just grepped for files in core that import avro:
>
> ❯ grep -liIrn 'import org\.apache\.avro' sdks/java/core/src/main
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroByteBuddyUtils.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ConvertHelpers.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroRecordSchema.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/DynamicAvroDestinations.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/SerializableAvroCodecFactory.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSink.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSource.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSchemaIOProvider.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/ConstantAvroDestination.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroGenericCoder.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroCoder.java
>
> Brian
>
>
>
> On Thu, May 12, 2022 at 2:08 PM Robert Bradshaw <ro...@google.com> wrote:
>
> Keeping avro in our public (core) API and as an internal dependency
> seems to be a recurring pain point, I would be all for pulling it out
> (basically option 3) and subsequently updating our internal version
> (hopefully no backwards compatibility issues here) and letting the
> extension live with a variety of versions (insofar as this is
> feasible).
>
> On Thu, May 12, 2022 at 10:29 AM Alexey Romanenko
> <ar...@gmail.com> wrote:
> >
> > Hi everyone,
> >
> > Sorry in advance for a long email.
> > TL;DR: Let’s discuss the next steps to update Avro dependency in Beam.
> >
> > I’d like to come back to this old and quite sensitive topic here which is Apache Avro version update in Beam. Along the time, we already had several discussions on this (for example [1]) but without any concrete resolutions in the end, iirc.
> >
> > As we all know, Beam still depends on quite old Avro version 1.8.2 and there were some attempts to bump it to more recent ones. One of the main reasons to bump an Avro version, imho, is that Avro 1.8.2 dependency brings several CVEs [2], but the latest Avro 1.11.0 brings only one [3]
> >
> > In the same time, this update with introduce some incompatible changes that Avro has between versions and this may affect Beam users and potentially it may affect transitive dependencies while using Beam with other project that use Avro as well:
> > - Avro completely moved to java.time.* instead of org.joda.time.*. So, we need to adjust date/time conversions from/to Beam schema accordingly since Beam schema still uses joda.time. It will require users to regenerate already generated Java code with avro-compiler (if any) otherwise it won’t compile;
> > - Some minor changes in Avro dependencies and user API;
> > - Something else?
> >
> > I know that here, on the list, we have people from Avro community that are much more experienced in this than me - so, please correct me if I say something wrong or not 100% correct.
> >
> >
> > In Beam, we also performed several attempts to update Avro - for example, [4], [5], [6] and others.
> >
> > To make such update easier in the future, we also discussed to move Avro dependency out of core Beam [7] and there were an attempt to do that [8] by finally this PR was closed with a resolution that it’s not actually needed and we may just want to test Beam with different Avro versions [9]
> >
> > The latest work on this was a PR to support several versions of Avro in Beam (1.8.x and 1.9.x) [10] which still introduces some breaking changes for users, iirc.
> >
> > So, seems that we are a bit stuck on this topic, though, imho, we need to decide how move forward mostly because of CVEs in old Avro versions and future Avro updates in Beam.
> >
> > The potential options (as I can see them):
> >
> > 1) Bump Avro dependency to the latest one (1.11.0) or the possible more recent one
> > - Pros:
> > - latest/recent Avro dependency;
> > - potentially easy to update in the future;
> > - Cons:
> > - breaking change for users;
> > - potentially issues with other projects that use Avro (like Apache Spark e.g.).
> >
> > 2) Support different Avro versions in Beam, make Avro dependency provided
> > - Pros:
> > - user decides which versions to use;
> > - easy to update in the future;
> > - Cons:
> > - breaking change for users;
> > - not fact that it’s possible to implement in reality;
> > - more tests to test Beam with different Avro versions
> >
> > 3) Extract Avro as an extension, like we do for other formats, and update to latest Avro version, but keep and shade Avro for “core” needs as v.1.8.2 (still have an issue with CVEs)
> >
> > 4) Anything else?
> >
> >
> > Please, share your thoughts on this and correct me if I stated something wrong. The goal of this discussion is finally to move forward with Avro update topic.
> >
> > —
> > Alexey
> >
> >
> > [1] https://lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r
> > [2] https://mvnrepository.com/artifact/org.apache.avro/avro/1.8.2
> > [3] https://mvnrepository.com/artifact/org.apache.avro/avro/1.11.0
> > [4] https://github.com/apache/beam/pull/9779
> > [5] https://github.com/apache/beam/pull/17372
> > [6] https://github.com/apache/beam/pull/17246
> > [7] https://lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on
> > [8] https://github.com/apache/beam/pull/12748
> > [9] https://lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg
> > [10] https://github.com/apache/beam/pull/16271
> >
> >
> >
> >
> >
> >
>
> As a recipient of an email from Talend, your contact personal data will be on our systems. Please see our privacy notice.
>
>
>

Re: [DISCUSS] Next steps for update of Avro dependency in Beam

Posted by Cristian Constantinescu <ze...@gmail.com>.
Hey everyone,

I appreciate that the Beam community is having another look at this.

I don't have any strong opinions on these options, but think
that extracting Avro in its own extension is the better choice.

I've already given a stab at extracting Avro in its own module (PR: 12748
[1]). The way I see it, this is less of a technical challenge and more of a
Beam community consensus challenge because of backwards compatibility and
historical reasons. If the beam community does reach consensus, I'd be more
than happy to do the work.

Cheers,
Cristian

[1] https://github.com/apache/beam/pull/12748

On Fri, May 13, 2022 at 8:56 AM Moritz Mack <mm...@talend.com> wrote:

> Thanks so much for all these pointers, Alexey. Having that context really
> helps!
>
>
>
> Skimming through the past conversations, this one key consideration hasn’t
> changed and seems still critical:
>
> AvroCoder is the de facto standard for encoding complex user types (with
> SchemaCoder still being experimental).
>
> Consequently, any backwards incompatible change in that respect would
> massively impact users.
>
>
>
> Having that in mind, here’s my 2 cents on the options mentioned:
>
>
>
>    1. Bump Avro in core:
>       1. This will seamlessly work for users that just need a coder for
>       complex types, but don’t use Avro themselves.
>       2. Anyone using Avro APIs from user code might get into trouble.
>    2. Support different Avro versions & making the dependency provided
>       1. In terms of user impact, making Avro provided will be very
>       troublesome. Unless Avro remains available on the classpath through
>       different ways, this would break things in a really bad way. Worst of all,
>       it might not even break at compile time but at runtime only.
>       2. I don’t think this option requires Avro being a provided
>       dependency. If the default is kept as is, anyone can (safely) bump Avro to
>       a higher (supported) version on their classpath. In that case it would be
>       good to warn users about CVEs in logs if they remain on an old version of
>       Avro. But I’m not sure how feasible (reliable) it is to detect the Avro
>       minor version given that things might be repackaged into fat jars.
>    3. Extract Avro as an extension
>       1. I agree with Brian, I don’t think there’s a need to keep a
>       shaded/vendored Avro in core… but
>       2. It’s pretty hard to get this done without causing trouble for
>       users. Making this a seamless migration would require core to depend on the
>       new extension module and to maintain all the old public user facing APIs.
>       3. Duplicating packages between core and the new extension module
>       is going to cause issues with Java 11. That pretty much means existing
>       public user facing APIs have to remain as a (deprecated) façade in core
>       using the code in the new extension.
>       4. The above makes the effort almost look worthless in the short
>       term, but it would certainly help to have a well-defined scope that can be
>       tested for compatibility with multiple different Avro versions.
>
>
>
> To finally make progress on this topic, I’d suggest starting off with
> supporting multiple different Avro versions in core (2) but to keep the
> default Avro version as is.
>
> At the same time, extracting Avro as extension in a* backwards compatible
> *way seems to be the right path forward long term together with a well
> communicated deprecation plan.
>
>
>
> Best,
>
> Moritz
>
>
>
>
>
> *From: *Brian Hulette <bh...@google.com>
> *Date: *Thursday, 12. May 2022 at 23:21
> *To: *dev <de...@beam.apache.org>
> *Subject: *Re: [DISCUSS] Next steps for update of Avro dependency in Beam
>
> Regarding Option (3) "but keep and shade Avro for “core” needs as v.1.8.2
> (still have an issue with CVEs)" Do we actually need to keep avro in core
> for any reason? I thought we only had it in core for AvroCoder, schema
> support, and
>
> ZjQcmQRYFpfptBannerStart
>
> *This Message Is From an External Sender *
>
> This message came from outside your organization.
>
> Exercise caution when opening attachments or clicking any links.
>
> ZjQcmQRYFpfptBannerEnd
>
> Regarding Option (3) "but keep and shade Avro for “core” needs as v.1.8.2
> (still have an issue with CVEs)"
>
>
>
> Do we actually need to keep avro in core for any reason? I thought we only
> had it in core for AvroCoder, schema support, and IOs, which I think are
> all reasonable to separate out into an extension (this would be comparable
> to the protobuf extension). To confirm I just grepped for files in core
> that import avro:
>
> ❯ grep -liIrn 'import org\.apache\.avro' sdks/java/core/src/main
>
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroByteBuddyUtils.java
>
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ConvertHelpers.java
>
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
>
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroRecordSchema.java
>
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/DynamicAvroDestinations.java
>
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/SerializableAvroCodecFactory.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSink.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSource.java
>
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSchemaIOProvider.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
>
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/ConstantAvroDestination.java
>
> sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroGenericCoder.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroCoder.java
>
> Brian
>
>
>
> On Thu, May 12, 2022 at 2:08 PM Robert Bradshaw <ro...@google.com>
> wrote:
>
> Keeping avro in our public (core) API and as an internal dependency
> seems to be a recurring pain point, I would be all for pulling it out
> (basically option 3) and subsequently updating our internal version
> (hopefully no backwards compatibility issues here) and letting the
> extension live with a variety of versions (insofar as this is
> feasible).
>
> On Thu, May 12, 2022 at 10:29 AM Alexey Romanenko
> <ar...@gmail.com> wrote:
> >
> > Hi everyone,
> >
> > Sorry in advance for a long email.
> > TL;DR: Let’s discuss the next steps to update Avro dependency in Beam.
> >
> > I’d like to come back to this old and quite sensitive topic here which
> is Apache Avro version update in Beam. Along the time, we already had
> several discussions on this (for example [1]) but without any concrete
> resolutions in the end, iirc.
> >
> > As we all know, Beam still depends on quite old Avro version 1.8.2 and
> there were some attempts to bump it to more recent ones. One of the main
> reasons to bump an Avro version, imho, is that Avro 1.8.2 dependency brings
> several CVEs [2], but the latest Avro 1.11.0 brings only one [3]
> >
> > In the same time, this update with introduce some incompatible changes
> that Avro has between versions and this may affect Beam users and
> potentially it may affect transitive dependencies while using Beam with
> other project that use Avro as well:
> > - Avro completely moved to java.time.* instead of org.joda.time.*. So,
> we need to adjust date/time conversions from/to Beam schema accordingly
> since Beam schema still uses joda.time. It will require users to regenerate
> already generated Java code with avro-compiler (if any) otherwise it won’t
> compile;
> > - Some minor changes in Avro dependencies and user API;
> > - Something else?
> >
> > I know that here, on the list, we have people from Avro community that
> are much more experienced in this than me - so, please correct me if I say
> something wrong or not 100% correct.
> >
> >
> > In Beam, we also performed several attempts to update Avro - for
> example, [4], [5], [6] and others.
> >
> > To make such update easier in the future, we also discussed to move Avro
> dependency out of core Beam [7] and there were an attempt to do that [8] by
> finally this PR was closed with a resolution that it’s not actually needed
> and we may just want to test Beam with different Avro versions [9]
> >
> > The latest work on this was a PR to support several versions of Avro in
> Beam (1.8.x and 1.9.x) [10] which still introduces some breaking changes
> for users, iirc.
> >
> > So, seems that we are a bit stuck on this topic, though, imho, we need
> to decide how move forward mostly because of CVEs in old Avro versions and
> future Avro updates in Beam.
> >
> > The potential options (as I can see them):
> >
> > 1) Bump Avro dependency to the latest one (1.11.0) or the possible more
> recent one
> > - Pros:
> > - latest/recent Avro dependency;
> > - potentially easy to update in the future;
> > - Cons:
> > - breaking change for users;
> > - potentially issues with other projects that use Avro (like Apache
> Spark e.g.).
> >
> > 2) Support different Avro versions in Beam, make Avro dependency provided
> > - Pros:
> > - user decides which versions to use;
> > - easy to update in the future;
> > - Cons:
> > - breaking change for users;
> > - not fact that it’s possible to implement in reality;
> > - more tests to test Beam with different Avro versions
> >
> > 3) Extract Avro as an extension, like we do for other formats, and
> update to latest Avro version, but keep and shade Avro for “core” needs as
> v.1.8.2 (still have an issue with CVEs)
> >
> > 4) Anything else?
> >
> >
> > Please, share your thoughts on this and correct me if I stated something
> wrong. The goal of this discussion is finally to move forward with Avro
> update topic.
> >
> > —
> > Alexey
> >
> >
> > [1] https://lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r
> <https://urldefense.com/v3/__https:/lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw9kMOM1iA$>
> > [2] https://mvnrepository.com/artifact/org.apache.avro/avro/1.8.2
> <https://urldefense.com/v3/__https:/mvnrepository.com/artifact/org.apache.avro/avro/1.8.2__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw-JQRaI8A$>
> > [3] https://mvnrepository.com/artifact/org.apache.avro/avro/1.11.0
> <https://urldefense.com/v3/__https:/mvnrepository.com/artifact/org.apache.avro/avro/1.11.0__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw-P2VwtLA$>
> > [4] https://github.com/apache/beam/pull/9779
> <https://urldefense.com/v3/__https:/github.com/apache/beam/pull/9779__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw-C6txcQA$>
> > [5] https://github.com/apache/beam/pull/17372
> <https://urldefense.com/v3/__https:/github.com/apache/beam/pull/17372__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw8mE0kFWg$>
> > [6] https://github.com/apache/beam/pull/17246
> <https://urldefense.com/v3/__https:/github.com/apache/beam/pull/17246__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw-6l8e4lQ$>
> > [7] https://lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on
> <https://urldefense.com/v3/__https:/lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw9iPVZxRg$>
> > [8] https://github.com/apache/beam/pull/12748
> <https://urldefense.com/v3/__https:/github.com/apache/beam/pull/12748__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw9YWZV1hQ$>
> > [9] https://lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg
> <https://urldefense.com/v3/__https:/lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw9T8IrnEg$>
> > [10] https://github.com/apache/beam/pull/16271
> <https://urldefense.com/v3/__https:/github.com/apache/beam/pull/16271__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw81M4oMhQ$>
> >
> >
> >
> >
> >
> >
>
> *As a recipient of an email from Talend, your contact personal data will
> be on our systems. Please see our privacy notice.
> <https://www.talend.com/privacy/>*
>
>
>

Re: [DISCUSS] Next steps for update of Avro dependency in Beam

Posted by Moritz Mack <mm...@talend.com>.
Thanks so much for all these pointers, Alexey. Having that context really helps!

Skimming through the past conversations, this one key consideration hasn’t changed and seems still critical:
AvroCoder is the de facto standard for encoding complex user types (with SchemaCoder still being experimental).
Consequently, any backwards incompatible change in that respect would massively impact users.

Having that in mind, here’s my 2 cents on the options mentioned:


  1.  Bump Avro in core:
     *   This will seamlessly work for users that just need a coder for complex types, but don’t use Avro themselves.
     *   Anyone using Avro APIs from user code might get into trouble.
  2.  Support different Avro versions & making the dependency provided
     *   In terms of user impact, making Avro provided will be very troublesome. Unless Avro remains available on the classpath through different ways, this would break things in a really bad way. Worst of all, it might not even break at compile time but at runtime only.
     *   I don’t think this option requires Avro being a provided dependency. If the default is kept as is, anyone can (safely) bump Avro to a higher (supported) version on their classpath. In that case it would be good to warn users about CVEs in logs if they remain on an old version of Avro. But I’m not sure how feasible (reliable) it is to detect the Avro minor version given that things might be repackaged into fat jars.
  3.  Extract Avro as an extension
     *   I agree with Brian, I don’t think there’s a need to keep a shaded/vendored Avro in core… but
     *   It’s pretty hard to get this done without causing trouble for users. Making this a seamless migration would require core to depend on the new extension module and to maintain all the old public user facing APIs.
     *   Duplicating packages between core and the new extension module is going to cause issues with Java 11. That pretty much means existing public user facing APIs have to remain as a (deprecated) façade in core using the code in the new extension.
     *   The above makes the effort almost look worthless in the short term, but it would certainly help to have a well-defined scope that can be tested for compatibility with multiple different Avro versions.

To finally make progress on this topic, I’d suggest starting off with supporting multiple different Avro versions in core (2) but to keep the default Avro version as is.
At the same time, extracting Avro as extension in a backwards compatible way seems to be the right path forward long term together with a well communicated deprecation plan.

Best,
Moritz


From: Brian Hulette <bh...@google.com>
Date: Thursday, 12. May 2022 at 23:21
To: dev <de...@beam.apache.org>
Subject: Re: [DISCUSS] Next steps for update of Avro dependency in Beam
Regarding Option (3) "but keep and shade Avro for “core” needs as v.1.8.2 (still have an issue with CVEs)" Do we actually need to keep avro in core for any reason? I thought we only had it in core for AvroCoder, schema support, and
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
Exercise caution when opening attachments or clicking any links.
ZjQcmQRYFpfptBannerEnd
Regarding Option (3) "but keep and shade Avro for “core” needs as v.1.8.2 (still have an issue with CVEs)"

Do we actually need to keep avro in core for any reason? I thought we only had it in core for AvroCoder, schema support, and IOs, which I think are all reasonable to separate out into an extension (this would be comparable to the protobuf extension). To confirm I just grepped for files in core that import avro:

❯ grep -liIrn 'import org\.apache\.avro' sdks/java/core/src/main
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroByteBuddyUtils.java
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ConvertHelpers.java
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroRecordSchema.java
sdks/java/core/src/main/java/org/apache/beam/sdk/io/DynamicAvroDestinations.java
sdks/java/core/src/main/java/org/apache/beam/sdk/io/SerializableAvroCodecFactory.java
sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSink.java
sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSource.java
sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSchemaIOProvider.java
sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
sdks/java/core/src/main/java/org/apache/beam/sdk/io/ConstantAvroDestination.java
sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroGenericCoder.java
sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroCoder.java

Brian

On Thu, May 12, 2022 at 2:08 PM Robert Bradshaw <ro...@google.com>> wrote:
Keeping avro in our public (core) API and as an internal dependency
seems to be a recurring pain point, I would be all for pulling it out
(basically option 3) and subsequently updating our internal version
(hopefully no backwards compatibility issues here) and letting the
extension live with a variety of versions (insofar as this is
feasible).

On Thu, May 12, 2022 at 10:29 AM Alexey Romanenko
<ar...@gmail.com>> wrote:
>
> Hi everyone,
>
> Sorry in advance for a long email.
> TL;DR: Let’s discuss the next steps to update Avro dependency in Beam.
>
> I’d like to come back to this old and quite sensitive topic here which is Apache Avro version update in Beam. Along the time, we already had several discussions on this (for example [1]) but without any concrete resolutions in the end, iirc.
>
> As we all know, Beam still depends on quite old Avro version 1.8.2 and there were some attempts to bump it to more recent ones. One of the main reasons to bump an Avro version, imho, is that Avro 1.8.2 dependency brings several CVEs [2], but the latest Avro 1.11.0 brings only one [3]
>
> In the same time, this update with introduce some incompatible changes that Avro has between versions and this may affect Beam users and potentially it may affect transitive dependencies while using Beam with other project that use Avro as well:
> - Avro completely moved to java.time.* instead of org.joda.time.*. So, we need to adjust date/time conversions from/to Beam schema accordingly since Beam schema still uses joda.time. It will require users to regenerate already generated Java code with avro-compiler (if any) otherwise it won’t compile;
> - Some minor changes in Avro dependencies and user API;
> - Something else?
>
> I know that here, on the list, we have people from Avro community that are much more experienced in this than me - so, please correct me if I say something wrong or not 100% correct.
>
>
> In Beam, we also performed several attempts to update Avro - for example, [4], [5], [6] and others.
>
> To make such update easier in the future, we also discussed to move Avro dependency out of core Beam [7] and there were an attempt to do that [8] by finally this PR was closed with a resolution that it’s not actually needed and we may just want to test Beam with different Avro versions [9]
>
> The latest work on this was a PR to support several versions of Avro in Beam (1.8.x and 1.9.x) [10] which still introduces some breaking changes for users, iirc.
>
> So, seems that we are a bit stuck on this topic, though, imho, we need to decide how move forward mostly because of CVEs in old Avro versions and future Avro updates in Beam.
>
> The potential options (as I can see them):
>
> 1) Bump Avro dependency to the latest one (1.11.0) or the possible more recent one
> - Pros:
> - latest/recent Avro dependency;
> - potentially easy to update in the future;
> - Cons:
> - breaking change for users;
> - potentially issues with other projects that use Avro (like Apache Spark e.g.).
>
> 2) Support different Avro versions in Beam, make Avro dependency provided
> - Pros:
> - user decides which versions to use;
> - easy to update in the future;
> - Cons:
> - breaking change for users;
> - not fact that it’s possible to implement in reality;
> - more tests to test Beam with different Avro versions
>
> 3) Extract Avro as an extension, like we do for other formats, and update to latest Avro version, but keep and shade Avro for “core” needs as v.1.8.2 (still have an issue with CVEs)
>
> 4) Anything else?
>
>
> Please, share your thoughts on this and correct me if I stated something wrong. The goal of this discussion is finally to move forward with Avro update topic.
>
> —
> Alexey
>
>
> [1] https://lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r<https://urldefense.com/v3/__https:/lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw9kMOM1iA$>
> [2] https://mvnrepository.com/artifact/org.apache.avro/avro/1.8.2<https://urldefense.com/v3/__https:/mvnrepository.com/artifact/org.apache.avro/avro/1.8.2__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw-JQRaI8A$>
> [3] https://mvnrepository.com/artifact/org.apache.avro/avro/1.11.0<https://urldefense.com/v3/__https:/mvnrepository.com/artifact/org.apache.avro/avro/1.11.0__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw-P2VwtLA$>
> [4] https://github.com/apache/beam/pull/9779<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/9779__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw-C6txcQA$>
> [5] https://github.com/apache/beam/pull/17372<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/17372__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw8mE0kFWg$>
> [6] https://github.com/apache/beam/pull/17246<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/17246__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw-6l8e4lQ$>
> [7] https://lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on<https://urldefense.com/v3/__https:/lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw9iPVZxRg$>
> [8] https://github.com/apache/beam/pull/12748<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/12748__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw9YWZV1hQ$>
> [9] https://lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg<https://urldefense.com/v3/__https:/lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw9T8IrnEg$>
> [10] https://github.com/apache/beam/pull/16271<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/16271__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw81M4oMhQ$>
>
>
>
>
>
>

As a recipient of an email from Talend, your contact personal data will be on our systems. Please see our privacy notice. <https://www.talend.com/privacy/>



Re: [DISCUSS] Next steps for update of Avro dependency in Beam

Posted by Etienne Chauchot <ec...@apache.org>.
Hi,

Thanks Alexey for bringing this topic up.

I'd be in favor of 3

Best

Etienne

Le 12/05/2022 à 23:21, Brian Hulette a écrit :
> Regarding Option (3) "but keep and shade Avro for “core” needs as 
> v.1.8.2 (still have an issue with CVEs)"
>
> Do we actually need to keep avro in core for any reason? I thought we 
> only had it in core for AvroCoder, schema support, and IOs, which I 
> think are all reasonable to separate out into an extension (this would 
> be comparable to the protobuf extension). To confirm I just grepped 
> for files in core that import avro:
>
> ❯ grep -liIrn 'import org\.apache\.avro' sdks/java/core/src/main
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroByteBuddyUtils.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ConvertHelpers.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroRecordSchema.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/DynamicAvroDestinations.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/SerializableAvroCodecFactory.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSink.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSource.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSchemaIOProvider.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/ConstantAvroDestination.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroGenericCoder.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroCoder.java
>
> Brian
>
> On Thu, May 12, 2022 at 2:08 PM Robert Bradshaw <ro...@google.com> 
> wrote:
>
>     Keeping avro in our public (core) API and as an internal dependency
>     seems to be a recurring pain point, I would be all for pulling it out
>     (basically option 3) and subsequently updating our internal version
>     (hopefully no backwards compatibility issues here) and letting the
>     extension live with a variety of versions (insofar as this is
>     feasible).
>
>     On Thu, May 12, 2022 at 10:29 AM Alexey Romanenko
>     <ar...@gmail.com> wrote:
>     >
>     > Hi everyone,
>     >
>     > Sorry in advance for a long email.
>     > TL;DR: Let’s discuss the next steps to update Avro dependency in
>     Beam.
>     >
>     > I’d like to come back to this old and quite sensitive topic here
>     which is Apache Avro version update in Beam. Along the time, we
>     already had several discussions on this (for example [1]) but
>     without any concrete resolutions in the end, iirc.
>     >
>     > As we all know, Beam still depends on quite old Avro version
>     1.8.2 and there were some attempts to bump it to more recent ones.
>     One of the main reasons to bump an Avro version, imho, is that
>     Avro 1.8.2 dependency brings several CVEs [2], but the latest Avro
>     1.11.0 brings only one [3]
>     >
>     > In the same time, this update with introduce some incompatible
>     changes that Avro has between versions and this may affect Beam
>     users and potentially it may affect transitive dependencies while
>     using Beam with other project that use Avro as well:
>     > - Avro completely moved to java.time.* instead of
>     org.joda.time.*. So, we need to adjust date/time conversions
>     from/to Beam schema accordingly since Beam schema still uses
>     joda.time. It will require users to regenerate already generated
>     Java code with avro-compiler (if any) otherwise it won’t compile;
>     > - Some minor changes in Avro dependencies and user API;
>     > - Something else?
>     >
>     > I know that here, on the list, we have people from Avro
>     community that are much more experienced in this than me - so,
>     please correct me if I say something wrong or not 100% correct.
>     >
>     >
>     > In Beam, we also performed several attempts to update Avro - for
>     example, [4], [5], [6] and others.
>     >
>     > To make such update easier in the future, we also discussed to
>     move Avro dependency out of core Beam [7] and there were an
>     attempt to do that [8] by finally this PR was closed with a
>     resolution that it’s not actually needed and we may just want to
>     test Beam with different Avro versions [9]
>     >
>     > The latest work on this was a PR to support several versions of
>     Avro in Beam (1.8.x and 1.9.x) [10] which still introduces some
>     breaking changes for users, iirc.
>     >
>     > So, seems that we are a bit stuck on this topic, though, imho,
>     we need to decide how move forward mostly because of CVEs in old
>     Avro versions and future Avro updates in Beam.
>     >
>     > The potential options (as I can see them):
>     >
>     > 1) Bump Avro dependency to the latest one (1.11.0) or the
>     possible more recent one
>     > - Pros:
>     > - latest/recent Avro dependency;
>     > - potentially easy to update in the future;
>     > - Cons:
>     > - breaking change for users;
>     > - potentially issues with other projects that use Avro (like
>     Apache Spark e.g.).
>     >
>     > 2) Support different Avro versions in Beam, make Avro dependency
>     provided
>     > - Pros:
>     > - user decides which versions to use;
>     > - easy to update in the future;
>     > - Cons:
>     > - breaking change for users;
>     > - not fact that it’s possible to implement in reality;
>     > - more tests to test Beam with different Avro versions
>     >
>     > 3) Extract Avro as an extension, like we do for other formats,
>     and update to latest Avro version, but keep and shade Avro for
>     “core” needs as v.1.8.2 (still have an issue with CVEs)
>     >
>     > 4) Anything else?
>     >
>     >
>     > Please, share your thoughts on this and correct me if I stated
>     something wrong. The goal of this discussion is finally to move
>     forward with Avro update topic.
>     >
>     > —
>     > Alexey
>     >
>     >
>     > [1] https://lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r
>     > [2] https://mvnrepository.com/artifact/org.apache.avro/avro/1.8.2
>     > [3] https://mvnrepository.com/artifact/org.apache.avro/avro/1.11.0
>     > [4] https://github.com/apache/beam/pull/9779
>     > [5] https://github.com/apache/beam/pull/17372
>     > [6] https://github.com/apache/beam/pull/17246
>     > [7] https://lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on
>     > [8] https://github.com/apache/beam/pull/12748
>     > [9] https://lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg
>     > [10] https://github.com/apache/beam/pull/16271
>     >
>     >
>     >
>     >
>     >
>     >
>

Re: [DISCUSS] Next steps for update of Avro dependency in Beam

Posted by Brian Hulette <bh...@google.com>.
Regarding Option (3) "but keep and shade Avro for “core” needs as v.1.8.2
(still have an issue with CVEs)"

Do we actually need to keep avro in core for any reason? I thought we only
had it in core for AvroCoder, schema support, and IOs, which I think are
all reasonable to separate out into an extension (this would be comparable
to the protobuf extension). To confirm I just grepped for files in core
that import avro:

❯ grep -liIrn 'import org\.apache\.avro' sdks/java/core/src/main
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroByteBuddyUtils.java
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ConvertHelpers.java
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroRecordSchema.java
sdks/java/core/src/main/java/org/apache/beam/sdk/io/DynamicAvroDestinations.java
sdks/java/core/src/main/java/org/apache/beam/sdk/io/SerializableAvroCodecFactory.java
sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSink.java
sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSource.java
sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSchemaIOProvider.java
sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
sdks/java/core/src/main/java/org/apache/beam/sdk/io/ConstantAvroDestination.java
sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroGenericCoder.java
sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroCoder.java

Brian

On Thu, May 12, 2022 at 2:08 PM Robert Bradshaw <ro...@google.com> wrote:

> Keeping avro in our public (core) API and as an internal dependency
> seems to be a recurring pain point, I would be all for pulling it out
> (basically option 3) and subsequently updating our internal version
> (hopefully no backwards compatibility issues here) and letting the
> extension live with a variety of versions (insofar as this is
> feasible).
>
> On Thu, May 12, 2022 at 10:29 AM Alexey Romanenko
> <ar...@gmail.com> wrote:
> >
> > Hi everyone,
> >
> > Sorry in advance for a long email.
> > TL;DR: Let’s discuss the next steps to update Avro dependency in Beam.
> >
> > I’d like to come back to this old and quite sensitive topic here which
> is Apache Avro version update in Beam. Along the time, we already had
> several discussions on this (for example [1]) but without any concrete
> resolutions in the end, iirc.
> >
> > As we all know, Beam still depends on quite old Avro version 1.8.2 and
> there were some attempts to bump it to more recent ones. One of the main
> reasons to bump an Avro version, imho, is that Avro 1.8.2 dependency brings
> several CVEs [2], but the latest Avro 1.11.0 brings only one [3]
> >
> > In the same time, this update with introduce some incompatible changes
> that Avro has between versions and this may affect Beam users and
> potentially it may affect transitive dependencies while using Beam with
> other project that use Avro as well:
> > - Avro completely moved to java.time.* instead of org.joda.time.*. So,
> we need to adjust date/time conversions from/to Beam schema accordingly
> since Beam schema still uses joda.time. It will require users to regenerate
> already generated Java code with avro-compiler (if any) otherwise it won’t
> compile;
> > - Some minor changes in Avro dependencies and user API;
> > - Something else?
> >
> > I know that here, on the list, we have people from Avro community that
> are much more experienced in this than me - so, please correct me if I say
> something wrong or not 100% correct.
> >
> >
> > In Beam, we also performed several attempts to update Avro - for
> example, [4], [5], [6] and others.
> >
> > To make such update easier in the future, we also discussed to move Avro
> dependency out of core Beam [7] and there were an attempt to do that [8] by
> finally this PR was closed with a resolution that it’s not actually needed
> and we may just want to test Beam with different Avro versions [9]
> >
> > The latest work on this was a PR to support several versions of Avro in
> Beam (1.8.x and 1.9.x) [10] which still introduces some breaking changes
> for users, iirc.
> >
> > So, seems that we are a bit stuck on this topic, though, imho, we need
> to decide how move forward mostly because of CVEs in old Avro versions and
> future Avro updates in Beam.
> >
> > The potential options (as I can see them):
> >
> > 1) Bump Avro dependency to the latest one (1.11.0) or the possible more
> recent one
> > - Pros:
> > - latest/recent Avro dependency;
> > - potentially easy to update in the future;
> > - Cons:
> > - breaking change for users;
> > - potentially issues with other projects that use Avro (like Apache
> Spark e.g.).
> >
> > 2) Support different Avro versions in Beam, make Avro dependency provided
> > - Pros:
> > - user decides which versions to use;
> > - easy to update in the future;
> > - Cons:
> > - breaking change for users;
> > - not fact that it’s possible to implement in reality;
> > - more tests to test Beam with different Avro versions
> >
> > 3) Extract Avro as an extension, like we do for other formats, and
> update to latest Avro version, but keep and shade Avro for “core” needs as
> v.1.8.2 (still have an issue with CVEs)
> >
> > 4) Anything else?
> >
> >
> > Please, share your thoughts on this and correct me if I stated something
> wrong. The goal of this discussion is finally to move forward with Avro
> update topic.
> >
> > —
> > Alexey
> >
> >
> > [1] https://lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r
> > [2] https://mvnrepository.com/artifact/org.apache.avro/avro/1.8.2
> > [3] https://mvnrepository.com/artifact/org.apache.avro/avro/1.11.0
> > [4] https://github.com/apache/beam/pull/9779
> > [5] https://github.com/apache/beam/pull/17372
> > [6] https://github.com/apache/beam/pull/17246
> > [7] https://lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on
> > [8] https://github.com/apache/beam/pull/12748
> > [9] https://lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg
> > [10] https://github.com/apache/beam/pull/16271
> >
> >
> >
> >
> >
> >
>

Re: [DISCUSS] Next steps for update of Avro dependency in Beam

Posted by Robert Bradshaw <ro...@google.com>.
Keeping avro in our public (core) API and as an internal dependency
seems to be a recurring pain point, I would be all for pulling it out
(basically option 3) and subsequently updating our internal version
(hopefully no backwards compatibility issues here) and letting the
extension live with a variety of versions (insofar as this is
feasible).

On Thu, May 12, 2022 at 10:29 AM Alexey Romanenko
<ar...@gmail.com> wrote:
>
> Hi everyone,
>
> Sorry in advance for a long email.
> TL;DR: Let’s discuss the next steps to update Avro dependency in Beam.
>
> I’d like to come back to this old and quite sensitive topic here which is Apache Avro version update in Beam. Along the time, we already had several discussions on this (for example [1]) but without any concrete resolutions in the end, iirc.
>
> As we all know, Beam still depends on quite old Avro version 1.8.2 and there were some attempts to bump it to more recent ones. One of the main reasons to bump an Avro version, imho, is that Avro 1.8.2 dependency brings several CVEs [2], but the latest Avro 1.11.0 brings only one [3]
>
> In the same time, this update with introduce some incompatible changes that Avro has between versions and this may affect Beam users and potentially it may affect transitive dependencies while using Beam with other project that use Avro as well:
> - Avro completely moved to java.time.* instead of org.joda.time.*. So, we need to adjust date/time conversions from/to Beam schema accordingly since Beam schema still uses joda.time. It will require users to regenerate already generated Java code with avro-compiler (if any) otherwise it won’t compile;
> - Some minor changes in Avro dependencies and user API;
> - Something else?
>
> I know that here, on the list, we have people from Avro community that are much more experienced in this than me - so, please correct me if I say something wrong or not 100% correct.
>
>
> In Beam, we also performed several attempts to update Avro - for example, [4], [5], [6] and others.
>
> To make such update easier in the future, we also discussed to move Avro dependency out of core Beam [7] and there were an attempt to do that [8] by finally this PR was closed with a resolution that it’s not actually needed and we may just want to test Beam with different Avro versions [9]
>
> The latest work on this was a PR to support several versions of Avro in Beam (1.8.x and 1.9.x) [10] which still introduces some breaking changes for users, iirc.
>
> So, seems that we are a bit stuck on this topic, though, imho, we need to decide how move forward mostly because of CVEs in old Avro versions and future Avro updates in Beam.
>
> The potential options (as I can see them):
>
> 1) Bump Avro dependency to the latest one (1.11.0) or the possible more recent one
> - Pros:
> - latest/recent Avro dependency;
> - potentially easy to update in the future;
> - Cons:
> - breaking change for users;
> - potentially issues with other projects that use Avro (like Apache Spark e.g.).
>
> 2) Support different Avro versions in Beam, make Avro dependency provided
> - Pros:
> - user decides which versions to use;
> - easy to update in the future;
> - Cons:
> - breaking change for users;
> - not fact that it’s possible to implement in reality;
> - more tests to test Beam with different Avro versions
>
> 3) Extract Avro as an extension, like we do for other formats, and update to latest Avro version, but keep and shade Avro for “core” needs as v.1.8.2 (still have an issue with CVEs)
>
> 4) Anything else?
>
>
> Please, share your thoughts on this and correct me if I stated something wrong. The goal of this discussion is finally to move forward with Avro update topic.
>
> —
> Alexey
>
>
> [1] https://lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r
> [2] https://mvnrepository.com/artifact/org.apache.avro/avro/1.8.2
> [3] https://mvnrepository.com/artifact/org.apache.avro/avro/1.11.0
> [4] https://github.com/apache/beam/pull/9779
> [5] https://github.com/apache/beam/pull/17372
> [6] https://github.com/apache/beam/pull/17246
> [7] https://lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on
> [8] https://github.com/apache/beam/pull/12748
> [9] https://lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg
> [10] https://github.com/apache/beam/pull/16271
>
>
>
>
>
>