You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Steve Niemitz <sn...@apache.org> on 2022/09/12 19:38:07 UTC

AvroIO.to(DynamicAvroDestinations) deprecated?

We're trying to do some semi-advanced custom logic (custom writers and
schemas per destination) with AvroIO, and want to use
DynamicAvroDestinations to accomplish this.

However, AvroIO.to(DynamicAvroDestinations) is deprecated, but there
doesn't seem to be any other way to accomplish what we want here.
AvroIO.sink is much less sophisticated than the non-sink options, missing
much of the configurability that the non-sink version has.  For example,
there's no way to project from the UserT -> OutputT with the sink version,
only from UserT -> GenericRecord, which isn't what we want.

It seems like most things would be trivial to fix or add on the AvroIO.sink
implementation, is that the intended way that people would be consuming
AvroIO?  I'm a little confused with FileIO.write/writeDynamic vs WriteFiles
vs AvroIO.write, some seem deprecated, and some seem
not-deprecated-but-not-recommended.  To add to the confusion AvroIO.write
uses WriteFiles, but the documentation for the deprecated
AvroIO.to(DynamicAvroDestinations) points to FileIO.write.  Which is the
"right" one to use?

Re: AvroIO.to(DynamicAvroDestinations) deprecated?

Posted by John Casey via user <us...@beam.apache.org>.
That would be great, thanks!

On Tue, Sep 13, 2022 at 3:00 PM Steve Niemitz <sn...@apache.org> wrote:

> Ah this is super useful context, thank you!  I can submit a couple PRs to
> get AvroIO.sink up to parity if that's the way forward.
>
> On Tue, Sep 13, 2022 at 2:53 PM John Casey via user <us...@beam.apache.org>
> wrote:
>
>> Hi Steve,
>>
>> I've asked around, and it looks like this confusing state is due to a
>> migration that isn't complete (and likely won't be until Beam 3.0).
>>
>> Here is the doc that explains some of the history:
>> https://docs.google.com/document/d/1zcF4ZGtq8pxzLZxgD_JMWAouSszIf9LnFANWHKBsZlg/edit
>> And a PR that implements some of the changes:
>> https://github.com/apache/beam/pull/3817
>>
>> Based on this, AvroIO.sink is what we recommend. Please feel free to
>> raise issues on Github to account for features you're missing. In addition,
>> if you think they are straightforward changes, I'd be happy to discuss
>> designs, or look at proposed changes to make these features available.
>>
>> I hope this helps,
>> John
>>
>> On Mon, Sep 12, 2022 at 3:38 PM Steve Niemitz <sn...@apache.org>
>> wrote:
>>
>>> We're trying to do some semi-advanced custom logic (custom writers and
>>> schemas per destination) with AvroIO, and want to use
>>> DynamicAvroDestinations to accomplish this.
>>>
>>> However, AvroIO.to(DynamicAvroDestinations) is deprecated, but there
>>> doesn't seem to be any other way to accomplish what we want here.
>>> AvroIO.sink is much less sophisticated than the non-sink options, missing
>>> much of the configurability that the non-sink version has.  For example,
>>> there's no way to project from the UserT -> OutputT with the sink version,
>>> only from UserT -> GenericRecord, which isn't what we want.
>>>
>>> It seems like most things would be trivial to fix or add on the
>>> AvroIO.sink implementation, is that the intended way that people would be
>>> consuming AvroIO?  I'm a little confused with FileIO.write/writeDynamic vs
>>> WriteFiles vs AvroIO.write, some seem deprecated, and some seem
>>> not-deprecated-but-not-recommended.  To add to the confusion AvroIO.write
>>> uses WriteFiles, but the documentation for the deprecated
>>> AvroIO.to(DynamicAvroDestinations) points to FileIO.write.  Which is the
>>> "right" one to use?
>>>
>>

Re: AvroIO.to(DynamicAvroDestinations) deprecated?

Posted by Steve Niemitz <sn...@apache.org>.
Ah this is super useful context, thank you!  I can submit a couple PRs to
get AvroIO.sink up to parity if that's the way forward.

On Tue, Sep 13, 2022 at 2:53 PM John Casey via user <us...@beam.apache.org>
wrote:

> Hi Steve,
>
> I've asked around, and it looks like this confusing state is due to a
> migration that isn't complete (and likely won't be until Beam 3.0).
>
> Here is the doc that explains some of the history:
> https://docs.google.com/document/d/1zcF4ZGtq8pxzLZxgD_JMWAouSszIf9LnFANWHKBsZlg/edit
> And a PR that implements some of the changes:
> https://github.com/apache/beam/pull/3817
>
> Based on this, AvroIO.sink is what we recommend. Please feel free to raise
> issues on Github to account for features you're missing. In addition, if
> you think they are straightforward changes, I'd be happy to discuss
> designs, or look at proposed changes to make these features available.
>
> I hope this helps,
> John
>
> On Mon, Sep 12, 2022 at 3:38 PM Steve Niemitz <sn...@apache.org> wrote:
>
>> We're trying to do some semi-advanced custom logic (custom writers and
>> schemas per destination) with AvroIO, and want to use
>> DynamicAvroDestinations to accomplish this.
>>
>> However, AvroIO.to(DynamicAvroDestinations) is deprecated, but there
>> doesn't seem to be any other way to accomplish what we want here.
>> AvroIO.sink is much less sophisticated than the non-sink options, missing
>> much of the configurability that the non-sink version has.  For example,
>> there's no way to project from the UserT -> OutputT with the sink version,
>> only from UserT -> GenericRecord, which isn't what we want.
>>
>> It seems like most things would be trivial to fix or add on the
>> AvroIO.sink implementation, is that the intended way that people would be
>> consuming AvroIO?  I'm a little confused with FileIO.write/writeDynamic vs
>> WriteFiles vs AvroIO.write, some seem deprecated, and some seem
>> not-deprecated-but-not-recommended.  To add to the confusion AvroIO.write
>> uses WriteFiles, but the documentation for the deprecated
>> AvroIO.to(DynamicAvroDestinations) points to FileIO.write.  Which is the
>> "right" one to use?
>>
>

Re: AvroIO.to(DynamicAvroDestinations) deprecated?

Posted by John Casey via user <us...@beam.apache.org>.
Hi Steve,

I've asked around, and it looks like this confusing state is due to a
migration that isn't complete (and likely won't be until Beam 3.0).

Here is the doc that explains some of the history:
https://docs.google.com/document/d/1zcF4ZGtq8pxzLZxgD_JMWAouSszIf9LnFANWHKBsZlg/edit
And a PR that implements some of the changes:
https://github.com/apache/beam/pull/3817

Based on this, AvroIO.sink is what we recommend. Please feel free to raise
issues on Github to account for features you're missing. In addition, if
you think they are straightforward changes, I'd be happy to discuss
designs, or look at proposed changes to make these features available.

I hope this helps,
John

On Mon, Sep 12, 2022 at 3:38 PM Steve Niemitz <sn...@apache.org> wrote:

> We're trying to do some semi-advanced custom logic (custom writers and
> schemas per destination) with AvroIO, and want to use
> DynamicAvroDestinations to accomplish this.
>
> However, AvroIO.to(DynamicAvroDestinations) is deprecated, but there
> doesn't seem to be any other way to accomplish what we want here.
> AvroIO.sink is much less sophisticated than the non-sink options, missing
> much of the configurability that the non-sink version has.  For example,
> there's no way to project from the UserT -> OutputT with the sink version,
> only from UserT -> GenericRecord, which isn't what we want.
>
> It seems like most things would be trivial to fix or add on the
> AvroIO.sink implementation, is that the intended way that people would be
> consuming AvroIO?  I'm a little confused with FileIO.write/writeDynamic vs
> WriteFiles vs AvroIO.write, some seem deprecated, and some seem
> not-deprecated-but-not-recommended.  To add to the confusion AvroIO.write
> uses WriteFiles, but the documentation for the deprecated
> AvroIO.to(DynamicAvroDestinations) points to FileIO.write.  Which is the
> "right" one to use?
>