You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Brian Hulette <bh...@google.com> on 2021/10/14 21:34:43 UTC

Re: Why is Avro Date field using InstantCoder?

+dev <de...@beam.apache.org>

This sounds like a bug in the Avro schema mapping. I *think* the intention
is that we generate logic for converting Date to/from Instant when making a
getters for a RowWithGetters backed by Avro.

Brian

On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu <ze...@gmail.com>
wrote:

> A little bit more color on this.
>
> That field is nested inside a avro record like so (not syntactically
> correct):
> {
> type: record,
> fields: [
>  {
>   name: whatever,
>   type: record {
>    fields: [
>    {
>      "name": "somedate",
>      "type: {"type": "int", "logicalType": "date"}
>     }
>    ]
> }
> ]
>
> The outer layer record uses SchemaCoderHelper.coderForFieldType on all
> it's fields to create the coder. However, that method when called for the
> inner record field just returns a SchemaCoder.of(fieldType.getRowSchema())
> which doesn't take into account LogicalTypes.
>
> I think that's where the problem is. If anyone who knows that code could
> have a look and let me know their thoughts, I can try to fix the issue if
> we agree that there is one.
>
> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu <ze...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I have the following field in one of my avro schemas:
>>
>> {
>> "name": "somedate",
>> "type: {"type": "int", "logicalType": "date"}
>> }
>>
>> This generates a java.time.LocalDate field in the corresponding java
>> class (call it Foo).
>>
>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that field as
>> DATETIME in the Beam schema. because of AvroUtils.toFieldType (around line
>> 275, where TimestampMillis and Date are both stored as DATETIME).
>>
>> My problem is that in SchemaCoderHelpers, DATETIME is set to use
>> InstantCoder which expects a joda Instant as input not a LocalDate. So when
>> my PTransform returns Foo objects it crashes with "class
>> java.time.LocalDate cannot be cast to class org.joda.time.Instant..." when
>> trying to encode that field using InstantCoder.encode().
>>
>> Is there a workaround for this issue?
>>
>> Thank you,
>> Cristian
>>
>> PS: I did search the mailing list and google, but didn't find anything
>> related except a thread on AvroCoder.JodaTimestampConversion, which I don't
>> think it applies to here.
>>
>

Re: Why is Avro Date field using InstantCoder?

Posted by Cristian Constantinescu <ze...@gmail.com>.
Hi everyone,

I'm still trying to figure out what the best path forward is on my project.
While doing some research, I came across the confluent interoperability
page [1]. Beam is currently using version 5.3.2 of the confluent libs. They
have an end of support on July 19, 2022. It's the last version that
supports Avro 1.8 (actually 1.8.1). The 5.4.x confluent libs passes to Avro
1.9.1 and 6.2.x (latest) goes to Avro 1.10.1 (latest being 1.10.2).

What's the plan in this case, where dependencies reach their end of life?
Does the Beam plan on staying on the 5.3.2 version even if they are not
supported anymore?

Thanks,
Cristian

[1]
https://docs.confluent.io/platform/current/installation/versions-interoperability.html

On Mon, Oct 18, 2021 at 2:42 PM Brian Hulette <bh...@google.com> wrote:

> Note there was some work done to make Beam work with Avro 1.9 [1]
> (presumably this only works with joda time though? certainly the bytebuddy
> generated code would). Avro 1.9 compatibility is not verified continuously
> though, there was an effort a while ago to test Beam against multiple
> versions of Avro [2] but I don't think it went anywhere. There's also some
> discussion about API compatibility concerns in [2].
>
> Brian
>
> [1] https://issues.apache.org/jira/browse/BEAM-9144
> [2]
> https://lists.apache.org/thread.html/r2739eb540ea2b8d8c50db71850361b8b1347df66e4357001b334987b%40%3Cdev.beam.apache.org%3E
>
> On Mon, Oct 18, 2021 at 11:19 AM Daniel Collins <dp...@google.com>
> wrote:
>
>> > I see a lot of static calls
>>
>> A lot of these calls bottom out in usage of ServiceLoader
>> <http://docs.oracle.com/javase/6/docs/api/java/util/ServiceLoader.html>,
>> effectively acting as an ergonomic API on top of it, with AutoService
>> <https://github.com/google/auto/tree/master/service> used to register
>> new handlers. If they're not currently, and there's some extension point
>> which would be helpful to you, its quite likely that you could get buy in
>> to adding another such extension point.
>>
>> > after I find a workaround
>>
>> I think if the issue is that AvroUtils currently uses an old version of
>> avro, forking it into your own package and version bumping it might be a
>> good place to start for a workaround. Any changes committed to the beam
>> repo will take 1-2 months to make it to a non-snapshot build even if you do
>> find a long term solution acceptable to all interested parties.
>>
>> -Daniel
>>
>> On Mon, Oct 18, 2021 at 1:46 PM Cristian Constantinescu <ze...@gmail.com>
>> wrote:
>>
>>> I will have a look after I find a workaround as I really need to deliver
>>> some things and using Avro 1.8 isn't really an option.
>>>
>>> But once that's done, I'd love to find ways to make Beam less dependent
>>> on Avro 1.8 considering it was released in 2017.
>>>
>>> On Mon, Oct 18, 2021 at 12:34 PM Reuven Lax <re...@google.com> wrote:
>>>
>>>> Do you know if it's easy to detect which version of Avro is being used?
>>>>
>>>> On Sun, Oct 17, 2021 at 10:20 PM Cristian Constantinescu <
>>>> zeidoo@gmail.com> wrote:
>>>>
>>>>> If I had to change things, I would:
>>>>>
>>>>> 1. When deriving the SCHEMA add a few new types (JAVA_TIME, JAVA_DATE
>>>>> or something along those lines).
>>>>> 2. RowCoderGenerator around line 159 calls
>>>>> "SchemaCoder.coderForFieldType(schema.getField(rowIndex).getType().withNullable(false));"
>>>>> which eventually gets to SchemaCoderHelpers.coderForFieldType. There,
>>>>> CODER_MAP has a hard reference on InstantCoder for DATETIME. Maybe that map
>>>>> can be augmented (possibly dynamically) with new
>>>>> fieldtypes-coder combinations to take care of the new types from #1.
>>>>>
>>>>> I would also like to ask. Looking through the Beam code, I see a lot
>>>>> of static calls. Just wondering why it's done this way. I'm used to
>>>>> projects having some form of dependency injection involved and static calls
>>>>> being frowned upon (lack of mockability, hidden dependencies, tight
>>>>> coupling etc). The only reason I can think of is serializability given
>>>>> Beam's multi-node processing?
>>>>>
>>>>>
>>>>> On Sat, Oct 16, 2021 at 3:11 AM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>> Is the Schema inference the only reason we can't upgrade Avro, or are
>>>>>> there other blockers? Is there any way we can tell at runtime which version
>>>>>> of Avro is running? Since we generate the conversion code at runtime with
>>>>>> ByteBuddy, we could potentially just generate different conversions
>>>>>> depending on the Avro version.
>>>>>>
>>>>>> On Fri, Oct 15, 2021 at 11:56 PM Cristian Constantinescu <
>>>>>> zeidoo@gmail.com> wrote:
>>>>>>
>>>>>>> Those are fair points. However please consider that there might be
>>>>>>> new users who will decide that Beam isn't suitable because of things like
>>>>>>> requiring Avro 1.8, Joda time, old Confluent libraries, and, when I started
>>>>>>> using Beam about a year ago, Java 8 (I think we're okay with Java 11 now).
>>>>>>>
>>>>>>> I guess what I'm saying is that there's definitely a non-negligible
>>>>>>> cost associated with old 3rd party libs in Beam's code (even if efforts are
>>>>>>> put in to minimize them).
>>>>>>>
>>>>>>> On Sat, Oct 16, 2021 at 2:33 AM Reuven Lax <re...@google.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Oct 15, 2021 at 11:13 PM Cristian Constantinescu <
>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> To my knowledge and reading through AVRO's Jira[1], it does not
>>>>>>>>> support jodatime anymore.
>>>>>>>>>
>>>>>>>>> It seems everything related to this Avro 1.8 dependency is tricky.
>>>>>>>>> If you recall, it also prevents us from upgrading to the latest Confluent
>>>>>>>>> libs... for enabling Beam to use protobufs schemas with the schema
>>>>>>>>> registry. (I was also the one who brought that issue up, also made an
>>>>>>>>> exploratory PR to move AVRO outside of Beam core.)
>>>>>>>>>
>>>>>>>>> I understand that Beam tries to maintain public APIs stable, but
>>>>>>>>> I'd like to put forward two points:
>>>>>>>>> 1) Schemas are experimental, hence there shouldn't be any API
>>>>>>>>> stability guarantees there.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Unfortunately at this point, they aren't really. As a community
>>>>>>>> we've been bad about removing the Experimental label - many core, core
>>>>>>>> parts of Beam are still labeled experimental (sources, triggering, state,
>>>>>>>> timers). Realistically they are no longer experimental.
>>>>>>>>
>>>>>>>> 2) Maybe this is the perfect opportunity for the Beam community to
>>>>>>>>> think about the long term effects of old dependencies within Beam's
>>>>>>>>> codebase, and especially how to deal with them. Perhaps
>>>>>>>>> starting/maintaining an "experimental" branch/maven-published-artifacts
>>>>>>>>> where Beam does not guarantee backwards compatibility (or maintains it for
>>>>>>>>> a shorter period of time) is something to think about.
>>>>>>>>>
>>>>>>>>
>>>>>>>> This is why we usually try to prevent third-party libraries from
>>>>>>>> being in our public API. However in this case, that's tricky.
>>>>>>>>
>>>>>>>> The beam community can of course decide to break backwards
>>>>>>>> compatibility. However as stated today, it is maintained. The last time we
>>>>>>>> broke backwards compatibility was when the old Dataflow API was
>>>>>>>> transitioned to Beam, and it was very painful. It took multiple years to
>>>>>>>> get some users onto the Beam API due to the code changes required to
>>>>>>>> migrate (and those required code changes weren't terribly invasive).
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1] https://issues.apache.org/jira/browse/AVRO-2335
>>>>>>>>>
>>>>>>>>> On Sat, Oct 16, 2021 at 12:40 AM Reuven Lax <re...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Does this mean more recent versions of avro aren't backwards
>>>>>>>>>> compatible with avro 1.8? If so, this might be tricky to fix, since Beam
>>>>>>>>>> maintains backwards compatibility on its public API.
>>>>>>>>>>
>>>>>>>>>> On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu <
>>>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I've created a small demo project to show the issue[1].
>>>>>>>>>>>
>>>>>>>>>>> I've looked at the beam code and all the avro tests use avro
>>>>>>>>>>> 1.8... which is hardcoded to Joda, hence why all the tests pass. Avro
>>>>>>>>>>> changed to java.time (as most other recent java projects) after 1.8. Is
>>>>>>>>>>> there a plan for Beam to do the same?
>>>>>>>>>>>
>>>>>>>>>>> Does anyone use Avro with java.time instead of joda.time that
>>>>>>>>>>> could give me ideas how to make it work?
>>>>>>>>>>>
>>>>>>>>>>> Thank you,
>>>>>>>>>>> Cristian
>>>>>>>>>>>
>>>>>>>>>>> [1] https://github.com/zeidoo/beamjavadates-poc
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette <
>>>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> +dev <de...@beam.apache.org>
>>>>>>>>>>>>
>>>>>>>>>>>> This sounds like a bug in the Avro schema mapping. I *think*
>>>>>>>>>>>> the intention is that we generate logic for converting Date to/from Instant
>>>>>>>>>>>> when making a getters for a RowWithGetters backed by Avro.
>>>>>>>>>>>>
>>>>>>>>>>>> Brian
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu <
>>>>>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> A little bit more color on this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> That field is nested inside a avro record like so (not
>>>>>>>>>>>>> syntactically correct):
>>>>>>>>>>>>> {
>>>>>>>>>>>>> type: record,
>>>>>>>>>>>>> fields: [
>>>>>>>>>>>>>  {
>>>>>>>>>>>>>   name: whatever,
>>>>>>>>>>>>>   type: record {
>>>>>>>>>>>>>    fields: [
>>>>>>>>>>>>>    {
>>>>>>>>>>>>>      "name": "somedate",
>>>>>>>>>>>>>      "type: {"type": "int", "logicalType": "date"}
>>>>>>>>>>>>>     }
>>>>>>>>>>>>>    ]
>>>>>>>>>>>>> }
>>>>>>>>>>>>> ]
>>>>>>>>>>>>>
>>>>>>>>>>>>> The outer layer record uses
>>>>>>>>>>>>> SchemaCoderHelper.coderForFieldType on all it's fields to create the coder.
>>>>>>>>>>>>> However, that method when called for the inner record field just returns a
>>>>>>>>>>>>> SchemaCoder.of(fieldType.getRowSchema()) which doesn't take into account
>>>>>>>>>>>>> LogicalTypes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think that's where the problem is. If anyone who knows that
>>>>>>>>>>>>> code could have a look and let me know their thoughts, I can try to fix the
>>>>>>>>>>>>> issue if we agree that there is one.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu <
>>>>>>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have the following field in one of my avro schemas:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> "name": "somedate",
>>>>>>>>>>>>>> "type: {"type": "int", "logicalType": "date"}
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This generates a java.time.LocalDate field in the
>>>>>>>>>>>>>> corresponding java class (call it Foo).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that
>>>>>>>>>>>>>> field as DATETIME in the Beam schema. because of AvroUtils.toFieldType
>>>>>>>>>>>>>> (around line 275, where TimestampMillis and Date are both stored as
>>>>>>>>>>>>>> DATETIME).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My problem is that in SchemaCoderHelpers, DATETIME is set to
>>>>>>>>>>>>>> use InstantCoder which expects a joda Instant as input not a LocalDate. So
>>>>>>>>>>>>>> when my PTransform returns Foo objects it crashes with "class
>>>>>>>>>>>>>> java.time.LocalDate cannot be cast to class org.joda.time.Instant..." when
>>>>>>>>>>>>>> trying to encode that field using InstantCoder.encode().
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is there a workaround for this issue?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you,
>>>>>>>>>>>>>> Cristian
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> PS: I did search the mailing list and google, but didn't find
>>>>>>>>>>>>>> anything related except a thread on AvroCoder.JodaTimestampConversion,
>>>>>>>>>>>>>> which I don't think it applies to here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>

Re: Why is Avro Date field using InstantCoder?

Posted by Cristian Constantinescu <ze...@gmail.com>.
Hi everyone,

I'm still trying to figure out what the best path forward is on my project.
While doing some research, I came across the confluent interoperability
page [1]. Beam is currently using version 5.3.2 of the confluent libs. They
have an end of support on July 19, 2022. It's the last version that
supports Avro 1.8 (actually 1.8.1). The 5.4.x confluent libs passes to Avro
1.9.1 and 6.2.x (latest) goes to Avro 1.10.1 (latest being 1.10.2).

What's the plan in this case, where dependencies reach their end of life?
Does the Beam plan on staying on the 5.3.2 version even if they are not
supported anymore?

Thanks,
Cristian

[1]
https://docs.confluent.io/platform/current/installation/versions-interoperability.html

On Mon, Oct 18, 2021 at 2:42 PM Brian Hulette <bh...@google.com> wrote:

> Note there was some work done to make Beam work with Avro 1.9 [1]
> (presumably this only works with joda time though? certainly the bytebuddy
> generated code would). Avro 1.9 compatibility is not verified continuously
> though, there was an effort a while ago to test Beam against multiple
> versions of Avro [2] but I don't think it went anywhere. There's also some
> discussion about API compatibility concerns in [2].
>
> Brian
>
> [1] https://issues.apache.org/jira/browse/BEAM-9144
> [2]
> https://lists.apache.org/thread.html/r2739eb540ea2b8d8c50db71850361b8b1347df66e4357001b334987b%40%3Cdev.beam.apache.org%3E
>
> On Mon, Oct 18, 2021 at 11:19 AM Daniel Collins <dp...@google.com>
> wrote:
>
>> > I see a lot of static calls
>>
>> A lot of these calls bottom out in usage of ServiceLoader
>> <http://docs.oracle.com/javase/6/docs/api/java/util/ServiceLoader.html>,
>> effectively acting as an ergonomic API on top of it, with AutoService
>> <https://github.com/google/auto/tree/master/service> used to register
>> new handlers. If they're not currently, and there's some extension point
>> which would be helpful to you, its quite likely that you could get buy in
>> to adding another such extension point.
>>
>> > after I find a workaround
>>
>> I think if the issue is that AvroUtils currently uses an old version of
>> avro, forking it into your own package and version bumping it might be a
>> good place to start for a workaround. Any changes committed to the beam
>> repo will take 1-2 months to make it to a non-snapshot build even if you do
>> find a long term solution acceptable to all interested parties.
>>
>> -Daniel
>>
>> On Mon, Oct 18, 2021 at 1:46 PM Cristian Constantinescu <ze...@gmail.com>
>> wrote:
>>
>>> I will have a look after I find a workaround as I really need to deliver
>>> some things and using Avro 1.8 isn't really an option.
>>>
>>> But once that's done, I'd love to find ways to make Beam less dependent
>>> on Avro 1.8 considering it was released in 2017.
>>>
>>> On Mon, Oct 18, 2021 at 12:34 PM Reuven Lax <re...@google.com> wrote:
>>>
>>>> Do you know if it's easy to detect which version of Avro is being used?
>>>>
>>>> On Sun, Oct 17, 2021 at 10:20 PM Cristian Constantinescu <
>>>> zeidoo@gmail.com> wrote:
>>>>
>>>>> If I had to change things, I would:
>>>>>
>>>>> 1. When deriving the SCHEMA add a few new types (JAVA_TIME, JAVA_DATE
>>>>> or something along those lines).
>>>>> 2. RowCoderGenerator around line 159 calls
>>>>> "SchemaCoder.coderForFieldType(schema.getField(rowIndex).getType().withNullable(false));"
>>>>> which eventually gets to SchemaCoderHelpers.coderForFieldType. There,
>>>>> CODER_MAP has a hard reference on InstantCoder for DATETIME. Maybe that map
>>>>> can be augmented (possibly dynamically) with new
>>>>> fieldtypes-coder combinations to take care of the new types from #1.
>>>>>
>>>>> I would also like to ask. Looking through the Beam code, I see a lot
>>>>> of static calls. Just wondering why it's done this way. I'm used to
>>>>> projects having some form of dependency injection involved and static calls
>>>>> being frowned upon (lack of mockability, hidden dependencies, tight
>>>>> coupling etc). The only reason I can think of is serializability given
>>>>> Beam's multi-node processing?
>>>>>
>>>>>
>>>>> On Sat, Oct 16, 2021 at 3:11 AM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>> Is the Schema inference the only reason we can't upgrade Avro, or are
>>>>>> there other blockers? Is there any way we can tell at runtime which version
>>>>>> of Avro is running? Since we generate the conversion code at runtime with
>>>>>> ByteBuddy, we could potentially just generate different conversions
>>>>>> depending on the Avro version.
>>>>>>
>>>>>> On Fri, Oct 15, 2021 at 11:56 PM Cristian Constantinescu <
>>>>>> zeidoo@gmail.com> wrote:
>>>>>>
>>>>>>> Those are fair points. However please consider that there might be
>>>>>>> new users who will decide that Beam isn't suitable because of things like
>>>>>>> requiring Avro 1.8, Joda time, old Confluent libraries, and, when I started
>>>>>>> using Beam about a year ago, Java 8 (I think we're okay with Java 11 now).
>>>>>>>
>>>>>>> I guess what I'm saying is that there's definitely a non-negligible
>>>>>>> cost associated with old 3rd party libs in Beam's code (even if efforts are
>>>>>>> put in to minimize them).
>>>>>>>
>>>>>>> On Sat, Oct 16, 2021 at 2:33 AM Reuven Lax <re...@google.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Oct 15, 2021 at 11:13 PM Cristian Constantinescu <
>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> To my knowledge and reading through AVRO's Jira[1], it does not
>>>>>>>>> support jodatime anymore.
>>>>>>>>>
>>>>>>>>> It seems everything related to this Avro 1.8 dependency is tricky.
>>>>>>>>> If you recall, it also prevents us from upgrading to the latest Confluent
>>>>>>>>> libs... for enabling Beam to use protobufs schemas with the schema
>>>>>>>>> registry. (I was also the one who brought that issue up, also made an
>>>>>>>>> exploratory PR to move AVRO outside of Beam core.)
>>>>>>>>>
>>>>>>>>> I understand that Beam tries to maintain public APIs stable, but
>>>>>>>>> I'd like to put forward two points:
>>>>>>>>> 1) Schemas are experimental, hence there shouldn't be any API
>>>>>>>>> stability guarantees there.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Unfortunately at this point, they aren't really. As a community
>>>>>>>> we've been bad about removing the Experimental label - many core, core
>>>>>>>> parts of Beam are still labeled experimental (sources, triggering, state,
>>>>>>>> timers). Realistically they are no longer experimental.
>>>>>>>>
>>>>>>>> 2) Maybe this is the perfect opportunity for the Beam community to
>>>>>>>>> think about the long term effects of old dependencies within Beam's
>>>>>>>>> codebase, and especially how to deal with them. Perhaps
>>>>>>>>> starting/maintaining an "experimental" branch/maven-published-artifacts
>>>>>>>>> where Beam does not guarantee backwards compatibility (or maintains it for
>>>>>>>>> a shorter period of time) is something to think about.
>>>>>>>>>
>>>>>>>>
>>>>>>>> This is why we usually try to prevent third-party libraries from
>>>>>>>> being in our public API. However in this case, that's tricky.
>>>>>>>>
>>>>>>>> The beam community can of course decide to break backwards
>>>>>>>> compatibility. However as stated today, it is maintained. The last time we
>>>>>>>> broke backwards compatibility was when the old Dataflow API was
>>>>>>>> transitioned to Beam, and it was very painful. It took multiple years to
>>>>>>>> get some users onto the Beam API due to the code changes required to
>>>>>>>> migrate (and those required code changes weren't terribly invasive).
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1] https://issues.apache.org/jira/browse/AVRO-2335
>>>>>>>>>
>>>>>>>>> On Sat, Oct 16, 2021 at 12:40 AM Reuven Lax <re...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Does this mean more recent versions of avro aren't backwards
>>>>>>>>>> compatible with avro 1.8? If so, this might be tricky to fix, since Beam
>>>>>>>>>> maintains backwards compatibility on its public API.
>>>>>>>>>>
>>>>>>>>>> On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu <
>>>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I've created a small demo project to show the issue[1].
>>>>>>>>>>>
>>>>>>>>>>> I've looked at the beam code and all the avro tests use avro
>>>>>>>>>>> 1.8... which is hardcoded to Joda, hence why all the tests pass. Avro
>>>>>>>>>>> changed to java.time (as most other recent java projects) after 1.8. Is
>>>>>>>>>>> there a plan for Beam to do the same?
>>>>>>>>>>>
>>>>>>>>>>> Does anyone use Avro with java.time instead of joda.time that
>>>>>>>>>>> could give me ideas how to make it work?
>>>>>>>>>>>
>>>>>>>>>>> Thank you,
>>>>>>>>>>> Cristian
>>>>>>>>>>>
>>>>>>>>>>> [1] https://github.com/zeidoo/beamjavadates-poc
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette <
>>>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> +dev <de...@beam.apache.org>
>>>>>>>>>>>>
>>>>>>>>>>>> This sounds like a bug in the Avro schema mapping. I *think*
>>>>>>>>>>>> the intention is that we generate logic for converting Date to/from Instant
>>>>>>>>>>>> when making a getters for a RowWithGetters backed by Avro.
>>>>>>>>>>>>
>>>>>>>>>>>> Brian
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu <
>>>>>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> A little bit more color on this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> That field is nested inside a avro record like so (not
>>>>>>>>>>>>> syntactically correct):
>>>>>>>>>>>>> {
>>>>>>>>>>>>> type: record,
>>>>>>>>>>>>> fields: [
>>>>>>>>>>>>>  {
>>>>>>>>>>>>>   name: whatever,
>>>>>>>>>>>>>   type: record {
>>>>>>>>>>>>>    fields: [
>>>>>>>>>>>>>    {
>>>>>>>>>>>>>      "name": "somedate",
>>>>>>>>>>>>>      "type: {"type": "int", "logicalType": "date"}
>>>>>>>>>>>>>     }
>>>>>>>>>>>>>    ]
>>>>>>>>>>>>> }
>>>>>>>>>>>>> ]
>>>>>>>>>>>>>
>>>>>>>>>>>>> The outer layer record uses
>>>>>>>>>>>>> SchemaCoderHelper.coderForFieldType on all it's fields to create the coder.
>>>>>>>>>>>>> However, that method when called for the inner record field just returns a
>>>>>>>>>>>>> SchemaCoder.of(fieldType.getRowSchema()) which doesn't take into account
>>>>>>>>>>>>> LogicalTypes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think that's where the problem is. If anyone who knows that
>>>>>>>>>>>>> code could have a look and let me know their thoughts, I can try to fix the
>>>>>>>>>>>>> issue if we agree that there is one.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu <
>>>>>>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have the following field in one of my avro schemas:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> "name": "somedate",
>>>>>>>>>>>>>> "type: {"type": "int", "logicalType": "date"}
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This generates a java.time.LocalDate field in the
>>>>>>>>>>>>>> corresponding java class (call it Foo).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that
>>>>>>>>>>>>>> field as DATETIME in the Beam schema. because of AvroUtils.toFieldType
>>>>>>>>>>>>>> (around line 275, where TimestampMillis and Date are both stored as
>>>>>>>>>>>>>> DATETIME).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My problem is that in SchemaCoderHelpers, DATETIME is set to
>>>>>>>>>>>>>> use InstantCoder which expects a joda Instant as input not a LocalDate. So
>>>>>>>>>>>>>> when my PTransform returns Foo objects it crashes with "class
>>>>>>>>>>>>>> java.time.LocalDate cannot be cast to class org.joda.time.Instant..." when
>>>>>>>>>>>>>> trying to encode that field using InstantCoder.encode().
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is there a workaround for this issue?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you,
>>>>>>>>>>>>>> Cristian
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> PS: I did search the mailing list and google, but didn't find
>>>>>>>>>>>>>> anything related except a thread on AvroCoder.JodaTimestampConversion,
>>>>>>>>>>>>>> which I don't think it applies to here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>

Re: Why is Avro Date field using InstantCoder?

Posted by Brian Hulette <bh...@google.com>.
Note there was some work done to make Beam work with Avro 1.9 [1]
(presumably this only works with joda time though? certainly the bytebuddy
generated code would). Avro 1.9 compatibility is not verified continuously
though, there was an effort a while ago to test Beam against multiple
versions of Avro [2] but I don't think it went anywhere. There's also some
discussion about API compatibility concerns in [2].

Brian

[1] https://issues.apache.org/jira/browse/BEAM-9144
[2]
https://lists.apache.org/thread.html/r2739eb540ea2b8d8c50db71850361b8b1347df66e4357001b334987b%40%3Cdev.beam.apache.org%3E

On Mon, Oct 18, 2021 at 11:19 AM Daniel Collins <dp...@google.com>
wrote:

> > I see a lot of static calls
>
> A lot of these calls bottom out in usage of ServiceLoader
> <http://docs.oracle.com/javase/6/docs/api/java/util/ServiceLoader.html>,
> effectively acting as an ergonomic API on top of it, with AutoService
> <https://github.com/google/auto/tree/master/service> used to register new
> handlers. If they're not currently, and there's some extension point which
> would be helpful to you, its quite likely that you could get buy in to
> adding another such extension point.
>
> > after I find a workaround
>
> I think if the issue is that AvroUtils currently uses an old version of
> avro, forking it into your own package and version bumping it might be a
> good place to start for a workaround. Any changes committed to the beam
> repo will take 1-2 months to make it to a non-snapshot build even if you do
> find a long term solution acceptable to all interested parties.
>
> -Daniel
>
> On Mon, Oct 18, 2021 at 1:46 PM Cristian Constantinescu <ze...@gmail.com>
> wrote:
>
>> I will have a look after I find a workaround as I really need to deliver
>> some things and using Avro 1.8 isn't really an option.
>>
>> But once that's done, I'd love to find ways to make Beam less dependent
>> on Avro 1.8 considering it was released in 2017.
>>
>> On Mon, Oct 18, 2021 at 12:34 PM Reuven Lax <re...@google.com> wrote:
>>
>>> Do you know if it's easy to detect which version of Avro is being used?
>>>
>>> On Sun, Oct 17, 2021 at 10:20 PM Cristian Constantinescu <
>>> zeidoo@gmail.com> wrote:
>>>
>>>> If I had to change things, I would:
>>>>
>>>> 1. When deriving the SCHEMA add a few new types (JAVA_TIME, JAVA_DATE
>>>> or something along those lines).
>>>> 2. RowCoderGenerator around line 159 calls
>>>> "SchemaCoder.coderForFieldType(schema.getField(rowIndex).getType().withNullable(false));"
>>>> which eventually gets to SchemaCoderHelpers.coderForFieldType. There,
>>>> CODER_MAP has a hard reference on InstantCoder for DATETIME. Maybe that map
>>>> can be augmented (possibly dynamically) with new
>>>> fieldtypes-coder combinations to take care of the new types from #1.
>>>>
>>>> I would also like to ask. Looking through the Beam code, I see a lot of
>>>> static calls. Just wondering why it's done this way. I'm used to projects
>>>> having some form of dependency injection involved and static calls being
>>>> frowned upon (lack of mockability, hidden dependencies, tight coupling
>>>> etc). The only reason I can think of is serializability given Beam's
>>>> multi-node processing?
>>>>
>>>>
>>>> On Sat, Oct 16, 2021 at 3:11 AM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> Is the Schema inference the only reason we can't upgrade Avro, or are
>>>>> there other blockers? Is there any way we can tell at runtime which version
>>>>> of Avro is running? Since we generate the conversion code at runtime with
>>>>> ByteBuddy, we could potentially just generate different conversions
>>>>> depending on the Avro version.
>>>>>
>>>>> On Fri, Oct 15, 2021 at 11:56 PM Cristian Constantinescu <
>>>>> zeidoo@gmail.com> wrote:
>>>>>
>>>>>> Those are fair points. However please consider that there might be
>>>>>> new users who will decide that Beam isn't suitable because of things like
>>>>>> requiring Avro 1.8, Joda time, old Confluent libraries, and, when I started
>>>>>> using Beam about a year ago, Java 8 (I think we're okay with Java 11 now).
>>>>>>
>>>>>> I guess what I'm saying is that there's definitely a non-negligible
>>>>>> cost associated with old 3rd party libs in Beam's code (even if efforts are
>>>>>> put in to minimize them).
>>>>>>
>>>>>> On Sat, Oct 16, 2021 at 2:33 AM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Oct 15, 2021 at 11:13 PM Cristian Constantinescu <
>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>
>>>>>>>> To my knowledge and reading through AVRO's Jira[1], it does not
>>>>>>>> support jodatime anymore.
>>>>>>>>
>>>>>>>> It seems everything related to this Avro 1.8 dependency is tricky.
>>>>>>>> If you recall, it also prevents us from upgrading to the latest Confluent
>>>>>>>> libs... for enabling Beam to use protobufs schemas with the schema
>>>>>>>> registry. (I was also the one who brought that issue up, also made an
>>>>>>>> exploratory PR to move AVRO outside of Beam core.)
>>>>>>>>
>>>>>>>> I understand that Beam tries to maintain public APIs stable, but
>>>>>>>> I'd like to put forward two points:
>>>>>>>> 1) Schemas are experimental, hence there shouldn't be any API
>>>>>>>> stability guarantees there.
>>>>>>>>
>>>>>>>
>>>>>>> Unfortunately at this point, they aren't really. As a community
>>>>>>> we've been bad about removing the Experimental label - many core, core
>>>>>>> parts of Beam are still labeled experimental (sources, triggering, state,
>>>>>>> timers). Realistically they are no longer experimental.
>>>>>>>
>>>>>>> 2) Maybe this is the perfect opportunity for the Beam community to
>>>>>>>> think about the long term effects of old dependencies within Beam's
>>>>>>>> codebase, and especially how to deal with them. Perhaps
>>>>>>>> starting/maintaining an "experimental" branch/maven-published-artifacts
>>>>>>>> where Beam does not guarantee backwards compatibility (or maintains it for
>>>>>>>> a shorter period of time) is something to think about.
>>>>>>>>
>>>>>>>
>>>>>>> This is why we usually try to prevent third-party libraries from
>>>>>>> being in our public API. However in this case, that's tricky.
>>>>>>>
>>>>>>> The beam community can of course decide to break backwards
>>>>>>> compatibility. However as stated today, it is maintained. The last time we
>>>>>>> broke backwards compatibility was when the old Dataflow API was
>>>>>>> transitioned to Beam, and it was very painful. It took multiple years to
>>>>>>> get some users onto the Beam API due to the code changes required to
>>>>>>> migrate (and those required code changes weren't terribly invasive).
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> [1] https://issues.apache.org/jira/browse/AVRO-2335
>>>>>>>>
>>>>>>>> On Sat, Oct 16, 2021 at 12:40 AM Reuven Lax <re...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Does this mean more recent versions of avro aren't backwards
>>>>>>>>> compatible with avro 1.8? If so, this might be tricky to fix, since Beam
>>>>>>>>> maintains backwards compatibility on its public API.
>>>>>>>>>
>>>>>>>>> On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu <
>>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I've created a small demo project to show the issue[1].
>>>>>>>>>>
>>>>>>>>>> I've looked at the beam code and all the avro tests use avro
>>>>>>>>>> 1.8... which is hardcoded to Joda, hence why all the tests pass. Avro
>>>>>>>>>> changed to java.time (as most other recent java projects) after 1.8. Is
>>>>>>>>>> there a plan for Beam to do the same?
>>>>>>>>>>
>>>>>>>>>> Does anyone use Avro with java.time instead of joda.time that
>>>>>>>>>> could give me ideas how to make it work?
>>>>>>>>>>
>>>>>>>>>> Thank you,
>>>>>>>>>> Cristian
>>>>>>>>>>
>>>>>>>>>> [1] https://github.com/zeidoo/beamjavadates-poc
>>>>>>>>>>
>>>>>>>>>> On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette <
>>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> +dev <de...@beam.apache.org>
>>>>>>>>>>>
>>>>>>>>>>> This sounds like a bug in the Avro schema mapping. I *think* the
>>>>>>>>>>> intention is that we generate logic for converting Date to/from Instant
>>>>>>>>>>> when making a getters for a RowWithGetters backed by Avro.
>>>>>>>>>>>
>>>>>>>>>>> Brian
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu <
>>>>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> A little bit more color on this.
>>>>>>>>>>>>
>>>>>>>>>>>> That field is nested inside a avro record like so (not
>>>>>>>>>>>> syntactically correct):
>>>>>>>>>>>> {
>>>>>>>>>>>> type: record,
>>>>>>>>>>>> fields: [
>>>>>>>>>>>>  {
>>>>>>>>>>>>   name: whatever,
>>>>>>>>>>>>   type: record {
>>>>>>>>>>>>    fields: [
>>>>>>>>>>>>    {
>>>>>>>>>>>>      "name": "somedate",
>>>>>>>>>>>>      "type: {"type": "int", "logicalType": "date"}
>>>>>>>>>>>>     }
>>>>>>>>>>>>    ]
>>>>>>>>>>>> }
>>>>>>>>>>>> ]
>>>>>>>>>>>>
>>>>>>>>>>>> The outer layer record uses SchemaCoderHelper.coderForFieldType
>>>>>>>>>>>> on all it's fields to create the coder. However, that method when called
>>>>>>>>>>>> for the inner record field just returns a
>>>>>>>>>>>> SchemaCoder.of(fieldType.getRowSchema()) which doesn't take into account
>>>>>>>>>>>> LogicalTypes.
>>>>>>>>>>>>
>>>>>>>>>>>> I think that's where the problem is. If anyone who knows that
>>>>>>>>>>>> code could have a look and let me know their thoughts, I can try to fix the
>>>>>>>>>>>> issue if we agree that there is one.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu <
>>>>>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have the following field in one of my avro schemas:
>>>>>>>>>>>>>
>>>>>>>>>>>>> {
>>>>>>>>>>>>> "name": "somedate",
>>>>>>>>>>>>> "type: {"type": "int", "logicalType": "date"}
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> This generates a java.time.LocalDate field in the
>>>>>>>>>>>>> corresponding java class (call it Foo).
>>>>>>>>>>>>>
>>>>>>>>>>>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that
>>>>>>>>>>>>> field as DATETIME in the Beam schema. because of AvroUtils.toFieldType
>>>>>>>>>>>>> (around line 275, where TimestampMillis and Date are both stored as
>>>>>>>>>>>>> DATETIME).
>>>>>>>>>>>>>
>>>>>>>>>>>>> My problem is that in SchemaCoderHelpers, DATETIME is set to
>>>>>>>>>>>>> use InstantCoder which expects a joda Instant as input not a LocalDate. So
>>>>>>>>>>>>> when my PTransform returns Foo objects it crashes with "class
>>>>>>>>>>>>> java.time.LocalDate cannot be cast to class org.joda.time.Instant..." when
>>>>>>>>>>>>> trying to encode that field using InstantCoder.encode().
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is there a workaround for this issue?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you,
>>>>>>>>>>>>> Cristian
>>>>>>>>>>>>>
>>>>>>>>>>>>> PS: I did search the mailing list and google, but didn't find
>>>>>>>>>>>>> anything related except a thread on AvroCoder.JodaTimestampConversion,
>>>>>>>>>>>>> which I don't think it applies to here.
>>>>>>>>>>>>>
>>>>>>>>>>>>

Re: Why is Avro Date field using InstantCoder?

Posted by Brian Hulette <bh...@google.com>.
Note there was some work done to make Beam work with Avro 1.9 [1]
(presumably this only works with joda time though? certainly the bytebuddy
generated code would). Avro 1.9 compatibility is not verified continuously
though, there was an effort a while ago to test Beam against multiple
versions of Avro [2] but I don't think it went anywhere. There's also some
discussion about API compatibility concerns in [2].

Brian

[1] https://issues.apache.org/jira/browse/BEAM-9144
[2]
https://lists.apache.org/thread.html/r2739eb540ea2b8d8c50db71850361b8b1347df66e4357001b334987b%40%3Cdev.beam.apache.org%3E

On Mon, Oct 18, 2021 at 11:19 AM Daniel Collins <dp...@google.com>
wrote:

> > I see a lot of static calls
>
> A lot of these calls bottom out in usage of ServiceLoader
> <http://docs.oracle.com/javase/6/docs/api/java/util/ServiceLoader.html>,
> effectively acting as an ergonomic API on top of it, with AutoService
> <https://github.com/google/auto/tree/master/service> used to register new
> handlers. If they're not currently, and there's some extension point which
> would be helpful to you, its quite likely that you could get buy in to
> adding another such extension point.
>
> > after I find a workaround
>
> I think if the issue is that AvroUtils currently uses an old version of
> avro, forking it into your own package and version bumping it might be a
> good place to start for a workaround. Any changes committed to the beam
> repo will take 1-2 months to make it to a non-snapshot build even if you do
> find a long term solution acceptable to all interested parties.
>
> -Daniel
>
> On Mon, Oct 18, 2021 at 1:46 PM Cristian Constantinescu <ze...@gmail.com>
> wrote:
>
>> I will have a look after I find a workaround as I really need to deliver
>> some things and using Avro 1.8 isn't really an option.
>>
>> But once that's done, I'd love to find ways to make Beam less dependent
>> on Avro 1.8 considering it was released in 2017.
>>
>> On Mon, Oct 18, 2021 at 12:34 PM Reuven Lax <re...@google.com> wrote:
>>
>>> Do you know if it's easy to detect which version of Avro is being used?
>>>
>>> On Sun, Oct 17, 2021 at 10:20 PM Cristian Constantinescu <
>>> zeidoo@gmail.com> wrote:
>>>
>>>> If I had to change things, I would:
>>>>
>>>> 1. When deriving the SCHEMA add a few new types (JAVA_TIME, JAVA_DATE
>>>> or something along those lines).
>>>> 2. RowCoderGenerator around line 159 calls
>>>> "SchemaCoder.coderForFieldType(schema.getField(rowIndex).getType().withNullable(false));"
>>>> which eventually gets to SchemaCoderHelpers.coderForFieldType. There,
>>>> CODER_MAP has a hard reference on InstantCoder for DATETIME. Maybe that map
>>>> can be augmented (possibly dynamically) with new
>>>> fieldtypes-coder combinations to take care of the new types from #1.
>>>>
>>>> I would also like to ask. Looking through the Beam code, I see a lot of
>>>> static calls. Just wondering why it's done this way. I'm used to projects
>>>> having some form of dependency injection involved and static calls being
>>>> frowned upon (lack of mockability, hidden dependencies, tight coupling
>>>> etc). The only reason I can think of is serializability given Beam's
>>>> multi-node processing?
>>>>
>>>>
>>>> On Sat, Oct 16, 2021 at 3:11 AM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> Is the Schema inference the only reason we can't upgrade Avro, or are
>>>>> there other blockers? Is there any way we can tell at runtime which version
>>>>> of Avro is running? Since we generate the conversion code at runtime with
>>>>> ByteBuddy, we could potentially just generate different conversions
>>>>> depending on the Avro version.
>>>>>
>>>>> On Fri, Oct 15, 2021 at 11:56 PM Cristian Constantinescu <
>>>>> zeidoo@gmail.com> wrote:
>>>>>
>>>>>> Those are fair points. However please consider that there might be
>>>>>> new users who will decide that Beam isn't suitable because of things like
>>>>>> requiring Avro 1.8, Joda time, old Confluent libraries, and, when I started
>>>>>> using Beam about a year ago, Java 8 (I think we're okay with Java 11 now).
>>>>>>
>>>>>> I guess what I'm saying is that there's definitely a non-negligible
>>>>>> cost associated with old 3rd party libs in Beam's code (even if efforts are
>>>>>> put in to minimize them).
>>>>>>
>>>>>> On Sat, Oct 16, 2021 at 2:33 AM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Oct 15, 2021 at 11:13 PM Cristian Constantinescu <
>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>
>>>>>>>> To my knowledge and reading through AVRO's Jira[1], it does not
>>>>>>>> support jodatime anymore.
>>>>>>>>
>>>>>>>> It seems everything related to this Avro 1.8 dependency is tricky.
>>>>>>>> If you recall, it also prevents us from upgrading to the latest Confluent
>>>>>>>> libs... for enabling Beam to use protobufs schemas with the schema
>>>>>>>> registry. (I was also the one who brought that issue up, also made an
>>>>>>>> exploratory PR to move AVRO outside of Beam core.)
>>>>>>>>
>>>>>>>> I understand that Beam tries to maintain public APIs stable, but
>>>>>>>> I'd like to put forward two points:
>>>>>>>> 1) Schemas are experimental, hence there shouldn't be any API
>>>>>>>> stability guarantees there.
>>>>>>>>
>>>>>>>
>>>>>>> Unfortunately at this point, they aren't really. As a community
>>>>>>> we've been bad about removing the Experimental label - many core, core
>>>>>>> parts of Beam are still labeled experimental (sources, triggering, state,
>>>>>>> timers). Realistically they are no longer experimental.
>>>>>>>
>>>>>>> 2) Maybe this is the perfect opportunity for the Beam community to
>>>>>>>> think about the long term effects of old dependencies within Beam's
>>>>>>>> codebase, and especially how to deal with them. Perhaps
>>>>>>>> starting/maintaining an "experimental" branch/maven-published-artifacts
>>>>>>>> where Beam does not guarantee backwards compatibility (or maintains it for
>>>>>>>> a shorter period of time) is something to think about.
>>>>>>>>
>>>>>>>
>>>>>>> This is why we usually try to prevent third-party libraries from
>>>>>>> being in our public API. However in this case, that's tricky.
>>>>>>>
>>>>>>> The beam community can of course decide to break backwards
>>>>>>> compatibility. However as stated today, it is maintained. The last time we
>>>>>>> broke backwards compatibility was when the old Dataflow API was
>>>>>>> transitioned to Beam, and it was very painful. It took multiple years to
>>>>>>> get some users onto the Beam API due to the code changes required to
>>>>>>> migrate (and those required code changes weren't terribly invasive).
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> [1] https://issues.apache.org/jira/browse/AVRO-2335
>>>>>>>>
>>>>>>>> On Sat, Oct 16, 2021 at 12:40 AM Reuven Lax <re...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Does this mean more recent versions of avro aren't backwards
>>>>>>>>> compatible with avro 1.8? If so, this might be tricky to fix, since Beam
>>>>>>>>> maintains backwards compatibility on its public API.
>>>>>>>>>
>>>>>>>>> On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu <
>>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I've created a small demo project to show the issue[1].
>>>>>>>>>>
>>>>>>>>>> I've looked at the beam code and all the avro tests use avro
>>>>>>>>>> 1.8... which is hardcoded to Joda, hence why all the tests pass. Avro
>>>>>>>>>> changed to java.time (as most other recent java projects) after 1.8. Is
>>>>>>>>>> there a plan for Beam to do the same?
>>>>>>>>>>
>>>>>>>>>> Does anyone use Avro with java.time instead of joda.time that
>>>>>>>>>> could give me ideas how to make it work?
>>>>>>>>>>
>>>>>>>>>> Thank you,
>>>>>>>>>> Cristian
>>>>>>>>>>
>>>>>>>>>> [1] https://github.com/zeidoo/beamjavadates-poc
>>>>>>>>>>
>>>>>>>>>> On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette <
>>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> +dev <de...@beam.apache.org>
>>>>>>>>>>>
>>>>>>>>>>> This sounds like a bug in the Avro schema mapping. I *think* the
>>>>>>>>>>> intention is that we generate logic for converting Date to/from Instant
>>>>>>>>>>> when making a getters for a RowWithGetters backed by Avro.
>>>>>>>>>>>
>>>>>>>>>>> Brian
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu <
>>>>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> A little bit more color on this.
>>>>>>>>>>>>
>>>>>>>>>>>> That field is nested inside a avro record like so (not
>>>>>>>>>>>> syntactically correct):
>>>>>>>>>>>> {
>>>>>>>>>>>> type: record,
>>>>>>>>>>>> fields: [
>>>>>>>>>>>>  {
>>>>>>>>>>>>   name: whatever,
>>>>>>>>>>>>   type: record {
>>>>>>>>>>>>    fields: [
>>>>>>>>>>>>    {
>>>>>>>>>>>>      "name": "somedate",
>>>>>>>>>>>>      "type: {"type": "int", "logicalType": "date"}
>>>>>>>>>>>>     }
>>>>>>>>>>>>    ]
>>>>>>>>>>>> }
>>>>>>>>>>>> ]
>>>>>>>>>>>>
>>>>>>>>>>>> The outer layer record uses SchemaCoderHelper.coderForFieldType
>>>>>>>>>>>> on all it's fields to create the coder. However, that method when called
>>>>>>>>>>>> for the inner record field just returns a
>>>>>>>>>>>> SchemaCoder.of(fieldType.getRowSchema()) which doesn't take into account
>>>>>>>>>>>> LogicalTypes.
>>>>>>>>>>>>
>>>>>>>>>>>> I think that's where the problem is. If anyone who knows that
>>>>>>>>>>>> code could have a look and let me know their thoughts, I can try to fix the
>>>>>>>>>>>> issue if we agree that there is one.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu <
>>>>>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have the following field in one of my avro schemas:
>>>>>>>>>>>>>
>>>>>>>>>>>>> {
>>>>>>>>>>>>> "name": "somedate",
>>>>>>>>>>>>> "type: {"type": "int", "logicalType": "date"}
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> This generates a java.time.LocalDate field in the
>>>>>>>>>>>>> corresponding java class (call it Foo).
>>>>>>>>>>>>>
>>>>>>>>>>>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that
>>>>>>>>>>>>> field as DATETIME in the Beam schema. because of AvroUtils.toFieldType
>>>>>>>>>>>>> (around line 275, where TimestampMillis and Date are both stored as
>>>>>>>>>>>>> DATETIME).
>>>>>>>>>>>>>
>>>>>>>>>>>>> My problem is that in SchemaCoderHelpers, DATETIME is set to
>>>>>>>>>>>>> use InstantCoder which expects a joda Instant as input not a LocalDate. So
>>>>>>>>>>>>> when my PTransform returns Foo objects it crashes with "class
>>>>>>>>>>>>> java.time.LocalDate cannot be cast to class org.joda.time.Instant..." when
>>>>>>>>>>>>> trying to encode that field using InstantCoder.encode().
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is there a workaround for this issue?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you,
>>>>>>>>>>>>> Cristian
>>>>>>>>>>>>>
>>>>>>>>>>>>> PS: I did search the mailing list and google, but didn't find
>>>>>>>>>>>>> anything related except a thread on AvroCoder.JodaTimestampConversion,
>>>>>>>>>>>>> which I don't think it applies to here.
>>>>>>>>>>>>>
>>>>>>>>>>>>

Re: Why is Avro Date field using InstantCoder?

Posted by Daniel Collins <dp...@google.com>.
> I see a lot of static calls

A lot of these calls bottom out in usage of ServiceLoader
<http://docs.oracle.com/javase/6/docs/api/java/util/ServiceLoader.html>,
effectively acting as an ergonomic API on top of it, with AutoService
<https://github.com/google/auto/tree/master/service> used to register new
handlers. If they're not currently, and there's some extension point which
would be helpful to you, its quite likely that you could get buy in to
adding another such extension point.

> after I find a workaround

I think if the issue is that AvroUtils currently uses an old version of
avro, forking it into your own package and version bumping it might be a
good place to start for a workaround. Any changes committed to the beam
repo will take 1-2 months to make it to a non-snapshot build even if you do
find a long term solution acceptable to all interested parties.

-Daniel

On Mon, Oct 18, 2021 at 1:46 PM Cristian Constantinescu <ze...@gmail.com>
wrote:

> I will have a look after I find a workaround as I really need to deliver
> some things and using Avro 1.8 isn't really an option.
>
> But once that's done, I'd love to find ways to make Beam less dependent on
> Avro 1.8 considering it was released in 2017.
>
> On Mon, Oct 18, 2021 at 12:34 PM Reuven Lax <re...@google.com> wrote:
>
>> Do you know if it's easy to detect which version of Avro is being used?
>>
>> On Sun, Oct 17, 2021 at 10:20 PM Cristian Constantinescu <
>> zeidoo@gmail.com> wrote:
>>
>>> If I had to change things, I would:
>>>
>>> 1. When deriving the SCHEMA add a few new types (JAVA_TIME, JAVA_DATE or
>>> something along those lines).
>>> 2. RowCoderGenerator around line 159 calls
>>> "SchemaCoder.coderForFieldType(schema.getField(rowIndex).getType().withNullable(false));"
>>> which eventually gets to SchemaCoderHelpers.coderForFieldType. There,
>>> CODER_MAP has a hard reference on InstantCoder for DATETIME. Maybe that map
>>> can be augmented (possibly dynamically) with new
>>> fieldtypes-coder combinations to take care of the new types from #1.
>>>
>>> I would also like to ask. Looking through the Beam code, I see a lot of
>>> static calls. Just wondering why it's done this way. I'm used to projects
>>> having some form of dependency injection involved and static calls being
>>> frowned upon (lack of mockability, hidden dependencies, tight coupling
>>> etc). The only reason I can think of is serializability given Beam's
>>> multi-node processing?
>>>
>>>
>>> On Sat, Oct 16, 2021 at 3:11 AM Reuven Lax <re...@google.com> wrote:
>>>
>>>> Is the Schema inference the only reason we can't upgrade Avro, or are
>>>> there other blockers? Is there any way we can tell at runtime which version
>>>> of Avro is running? Since we generate the conversion code at runtime with
>>>> ByteBuddy, we could potentially just generate different conversions
>>>> depending on the Avro version.
>>>>
>>>> On Fri, Oct 15, 2021 at 11:56 PM Cristian Constantinescu <
>>>> zeidoo@gmail.com> wrote:
>>>>
>>>>> Those are fair points. However please consider that there might be new
>>>>> users who will decide that Beam isn't suitable because of things like
>>>>> requiring Avro 1.8, Joda time, old Confluent libraries, and, when I started
>>>>> using Beam about a year ago, Java 8 (I think we're okay with Java 11 now).
>>>>>
>>>>> I guess what I'm saying is that there's definitely a non-negligible
>>>>> cost associated with old 3rd party libs in Beam's code (even if efforts are
>>>>> put in to minimize them).
>>>>>
>>>>> On Sat, Oct 16, 2021 at 2:33 AM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 15, 2021 at 11:13 PM Cristian Constantinescu <
>>>>>> zeidoo@gmail.com> wrote:
>>>>>>
>>>>>>> To my knowledge and reading through AVRO's Jira[1], it does not
>>>>>>> support jodatime anymore.
>>>>>>>
>>>>>>> It seems everything related to this Avro 1.8 dependency is tricky.
>>>>>>> If you recall, it also prevents us from upgrading to the latest Confluent
>>>>>>> libs... for enabling Beam to use protobufs schemas with the schema
>>>>>>> registry. (I was also the one who brought that issue up, also made an
>>>>>>> exploratory PR to move AVRO outside of Beam core.)
>>>>>>>
>>>>>>> I understand that Beam tries to maintain public APIs stable, but I'd
>>>>>>> like to put forward two points:
>>>>>>> 1) Schemas are experimental, hence there shouldn't be any API
>>>>>>> stability guarantees there.
>>>>>>>
>>>>>>
>>>>>> Unfortunately at this point, they aren't really. As a community we've
>>>>>> been bad about removing the Experimental label - many core, core parts of
>>>>>> Beam are still labeled experimental (sources, triggering, state, timers).
>>>>>> Realistically they are no longer experimental.
>>>>>>
>>>>>> 2) Maybe this is the perfect opportunity for the Beam community to
>>>>>>> think about the long term effects of old dependencies within Beam's
>>>>>>> codebase, and especially how to deal with them. Perhaps
>>>>>>> starting/maintaining an "experimental" branch/maven-published-artifacts
>>>>>>> where Beam does not guarantee backwards compatibility (or maintains it for
>>>>>>> a shorter period of time) is something to think about.
>>>>>>>
>>>>>>
>>>>>> This is why we usually try to prevent third-party libraries from
>>>>>> being in our public API. However in this case, that's tricky.
>>>>>>
>>>>>> The beam community can of course decide to break backwards
>>>>>> compatibility. However as stated today, it is maintained. The last time we
>>>>>> broke backwards compatibility was when the old Dataflow API was
>>>>>> transitioned to Beam, and it was very painful. It took multiple years to
>>>>>> get some users onto the Beam API due to the code changes required to
>>>>>> migrate (and those required code changes weren't terribly invasive).
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> [1] https://issues.apache.org/jira/browse/AVRO-2335
>>>>>>>
>>>>>>> On Sat, Oct 16, 2021 at 12:40 AM Reuven Lax <re...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Does this mean more recent versions of avro aren't backwards
>>>>>>>> compatible with avro 1.8? If so, this might be tricky to fix, since Beam
>>>>>>>> maintains backwards compatibility on its public API.
>>>>>>>>
>>>>>>>> On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu <
>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I've created a small demo project to show the issue[1].
>>>>>>>>>
>>>>>>>>> I've looked at the beam code and all the avro tests use avro
>>>>>>>>> 1.8... which is hardcoded to Joda, hence why all the tests pass. Avro
>>>>>>>>> changed to java.time (as most other recent java projects) after 1.8. Is
>>>>>>>>> there a plan for Beam to do the same?
>>>>>>>>>
>>>>>>>>> Does anyone use Avro with java.time instead of joda.time that
>>>>>>>>> could give me ideas how to make it work?
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> Cristian
>>>>>>>>>
>>>>>>>>> [1] https://github.com/zeidoo/beamjavadates-poc
>>>>>>>>>
>>>>>>>>> On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette <bh...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> +dev <de...@beam.apache.org>
>>>>>>>>>>
>>>>>>>>>> This sounds like a bug in the Avro schema mapping. I *think* the
>>>>>>>>>> intention is that we generate logic for converting Date to/from Instant
>>>>>>>>>> when making a getters for a RowWithGetters backed by Avro.
>>>>>>>>>>
>>>>>>>>>> Brian
>>>>>>>>>>
>>>>>>>>>> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu <
>>>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> A little bit more color on this.
>>>>>>>>>>>
>>>>>>>>>>> That field is nested inside a avro record like so (not
>>>>>>>>>>> syntactically correct):
>>>>>>>>>>> {
>>>>>>>>>>> type: record,
>>>>>>>>>>> fields: [
>>>>>>>>>>>  {
>>>>>>>>>>>   name: whatever,
>>>>>>>>>>>   type: record {
>>>>>>>>>>>    fields: [
>>>>>>>>>>>    {
>>>>>>>>>>>      "name": "somedate",
>>>>>>>>>>>      "type: {"type": "int", "logicalType": "date"}
>>>>>>>>>>>     }
>>>>>>>>>>>    ]
>>>>>>>>>>> }
>>>>>>>>>>> ]
>>>>>>>>>>>
>>>>>>>>>>> The outer layer record uses SchemaCoderHelper.coderForFieldType
>>>>>>>>>>> on all it's fields to create the coder. However, that method when called
>>>>>>>>>>> for the inner record field just returns a
>>>>>>>>>>> SchemaCoder.of(fieldType.getRowSchema()) which doesn't take into account
>>>>>>>>>>> LogicalTypes.
>>>>>>>>>>>
>>>>>>>>>>> I think that's where the problem is. If anyone who knows that
>>>>>>>>>>> code could have a look and let me know their thoughts, I can try to fix the
>>>>>>>>>>> issue if we agree that there is one.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu <
>>>>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> I have the following field in one of my avro schemas:
>>>>>>>>>>>>
>>>>>>>>>>>> {
>>>>>>>>>>>> "name": "somedate",
>>>>>>>>>>>> "type: {"type": "int", "logicalType": "date"}
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> This generates a java.time.LocalDate field in the corresponding
>>>>>>>>>>>> java class (call it Foo).
>>>>>>>>>>>>
>>>>>>>>>>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that
>>>>>>>>>>>> field as DATETIME in the Beam schema. because of AvroUtils.toFieldType
>>>>>>>>>>>> (around line 275, where TimestampMillis and Date are both stored as
>>>>>>>>>>>> DATETIME).
>>>>>>>>>>>>
>>>>>>>>>>>> My problem is that in SchemaCoderHelpers, DATETIME is set to
>>>>>>>>>>>> use InstantCoder which expects a joda Instant as input not a LocalDate. So
>>>>>>>>>>>> when my PTransform returns Foo objects it crashes with "class
>>>>>>>>>>>> java.time.LocalDate cannot be cast to class org.joda.time.Instant..." when
>>>>>>>>>>>> trying to encode that field using InstantCoder.encode().
>>>>>>>>>>>>
>>>>>>>>>>>> Is there a workaround for this issue?
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you,
>>>>>>>>>>>> Cristian
>>>>>>>>>>>>
>>>>>>>>>>>> PS: I did search the mailing list and google, but didn't find
>>>>>>>>>>>> anything related except a thread on AvroCoder.JodaTimestampConversion,
>>>>>>>>>>>> which I don't think it applies to here.
>>>>>>>>>>>>
>>>>>>>>>>>

Re: Why is Avro Date field using InstantCoder?

Posted by Cristian Constantinescu <ze...@gmail.com>.
I will have a look after I find a workaround as I really need to deliver
some things and using Avro 1.8 isn't really an option.

But once that's done, I'd love to find ways to make Beam less dependent on
Avro 1.8 considering it was released in 2017.

On Mon, Oct 18, 2021 at 12:34 PM Reuven Lax <re...@google.com> wrote:

> Do you know if it's easy to detect which version of Avro is being used?
>
> On Sun, Oct 17, 2021 at 10:20 PM Cristian Constantinescu <ze...@gmail.com>
> wrote:
>
>> If I had to change things, I would:
>>
>> 1. When deriving the SCHEMA add a few new types (JAVA_TIME, JAVA_DATE or
>> something along those lines).
>> 2. RowCoderGenerator around line 159 calls
>> "SchemaCoder.coderForFieldType(schema.getField(rowIndex).getType().withNullable(false));"
>> which eventually gets to SchemaCoderHelpers.coderForFieldType. There,
>> CODER_MAP has a hard reference on InstantCoder for DATETIME. Maybe that map
>> can be augmented (possibly dynamically) with new
>> fieldtypes-coder combinations to take care of the new types from #1.
>>
>> I would also like to ask. Looking through the Beam code, I see a lot of
>> static calls. Just wondering why it's done this way. I'm used to projects
>> having some form of dependency injection involved and static calls being
>> frowned upon (lack of mockability, hidden dependencies, tight coupling
>> etc). The only reason I can think of is serializability given Beam's
>> multi-node processing?
>>
>>
>> On Sat, Oct 16, 2021 at 3:11 AM Reuven Lax <re...@google.com> wrote:
>>
>>> Is the Schema inference the only reason we can't upgrade Avro, or are
>>> there other blockers? Is there any way we can tell at runtime which version
>>> of Avro is running? Since we generate the conversion code at runtime with
>>> ByteBuddy, we could potentially just generate different conversions
>>> depending on the Avro version.
>>>
>>> On Fri, Oct 15, 2021 at 11:56 PM Cristian Constantinescu <
>>> zeidoo@gmail.com> wrote:
>>>
>>>> Those are fair points. However please consider that there might be new
>>>> users who will decide that Beam isn't suitable because of things like
>>>> requiring Avro 1.8, Joda time, old Confluent libraries, and, when I started
>>>> using Beam about a year ago, Java 8 (I think we're okay with Java 11 now).
>>>>
>>>> I guess what I'm saying is that there's definitely a non-negligible
>>>> cost associated with old 3rd party libs in Beam's code (even if efforts are
>>>> put in to minimize them).
>>>>
>>>> On Sat, Oct 16, 2021 at 2:33 AM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Oct 15, 2021 at 11:13 PM Cristian Constantinescu <
>>>>> zeidoo@gmail.com> wrote:
>>>>>
>>>>>> To my knowledge and reading through AVRO's Jira[1], it does not
>>>>>> support jodatime anymore.
>>>>>>
>>>>>> It seems everything related to this Avro 1.8 dependency is tricky. If
>>>>>> you recall, it also prevents us from upgrading to the latest Confluent
>>>>>> libs... for enabling Beam to use protobufs schemas with the schema
>>>>>> registry. (I was also the one who brought that issue up, also made an
>>>>>> exploratory PR to move AVRO outside of Beam core.)
>>>>>>
>>>>>> I understand that Beam tries to maintain public APIs stable, but I'd
>>>>>> like to put forward two points:
>>>>>> 1) Schemas are experimental, hence there shouldn't be any API
>>>>>> stability guarantees there.
>>>>>>
>>>>>
>>>>> Unfortunately at this point, they aren't really. As a community we've
>>>>> been bad about removing the Experimental label - many core, core parts of
>>>>> Beam are still labeled experimental (sources, triggering, state, timers).
>>>>> Realistically they are no longer experimental.
>>>>>
>>>>> 2) Maybe this is the perfect opportunity for the Beam community to
>>>>>> think about the long term effects of old dependencies within Beam's
>>>>>> codebase, and especially how to deal with them. Perhaps
>>>>>> starting/maintaining an "experimental" branch/maven-published-artifacts
>>>>>> where Beam does not guarantee backwards compatibility (or maintains it for
>>>>>> a shorter period of time) is something to think about.
>>>>>>
>>>>>
>>>>> This is why we usually try to prevent third-party libraries from being
>>>>> in our public API. However in this case, that's tricky.
>>>>>
>>>>> The beam community can of course decide to break backwards
>>>>> compatibility. However as stated today, it is maintained. The last time we
>>>>> broke backwards compatibility was when the old Dataflow API was
>>>>> transitioned to Beam, and it was very painful. It took multiple years to
>>>>> get some users onto the Beam API due to the code changes required to
>>>>> migrate (and those required code changes weren't terribly invasive).
>>>>>
>>>>>
>>>>>>
>>>>>> [1] https://issues.apache.org/jira/browse/AVRO-2335
>>>>>>
>>>>>> On Sat, Oct 16, 2021 at 12:40 AM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>> Does this mean more recent versions of avro aren't backwards
>>>>>>> compatible with avro 1.8? If so, this might be tricky to fix, since Beam
>>>>>>> maintains backwards compatibility on its public API.
>>>>>>>
>>>>>>> On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu <
>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I've created a small demo project to show the issue[1].
>>>>>>>>
>>>>>>>> I've looked at the beam code and all the avro tests use avro 1.8...
>>>>>>>> which is hardcoded to Joda, hence why all the tests pass. Avro changed to
>>>>>>>> java.time (as most other recent java projects) after 1.8. Is there a plan
>>>>>>>> for Beam to do the same?
>>>>>>>>
>>>>>>>> Does anyone use Avro with java.time instead of joda.time that could
>>>>>>>> give me ideas how to make it work?
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> Cristian
>>>>>>>>
>>>>>>>> [1] https://github.com/zeidoo/beamjavadates-poc
>>>>>>>>
>>>>>>>> On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette <bh...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> +dev <de...@beam.apache.org>
>>>>>>>>>
>>>>>>>>> This sounds like a bug in the Avro schema mapping. I *think* the
>>>>>>>>> intention is that we generate logic for converting Date to/from Instant
>>>>>>>>> when making a getters for a RowWithGetters backed by Avro.
>>>>>>>>>
>>>>>>>>> Brian
>>>>>>>>>
>>>>>>>>> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu <
>>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> A little bit more color on this.
>>>>>>>>>>
>>>>>>>>>> That field is nested inside a avro record like so (not
>>>>>>>>>> syntactically correct):
>>>>>>>>>> {
>>>>>>>>>> type: record,
>>>>>>>>>> fields: [
>>>>>>>>>>  {
>>>>>>>>>>   name: whatever,
>>>>>>>>>>   type: record {
>>>>>>>>>>    fields: [
>>>>>>>>>>    {
>>>>>>>>>>      "name": "somedate",
>>>>>>>>>>      "type: {"type": "int", "logicalType": "date"}
>>>>>>>>>>     }
>>>>>>>>>>    ]
>>>>>>>>>> }
>>>>>>>>>> ]
>>>>>>>>>>
>>>>>>>>>> The outer layer record uses SchemaCoderHelper.coderForFieldType
>>>>>>>>>> on all it's fields to create the coder. However, that method when called
>>>>>>>>>> for the inner record field just returns a
>>>>>>>>>> SchemaCoder.of(fieldType.getRowSchema()) which doesn't take into account
>>>>>>>>>> LogicalTypes.
>>>>>>>>>>
>>>>>>>>>> I think that's where the problem is. If anyone who knows that
>>>>>>>>>> code could have a look and let me know their thoughts, I can try to fix the
>>>>>>>>>> issue if we agree that there is one.
>>>>>>>>>>
>>>>>>>>>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu <
>>>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I have the following field in one of my avro schemas:
>>>>>>>>>>>
>>>>>>>>>>> {
>>>>>>>>>>> "name": "somedate",
>>>>>>>>>>> "type: {"type": "int", "logicalType": "date"}
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> This generates a java.time.LocalDate field in the corresponding
>>>>>>>>>>> java class (call it Foo).
>>>>>>>>>>>
>>>>>>>>>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that
>>>>>>>>>>> field as DATETIME in the Beam schema. because of AvroUtils.toFieldType
>>>>>>>>>>> (around line 275, where TimestampMillis and Date are both stored as
>>>>>>>>>>> DATETIME).
>>>>>>>>>>>
>>>>>>>>>>> My problem is that in SchemaCoderHelpers, DATETIME is set to use
>>>>>>>>>>> InstantCoder which expects a joda Instant as input not a LocalDate. So when
>>>>>>>>>>> my PTransform returns Foo objects it crashes with "class
>>>>>>>>>>> java.time.LocalDate cannot be cast to class org.joda.time.Instant..." when
>>>>>>>>>>> trying to encode that field using InstantCoder.encode().
>>>>>>>>>>>
>>>>>>>>>>> Is there a workaround for this issue?
>>>>>>>>>>>
>>>>>>>>>>> Thank you,
>>>>>>>>>>> Cristian
>>>>>>>>>>>
>>>>>>>>>>> PS: I did search the mailing list and google, but didn't find
>>>>>>>>>>> anything related except a thread on AvroCoder.JodaTimestampConversion,
>>>>>>>>>>> which I don't think it applies to here.
>>>>>>>>>>>
>>>>>>>>>>

Re: Why is Avro Date field using InstantCoder?

Posted by Cristian Constantinescu <ze...@gmail.com>.
I will have a look after I find a workaround as I really need to deliver
some things and using Avro 1.8 isn't really an option.

But once that's done, I'd love to find ways to make Beam less dependent on
Avro 1.8 considering it was released in 2017.

On Mon, Oct 18, 2021 at 12:34 PM Reuven Lax <re...@google.com> wrote:

> Do you know if it's easy to detect which version of Avro is being used?
>
> On Sun, Oct 17, 2021 at 10:20 PM Cristian Constantinescu <ze...@gmail.com>
> wrote:
>
>> If I had to change things, I would:
>>
>> 1. When deriving the SCHEMA add a few new types (JAVA_TIME, JAVA_DATE or
>> something along those lines).
>> 2. RowCoderGenerator around line 159 calls
>> "SchemaCoder.coderForFieldType(schema.getField(rowIndex).getType().withNullable(false));"
>> which eventually gets to SchemaCoderHelpers.coderForFieldType. There,
>> CODER_MAP has a hard reference on InstantCoder for DATETIME. Maybe that map
>> can be augmented (possibly dynamically) with new
>> fieldtypes-coder combinations to take care of the new types from #1.
>>
>> I would also like to ask. Looking through the Beam code, I see a lot of
>> static calls. Just wondering why it's done this way. I'm used to projects
>> having some form of dependency injection involved and static calls being
>> frowned upon (lack of mockability, hidden dependencies, tight coupling
>> etc). The only reason I can think of is serializability given Beam's
>> multi-node processing?
>>
>>
>> On Sat, Oct 16, 2021 at 3:11 AM Reuven Lax <re...@google.com> wrote:
>>
>>> Is the Schema inference the only reason we can't upgrade Avro, or are
>>> there other blockers? Is there any way we can tell at runtime which version
>>> of Avro is running? Since we generate the conversion code at runtime with
>>> ByteBuddy, we could potentially just generate different conversions
>>> depending on the Avro version.
>>>
>>> On Fri, Oct 15, 2021 at 11:56 PM Cristian Constantinescu <
>>> zeidoo@gmail.com> wrote:
>>>
>>>> Those are fair points. However please consider that there might be new
>>>> users who will decide that Beam isn't suitable because of things like
>>>> requiring Avro 1.8, Joda time, old Confluent libraries, and, when I started
>>>> using Beam about a year ago, Java 8 (I think we're okay with Java 11 now).
>>>>
>>>> I guess what I'm saying is that there's definitely a non-negligible
>>>> cost associated with old 3rd party libs in Beam's code (even if efforts are
>>>> put in to minimize them).
>>>>
>>>> On Sat, Oct 16, 2021 at 2:33 AM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Oct 15, 2021 at 11:13 PM Cristian Constantinescu <
>>>>> zeidoo@gmail.com> wrote:
>>>>>
>>>>>> To my knowledge and reading through AVRO's Jira[1], it does not
>>>>>> support jodatime anymore.
>>>>>>
>>>>>> It seems everything related to this Avro 1.8 dependency is tricky. If
>>>>>> you recall, it also prevents us from upgrading to the latest Confluent
>>>>>> libs... for enabling Beam to use protobufs schemas with the schema
>>>>>> registry. (I was also the one who brought that issue up, also made an
>>>>>> exploratory PR to move AVRO outside of Beam core.)
>>>>>>
>>>>>> I understand that Beam tries to maintain public APIs stable, but I'd
>>>>>> like to put forward two points:
>>>>>> 1) Schemas are experimental, hence there shouldn't be any API
>>>>>> stability guarantees there.
>>>>>>
>>>>>
>>>>> Unfortunately at this point, they aren't really. As a community we've
>>>>> been bad about removing the Experimental label - many core, core parts of
>>>>> Beam are still labeled experimental (sources, triggering, state, timers).
>>>>> Realistically they are no longer experimental.
>>>>>
>>>>> 2) Maybe this is the perfect opportunity for the Beam community to
>>>>>> think about the long term effects of old dependencies within Beam's
>>>>>> codebase, and especially how to deal with them. Perhaps
>>>>>> starting/maintaining an "experimental" branch/maven-published-artifacts
>>>>>> where Beam does not guarantee backwards compatibility (or maintains it for
>>>>>> a shorter period of time) is something to think about.
>>>>>>
>>>>>
>>>>> This is why we usually try to prevent third-party libraries from being
>>>>> in our public API. However in this case, that's tricky.
>>>>>
>>>>> The beam community can of course decide to break backwards
>>>>> compatibility. However as stated today, it is maintained. The last time we
>>>>> broke backwards compatibility was when the old Dataflow API was
>>>>> transitioned to Beam, and it was very painful. It took multiple years to
>>>>> get some users onto the Beam API due to the code changes required to
>>>>> migrate (and those required code changes weren't terribly invasive).
>>>>>
>>>>>
>>>>>>
>>>>>> [1] https://issues.apache.org/jira/browse/AVRO-2335
>>>>>>
>>>>>> On Sat, Oct 16, 2021 at 12:40 AM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>> Does this mean more recent versions of avro aren't backwards
>>>>>>> compatible with avro 1.8? If so, this might be tricky to fix, since Beam
>>>>>>> maintains backwards compatibility on its public API.
>>>>>>>
>>>>>>> On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu <
>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I've created a small demo project to show the issue[1].
>>>>>>>>
>>>>>>>> I've looked at the beam code and all the avro tests use avro 1.8...
>>>>>>>> which is hardcoded to Joda, hence why all the tests pass. Avro changed to
>>>>>>>> java.time (as most other recent java projects) after 1.8. Is there a plan
>>>>>>>> for Beam to do the same?
>>>>>>>>
>>>>>>>> Does anyone use Avro with java.time instead of joda.time that could
>>>>>>>> give me ideas how to make it work?
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> Cristian
>>>>>>>>
>>>>>>>> [1] https://github.com/zeidoo/beamjavadates-poc
>>>>>>>>
>>>>>>>> On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette <bh...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> +dev <de...@beam.apache.org>
>>>>>>>>>
>>>>>>>>> This sounds like a bug in the Avro schema mapping. I *think* the
>>>>>>>>> intention is that we generate logic for converting Date to/from Instant
>>>>>>>>> when making a getters for a RowWithGetters backed by Avro.
>>>>>>>>>
>>>>>>>>> Brian
>>>>>>>>>
>>>>>>>>> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu <
>>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> A little bit more color on this.
>>>>>>>>>>
>>>>>>>>>> That field is nested inside a avro record like so (not
>>>>>>>>>> syntactically correct):
>>>>>>>>>> {
>>>>>>>>>> type: record,
>>>>>>>>>> fields: [
>>>>>>>>>>  {
>>>>>>>>>>   name: whatever,
>>>>>>>>>>   type: record {
>>>>>>>>>>    fields: [
>>>>>>>>>>    {
>>>>>>>>>>      "name": "somedate",
>>>>>>>>>>      "type: {"type": "int", "logicalType": "date"}
>>>>>>>>>>     }
>>>>>>>>>>    ]
>>>>>>>>>> }
>>>>>>>>>> ]
>>>>>>>>>>
>>>>>>>>>> The outer layer record uses SchemaCoderHelper.coderForFieldType
>>>>>>>>>> on all it's fields to create the coder. However, that method when called
>>>>>>>>>> for the inner record field just returns a
>>>>>>>>>> SchemaCoder.of(fieldType.getRowSchema()) which doesn't take into account
>>>>>>>>>> LogicalTypes.
>>>>>>>>>>
>>>>>>>>>> I think that's where the problem is. If anyone who knows that
>>>>>>>>>> code could have a look and let me know their thoughts, I can try to fix the
>>>>>>>>>> issue if we agree that there is one.
>>>>>>>>>>
>>>>>>>>>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu <
>>>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I have the following field in one of my avro schemas:
>>>>>>>>>>>
>>>>>>>>>>> {
>>>>>>>>>>> "name": "somedate",
>>>>>>>>>>> "type: {"type": "int", "logicalType": "date"}
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> This generates a java.time.LocalDate field in the corresponding
>>>>>>>>>>> java class (call it Foo).
>>>>>>>>>>>
>>>>>>>>>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that
>>>>>>>>>>> field as DATETIME in the Beam schema. because of AvroUtils.toFieldType
>>>>>>>>>>> (around line 275, where TimestampMillis and Date are both stored as
>>>>>>>>>>> DATETIME).
>>>>>>>>>>>
>>>>>>>>>>> My problem is that in SchemaCoderHelpers, DATETIME is set to use
>>>>>>>>>>> InstantCoder which expects a joda Instant as input not a LocalDate. So when
>>>>>>>>>>> my PTransform returns Foo objects it crashes with "class
>>>>>>>>>>> java.time.LocalDate cannot be cast to class org.joda.time.Instant..." when
>>>>>>>>>>> trying to encode that field using InstantCoder.encode().
>>>>>>>>>>>
>>>>>>>>>>> Is there a workaround for this issue?
>>>>>>>>>>>
>>>>>>>>>>> Thank you,
>>>>>>>>>>> Cristian
>>>>>>>>>>>
>>>>>>>>>>> PS: I did search the mailing list and google, but didn't find
>>>>>>>>>>> anything related except a thread on AvroCoder.JodaTimestampConversion,
>>>>>>>>>>> which I don't think it applies to here.
>>>>>>>>>>>
>>>>>>>>>>

Re: Why is Avro Date field using InstantCoder?

Posted by Reuven Lax <re...@google.com>.
Do you know if it's easy to detect which version of Avro is being used?

On Sun, Oct 17, 2021 at 10:20 PM Cristian Constantinescu <ze...@gmail.com>
wrote:

> If I had to change things, I would:
>
> 1. When deriving the SCHEMA add a few new types (JAVA_TIME, JAVA_DATE or
> something along those lines).
> 2. RowCoderGenerator around line 159 calls
> "SchemaCoder.coderForFieldType(schema.getField(rowIndex).getType().withNullable(false));"
> which eventually gets to SchemaCoderHelpers.coderForFieldType. There,
> CODER_MAP has a hard reference on InstantCoder for DATETIME. Maybe that map
> can be augmented (possibly dynamically) with new
> fieldtypes-coder combinations to take care of the new types from #1.
>
> I would also like to ask. Looking through the Beam code, I see a lot of
> static calls. Just wondering why it's done this way. I'm used to projects
> having some form of dependency injection involved and static calls being
> frowned upon (lack of mockability, hidden dependencies, tight coupling
> etc). The only reason I can think of is serializability given Beam's
> multi-node processing?
>
>
> On Sat, Oct 16, 2021 at 3:11 AM Reuven Lax <re...@google.com> wrote:
>
>> Is the Schema inference the only reason we can't upgrade Avro, or are
>> there other blockers? Is there any way we can tell at runtime which version
>> of Avro is running? Since we generate the conversion code at runtime with
>> ByteBuddy, we could potentially just generate different conversions
>> depending on the Avro version.
>>
>> On Fri, Oct 15, 2021 at 11:56 PM Cristian Constantinescu <
>> zeidoo@gmail.com> wrote:
>>
>>> Those are fair points. However please consider that there might be new
>>> users who will decide that Beam isn't suitable because of things like
>>> requiring Avro 1.8, Joda time, old Confluent libraries, and, when I started
>>> using Beam about a year ago, Java 8 (I think we're okay with Java 11 now).
>>>
>>> I guess what I'm saying is that there's definitely a non-negligible cost
>>> associated with old 3rd party libs in Beam's code (even if efforts are put
>>> in to minimize them).
>>>
>>> On Sat, Oct 16, 2021 at 2:33 AM Reuven Lax <re...@google.com> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Oct 15, 2021 at 11:13 PM Cristian Constantinescu <
>>>> zeidoo@gmail.com> wrote:
>>>>
>>>>> To my knowledge and reading through AVRO's Jira[1], it does not
>>>>> support jodatime anymore.
>>>>>
>>>>> It seems everything related to this Avro 1.8 dependency is tricky. If
>>>>> you recall, it also prevents us from upgrading to the latest Confluent
>>>>> libs... for enabling Beam to use protobufs schemas with the schema
>>>>> registry. (I was also the one who brought that issue up, also made an
>>>>> exploratory PR to move AVRO outside of Beam core.)
>>>>>
>>>>> I understand that Beam tries to maintain public APIs stable, but I'd
>>>>> like to put forward two points:
>>>>> 1) Schemas are experimental, hence there shouldn't be any API
>>>>> stability guarantees there.
>>>>>
>>>>
>>>> Unfortunately at this point, they aren't really. As a community we've
>>>> been bad about removing the Experimental label - many core, core parts of
>>>> Beam are still labeled experimental (sources, triggering, state, timers).
>>>> Realistically they are no longer experimental.
>>>>
>>>> 2) Maybe this is the perfect opportunity for the Beam community to
>>>>> think about the long term effects of old dependencies within Beam's
>>>>> codebase, and especially how to deal with them. Perhaps
>>>>> starting/maintaining an "experimental" branch/maven-published-artifacts
>>>>> where Beam does not guarantee backwards compatibility (or maintains it for
>>>>> a shorter period of time) is something to think about.
>>>>>
>>>>
>>>> This is why we usually try to prevent third-party libraries from being
>>>> in our public API. However in this case, that's tricky.
>>>>
>>>> The beam community can of course decide to break backwards
>>>> compatibility. However as stated today, it is maintained. The last time we
>>>> broke backwards compatibility was when the old Dataflow API was
>>>> transitioned to Beam, and it was very painful. It took multiple years to
>>>> get some users onto the Beam API due to the code changes required to
>>>> migrate (and those required code changes weren't terribly invasive).
>>>>
>>>>
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/AVRO-2335
>>>>>
>>>>> On Sat, Oct 16, 2021 at 12:40 AM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>> Does this mean more recent versions of avro aren't backwards
>>>>>> compatible with avro 1.8? If so, this might be tricky to fix, since Beam
>>>>>> maintains backwards compatibility on its public API.
>>>>>>
>>>>>> On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu <
>>>>>> zeidoo@gmail.com> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I've created a small demo project to show the issue[1].
>>>>>>>
>>>>>>> I've looked at the beam code and all the avro tests use avro 1.8...
>>>>>>> which is hardcoded to Joda, hence why all the tests pass. Avro changed to
>>>>>>> java.time (as most other recent java projects) after 1.8. Is there a plan
>>>>>>> for Beam to do the same?
>>>>>>>
>>>>>>> Does anyone use Avro with java.time instead of joda.time that could
>>>>>>> give me ideas how to make it work?
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Cristian
>>>>>>>
>>>>>>> [1] https://github.com/zeidoo/beamjavadates-poc
>>>>>>>
>>>>>>> On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette <bh...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> +dev <de...@beam.apache.org>
>>>>>>>>
>>>>>>>> This sounds like a bug in the Avro schema mapping. I *think* the
>>>>>>>> intention is that we generate logic for converting Date to/from Instant
>>>>>>>> when making a getters for a RowWithGetters backed by Avro.
>>>>>>>>
>>>>>>>> Brian
>>>>>>>>
>>>>>>>> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu <
>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> A little bit more color on this.
>>>>>>>>>
>>>>>>>>> That field is nested inside a avro record like so (not
>>>>>>>>> syntactically correct):
>>>>>>>>> {
>>>>>>>>> type: record,
>>>>>>>>> fields: [
>>>>>>>>>  {
>>>>>>>>>   name: whatever,
>>>>>>>>>   type: record {
>>>>>>>>>    fields: [
>>>>>>>>>    {
>>>>>>>>>      "name": "somedate",
>>>>>>>>>      "type: {"type": "int", "logicalType": "date"}
>>>>>>>>>     }
>>>>>>>>>    ]
>>>>>>>>> }
>>>>>>>>> ]
>>>>>>>>>
>>>>>>>>> The outer layer record uses SchemaCoderHelper.coderForFieldType on
>>>>>>>>> all it's fields to create the coder. However, that method when called for
>>>>>>>>> the inner record field just returns a
>>>>>>>>> SchemaCoder.of(fieldType.getRowSchema()) which doesn't take into account
>>>>>>>>> LogicalTypes.
>>>>>>>>>
>>>>>>>>> I think that's where the problem is. If anyone who knows that code
>>>>>>>>> could have a look and let me know their thoughts, I can try to fix the
>>>>>>>>> issue if we agree that there is one.
>>>>>>>>>
>>>>>>>>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu <
>>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I have the following field in one of my avro schemas:
>>>>>>>>>>
>>>>>>>>>> {
>>>>>>>>>> "name": "somedate",
>>>>>>>>>> "type: {"type": "int", "logicalType": "date"}
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> This generates a java.time.LocalDate field in the corresponding
>>>>>>>>>> java class (call it Foo).
>>>>>>>>>>
>>>>>>>>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that
>>>>>>>>>> field as DATETIME in the Beam schema. because of AvroUtils.toFieldType
>>>>>>>>>> (around line 275, where TimestampMillis and Date are both stored as
>>>>>>>>>> DATETIME).
>>>>>>>>>>
>>>>>>>>>> My problem is that in SchemaCoderHelpers, DATETIME is set to use
>>>>>>>>>> InstantCoder which expects a joda Instant as input not a LocalDate. So when
>>>>>>>>>> my PTransform returns Foo objects it crashes with "class
>>>>>>>>>> java.time.LocalDate cannot be cast to class org.joda.time.Instant..." when
>>>>>>>>>> trying to encode that field using InstantCoder.encode().
>>>>>>>>>>
>>>>>>>>>> Is there a workaround for this issue?
>>>>>>>>>>
>>>>>>>>>> Thank you,
>>>>>>>>>> Cristian
>>>>>>>>>>
>>>>>>>>>> PS: I did search the mailing list and google, but didn't find
>>>>>>>>>> anything related except a thread on AvroCoder.JodaTimestampConversion,
>>>>>>>>>> which I don't think it applies to here.
>>>>>>>>>>
>>>>>>>>>

Re: Why is Avro Date field using InstantCoder?

Posted by Reuven Lax <re...@google.com>.
Do you know if it's easy to detect which version of Avro is being used?

On Sun, Oct 17, 2021 at 10:20 PM Cristian Constantinescu <ze...@gmail.com>
wrote:

> If I had to change things, I would:
>
> 1. When deriving the SCHEMA add a few new types (JAVA_TIME, JAVA_DATE or
> something along those lines).
> 2. RowCoderGenerator around line 159 calls
> "SchemaCoder.coderForFieldType(schema.getField(rowIndex).getType().withNullable(false));"
> which eventually gets to SchemaCoderHelpers.coderForFieldType. There,
> CODER_MAP has a hard reference on InstantCoder for DATETIME. Maybe that map
> can be augmented (possibly dynamically) with new
> fieldtypes-coder combinations to take care of the new types from #1.
>
> I would also like to ask. Looking through the Beam code, I see a lot of
> static calls. Just wondering why it's done this way. I'm used to projects
> having some form of dependency injection involved and static calls being
> frowned upon (lack of mockability, hidden dependencies, tight coupling
> etc). The only reason I can think of is serializability given Beam's
> multi-node processing?
>
>
> On Sat, Oct 16, 2021 at 3:11 AM Reuven Lax <re...@google.com> wrote:
>
>> Is the Schema inference the only reason we can't upgrade Avro, or are
>> there other blockers? Is there any way we can tell at runtime which version
>> of Avro is running? Since we generate the conversion code at runtime with
>> ByteBuddy, we could potentially just generate different conversions
>> depending on the Avro version.
>>
>> On Fri, Oct 15, 2021 at 11:56 PM Cristian Constantinescu <
>> zeidoo@gmail.com> wrote:
>>
>>> Those are fair points. However please consider that there might be new
>>> users who will decide that Beam isn't suitable because of things like
>>> requiring Avro 1.8, Joda time, old Confluent libraries, and, when I started
>>> using Beam about a year ago, Java 8 (I think we're okay with Java 11 now).
>>>
>>> I guess what I'm saying is that there's definitely a non-negligible cost
>>> associated with old 3rd party libs in Beam's code (even if efforts are put
>>> in to minimize them).
>>>
>>> On Sat, Oct 16, 2021 at 2:33 AM Reuven Lax <re...@google.com> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Oct 15, 2021 at 11:13 PM Cristian Constantinescu <
>>>> zeidoo@gmail.com> wrote:
>>>>
>>>>> To my knowledge and reading through AVRO's Jira[1], it does not
>>>>> support jodatime anymore.
>>>>>
>>>>> It seems everything related to this Avro 1.8 dependency is tricky. If
>>>>> you recall, it also prevents us from upgrading to the latest Confluent
>>>>> libs... for enabling Beam to use protobufs schemas with the schema
>>>>> registry. (I was also the one who brought that issue up, also made an
>>>>> exploratory PR to move AVRO outside of Beam core.)
>>>>>
>>>>> I understand that Beam tries to maintain public APIs stable, but I'd
>>>>> like to put forward two points:
>>>>> 1) Schemas are experimental, hence there shouldn't be any API
>>>>> stability guarantees there.
>>>>>
>>>>
>>>> Unfortunately at this point, they aren't really. As a community we've
>>>> been bad about removing the Experimental label - many core, core parts of
>>>> Beam are still labeled experimental (sources, triggering, state, timers).
>>>> Realistically they are no longer experimental.
>>>>
>>>> 2) Maybe this is the perfect opportunity for the Beam community to
>>>>> think about the long term effects of old dependencies within Beam's
>>>>> codebase, and especially how to deal with them. Perhaps
>>>>> starting/maintaining an "experimental" branch/maven-published-artifacts
>>>>> where Beam does not guarantee backwards compatibility (or maintains it for
>>>>> a shorter period of time) is something to think about.
>>>>>
>>>>
>>>> This is why we usually try to prevent third-party libraries from being
>>>> in our public API. However in this case, that's tricky.
>>>>
>>>> The beam community can of course decide to break backwards
>>>> compatibility. However as stated today, it is maintained. The last time we
>>>> broke backwards compatibility was when the old Dataflow API was
>>>> transitioned to Beam, and it was very painful. It took multiple years to
>>>> get some users onto the Beam API due to the code changes required to
>>>> migrate (and those required code changes weren't terribly invasive).
>>>>
>>>>
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/AVRO-2335
>>>>>
>>>>> On Sat, Oct 16, 2021 at 12:40 AM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>> Does this mean more recent versions of avro aren't backwards
>>>>>> compatible with avro 1.8? If so, this might be tricky to fix, since Beam
>>>>>> maintains backwards compatibility on its public API.
>>>>>>
>>>>>> On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu <
>>>>>> zeidoo@gmail.com> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I've created a small demo project to show the issue[1].
>>>>>>>
>>>>>>> I've looked at the beam code and all the avro tests use avro 1.8...
>>>>>>> which is hardcoded to Joda, hence why all the tests pass. Avro changed to
>>>>>>> java.time (as most other recent java projects) after 1.8. Is there a plan
>>>>>>> for Beam to do the same?
>>>>>>>
>>>>>>> Does anyone use Avro with java.time instead of joda.time that could
>>>>>>> give me ideas how to make it work?
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Cristian
>>>>>>>
>>>>>>> [1] https://github.com/zeidoo/beamjavadates-poc
>>>>>>>
>>>>>>> On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette <bh...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> +dev <de...@beam.apache.org>
>>>>>>>>
>>>>>>>> This sounds like a bug in the Avro schema mapping. I *think* the
>>>>>>>> intention is that we generate logic for converting Date to/from Instant
>>>>>>>> when making a getters for a RowWithGetters backed by Avro.
>>>>>>>>
>>>>>>>> Brian
>>>>>>>>
>>>>>>>> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu <
>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> A little bit more color on this.
>>>>>>>>>
>>>>>>>>> That field is nested inside a avro record like so (not
>>>>>>>>> syntactically correct):
>>>>>>>>> {
>>>>>>>>> type: record,
>>>>>>>>> fields: [
>>>>>>>>>  {
>>>>>>>>>   name: whatever,
>>>>>>>>>   type: record {
>>>>>>>>>    fields: [
>>>>>>>>>    {
>>>>>>>>>      "name": "somedate",
>>>>>>>>>      "type: {"type": "int", "logicalType": "date"}
>>>>>>>>>     }
>>>>>>>>>    ]
>>>>>>>>> }
>>>>>>>>> ]
>>>>>>>>>
>>>>>>>>> The outer layer record uses SchemaCoderHelper.coderForFieldType on
>>>>>>>>> all it's fields to create the coder. However, that method when called for
>>>>>>>>> the inner record field just returns a
>>>>>>>>> SchemaCoder.of(fieldType.getRowSchema()) which doesn't take into account
>>>>>>>>> LogicalTypes.
>>>>>>>>>
>>>>>>>>> I think that's where the problem is. If anyone who knows that code
>>>>>>>>> could have a look and let me know their thoughts, I can try to fix the
>>>>>>>>> issue if we agree that there is one.
>>>>>>>>>
>>>>>>>>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu <
>>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I have the following field in one of my avro schemas:
>>>>>>>>>>
>>>>>>>>>> {
>>>>>>>>>> "name": "somedate",
>>>>>>>>>> "type: {"type": "int", "logicalType": "date"}
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> This generates a java.time.LocalDate field in the corresponding
>>>>>>>>>> java class (call it Foo).
>>>>>>>>>>
>>>>>>>>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that
>>>>>>>>>> field as DATETIME in the Beam schema. because of AvroUtils.toFieldType
>>>>>>>>>> (around line 275, where TimestampMillis and Date are both stored as
>>>>>>>>>> DATETIME).
>>>>>>>>>>
>>>>>>>>>> My problem is that in SchemaCoderHelpers, DATETIME is set to use
>>>>>>>>>> InstantCoder which expects a joda Instant as input not a LocalDate. So when
>>>>>>>>>> my PTransform returns Foo objects it crashes with "class
>>>>>>>>>> java.time.LocalDate cannot be cast to class org.joda.time.Instant..." when
>>>>>>>>>> trying to encode that field using InstantCoder.encode().
>>>>>>>>>>
>>>>>>>>>> Is there a workaround for this issue?
>>>>>>>>>>
>>>>>>>>>> Thank you,
>>>>>>>>>> Cristian
>>>>>>>>>>
>>>>>>>>>> PS: I did search the mailing list and google, but didn't find
>>>>>>>>>> anything related except a thread on AvroCoder.JodaTimestampConversion,
>>>>>>>>>> which I don't think it applies to here.
>>>>>>>>>>
>>>>>>>>>

Re: Why is Avro Date field using InstantCoder?

Posted by Cristian Constantinescu <ze...@gmail.com>.
If I had to change things, I would:

1. When deriving the SCHEMA add a few new types (JAVA_TIME, JAVA_DATE or
something along those lines).
2. RowCoderGenerator around line 159 calls
"SchemaCoder.coderForFieldType(schema.getField(rowIndex).getType().withNullable(false));"
which eventually gets to SchemaCoderHelpers.coderForFieldType. There,
CODER_MAP has a hard reference on InstantCoder for DATETIME. Maybe that map
can be augmented (possibly dynamically) with new
fieldtypes-coder combinations to take care of the new types from #1.

I would also like to ask. Looking through the Beam code, I see a lot of
static calls. Just wondering why it's done this way. I'm used to projects
having some form of dependency injection involved and static calls being
frowned upon (lack of mockability, hidden dependencies, tight coupling
etc). The only reason I can think of is serializability given Beam's
multi-node processing?


On Sat, Oct 16, 2021 at 3:11 AM Reuven Lax <re...@google.com> wrote:

> Is the Schema inference the only reason we can't upgrade Avro, or are
> there other blockers? Is there any way we can tell at runtime which version
> of Avro is running? Since we generate the conversion code at runtime with
> ByteBuddy, we could potentially just generate different conversions
> depending on the Avro version.
>
> On Fri, Oct 15, 2021 at 11:56 PM Cristian Constantinescu <ze...@gmail.com>
> wrote:
>
>> Those are fair points. However please consider that there might be new
>> users who will decide that Beam isn't suitable because of things like
>> requiring Avro 1.8, Joda time, old Confluent libraries, and, when I started
>> using Beam about a year ago, Java 8 (I think we're okay with Java 11 now).
>>
>> I guess what I'm saying is that there's definitely a non-negligible cost
>> associated with old 3rd party libs in Beam's code (even if efforts are put
>> in to minimize them).
>>
>> On Sat, Oct 16, 2021 at 2:33 AM Reuven Lax <re...@google.com> wrote:
>>
>>>
>>>
>>> On Fri, Oct 15, 2021 at 11:13 PM Cristian Constantinescu <
>>> zeidoo@gmail.com> wrote:
>>>
>>>> To my knowledge and reading through AVRO's Jira[1], it does not support
>>>> jodatime anymore.
>>>>
>>>> It seems everything related to this Avro 1.8 dependency is tricky. If
>>>> you recall, it also prevents us from upgrading to the latest Confluent
>>>> libs... for enabling Beam to use protobufs schemas with the schema
>>>> registry. (I was also the one who brought that issue up, also made an
>>>> exploratory PR to move AVRO outside of Beam core.)
>>>>
>>>> I understand that Beam tries to maintain public APIs stable, but I'd
>>>> like to put forward two points:
>>>> 1) Schemas are experimental, hence there shouldn't be any API
>>>> stability guarantees there.
>>>>
>>>
>>> Unfortunately at this point, they aren't really. As a community we've
>>> been bad about removing the Experimental label - many core, core parts of
>>> Beam are still labeled experimental (sources, triggering, state, timers).
>>> Realistically they are no longer experimental.
>>>
>>> 2) Maybe this is the perfect opportunity for the Beam community to think
>>>> about the long term effects of old dependencies within Beam's codebase, and
>>>> especially how to deal with them. Perhaps starting/maintaining an
>>>> "experimental" branch/maven-published-artifacts where Beam does not
>>>> guarantee backwards compatibility (or maintains it for a shorter period of
>>>> time) is something to think about.
>>>>
>>>
>>> This is why we usually try to prevent third-party libraries from being
>>> in our public API. However in this case, that's tricky.
>>>
>>> The beam community can of course decide to break backwards
>>> compatibility. However as stated today, it is maintained. The last time we
>>> broke backwards compatibility was when the old Dataflow API was
>>> transitioned to Beam, and it was very painful. It took multiple years to
>>> get some users onto the Beam API due to the code changes required to
>>> migrate (and those required code changes weren't terribly invasive).
>>>
>>>
>>>>
>>>> [1] https://issues.apache.org/jira/browse/AVRO-2335
>>>>
>>>> On Sat, Oct 16, 2021 at 12:40 AM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> Does this mean more recent versions of avro aren't backwards
>>>>> compatible with avro 1.8? If so, this might be tricky to fix, since Beam
>>>>> maintains backwards compatibility on its public API.
>>>>>
>>>>> On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu <
>>>>> zeidoo@gmail.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I've created a small demo project to show the issue[1].
>>>>>>
>>>>>> I've looked at the beam code and all the avro tests use avro 1.8...
>>>>>> which is hardcoded to Joda, hence why all the tests pass. Avro changed to
>>>>>> java.time (as most other recent java projects) after 1.8. Is there a plan
>>>>>> for Beam to do the same?
>>>>>>
>>>>>> Does anyone use Avro with java.time instead of joda.time that could
>>>>>> give me ideas how to make it work?
>>>>>>
>>>>>> Thank you,
>>>>>> Cristian
>>>>>>
>>>>>> [1] https://github.com/zeidoo/beamjavadates-poc
>>>>>>
>>>>>> On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette <bh...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> +dev <de...@beam.apache.org>
>>>>>>>
>>>>>>> This sounds like a bug in the Avro schema mapping. I *think* the
>>>>>>> intention is that we generate logic for converting Date to/from Instant
>>>>>>> when making a getters for a RowWithGetters backed by Avro.
>>>>>>>
>>>>>>> Brian
>>>>>>>
>>>>>>> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu <
>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>
>>>>>>>> A little bit more color on this.
>>>>>>>>
>>>>>>>> That field is nested inside a avro record like so (not
>>>>>>>> syntactically correct):
>>>>>>>> {
>>>>>>>> type: record,
>>>>>>>> fields: [
>>>>>>>>  {
>>>>>>>>   name: whatever,
>>>>>>>>   type: record {
>>>>>>>>    fields: [
>>>>>>>>    {
>>>>>>>>      "name": "somedate",
>>>>>>>>      "type: {"type": "int", "logicalType": "date"}
>>>>>>>>     }
>>>>>>>>    ]
>>>>>>>> }
>>>>>>>> ]
>>>>>>>>
>>>>>>>> The outer layer record uses SchemaCoderHelper.coderForFieldType on
>>>>>>>> all it's fields to create the coder. However, that method when called for
>>>>>>>> the inner record field just returns a
>>>>>>>> SchemaCoder.of(fieldType.getRowSchema()) which doesn't take into account
>>>>>>>> LogicalTypes.
>>>>>>>>
>>>>>>>> I think that's where the problem is. If anyone who knows that code
>>>>>>>> could have a look and let me know their thoughts, I can try to fix the
>>>>>>>> issue if we agree that there is one.
>>>>>>>>
>>>>>>>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu <
>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I have the following field in one of my avro schemas:
>>>>>>>>>
>>>>>>>>> {
>>>>>>>>> "name": "somedate",
>>>>>>>>> "type: {"type": "int", "logicalType": "date"}
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> This generates a java.time.LocalDate field in the corresponding
>>>>>>>>> java class (call it Foo).
>>>>>>>>>
>>>>>>>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that
>>>>>>>>> field as DATETIME in the Beam schema. because of AvroUtils.toFieldType
>>>>>>>>> (around line 275, where TimestampMillis and Date are both stored as
>>>>>>>>> DATETIME).
>>>>>>>>>
>>>>>>>>> My problem is that in SchemaCoderHelpers, DATETIME is set to use
>>>>>>>>> InstantCoder which expects a joda Instant as input not a LocalDate. So when
>>>>>>>>> my PTransform returns Foo objects it crashes with "class
>>>>>>>>> java.time.LocalDate cannot be cast to class org.joda.time.Instant..." when
>>>>>>>>> trying to encode that field using InstantCoder.encode().
>>>>>>>>>
>>>>>>>>> Is there a workaround for this issue?
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> Cristian
>>>>>>>>>
>>>>>>>>> PS: I did search the mailing list and google, but didn't find
>>>>>>>>> anything related except a thread on AvroCoder.JodaTimestampConversion,
>>>>>>>>> which I don't think it applies to here.
>>>>>>>>>
>>>>>>>>

Re: Why is Avro Date field using InstantCoder?

Posted by Cristian Constantinescu <ze...@gmail.com>.
If I had to change things, I would:

1. When deriving the SCHEMA add a few new types (JAVA_TIME, JAVA_DATE or
something along those lines).
2. RowCoderGenerator around line 159 calls
"SchemaCoder.coderForFieldType(schema.getField(rowIndex).getType().withNullable(false));"
which eventually gets to SchemaCoderHelpers.coderForFieldType. There,
CODER_MAP has a hard reference on InstantCoder for DATETIME. Maybe that map
can be augmented (possibly dynamically) with new
fieldtypes-coder combinations to take care of the new types from #1.

I would also like to ask. Looking through the Beam code, I see a lot of
static calls. Just wondering why it's done this way. I'm used to projects
having some form of dependency injection involved and static calls being
frowned upon (lack of mockability, hidden dependencies, tight coupling
etc). The only reason I can think of is serializability given Beam's
multi-node processing?


On Sat, Oct 16, 2021 at 3:11 AM Reuven Lax <re...@google.com> wrote:

> Is the Schema inference the only reason we can't upgrade Avro, or are
> there other blockers? Is there any way we can tell at runtime which version
> of Avro is running? Since we generate the conversion code at runtime with
> ByteBuddy, we could potentially just generate different conversions
> depending on the Avro version.
>
> On Fri, Oct 15, 2021 at 11:56 PM Cristian Constantinescu <ze...@gmail.com>
> wrote:
>
>> Those are fair points. However please consider that there might be new
>> users who will decide that Beam isn't suitable because of things like
>> requiring Avro 1.8, Joda time, old Confluent libraries, and, when I started
>> using Beam about a year ago, Java 8 (I think we're okay with Java 11 now).
>>
>> I guess what I'm saying is that there's definitely a non-negligible cost
>> associated with old 3rd party libs in Beam's code (even if efforts are put
>> in to minimize them).
>>
>> On Sat, Oct 16, 2021 at 2:33 AM Reuven Lax <re...@google.com> wrote:
>>
>>>
>>>
>>> On Fri, Oct 15, 2021 at 11:13 PM Cristian Constantinescu <
>>> zeidoo@gmail.com> wrote:
>>>
>>>> To my knowledge and reading through AVRO's Jira[1], it does not support
>>>> jodatime anymore.
>>>>
>>>> It seems everything related to this Avro 1.8 dependency is tricky. If
>>>> you recall, it also prevents us from upgrading to the latest Confluent
>>>> libs... for enabling Beam to use protobufs schemas with the schema
>>>> registry. (I was also the one who brought that issue up, also made an
>>>> exploratory PR to move AVRO outside of Beam core.)
>>>>
>>>> I understand that Beam tries to maintain public APIs stable, but I'd
>>>> like to put forward two points:
>>>> 1) Schemas are experimental, hence there shouldn't be any API
>>>> stability guarantees there.
>>>>
>>>
>>> Unfortunately at this point, they aren't really. As a community we've
>>> been bad about removing the Experimental label - many core, core parts of
>>> Beam are still labeled experimental (sources, triggering, state, timers).
>>> Realistically they are no longer experimental.
>>>
>>> 2) Maybe this is the perfect opportunity for the Beam community to think
>>>> about the long term effects of old dependencies within Beam's codebase, and
>>>> especially how to deal with them. Perhaps starting/maintaining an
>>>> "experimental" branch/maven-published-artifacts where Beam does not
>>>> guarantee backwards compatibility (or maintains it for a shorter period of
>>>> time) is something to think about.
>>>>
>>>
>>> This is why we usually try to prevent third-party libraries from being
>>> in our public API. However in this case, that's tricky.
>>>
>>> The beam community can of course decide to break backwards
>>> compatibility. However as stated today, it is maintained. The last time we
>>> broke backwards compatibility was when the old Dataflow API was
>>> transitioned to Beam, and it was very painful. It took multiple years to
>>> get some users onto the Beam API due to the code changes required to
>>> migrate (and those required code changes weren't terribly invasive).
>>>
>>>
>>>>
>>>> [1] https://issues.apache.org/jira/browse/AVRO-2335
>>>>
>>>> On Sat, Oct 16, 2021 at 12:40 AM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> Does this mean more recent versions of avro aren't backwards
>>>>> compatible with avro 1.8? If so, this might be tricky to fix, since Beam
>>>>> maintains backwards compatibility on its public API.
>>>>>
>>>>> On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu <
>>>>> zeidoo@gmail.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I've created a small demo project to show the issue[1].
>>>>>>
>>>>>> I've looked at the beam code and all the avro tests use avro 1.8...
>>>>>> which is hardcoded to Joda, hence why all the tests pass. Avro changed to
>>>>>> java.time (as most other recent java projects) after 1.8. Is there a plan
>>>>>> for Beam to do the same?
>>>>>>
>>>>>> Does anyone use Avro with java.time instead of joda.time that could
>>>>>> give me ideas how to make it work?
>>>>>>
>>>>>> Thank you,
>>>>>> Cristian
>>>>>>
>>>>>> [1] https://github.com/zeidoo/beamjavadates-poc
>>>>>>
>>>>>> On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette <bh...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> +dev <de...@beam.apache.org>
>>>>>>>
>>>>>>> This sounds like a bug in the Avro schema mapping. I *think* the
>>>>>>> intention is that we generate logic for converting Date to/from Instant
>>>>>>> when making a getters for a RowWithGetters backed by Avro.
>>>>>>>
>>>>>>> Brian
>>>>>>>
>>>>>>> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu <
>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>
>>>>>>>> A little bit more color on this.
>>>>>>>>
>>>>>>>> That field is nested inside a avro record like so (not
>>>>>>>> syntactically correct):
>>>>>>>> {
>>>>>>>> type: record,
>>>>>>>> fields: [
>>>>>>>>  {
>>>>>>>>   name: whatever,
>>>>>>>>   type: record {
>>>>>>>>    fields: [
>>>>>>>>    {
>>>>>>>>      "name": "somedate",
>>>>>>>>      "type: {"type": "int", "logicalType": "date"}
>>>>>>>>     }
>>>>>>>>    ]
>>>>>>>> }
>>>>>>>> ]
>>>>>>>>
>>>>>>>> The outer layer record uses SchemaCoderHelper.coderForFieldType on
>>>>>>>> all it's fields to create the coder. However, that method when called for
>>>>>>>> the inner record field just returns a
>>>>>>>> SchemaCoder.of(fieldType.getRowSchema()) which doesn't take into account
>>>>>>>> LogicalTypes.
>>>>>>>>
>>>>>>>> I think that's where the problem is. If anyone who knows that code
>>>>>>>> could have a look and let me know their thoughts, I can try to fix the
>>>>>>>> issue if we agree that there is one.
>>>>>>>>
>>>>>>>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu <
>>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I have the following field in one of my avro schemas:
>>>>>>>>>
>>>>>>>>> {
>>>>>>>>> "name": "somedate",
>>>>>>>>> "type: {"type": "int", "logicalType": "date"}
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> This generates a java.time.LocalDate field in the corresponding
>>>>>>>>> java class (call it Foo).
>>>>>>>>>
>>>>>>>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that
>>>>>>>>> field as DATETIME in the Beam schema. because of AvroUtils.toFieldType
>>>>>>>>> (around line 275, where TimestampMillis and Date are both stored as
>>>>>>>>> DATETIME).
>>>>>>>>>
>>>>>>>>> My problem is that in SchemaCoderHelpers, DATETIME is set to use
>>>>>>>>> InstantCoder which expects a joda Instant as input not a LocalDate. So when
>>>>>>>>> my PTransform returns Foo objects it crashes with "class
>>>>>>>>> java.time.LocalDate cannot be cast to class org.joda.time.Instant..." when
>>>>>>>>> trying to encode that field using InstantCoder.encode().
>>>>>>>>>
>>>>>>>>> Is there a workaround for this issue?
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> Cristian
>>>>>>>>>
>>>>>>>>> PS: I did search the mailing list and google, but didn't find
>>>>>>>>> anything related except a thread on AvroCoder.JodaTimestampConversion,
>>>>>>>>> which I don't think it applies to here.
>>>>>>>>>
>>>>>>>>

Re: Why is Avro Date field using InstantCoder?

Posted by Reuven Lax <re...@google.com>.
Is the Schema inference the only reason we can't upgrade Avro, or are there
other blockers? Is there any way we can tell at runtime which version of
Avro is running? Since we generate the conversion code at runtime with
ByteBuddy, we could potentially just generate different conversions
depending on the Avro version.

On Fri, Oct 15, 2021 at 11:56 PM Cristian Constantinescu <ze...@gmail.com>
wrote:

> Those are fair points. However please consider that there might be new
> users who will decide that Beam isn't suitable because of things like
> requiring Avro 1.8, Joda time, old Confluent libraries, and, when I started
> using Beam about a year ago, Java 8 (I think we're okay with Java 11 now).
>
> I guess what I'm saying is that there's definitely a non-negligible cost
> associated with old 3rd party libs in Beam's code (even if efforts are put
> in to minimize them).
>
> On Sat, Oct 16, 2021 at 2:33 AM Reuven Lax <re...@google.com> wrote:
>
>>
>>
>> On Fri, Oct 15, 2021 at 11:13 PM Cristian Constantinescu <
>> zeidoo@gmail.com> wrote:
>>
>>> To my knowledge and reading through AVRO's Jira[1], it does not support
>>> jodatime anymore.
>>>
>>> It seems everything related to this Avro 1.8 dependency is tricky. If
>>> you recall, it also prevents us from upgrading to the latest Confluent
>>> libs... for enabling Beam to use protobufs schemas with the schema
>>> registry. (I was also the one who brought that issue up, also made an
>>> exploratory PR to move AVRO outside of Beam core.)
>>>
>>> I understand that Beam tries to maintain public APIs stable, but I'd
>>> like to put forward two points:
>>> 1) Schemas are experimental, hence there shouldn't be any API
>>> stability guarantees there.
>>>
>>
>> Unfortunately at this point, they aren't really. As a community we've
>> been bad about removing the Experimental label - many core, core parts of
>> Beam are still labeled experimental (sources, triggering, state, timers).
>> Realistically they are no longer experimental.
>>
>> 2) Maybe this is the perfect opportunity for the Beam community to think
>>> about the long term effects of old dependencies within Beam's codebase, and
>>> especially how to deal with them. Perhaps starting/maintaining an
>>> "experimental" branch/maven-published-artifacts where Beam does not
>>> guarantee backwards compatibility (or maintains it for a shorter period of
>>> time) is something to think about.
>>>
>>
>> This is why we usually try to prevent third-party libraries from being in
>> our public API. However in this case, that's tricky.
>>
>> The beam community can of course decide to break backwards compatibility.
>> However as stated today, it is maintained. The last time we broke backwards
>> compatibility was when the old Dataflow API was transitioned to Beam, and
>> it was very painful. It took multiple years to get some users onto the Beam
>> API due to the code changes required to migrate (and those required code
>> changes weren't terribly invasive).
>>
>>
>>>
>>> [1] https://issues.apache.org/jira/browse/AVRO-2335
>>>
>>> On Sat, Oct 16, 2021 at 12:40 AM Reuven Lax <re...@google.com> wrote:
>>>
>>>> Does this mean more recent versions of avro aren't backwards compatible
>>>> with avro 1.8? If so, this might be tricky to fix, since Beam maintains
>>>> backwards compatibility on its public API.
>>>>
>>>> On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu <
>>>> zeidoo@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I've created a small demo project to show the issue[1].
>>>>>
>>>>> I've looked at the beam code and all the avro tests use avro 1.8...
>>>>> which is hardcoded to Joda, hence why all the tests pass. Avro changed to
>>>>> java.time (as most other recent java projects) after 1.8. Is there a plan
>>>>> for Beam to do the same?
>>>>>
>>>>> Does anyone use Avro with java.time instead of joda.time that could
>>>>> give me ideas how to make it work?
>>>>>
>>>>> Thank you,
>>>>> Cristian
>>>>>
>>>>> [1] https://github.com/zeidoo/beamjavadates-poc
>>>>>
>>>>> On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette <bh...@google.com>
>>>>> wrote:
>>>>>
>>>>>> +dev <de...@beam.apache.org>
>>>>>>
>>>>>> This sounds like a bug in the Avro schema mapping. I *think* the
>>>>>> intention is that we generate logic for converting Date to/from Instant
>>>>>> when making a getters for a RowWithGetters backed by Avro.
>>>>>>
>>>>>> Brian
>>>>>>
>>>>>> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu <
>>>>>> zeidoo@gmail.com> wrote:
>>>>>>
>>>>>>> A little bit more color on this.
>>>>>>>
>>>>>>> That field is nested inside a avro record like so (not syntactically
>>>>>>> correct):
>>>>>>> {
>>>>>>> type: record,
>>>>>>> fields: [
>>>>>>>  {
>>>>>>>   name: whatever,
>>>>>>>   type: record {
>>>>>>>    fields: [
>>>>>>>    {
>>>>>>>      "name": "somedate",
>>>>>>>      "type: {"type": "int", "logicalType": "date"}
>>>>>>>     }
>>>>>>>    ]
>>>>>>> }
>>>>>>> ]
>>>>>>>
>>>>>>> The outer layer record uses SchemaCoderHelper.coderForFieldType on
>>>>>>> all it's fields to create the coder. However, that method when called for
>>>>>>> the inner record field just returns a
>>>>>>> SchemaCoder.of(fieldType.getRowSchema()) which doesn't take into account
>>>>>>> LogicalTypes.
>>>>>>>
>>>>>>> I think that's where the problem is. If anyone who knows that code
>>>>>>> could have a look and let me know their thoughts, I can try to fix the
>>>>>>> issue if we agree that there is one.
>>>>>>>
>>>>>>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu <
>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I have the following field in one of my avro schemas:
>>>>>>>>
>>>>>>>> {
>>>>>>>> "name": "somedate",
>>>>>>>> "type: {"type": "int", "logicalType": "date"}
>>>>>>>> }
>>>>>>>>
>>>>>>>> This generates a java.time.LocalDate field in the corresponding
>>>>>>>> java class (call it Foo).
>>>>>>>>
>>>>>>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that field
>>>>>>>> as DATETIME in the Beam schema. because of AvroUtils.toFieldType (around
>>>>>>>> line 275, where TimestampMillis and Date are both stored as DATETIME).
>>>>>>>>
>>>>>>>> My problem is that in SchemaCoderHelpers, DATETIME is set to use
>>>>>>>> InstantCoder which expects a joda Instant as input not a LocalDate. So when
>>>>>>>> my PTransform returns Foo objects it crashes with "class
>>>>>>>> java.time.LocalDate cannot be cast to class org.joda.time.Instant..." when
>>>>>>>> trying to encode that field using InstantCoder.encode().
>>>>>>>>
>>>>>>>> Is there a workaround for this issue?
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> Cristian
>>>>>>>>
>>>>>>>> PS: I did search the mailing list and google, but didn't find
>>>>>>>> anything related except a thread on AvroCoder.JodaTimestampConversion,
>>>>>>>> which I don't think it applies to here.
>>>>>>>>
>>>>>>>

Re: Why is Avro Date field using InstantCoder?

Posted by Reuven Lax <re...@google.com>.
Is the Schema inference the only reason we can't upgrade Avro, or are there
other blockers? Is there any way we can tell at runtime which version of
Avro is running? Since we generate the conversion code at runtime with
ByteBuddy, we could potentially just generate different conversions
depending on the Avro version.

On Fri, Oct 15, 2021 at 11:56 PM Cristian Constantinescu <ze...@gmail.com>
wrote:

> Those are fair points. However please consider that there might be new
> users who will decide that Beam isn't suitable because of things like
> requiring Avro 1.8, Joda time, old Confluent libraries, and, when I started
> using Beam about a year ago, Java 8 (I think we're okay with Java 11 now).
>
> I guess what I'm saying is that there's definitely a non-negligible cost
> associated with old 3rd party libs in Beam's code (even if efforts are put
> in to minimize them).
>
> On Sat, Oct 16, 2021 at 2:33 AM Reuven Lax <re...@google.com> wrote:
>
>>
>>
>> On Fri, Oct 15, 2021 at 11:13 PM Cristian Constantinescu <
>> zeidoo@gmail.com> wrote:
>>
>>> To my knowledge and reading through AVRO's Jira[1], it does not support
>>> jodatime anymore.
>>>
>>> It seems everything related to this Avro 1.8 dependency is tricky. If
>>> you recall, it also prevents us from upgrading to the latest Confluent
>>> libs... for enabling Beam to use protobufs schemas with the schema
>>> registry. (I was also the one who brought that issue up, also made an
>>> exploratory PR to move AVRO outside of Beam core.)
>>>
>>> I understand that Beam tries to maintain public APIs stable, but I'd
>>> like to put forward two points:
>>> 1) Schemas are experimental, hence there shouldn't be any API
>>> stability guarantees there.
>>>
>>
>> Unfortunately at this point, they aren't really. As a community we've
>> been bad about removing the Experimental label - many core, core parts of
>> Beam are still labeled experimental (sources, triggering, state, timers).
>> Realistically they are no longer experimental.
>>
>> 2) Maybe this is the perfect opportunity for the Beam community to think
>>> about the long term effects of old dependencies within Beam's codebase, and
>>> especially how to deal with them. Perhaps starting/maintaining an
>>> "experimental" branch/maven-published-artifacts where Beam does not
>>> guarantee backwards compatibility (or maintains it for a shorter period of
>>> time) is something to think about.
>>>
>>
>> This is why we usually try to prevent third-party libraries from being in
>> our public API. However in this case, that's tricky.
>>
>> The beam community can of course decide to break backwards compatibility.
>> However as stated today, it is maintained. The last time we broke backwards
>> compatibility was when the old Dataflow API was transitioned to Beam, and
>> it was very painful. It took multiple years to get some users onto the Beam
>> API due to the code changes required to migrate (and those required code
>> changes weren't terribly invasive).
>>
>>
>>>
>>> [1] https://issues.apache.org/jira/browse/AVRO-2335
>>>
>>> On Sat, Oct 16, 2021 at 12:40 AM Reuven Lax <re...@google.com> wrote:
>>>
>>>> Does this mean more recent versions of avro aren't backwards compatible
>>>> with avro 1.8? If so, this might be tricky to fix, since Beam maintains
>>>> backwards compatibility on its public API.
>>>>
>>>> On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu <
>>>> zeidoo@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I've created a small demo project to show the issue[1].
>>>>>
>>>>> I've looked at the beam code and all the avro tests use avro 1.8...
>>>>> which is hardcoded to Joda, hence why all the tests pass. Avro changed to
>>>>> java.time (as most other recent java projects) after 1.8. Is there a plan
>>>>> for Beam to do the same?
>>>>>
>>>>> Does anyone use Avro with java.time instead of joda.time that could
>>>>> give me ideas how to make it work?
>>>>>
>>>>> Thank you,
>>>>> Cristian
>>>>>
>>>>> [1] https://github.com/zeidoo/beamjavadates-poc
>>>>>
>>>>> On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette <bh...@google.com>
>>>>> wrote:
>>>>>
>>>>>> +dev <de...@beam.apache.org>
>>>>>>
>>>>>> This sounds like a bug in the Avro schema mapping. I *think* the
>>>>>> intention is that we generate logic for converting Date to/from Instant
>>>>>> when making a getters for a RowWithGetters backed by Avro.
>>>>>>
>>>>>> Brian
>>>>>>
>>>>>> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu <
>>>>>> zeidoo@gmail.com> wrote:
>>>>>>
>>>>>>> A little bit more color on this.
>>>>>>>
>>>>>>> That field is nested inside a avro record like so (not syntactically
>>>>>>> correct):
>>>>>>> {
>>>>>>> type: record,
>>>>>>> fields: [
>>>>>>>  {
>>>>>>>   name: whatever,
>>>>>>>   type: record {
>>>>>>>    fields: [
>>>>>>>    {
>>>>>>>      "name": "somedate",
>>>>>>>      "type: {"type": "int", "logicalType": "date"}
>>>>>>>     }
>>>>>>>    ]
>>>>>>> }
>>>>>>> ]
>>>>>>>
>>>>>>> The outer layer record uses SchemaCoderHelper.coderForFieldType on
>>>>>>> all it's fields to create the coder. However, that method when called for
>>>>>>> the inner record field just returns a
>>>>>>> SchemaCoder.of(fieldType.getRowSchema()) which doesn't take into account
>>>>>>> LogicalTypes.
>>>>>>>
>>>>>>> I think that's where the problem is. If anyone who knows that code
>>>>>>> could have a look and let me know their thoughts, I can try to fix the
>>>>>>> issue if we agree that there is one.
>>>>>>>
>>>>>>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu <
>>>>>>> zeidoo@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I have the following field in one of my avro schemas:
>>>>>>>>
>>>>>>>> {
>>>>>>>> "name": "somedate",
>>>>>>>> "type: {"type": "int", "logicalType": "date"}
>>>>>>>> }
>>>>>>>>
>>>>>>>> This generates a java.time.LocalDate field in the corresponding
>>>>>>>> java class (call it Foo).
>>>>>>>>
>>>>>>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that field
>>>>>>>> as DATETIME in the Beam schema. because of AvroUtils.toFieldType (around
>>>>>>>> line 275, where TimestampMillis and Date are both stored as DATETIME).
>>>>>>>>
>>>>>>>> My problem is that in SchemaCoderHelpers, DATETIME is set to use
>>>>>>>> InstantCoder which expects a joda Instant as input not a LocalDate. So when
>>>>>>>> my PTransform returns Foo objects it crashes with "class
>>>>>>>> java.time.LocalDate cannot be cast to class org.joda.time.Instant..." when
>>>>>>>> trying to encode that field using InstantCoder.encode().
>>>>>>>>
>>>>>>>> Is there a workaround for this issue?
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> Cristian
>>>>>>>>
>>>>>>>> PS: I did search the mailing list and google, but didn't find
>>>>>>>> anything related except a thread on AvroCoder.JodaTimestampConversion,
>>>>>>>> which I don't think it applies to here.
>>>>>>>>
>>>>>>>

Re: Why is Avro Date field using InstantCoder?

Posted by Cristian Constantinescu <ze...@gmail.com>.
Those are fair points. However please consider that there might be new
users who will decide that Beam isn't suitable because of things like
requiring Avro 1.8, Joda time, old Confluent libraries, and, when I started
using Beam about a year ago, Java 8 (I think we're okay with Java 11 now).

I guess what I'm saying is that there's definitely a non-negligible cost
associated with old 3rd party libs in Beam's code (even if efforts are put
in to minimize them).

On Sat, Oct 16, 2021 at 2:33 AM Reuven Lax <re...@google.com> wrote:

>
>
> On Fri, Oct 15, 2021 at 11:13 PM Cristian Constantinescu <ze...@gmail.com>
> wrote:
>
>> To my knowledge and reading through AVRO's Jira[1], it does not support
>> jodatime anymore.
>>
>> It seems everything related to this Avro 1.8 dependency is tricky. If you
>> recall, it also prevents us from upgrading to the latest Confluent libs...
>> for enabling Beam to use protobufs schemas with the schema registry. (I was
>> also the one who brought that issue up, also made an exploratory PR to move
>> AVRO outside of Beam core.)
>>
>> I understand that Beam tries to maintain public APIs stable, but I'd like
>> to put forward two points:
>> 1) Schemas are experimental, hence there shouldn't be any API
>> stability guarantees there.
>>
>
> Unfortunately at this point, they aren't really. As a community we've been
> bad about removing the Experimental label - many core, core parts of Beam
> are still labeled experimental (sources, triggering, state, timers).
> Realistically they are no longer experimental.
>
> 2) Maybe this is the perfect opportunity for the Beam community to think
>> about the long term effects of old dependencies within Beam's codebase, and
>> especially how to deal with them. Perhaps starting/maintaining an
>> "experimental" branch/maven-published-artifacts where Beam does not
>> guarantee backwards compatibility (or maintains it for a shorter period of
>> time) is something to think about.
>>
>
> This is why we usually try to prevent third-party libraries from being in
> our public API. However in this case, that's tricky.
>
> The beam community can of course decide to break backwards compatibility.
> However as stated today, it is maintained. The last time we broke backwards
> compatibility was when the old Dataflow API was transitioned to Beam, and
> it was very painful. It took multiple years to get some users onto the Beam
> API due to the code changes required to migrate (and those required code
> changes weren't terribly invasive).
>
>
>>
>> [1] https://issues.apache.org/jira/browse/AVRO-2335
>>
>> On Sat, Oct 16, 2021 at 12:40 AM Reuven Lax <re...@google.com> wrote:
>>
>>> Does this mean more recent versions of avro aren't backwards compatible
>>> with avro 1.8? If so, this might be tricky to fix, since Beam maintains
>>> backwards compatibility on its public API.
>>>
>>> On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu <
>>> zeidoo@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I've created a small demo project to show the issue[1].
>>>>
>>>> I've looked at the beam code and all the avro tests use avro 1.8...
>>>> which is hardcoded to Joda, hence why all the tests pass. Avro changed to
>>>> java.time (as most other recent java projects) after 1.8. Is there a plan
>>>> for Beam to do the same?
>>>>
>>>> Does anyone use Avro with java.time instead of joda.time that could
>>>> give me ideas how to make it work?
>>>>
>>>> Thank you,
>>>> Cristian
>>>>
>>>> [1] https://github.com/zeidoo/beamjavadates-poc
>>>>
>>>> On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette <bh...@google.com>
>>>> wrote:
>>>>
>>>>> +dev <de...@beam.apache.org>
>>>>>
>>>>> This sounds like a bug in the Avro schema mapping. I *think* the
>>>>> intention is that we generate logic for converting Date to/from Instant
>>>>> when making a getters for a RowWithGetters backed by Avro.
>>>>>
>>>>> Brian
>>>>>
>>>>> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu <
>>>>> zeidoo@gmail.com> wrote:
>>>>>
>>>>>> A little bit more color on this.
>>>>>>
>>>>>> That field is nested inside a avro record like so (not syntactically
>>>>>> correct):
>>>>>> {
>>>>>> type: record,
>>>>>> fields: [
>>>>>>  {
>>>>>>   name: whatever,
>>>>>>   type: record {
>>>>>>    fields: [
>>>>>>    {
>>>>>>      "name": "somedate",
>>>>>>      "type: {"type": "int", "logicalType": "date"}
>>>>>>     }
>>>>>>    ]
>>>>>> }
>>>>>> ]
>>>>>>
>>>>>> The outer layer record uses SchemaCoderHelper.coderForFieldType on
>>>>>> all it's fields to create the coder. However, that method when called for
>>>>>> the inner record field just returns a
>>>>>> SchemaCoder.of(fieldType.getRowSchema()) which doesn't take into account
>>>>>> LogicalTypes.
>>>>>>
>>>>>> I think that's where the problem is. If anyone who knows that code
>>>>>> could have a look and let me know their thoughts, I can try to fix the
>>>>>> issue if we agree that there is one.
>>>>>>
>>>>>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu <
>>>>>> zeidoo@gmail.com> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I have the following field in one of my avro schemas:
>>>>>>>
>>>>>>> {
>>>>>>> "name": "somedate",
>>>>>>> "type: {"type": "int", "logicalType": "date"}
>>>>>>> }
>>>>>>>
>>>>>>> This generates a java.time.LocalDate field in the corresponding java
>>>>>>> class (call it Foo).
>>>>>>>
>>>>>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that field
>>>>>>> as DATETIME in the Beam schema. because of AvroUtils.toFieldType (around
>>>>>>> line 275, where TimestampMillis and Date are both stored as DATETIME).
>>>>>>>
>>>>>>> My problem is that in SchemaCoderHelpers, DATETIME is set to use
>>>>>>> InstantCoder which expects a joda Instant as input not a LocalDate. So when
>>>>>>> my PTransform returns Foo objects it crashes with "class
>>>>>>> java.time.LocalDate cannot be cast to class org.joda.time.Instant..." when
>>>>>>> trying to encode that field using InstantCoder.encode().
>>>>>>>
>>>>>>> Is there a workaround for this issue?
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Cristian
>>>>>>>
>>>>>>> PS: I did search the mailing list and google, but didn't find
>>>>>>> anything related except a thread on AvroCoder.JodaTimestampConversion,
>>>>>>> which I don't think it applies to here.
>>>>>>>
>>>>>>

Re: Why is Avro Date field using InstantCoder?

Posted by Cristian Constantinescu <ze...@gmail.com>.
Those are fair points. However please consider that there might be new
users who will decide that Beam isn't suitable because of things like
requiring Avro 1.8, Joda time, old Confluent libraries, and, when I started
using Beam about a year ago, Java 8 (I think we're okay with Java 11 now).

I guess what I'm saying is that there's definitely a non-negligible cost
associated with old 3rd party libs in Beam's code (even if efforts are put
in to minimize them).

On Sat, Oct 16, 2021 at 2:33 AM Reuven Lax <re...@google.com> wrote:

>
>
> On Fri, Oct 15, 2021 at 11:13 PM Cristian Constantinescu <ze...@gmail.com>
> wrote:
>
>> To my knowledge and reading through AVRO's Jira[1], it does not support
>> jodatime anymore.
>>
>> It seems everything related to this Avro 1.8 dependency is tricky. If you
>> recall, it also prevents us from upgrading to the latest Confluent libs...
>> for enabling Beam to use protobufs schemas with the schema registry. (I was
>> also the one who brought that issue up, also made an exploratory PR to move
>> AVRO outside of Beam core.)
>>
>> I understand that Beam tries to maintain public APIs stable, but I'd like
>> to put forward two points:
>> 1) Schemas are experimental, hence there shouldn't be any API
>> stability guarantees there.
>>
>
> Unfortunately at this point, they aren't really. As a community we've been
> bad about removing the Experimental label - many core, core parts of Beam
> are still labeled experimental (sources, triggering, state, timers).
> Realistically they are no longer experimental.
>
> 2) Maybe this is the perfect opportunity for the Beam community to think
>> about the long term effects of old dependencies within Beam's codebase, and
>> especially how to deal with them. Perhaps starting/maintaining an
>> "experimental" branch/maven-published-artifacts where Beam does not
>> guarantee backwards compatibility (or maintains it for a shorter period of
>> time) is something to think about.
>>
>
> This is why we usually try to prevent third-party libraries from being in
> our public API. However in this case, that's tricky.
>
> The beam community can of course decide to break backwards compatibility.
> However as stated today, it is maintained. The last time we broke backwards
> compatibility was when the old Dataflow API was transitioned to Beam, and
> it was very painful. It took multiple years to get some users onto the Beam
> API due to the code changes required to migrate (and those required code
> changes weren't terribly invasive).
>
>
>>
>> [1] https://issues.apache.org/jira/browse/AVRO-2335
>>
>> On Sat, Oct 16, 2021 at 12:40 AM Reuven Lax <re...@google.com> wrote:
>>
>>> Does this mean more recent versions of avro aren't backwards compatible
>>> with avro 1.8? If so, this might be tricky to fix, since Beam maintains
>>> backwards compatibility on its public API.
>>>
>>> On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu <
>>> zeidoo@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I've created a small demo project to show the issue[1].
>>>>
>>>> I've looked at the beam code and all the avro tests use avro 1.8...
>>>> which is hardcoded to Joda, hence why all the tests pass. Avro changed to
>>>> java.time (as most other recent java projects) after 1.8. Is there a plan
>>>> for Beam to do the same?
>>>>
>>>> Does anyone use Avro with java.time instead of joda.time that could
>>>> give me ideas how to make it work?
>>>>
>>>> Thank you,
>>>> Cristian
>>>>
>>>> [1] https://github.com/zeidoo/beamjavadates-poc
>>>>
>>>> On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette <bh...@google.com>
>>>> wrote:
>>>>
>>>>> +dev <de...@beam.apache.org>
>>>>>
>>>>> This sounds like a bug in the Avro schema mapping. I *think* the
>>>>> intention is that we generate logic for converting Date to/from Instant
>>>>> when making a getters for a RowWithGetters backed by Avro.
>>>>>
>>>>> Brian
>>>>>
>>>>> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu <
>>>>> zeidoo@gmail.com> wrote:
>>>>>
>>>>>> A little bit more color on this.
>>>>>>
>>>>>> That field is nested inside a avro record like so (not syntactically
>>>>>> correct):
>>>>>> {
>>>>>> type: record,
>>>>>> fields: [
>>>>>>  {
>>>>>>   name: whatever,
>>>>>>   type: record {
>>>>>>    fields: [
>>>>>>    {
>>>>>>      "name": "somedate",
>>>>>>      "type: {"type": "int", "logicalType": "date"}
>>>>>>     }
>>>>>>    ]
>>>>>> }
>>>>>> ]
>>>>>>
>>>>>> The outer layer record uses SchemaCoderHelper.coderForFieldType on
>>>>>> all it's fields to create the coder. However, that method when called for
>>>>>> the inner record field just returns a
>>>>>> SchemaCoder.of(fieldType.getRowSchema()) which doesn't take into account
>>>>>> LogicalTypes.
>>>>>>
>>>>>> I think that's where the problem is. If anyone who knows that code
>>>>>> could have a look and let me know their thoughts, I can try to fix the
>>>>>> issue if we agree that there is one.
>>>>>>
>>>>>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu <
>>>>>> zeidoo@gmail.com> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I have the following field in one of my avro schemas:
>>>>>>>
>>>>>>> {
>>>>>>> "name": "somedate",
>>>>>>> "type: {"type": "int", "logicalType": "date"}
>>>>>>> }
>>>>>>>
>>>>>>> This generates a java.time.LocalDate field in the corresponding java
>>>>>>> class (call it Foo).
>>>>>>>
>>>>>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that field
>>>>>>> as DATETIME in the Beam schema. because of AvroUtils.toFieldType (around
>>>>>>> line 275, where TimestampMillis and Date are both stored as DATETIME).
>>>>>>>
>>>>>>> My problem is that in SchemaCoderHelpers, DATETIME is set to use
>>>>>>> InstantCoder which expects a joda Instant as input not a LocalDate. So when
>>>>>>> my PTransform returns Foo objects it crashes with "class
>>>>>>> java.time.LocalDate cannot be cast to class org.joda.time.Instant..." when
>>>>>>> trying to encode that field using InstantCoder.encode().
>>>>>>>
>>>>>>> Is there a workaround for this issue?
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Cristian
>>>>>>>
>>>>>>> PS: I did search the mailing list and google, but didn't find
>>>>>>> anything related except a thread on AvroCoder.JodaTimestampConversion,
>>>>>>> which I don't think it applies to here.
>>>>>>>
>>>>>>

Re: Why is Avro Date field using InstantCoder?

Posted by Reuven Lax <re...@google.com>.
On Fri, Oct 15, 2021 at 11:13 PM Cristian Constantinescu <ze...@gmail.com>
wrote:

> To my knowledge and reading through AVRO's Jira[1], it does not support
> jodatime anymore.
>
> It seems everything related to this Avro 1.8 dependency is tricky. If you
> recall, it also prevents us from upgrading to the latest Confluent libs...
> for enabling Beam to use protobufs schemas with the schema registry. (I was
> also the one who brought that issue up, also made an exploratory PR to move
> AVRO outside of Beam core.)
>
> I understand that Beam tries to maintain public APIs stable, but I'd like
> to put forward two points:
> 1) Schemas are experimental, hence there shouldn't be any API
> stability guarantees there.
>

Unfortunately at this point, they aren't really. As a community we've been
bad about removing the Experimental label - many core, core parts of Beam
are still labeled experimental (sources, triggering, state, timers).
Realistically they are no longer experimental.

2) Maybe this is the perfect opportunity for the Beam community to think
> about the long term effects of old dependencies within Beam's codebase, and
> especially how to deal with them. Perhaps starting/maintaining an
> "experimental" branch/maven-published-artifacts where Beam does not
> guarantee backwards compatibility (or maintains it for a shorter period of
> time) is something to think about.
>

This is why we usually try to prevent third-party libraries from being in
our public API. However in this case, that's tricky.

The beam community can of course decide to break backwards compatibility.
However as stated today, it is maintained. The last time we broke backwards
compatibility was when the old Dataflow API was transitioned to Beam, and
it was very painful. It took multiple years to get some users onto the Beam
API due to the code changes required to migrate (and those required code
changes weren't terribly invasive).


>
> [1] https://issues.apache.org/jira/browse/AVRO-2335
>
> On Sat, Oct 16, 2021 at 12:40 AM Reuven Lax <re...@google.com> wrote:
>
>> Does this mean more recent versions of avro aren't backwards compatible
>> with avro 1.8? If so, this might be tricky to fix, since Beam maintains
>> backwards compatibility on its public API.
>>
>> On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu <ze...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I've created a small demo project to show the issue[1].
>>>
>>> I've looked at the beam code and all the avro tests use avro 1.8...
>>> which is hardcoded to Joda, hence why all the tests pass. Avro changed to
>>> java.time (as most other recent java projects) after 1.8. Is there a plan
>>> for Beam to do the same?
>>>
>>> Does anyone use Avro with java.time instead of joda.time that could give
>>> me ideas how to make it work?
>>>
>>> Thank you,
>>> Cristian
>>>
>>> [1] https://github.com/zeidoo/beamjavadates-poc
>>>
>>> On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette <bh...@google.com>
>>> wrote:
>>>
>>>> +dev <de...@beam.apache.org>
>>>>
>>>> This sounds like a bug in the Avro schema mapping. I *think* the
>>>> intention is that we generate logic for converting Date to/from Instant
>>>> when making a getters for a RowWithGetters backed by Avro.
>>>>
>>>> Brian
>>>>
>>>> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu <
>>>> zeidoo@gmail.com> wrote:
>>>>
>>>>> A little bit more color on this.
>>>>>
>>>>> That field is nested inside a avro record like so (not syntactically
>>>>> correct):
>>>>> {
>>>>> type: record,
>>>>> fields: [
>>>>>  {
>>>>>   name: whatever,
>>>>>   type: record {
>>>>>    fields: [
>>>>>    {
>>>>>      "name": "somedate",
>>>>>      "type: {"type": "int", "logicalType": "date"}
>>>>>     }
>>>>>    ]
>>>>> }
>>>>> ]
>>>>>
>>>>> The outer layer record uses SchemaCoderHelper.coderForFieldType on all
>>>>> it's fields to create the coder. However, that method when called for the
>>>>> inner record field just returns a SchemaCoder.of(fieldType.getRowSchema())
>>>>> which doesn't take into account LogicalTypes.
>>>>>
>>>>> I think that's where the problem is. If anyone who knows that code
>>>>> could have a look and let me know their thoughts, I can try to fix the
>>>>> issue if we agree that there is one.
>>>>>
>>>>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu <
>>>>> zeidoo@gmail.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I have the following field in one of my avro schemas:
>>>>>>
>>>>>> {
>>>>>> "name": "somedate",
>>>>>> "type: {"type": "int", "logicalType": "date"}
>>>>>> }
>>>>>>
>>>>>> This generates a java.time.LocalDate field in the corresponding java
>>>>>> class (call it Foo).
>>>>>>
>>>>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that field
>>>>>> as DATETIME in the Beam schema. because of AvroUtils.toFieldType (around
>>>>>> line 275, where TimestampMillis and Date are both stored as DATETIME).
>>>>>>
>>>>>> My problem is that in SchemaCoderHelpers, DATETIME is set to use
>>>>>> InstantCoder which expects a joda Instant as input not a LocalDate. So when
>>>>>> my PTransform returns Foo objects it crashes with "class
>>>>>> java.time.LocalDate cannot be cast to class org.joda.time.Instant..." when
>>>>>> trying to encode that field using InstantCoder.encode().
>>>>>>
>>>>>> Is there a workaround for this issue?
>>>>>>
>>>>>> Thank you,
>>>>>> Cristian
>>>>>>
>>>>>> PS: I did search the mailing list and google, but didn't find
>>>>>> anything related except a thread on AvroCoder.JodaTimestampConversion,
>>>>>> which I don't think it applies to here.
>>>>>>
>>>>>

Re: Why is Avro Date field using InstantCoder?

Posted by Reuven Lax <re...@google.com>.
On Fri, Oct 15, 2021 at 11:13 PM Cristian Constantinescu <ze...@gmail.com>
wrote:

> To my knowledge and reading through AVRO's Jira[1], it does not support
> jodatime anymore.
>
> It seems everything related to this Avro 1.8 dependency is tricky. If you
> recall, it also prevents us from upgrading to the latest Confluent libs...
> for enabling Beam to use protobufs schemas with the schema registry. (I was
> also the one who brought that issue up, also made an exploratory PR to move
> AVRO outside of Beam core.)
>
> I understand that Beam tries to maintain public APIs stable, but I'd like
> to put forward two points:
> 1) Schemas are experimental, hence there shouldn't be any API
> stability guarantees there.
>

Unfortunately at this point, they aren't really. As a community we've been
bad about removing the Experimental label - many core, core parts of Beam
are still labeled experimental (sources, triggering, state, timers).
Realistically they are no longer experimental.

2) Maybe this is the perfect opportunity for the Beam community to think
> about the long term effects of old dependencies within Beam's codebase, and
> especially how to deal with them. Perhaps starting/maintaining an
> "experimental" branch/maven-published-artifacts where Beam does not
> guarantee backwards compatibility (or maintains it for a shorter period of
> time) is something to think about.
>

This is why we usually try to prevent third-party libraries from being in
our public API. However in this case, that's tricky.

The beam community can of course decide to break backwards compatibility.
However as stated today, it is maintained. The last time we broke backwards
compatibility was when the old Dataflow API was transitioned to Beam, and
it was very painful. It took multiple years to get some users onto the Beam
API due to the code changes required to migrate (and those required code
changes weren't terribly invasive).


>
> [1] https://issues.apache.org/jira/browse/AVRO-2335
>
> On Sat, Oct 16, 2021 at 12:40 AM Reuven Lax <re...@google.com> wrote:
>
>> Does this mean more recent versions of avro aren't backwards compatible
>> with avro 1.8? If so, this might be tricky to fix, since Beam maintains
>> backwards compatibility on its public API.
>>
>> On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu <ze...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I've created a small demo project to show the issue[1].
>>>
>>> I've looked at the beam code and all the avro tests use avro 1.8...
>>> which is hardcoded to Joda, hence why all the tests pass. Avro changed to
>>> java.time (as most other recent java projects) after 1.8. Is there a plan
>>> for Beam to do the same?
>>>
>>> Does anyone use Avro with java.time instead of joda.time that could give
>>> me ideas how to make it work?
>>>
>>> Thank you,
>>> Cristian
>>>
>>> [1] https://github.com/zeidoo/beamjavadates-poc
>>>
>>> On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette <bh...@google.com>
>>> wrote:
>>>
>>>> +dev <de...@beam.apache.org>
>>>>
>>>> This sounds like a bug in the Avro schema mapping. I *think* the
>>>> intention is that we generate logic for converting Date to/from Instant
>>>> when making a getters for a RowWithGetters backed by Avro.
>>>>
>>>> Brian
>>>>
>>>> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu <
>>>> zeidoo@gmail.com> wrote:
>>>>
>>>>> A little bit more color on this.
>>>>>
>>>>> That field is nested inside a avro record like so (not syntactically
>>>>> correct):
>>>>> {
>>>>> type: record,
>>>>> fields: [
>>>>>  {
>>>>>   name: whatever,
>>>>>   type: record {
>>>>>    fields: [
>>>>>    {
>>>>>      "name": "somedate",
>>>>>      "type: {"type": "int", "logicalType": "date"}
>>>>>     }
>>>>>    ]
>>>>> }
>>>>> ]
>>>>>
>>>>> The outer layer record uses SchemaCoderHelper.coderForFieldType on all
>>>>> it's fields to create the coder. However, that method when called for the
>>>>> inner record field just returns a SchemaCoder.of(fieldType.getRowSchema())
>>>>> which doesn't take into account LogicalTypes.
>>>>>
>>>>> I think that's where the problem is. If anyone who knows that code
>>>>> could have a look and let me know their thoughts, I can try to fix the
>>>>> issue if we agree that there is one.
>>>>>
>>>>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu <
>>>>> zeidoo@gmail.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I have the following field in one of my avro schemas:
>>>>>>
>>>>>> {
>>>>>> "name": "somedate",
>>>>>> "type: {"type": "int", "logicalType": "date"}
>>>>>> }
>>>>>>
>>>>>> This generates a java.time.LocalDate field in the corresponding java
>>>>>> class (call it Foo).
>>>>>>
>>>>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that field
>>>>>> as DATETIME in the Beam schema. because of AvroUtils.toFieldType (around
>>>>>> line 275, where TimestampMillis and Date are both stored as DATETIME).
>>>>>>
>>>>>> My problem is that in SchemaCoderHelpers, DATETIME is set to use
>>>>>> InstantCoder which expects a joda Instant as input not a LocalDate. So when
>>>>>> my PTransform returns Foo objects it crashes with "class
>>>>>> java.time.LocalDate cannot be cast to class org.joda.time.Instant..." when
>>>>>> trying to encode that field using InstantCoder.encode().
>>>>>>
>>>>>> Is there a workaround for this issue?
>>>>>>
>>>>>> Thank you,
>>>>>> Cristian
>>>>>>
>>>>>> PS: I did search the mailing list and google, but didn't find
>>>>>> anything related except a thread on AvroCoder.JodaTimestampConversion,
>>>>>> which I don't think it applies to here.
>>>>>>
>>>>>

Re: Why is Avro Date field using InstantCoder?

Posted by Cristian Constantinescu <ze...@gmail.com>.
To my knowledge and reading through AVRO's Jira[1], it does not support
jodatime anymore.

It seems everything related to this Avro 1.8 dependency is tricky. If you
recall, it also prevents us from upgrading to the latest Confluent libs...
for enabling Beam to use protobufs schemas with the schema registry. (I was
also the one who brought that issue up, also made an exploratory PR to move
AVRO outside of Beam core.)

I understand that Beam tries to maintain public APIs stable, but I'd like
to put forward two points:
1) Schemas are experimental, hence there shouldn't be any API
stability guarantees there.
2) Maybe this is the perfect opportunity for the Beam community to think
about the long term effects of old dependencies within Beam's codebase, and
especially how to deal with them. Perhaps starting/maintaining an
"experimental" branch/maven-published-artifacts where Beam does not
guarantee backwards compatibility (or maintains it for a shorter period of
time) is something to think about.

[1] https://issues.apache.org/jira/browse/AVRO-2335

On Sat, Oct 16, 2021 at 12:40 AM Reuven Lax <re...@google.com> wrote:

> Does this mean more recent versions of avro aren't backwards compatible
> with avro 1.8? If so, this might be tricky to fix, since Beam maintains
> backwards compatibility on its public API.
>
> On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu <ze...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I've created a small demo project to show the issue[1].
>>
>> I've looked at the beam code and all the avro tests use avro 1.8...
>> which is hardcoded to Joda, hence why all the tests pass. Avro changed to
>> java.time (as most other recent java projects) after 1.8. Is there a plan
>> for Beam to do the same?
>>
>> Does anyone use Avro with java.time instead of joda.time that could give
>> me ideas how to make it work?
>>
>> Thank you,
>> Cristian
>>
>> [1] https://github.com/zeidoo/beamjavadates-poc
>>
>> On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette <bh...@google.com>
>> wrote:
>>
>>> +dev <de...@beam.apache.org>
>>>
>>> This sounds like a bug in the Avro schema mapping. I *think* the
>>> intention is that we generate logic for converting Date to/from Instant
>>> when making a getters for a RowWithGetters backed by Avro.
>>>
>>> Brian
>>>
>>> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu <
>>> zeidoo@gmail.com> wrote:
>>>
>>>> A little bit more color on this.
>>>>
>>>> That field is nested inside a avro record like so (not syntactically
>>>> correct):
>>>> {
>>>> type: record,
>>>> fields: [
>>>>  {
>>>>   name: whatever,
>>>>   type: record {
>>>>    fields: [
>>>>    {
>>>>      "name": "somedate",
>>>>      "type: {"type": "int", "logicalType": "date"}
>>>>     }
>>>>    ]
>>>> }
>>>> ]
>>>>
>>>> The outer layer record uses SchemaCoderHelper.coderForFieldType on all
>>>> it's fields to create the coder. However, that method when called for the
>>>> inner record field just returns a SchemaCoder.of(fieldType.getRowSchema())
>>>> which doesn't take into account LogicalTypes.
>>>>
>>>> I think that's where the problem is. If anyone who knows that code
>>>> could have a look and let me know their thoughts, I can try to fix the
>>>> issue if we agree that there is one.
>>>>
>>>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu <
>>>> zeidoo@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I have the following field in one of my avro schemas:
>>>>>
>>>>> {
>>>>> "name": "somedate",
>>>>> "type: {"type": "int", "logicalType": "date"}
>>>>> }
>>>>>
>>>>> This generates a java.time.LocalDate field in the corresponding java
>>>>> class (call it Foo).
>>>>>
>>>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that field as
>>>>> DATETIME in the Beam schema. because of AvroUtils.toFieldType (around line
>>>>> 275, where TimestampMillis and Date are both stored as DATETIME).
>>>>>
>>>>> My problem is that in SchemaCoderHelpers, DATETIME is set to use
>>>>> InstantCoder which expects a joda Instant as input not a LocalDate. So when
>>>>> my PTransform returns Foo objects it crashes with "class
>>>>> java.time.LocalDate cannot be cast to class org.joda.time.Instant..." when
>>>>> trying to encode that field using InstantCoder.encode().
>>>>>
>>>>> Is there a workaround for this issue?
>>>>>
>>>>> Thank you,
>>>>> Cristian
>>>>>
>>>>> PS: I did search the mailing list and google, but didn't find anything
>>>>> related except a thread on AvroCoder.JodaTimestampConversion, which I don't
>>>>> think it applies to here.
>>>>>
>>>>

Re: Why is Avro Date field using InstantCoder?

Posted by Cristian Constantinescu <ze...@gmail.com>.
To my knowledge and reading through AVRO's Jira[1], it does not support
jodatime anymore.

It seems everything related to this Avro 1.8 dependency is tricky. If you
recall, it also prevents us from upgrading to the latest Confluent libs...
for enabling Beam to use protobufs schemas with the schema registry. (I was
also the one who brought that issue up, also made an exploratory PR to move
AVRO outside of Beam core.)

I understand that Beam tries to maintain public APIs stable, but I'd like
to put forward two points:
1) Schemas are experimental, hence there shouldn't be any API
stability guarantees there.
2) Maybe this is the perfect opportunity for the Beam community to think
about the long term effects of old dependencies within Beam's codebase, and
especially how to deal with them. Perhaps starting/maintaining an
"experimental" branch/maven-published-artifacts where Beam does not
guarantee backwards compatibility (or maintains it for a shorter period of
time) is something to think about.

[1] https://issues.apache.org/jira/browse/AVRO-2335

On Sat, Oct 16, 2021 at 12:40 AM Reuven Lax <re...@google.com> wrote:

> Does this mean more recent versions of avro aren't backwards compatible
> with avro 1.8? If so, this might be tricky to fix, since Beam maintains
> backwards compatibility on its public API.
>
> On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu <ze...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I've created a small demo project to show the issue[1].
>>
>> I've looked at the beam code and all the avro tests use avro 1.8...
>> which is hardcoded to Joda, hence why all the tests pass. Avro changed to
>> java.time (as most other recent java projects) after 1.8. Is there a plan
>> for Beam to do the same?
>>
>> Does anyone use Avro with java.time instead of joda.time that could give
>> me ideas how to make it work?
>>
>> Thank you,
>> Cristian
>>
>> [1] https://github.com/zeidoo/beamjavadates-poc
>>
>> On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette <bh...@google.com>
>> wrote:
>>
>>> +dev <de...@beam.apache.org>
>>>
>>> This sounds like a bug in the Avro schema mapping. I *think* the
>>> intention is that we generate logic for converting Date to/from Instant
>>> when making a getters for a RowWithGetters backed by Avro.
>>>
>>> Brian
>>>
>>> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu <
>>> zeidoo@gmail.com> wrote:
>>>
>>>> A little bit more color on this.
>>>>
>>>> That field is nested inside a avro record like so (not syntactically
>>>> correct):
>>>> {
>>>> type: record,
>>>> fields: [
>>>>  {
>>>>   name: whatever,
>>>>   type: record {
>>>>    fields: [
>>>>    {
>>>>      "name": "somedate",
>>>>      "type: {"type": "int", "logicalType": "date"}
>>>>     }
>>>>    ]
>>>> }
>>>> ]
>>>>
>>>> The outer layer record uses SchemaCoderHelper.coderForFieldType on all
>>>> it's fields to create the coder. However, that method when called for the
>>>> inner record field just returns a SchemaCoder.of(fieldType.getRowSchema())
>>>> which doesn't take into account LogicalTypes.
>>>>
>>>> I think that's where the problem is. If anyone who knows that code
>>>> could have a look and let me know their thoughts, I can try to fix the
>>>> issue if we agree that there is one.
>>>>
>>>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu <
>>>> zeidoo@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I have the following field in one of my avro schemas:
>>>>>
>>>>> {
>>>>> "name": "somedate",
>>>>> "type: {"type": "int", "logicalType": "date"}
>>>>> }
>>>>>
>>>>> This generates a java.time.LocalDate field in the corresponding java
>>>>> class (call it Foo).
>>>>>
>>>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that field as
>>>>> DATETIME in the Beam schema. because of AvroUtils.toFieldType (around line
>>>>> 275, where TimestampMillis and Date are both stored as DATETIME).
>>>>>
>>>>> My problem is that in SchemaCoderHelpers, DATETIME is set to use
>>>>> InstantCoder which expects a joda Instant as input not a LocalDate. So when
>>>>> my PTransform returns Foo objects it crashes with "class
>>>>> java.time.LocalDate cannot be cast to class org.joda.time.Instant..." when
>>>>> trying to encode that field using InstantCoder.encode().
>>>>>
>>>>> Is there a workaround for this issue?
>>>>>
>>>>> Thank you,
>>>>> Cristian
>>>>>
>>>>> PS: I did search the mailing list and google, but didn't find anything
>>>>> related except a thread on AvroCoder.JodaTimestampConversion, which I don't
>>>>> think it applies to here.
>>>>>
>>>>

Re: Why is Avro Date field using InstantCoder?

Posted by Reuven Lax <re...@google.com>.
Does this mean more recent versions of avro aren't backwards compatible
with avro 1.8? If so, this might be tricky to fix, since Beam maintains
backwards compatibility on its public API.

On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu <ze...@gmail.com>
wrote:

> Hi all,
>
> I've created a small demo project to show the issue[1].
>
> I've looked at the beam code and all the avro tests use avro 1.8...
> which is hardcoded to Joda, hence why all the tests pass. Avro changed to
> java.time (as most other recent java projects) after 1.8. Is there a plan
> for Beam to do the same?
>
> Does anyone use Avro with java.time instead of joda.time that could give
> me ideas how to make it work?
>
> Thank you,
> Cristian
>
> [1] https://github.com/zeidoo/beamjavadates-poc
>
> On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette <bh...@google.com> wrote:
>
>> +dev <de...@beam.apache.org>
>>
>> This sounds like a bug in the Avro schema mapping. I *think* the
>> intention is that we generate logic for converting Date to/from Instant
>> when making a getters for a RowWithGetters backed by Avro.
>>
>> Brian
>>
>> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu <ze...@gmail.com>
>> wrote:
>>
>>> A little bit more color on this.
>>>
>>> That field is nested inside a avro record like so (not syntactically
>>> correct):
>>> {
>>> type: record,
>>> fields: [
>>>  {
>>>   name: whatever,
>>>   type: record {
>>>    fields: [
>>>    {
>>>      "name": "somedate",
>>>      "type: {"type": "int", "logicalType": "date"}
>>>     }
>>>    ]
>>> }
>>> ]
>>>
>>> The outer layer record uses SchemaCoderHelper.coderForFieldType on all
>>> it's fields to create the coder. However, that method when called for the
>>> inner record field just returns a SchemaCoder.of(fieldType.getRowSchema())
>>> which doesn't take into account LogicalTypes.
>>>
>>> I think that's where the problem is. If anyone who knows that code could
>>> have a look and let me know their thoughts, I can try to fix the issue if
>>> we agree that there is one.
>>>
>>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu <
>>> zeidoo@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I have the following field in one of my avro schemas:
>>>>
>>>> {
>>>> "name": "somedate",
>>>> "type: {"type": "int", "logicalType": "date"}
>>>> }
>>>>
>>>> This generates a java.time.LocalDate field in the corresponding java
>>>> class (call it Foo).
>>>>
>>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that field as
>>>> DATETIME in the Beam schema. because of AvroUtils.toFieldType (around line
>>>> 275, where TimestampMillis and Date are both stored as DATETIME).
>>>>
>>>> My problem is that in SchemaCoderHelpers, DATETIME is set to use
>>>> InstantCoder which expects a joda Instant as input not a LocalDate. So when
>>>> my PTransform returns Foo objects it crashes with "class
>>>> java.time.LocalDate cannot be cast to class org.joda.time.Instant..." when
>>>> trying to encode that field using InstantCoder.encode().
>>>>
>>>> Is there a workaround for this issue?
>>>>
>>>> Thank you,
>>>> Cristian
>>>>
>>>> PS: I did search the mailing list and google, but didn't find anything
>>>> related except a thread on AvroCoder.JodaTimestampConversion, which I don't
>>>> think it applies to here.
>>>>
>>>

Re: Why is Avro Date field using InstantCoder?

Posted by Reuven Lax <re...@google.com>.
Does this mean more recent versions of avro aren't backwards compatible
with avro 1.8? If so, this might be tricky to fix, since Beam maintains
backwards compatibility on its public API.

On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu <ze...@gmail.com>
wrote:

> Hi all,
>
> I've created a small demo project to show the issue[1].
>
> I've looked at the beam code and all the avro tests use avro 1.8...
> which is hardcoded to Joda, hence why all the tests pass. Avro changed to
> java.time (as most other recent java projects) after 1.8. Is there a plan
> for Beam to do the same?
>
> Does anyone use Avro with java.time instead of joda.time that could give
> me ideas how to make it work?
>
> Thank you,
> Cristian
>
> [1] https://github.com/zeidoo/beamjavadates-poc
>
> On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette <bh...@google.com> wrote:
>
>> +dev <de...@beam.apache.org>
>>
>> This sounds like a bug in the Avro schema mapping. I *think* the
>> intention is that we generate logic for converting Date to/from Instant
>> when making a getters for a RowWithGetters backed by Avro.
>>
>> Brian
>>
>> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu <ze...@gmail.com>
>> wrote:
>>
>>> A little bit more color on this.
>>>
>>> That field is nested inside a avro record like so (not syntactically
>>> correct):
>>> {
>>> type: record,
>>> fields: [
>>>  {
>>>   name: whatever,
>>>   type: record {
>>>    fields: [
>>>    {
>>>      "name": "somedate",
>>>      "type: {"type": "int", "logicalType": "date"}
>>>     }
>>>    ]
>>> }
>>> ]
>>>
>>> The outer layer record uses SchemaCoderHelper.coderForFieldType on all
>>> it's fields to create the coder. However, that method when called for the
>>> inner record field just returns a SchemaCoder.of(fieldType.getRowSchema())
>>> which doesn't take into account LogicalTypes.
>>>
>>> I think that's where the problem is. If anyone who knows that code could
>>> have a look and let me know their thoughts, I can try to fix the issue if
>>> we agree that there is one.
>>>
>>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu <
>>> zeidoo@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I have the following field in one of my avro schemas:
>>>>
>>>> {
>>>> "name": "somedate",
>>>> "type: {"type": "int", "logicalType": "date"}
>>>> }
>>>>
>>>> This generates a java.time.LocalDate field in the corresponding java
>>>> class (call it Foo).
>>>>
>>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that field as
>>>> DATETIME in the Beam schema. because of AvroUtils.toFieldType (around line
>>>> 275, where TimestampMillis and Date are both stored as DATETIME).
>>>>
>>>> My problem is that in SchemaCoderHelpers, DATETIME is set to use
>>>> InstantCoder which expects a joda Instant as input not a LocalDate. So when
>>>> my PTransform returns Foo objects it crashes with "class
>>>> java.time.LocalDate cannot be cast to class org.joda.time.Instant..." when
>>>> trying to encode that field using InstantCoder.encode().
>>>>
>>>> Is there a workaround for this issue?
>>>>
>>>> Thank you,
>>>> Cristian
>>>>
>>>> PS: I did search the mailing list and google, but didn't find anything
>>>> related except a thread on AvroCoder.JodaTimestampConversion, which I don't
>>>> think it applies to here.
>>>>
>>>

Re: Why is Avro Date field using InstantCoder?

Posted by Cristian Constantinescu <ze...@gmail.com>.
Hi all,

I've created a small demo project to show the issue[1].

I've looked at the beam code and all the avro tests use avro 1.8...
which is hardcoded to Joda, hence why all the tests pass. Avro changed to
java.time (as most other recent java projects) after 1.8. Is there a plan
for Beam to do the same?

Does anyone use Avro with java.time instead of joda.time that could give me
ideas how to make it work?

Thank you,
Cristian

[1] https://github.com/zeidoo/beamjavadates-poc

On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette <bh...@google.com> wrote:

> +dev <de...@beam.apache.org>
>
> This sounds like a bug in the Avro schema mapping. I *think* the intention
> is that we generate logic for converting Date to/from Instant when making a
> getters for a RowWithGetters backed by Avro.
>
> Brian
>
> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu <ze...@gmail.com>
> wrote:
>
>> A little bit more color on this.
>>
>> That field is nested inside a avro record like so (not syntactically
>> correct):
>> {
>> type: record,
>> fields: [
>>  {
>>   name: whatever,
>>   type: record {
>>    fields: [
>>    {
>>      "name": "somedate",
>>      "type: {"type": "int", "logicalType": "date"}
>>     }
>>    ]
>> }
>> ]
>>
>> The outer layer record uses SchemaCoderHelper.coderForFieldType on all
>> it's fields to create the coder. However, that method when called for the
>> inner record field just returns a SchemaCoder.of(fieldType.getRowSchema())
>> which doesn't take into account LogicalTypes.
>>
>> I think that's where the problem is. If anyone who knows that code could
>> have a look and let me know their thoughts, I can try to fix the issue if
>> we agree that there is one.
>>
>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu <ze...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I have the following field in one of my avro schemas:
>>>
>>> {
>>> "name": "somedate",
>>> "type: {"type": "int", "logicalType": "date"}
>>> }
>>>
>>> This generates a java.time.LocalDate field in the corresponding java
>>> class (call it Foo).
>>>
>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that field as
>>> DATETIME in the Beam schema. because of AvroUtils.toFieldType (around line
>>> 275, where TimestampMillis and Date are both stored as DATETIME).
>>>
>>> My problem is that in SchemaCoderHelpers, DATETIME is set to use
>>> InstantCoder which expects a joda Instant as input not a LocalDate. So when
>>> my PTransform returns Foo objects it crashes with "class
>>> java.time.LocalDate cannot be cast to class org.joda.time.Instant..." when
>>> trying to encode that field using InstantCoder.encode().
>>>
>>> Is there a workaround for this issue?
>>>
>>> Thank you,
>>> Cristian
>>>
>>> PS: I did search the mailing list and google, but didn't find anything
>>> related except a thread on AvroCoder.JodaTimestampConversion, which I don't
>>> think it applies to here.
>>>
>>

Re: Why is Avro Date field using InstantCoder?

Posted by Cristian Constantinescu <ze...@gmail.com>.
Hi all,

I've created a small demo project to show the issue[1].

I've looked at the beam code and all the avro tests use avro 1.8...
which is hardcoded to Joda, hence why all the tests pass. Avro changed to
java.time (as most other recent java projects) after 1.8. Is there a plan
for Beam to do the same?

Does anyone use Avro with java.time instead of joda.time that could give me
ideas how to make it work?

Thank you,
Cristian

[1] https://github.com/zeidoo/beamjavadates-poc

On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette <bh...@google.com> wrote:

> +dev <de...@beam.apache.org>
>
> This sounds like a bug in the Avro schema mapping. I *think* the intention
> is that we generate logic for converting Date to/from Instant when making a
> getters for a RowWithGetters backed by Avro.
>
> Brian
>
> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu <ze...@gmail.com>
> wrote:
>
>> A little bit more color on this.
>>
>> That field is nested inside a avro record like so (not syntactically
>> correct):
>> {
>> type: record,
>> fields: [
>>  {
>>   name: whatever,
>>   type: record {
>>    fields: [
>>    {
>>      "name": "somedate",
>>      "type: {"type": "int", "logicalType": "date"}
>>     }
>>    ]
>> }
>> ]
>>
>> The outer layer record uses SchemaCoderHelper.coderForFieldType on all
>> it's fields to create the coder. However, that method when called for the
>> inner record field just returns a SchemaCoder.of(fieldType.getRowSchema())
>> which doesn't take into account LogicalTypes.
>>
>> I think that's where the problem is. If anyone who knows that code could
>> have a look and let me know their thoughts, I can try to fix the issue if
>> we agree that there is one.
>>
>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu <ze...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I have the following field in one of my avro schemas:
>>>
>>> {
>>> "name": "somedate",
>>> "type: {"type": "int", "logicalType": "date"}
>>> }
>>>
>>> This generates a java.time.LocalDate field in the corresponding java
>>> class (call it Foo).
>>>
>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that field as
>>> DATETIME in the Beam schema. because of AvroUtils.toFieldType (around line
>>> 275, where TimestampMillis and Date are both stored as DATETIME).
>>>
>>> My problem is that in SchemaCoderHelpers, DATETIME is set to use
>>> InstantCoder which expects a joda Instant as input not a LocalDate. So when
>>> my PTransform returns Foo objects it crashes with "class
>>> java.time.LocalDate cannot be cast to class org.joda.time.Instant..." when
>>> trying to encode that field using InstantCoder.encode().
>>>
>>> Is there a workaround for this issue?
>>>
>>> Thank you,
>>> Cristian
>>>
>>> PS: I did search the mailing list and google, but didn't find anything
>>> related except a thread on AvroCoder.JodaTimestampConversion, which I don't
>>> think it applies to here.
>>>
>>