You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Jeff Klukas <jk...@mozilla.com> on 2018/11/09 21:50:19 UTC

Design review for supporting AutoValue Coders and conversions to Row

Hi all - I'm looking for some review and commentary on a proposed design
for providing built-in Coders for AutoValue classes. There's existing
discussion in BEAM-1891 [0] about using AvroCoder, but that's blocked on
incompatibility between AutoValue and Avro's reflection machinery that
don't look resolvable.

I wrote up a design document [1] that instead proposes using AutoValue's
extension API to automatically generate a Coder for each AutoValue class
that users generate. A similar technique could be used to generate
conversions to and from Row for use with BeamSql.

I'd appreciate review of the design and thoughts on whether this seems
feasible to support within the Beam codebase. I may be missing a simpler
approach.

[0] https://issues.apache.org/jira/browse/BEAM-1891
[1]
https://docs.google.com/document/d/1ucoik4WzUDfilqIz3I1AuMHc1J8DE6iv7gaUCDI42BI/edit?usp=sharing

Re: Design review for supporting AutoValue Coders and conversions to Row

Posted by Jeff Klukas <jk...@mozilla.com>.
Anton - Thanks for reading and commenting. I've gone as far as creating a
skeleton AutoValue extension to better understand how that API works, but I
don't yet have a working prototype for either of these proposed additions.

I'll move on to prototyping the Coder generation for AutoValue classes if I
get some clear signal from this list that maintaining an AutoValue
extension for generating this code seems like a reasonable path forward.

On Fri, Nov 9, 2018 at 7:42 PM Anton Kedin <ke...@google.com> wrote:

> Hi Jeff,
>
> I think this is a great idea! Thank you for working on the proposal. I
> left couple of comments in the doc.
>
> Have you tried prototyping this?
>
> Regards,
> Anton
>

Re: Design review for supporting AutoValue Coders and conversions to Row

Posted by Anton Kedin <ke...@google.com>.
Hi Jeff,

I think this is a great idea! Thank you for working on the proposal. I left
couple of comments in the doc.

Have you tried prototyping this?

Regards,
Anton

On Fri, Nov 9, 2018 at 1:50 PM Jeff Klukas <jk...@mozilla.com> wrote:

> Hi all - I'm looking for some review and commentary on a proposed design
> for providing built-in Coders for AutoValue classes. There's existing
> discussion in BEAM-1891 [0] about using AvroCoder, but that's blocked on
> incompatibility between AutoValue and Avro's reflection machinery that
> don't look resolvable.
>
> I wrote up a design document [1] that instead proposes using AutoValue's
> extension API to automatically generate a Coder for each AutoValue class
> that users generate. A similar technique could be used to generate
> conversions to and from Row for use with BeamSql.
>
> I'd appreciate review of the design and thoughts on whether this seems
> feasible to support within the Beam codebase. I may be missing a simpler
> approach.
>
> [0] https://issues.apache.org/jira/browse/BEAM-1891
> [1]
> https://docs.google.com/document/d/1ucoik4WzUDfilqIz3I1AuMHc1J8DE6iv7gaUCDI42BI/edit?usp=sharing
>

Re: Design review for supporting AutoValue Coders and conversions to Row

Posted by Reuven Lax <re...@google.com>.
One https://github.com/apache/beam/pull/7289 goes in, the field order will
be solved as well. I'll go ahead and send a PR adding support for
AutoValue, as there will be very little delta by then.

Reuven

On Sun, Dec 2, 2018 at 9:44 PM Reuven Lax <re...@google.com> wrote:

> Thinking about this a bit more - I suspect we already have almost all the
> code we need.
>
> The code to infer a schema from a Java Bean will probably work with little
> change on AutoValue, as it's essentially just a fancy Java Bean. The Java
> Bean generated getters should also work. I think all that need to be done
> is to generate a constructor. The tricky thing is that the order of fields
> in the AutoValue_XXX constructor may not match the order of the fields in
> the schema, so we will need to generate an intermediate constructor that
> generates the correct call. (alternatively we can try and detect the schema
> from the constructor instead of from the getters, which should give us a
> schema with matching field order).
>
> Reuven
>
> On Thu, Nov 29, 2018 at 9:30 AM Reuven Lax <re...@google.com> wrote:
>
>> https://github.com/apache/beam/pull/7147 starts adding the framework to
>> do this (for POJOs we actually generate a constructor using ByteBuddy, but
>> that might not be necessary for AutoValue).
>>
>> I would start by writing the inference from AutoVaue to a Schema. For
>> example, see PojoUils::schemaFromPojoClass or
>> JavaBeanUtils::schemaFromJavaBeanClass.
>>
>> Reuven
>>
>> On Mon, Nov 26, 2018 at 6:08 AM Jeff Klukas <jk...@mozilla.com> wrote:
>>
>>> Reuven - How is the work on constructor support for ByteBuddy codegen
>>> going? Does it still look like that's going to be a feasible way forward
>>> for generating schemas/coders for AutoValue classes?
>>>
>>> On Thu, Nov 15, 2018 at 4:37 PM Reuven Lax <re...@google.com> wrote:
>>>
>>>> I would hope so if possible.
>>>>
>>>> On Fri, Nov 16, 2018, 4:36 AM Kenneth Knowles <kenn@apache.org wrote:
>>>>
>>>>> Just some low-level detail: If there is no @DefaultSchema annotation
>>>>> but it is an @AutoValue class, can schema inference go ahead with the
>>>>> AutoValueSchema? Then the user doesn't have to do anything.
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Wed, Nov 14, 2018 at 6:14 AM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>> We already have a framework for ByteBuddy codegen for JavaBean Row
>>>>>> interfaces, which should hopefully be easy to extend AutoValue (and more
>>>>>> efficient than using reflection). I'm working on adding constructor support
>>>>>> to this right now.
>>>>>>
>>>>>> On Wed, Nov 14, 2018 at 12:29 AM Jeff Klukas <jk...@mozilla.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Sounds, then, like we need to a define a new `AutoValueSchema
>>>>>>> extends SchemaProvider` and users would opt-in to this via the
>>>>>>> DefaultSchema annotation:
>>>>>>>
>>>>>>> @DefaultSchema(AutoValueSchema.class)
>>>>>>> @AutoValue
>>>>>>> public abstract MyClass ...
>>>>>>>
>>>>>>> Since we already have the JavaBean and JavaField reflection-based
>>>>>>> schema providers to use as a guide, it sounds like it may be best to try to
>>>>>>> implement this using reflection rather than implementing an AutoValue
>>>>>>> extension.
>>>>>>>
>>>>>>> A reflection-based approach here would hinge on being able to
>>>>>>> discover the package-private constructor for the concrete class and read
>>>>>>> its types. Those types would define the schema, and the fromRow
>>>>>>> impementation would call the discovered constructor.
>>>>>>>
>>>>>>> On Mon, Nov 12, 2018 at 10:02 AM Reuven Lax <re...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Nov 12, 2018 at 11:38 PM Jeff Klukas <jk...@mozilla.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Reuven - A SchemaProvider makes sense. It's not clear to me,
>>>>>>>>> though, whether that's more limited than a Coder. Do all values of the
>>>>>>>>> schema have to be simple types, or does Beam SQL support nested schemas?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Nested schemas, collection types (lists and maps), and collections
>>>>>>>> of nested types are all supported.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Put another way, would a user be able to create an AutoValue class
>>>>>>>>> comprised of simple types and then use that as a field inside another
>>>>>>>>> AutoValue class? I can see how that's possible with Coders, but not clear
>>>>>>>>> whether that's possible with Row schemas.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, this is explicitly supported.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Nov 9, 2018 at 8:22 PM Reuven Lax <re...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Jeff,
>>>>>>>>>>
>>>>>>>>>> I would suggest a slightly different approach. Instead of
>>>>>>>>>> generating a coder, writing a SchemaProvider that generates a schema for
>>>>>>>>>> AutoValue. Once a PCollection has a schema, a coder is not needed (as Beam
>>>>>>>>>> knows how to encode any type with a schema), and it will work seamlessly
>>>>>>>>>> with Beam SQL (in fact you don't need to write a transform to turn it into
>>>>>>>>>> a Row if a schema is registered).
>>>>>>>>>>
>>>>>>>>>> We already do this for POJOs and basic JavaBeans. I'm happy to
>>>>>>>>>> help do this for AutoValue.
>>>>>>>>>>
>>>>>>>>>> Reuven
>>>>>>>>>>
>>>>>>>>>> On Sat, Nov 10, 2018 at 5:50 AM Jeff Klukas <jk...@mozilla.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all - I'm looking for some review and commentary on a
>>>>>>>>>>> proposed design for providing built-in Coders for AutoValue classes.
>>>>>>>>>>> There's existing discussion in BEAM-1891 [0] about using AvroCoder, but
>>>>>>>>>>> that's blocked on incompatibility between AutoValue and Avro's reflection
>>>>>>>>>>> machinery that don't look resolvable.
>>>>>>>>>>>
>>>>>>>>>>> I wrote up a design document [1] that instead proposes using
>>>>>>>>>>> AutoValue's extension API to automatically generate a Coder for each
>>>>>>>>>>> AutoValue class that users generate. A similar technique could be used to
>>>>>>>>>>> generate conversions to and from Row for use with BeamSql.
>>>>>>>>>>>
>>>>>>>>>>> I'd appreciate review of the design and thoughts on whether this
>>>>>>>>>>> seems feasible to support within the Beam codebase. I may be missing a
>>>>>>>>>>> simpler approach.
>>>>>>>>>>>
>>>>>>>>>>> [0] https://issues.apache.org/jira/browse/BEAM-1891
>>>>>>>>>>> [1]
>>>>>>>>>>> https://docs.google.com/document/d/1ucoik4WzUDfilqIz3I1AuMHc1J8DE6iv7gaUCDI42BI/edit?usp=sharing
>>>>>>>>>>>
>>>>>>>>>>

Re: Design review for supporting AutoValue Coders and conversions to Row

Posted by Reuven Lax <re...@google.com>.
Thinking about this a bit more - I suspect we already have almost all the
code we need.

The code to infer a schema from a Java Bean will probably work with little
change on AutoValue, as it's essentially just a fancy Java Bean. The Java
Bean generated getters should also work. I think all that need to be done
is to generate a constructor. The tricky thing is that the order of fields
in the AutoValue_XXX constructor may not match the order of the fields in
the schema, so we will need to generate an intermediate constructor that
generates the correct call. (alternatively we can try and detect the schema
from the constructor instead of from the getters, which should give us a
schema with matching field order).

Reuven

On Thu, Nov 29, 2018 at 9:30 AM Reuven Lax <re...@google.com> wrote:

> https://github.com/apache/beam/pull/7147 starts adding the framework to
> do this (for POJOs we actually generate a constructor using ByteBuddy, but
> that might not be necessary for AutoValue).
>
> I would start by writing the inference from AutoVaue to a Schema. For
> example, see PojoUils::schemaFromPojoClass or
> JavaBeanUtils::schemaFromJavaBeanClass.
>
> Reuven
>
> On Mon, Nov 26, 2018 at 6:08 AM Jeff Klukas <jk...@mozilla.com> wrote:
>
>> Reuven - How is the work on constructor support for ByteBuddy codegen
>> going? Does it still look like that's going to be a feasible way forward
>> for generating schemas/coders for AutoValue classes?
>>
>> On Thu, Nov 15, 2018 at 4:37 PM Reuven Lax <re...@google.com> wrote:
>>
>>> I would hope so if possible.
>>>
>>> On Fri, Nov 16, 2018, 4:36 AM Kenneth Knowles <kenn@apache.org wrote:
>>>
>>>> Just some low-level detail: If there is no @DefaultSchema annotation
>>>> but it is an @AutoValue class, can schema inference go ahead with the
>>>> AutoValueSchema? Then the user doesn't have to do anything.
>>>>
>>>> Kenn
>>>>
>>>> On Wed, Nov 14, 2018 at 6:14 AM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> We already have a framework for ByteBuddy codegen for JavaBean Row
>>>>> interfaces, which should hopefully be easy to extend AutoValue (and more
>>>>> efficient than using reflection). I'm working on adding constructor support
>>>>> to this right now.
>>>>>
>>>>> On Wed, Nov 14, 2018 at 12:29 AM Jeff Klukas <jk...@mozilla.com>
>>>>> wrote:
>>>>>
>>>>>> Sounds, then, like we need to a define a new `AutoValueSchema extends
>>>>>> SchemaProvider` and users would opt-in to this via the DefaultSchema
>>>>>> annotation:
>>>>>>
>>>>>> @DefaultSchema(AutoValueSchema.class)
>>>>>> @AutoValue
>>>>>> public abstract MyClass ...
>>>>>>
>>>>>> Since we already have the JavaBean and JavaField reflection-based
>>>>>> schema providers to use as a guide, it sounds like it may be best to try to
>>>>>> implement this using reflection rather than implementing an AutoValue
>>>>>> extension.
>>>>>>
>>>>>> A reflection-based approach here would hinge on being able to
>>>>>> discover the package-private constructor for the concrete class and read
>>>>>> its types. Those types would define the schema, and the fromRow
>>>>>> impementation would call the discovered constructor.
>>>>>>
>>>>>> On Mon, Nov 12, 2018 at 10:02 AM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Nov 12, 2018 at 11:38 PM Jeff Klukas <jk...@mozilla.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Reuven - A SchemaProvider makes sense. It's not clear to me,
>>>>>>>> though, whether that's more limited than a Coder. Do all values of the
>>>>>>>> schema have to be simple types, or does Beam SQL support nested schemas?
>>>>>>>>
>>>>>>>
>>>>>>> Nested schemas, collection types (lists and maps), and collections
>>>>>>> of nested types are all supported.
>>>>>>>
>>>>>>>>
>>>>>>>> Put another way, would a user be able to create an AutoValue class
>>>>>>>> comprised of simple types and then use that as a field inside another
>>>>>>>> AutoValue class? I can see how that's possible with Coders, but not clear
>>>>>>>> whether that's possible with Row schemas.
>>>>>>>>
>>>>>>>
>>>>>>> Yes, this is explicitly supported.
>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Nov 9, 2018 at 8:22 PM Reuven Lax <re...@google.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Jeff,
>>>>>>>>>
>>>>>>>>> I would suggest a slightly different approach. Instead of
>>>>>>>>> generating a coder, writing a SchemaProvider that generates a schema for
>>>>>>>>> AutoValue. Once a PCollection has a schema, a coder is not needed (as Beam
>>>>>>>>> knows how to encode any type with a schema), and it will work seamlessly
>>>>>>>>> with Beam SQL (in fact you don't need to write a transform to turn it into
>>>>>>>>> a Row if a schema is registered).
>>>>>>>>>
>>>>>>>>> We already do this for POJOs and basic JavaBeans. I'm happy to
>>>>>>>>> help do this for AutoValue.
>>>>>>>>>
>>>>>>>>> Reuven
>>>>>>>>>
>>>>>>>>> On Sat, Nov 10, 2018 at 5:50 AM Jeff Klukas <jk...@mozilla.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all - I'm looking for some review and commentary on a proposed
>>>>>>>>>> design for providing built-in Coders for AutoValue classes. There's
>>>>>>>>>> existing discussion in BEAM-1891 [0] about using AvroCoder, but that's
>>>>>>>>>> blocked on incompatibility between AutoValue and Avro's reflection
>>>>>>>>>> machinery that don't look resolvable.
>>>>>>>>>>
>>>>>>>>>> I wrote up a design document [1] that instead proposes using
>>>>>>>>>> AutoValue's extension API to automatically generate a Coder for each
>>>>>>>>>> AutoValue class that users generate. A similar technique could be used to
>>>>>>>>>> generate conversions to and from Row for use with BeamSql.
>>>>>>>>>>
>>>>>>>>>> I'd appreciate review of the design and thoughts on whether this
>>>>>>>>>> seems feasible to support within the Beam codebase. I may be missing a
>>>>>>>>>> simpler approach.
>>>>>>>>>>
>>>>>>>>>> [0] https://issues.apache.org/jira/browse/BEAM-1891
>>>>>>>>>> [1]
>>>>>>>>>> https://docs.google.com/document/d/1ucoik4WzUDfilqIz3I1AuMHc1J8DE6iv7gaUCDI42BI/edit?usp=sharing
>>>>>>>>>>
>>>>>>>>>

Re: Design review for supporting AutoValue Coders and conversions to Row

Posted by Reuven Lax <re...@google.com>.
https://github.com/apache/beam/pull/7147 starts adding the framework to do
this (for POJOs we actually generate a constructor using ByteBuddy, but
that might not be necessary for AutoValue).

I would start by writing the inference from AutoVaue to a Schema. For
example, see PojoUils::schemaFromPojoClass or
JavaBeanUtils::schemaFromJavaBeanClass.

Reuven

On Mon, Nov 26, 2018 at 6:08 AM Jeff Klukas <jk...@mozilla.com> wrote:

> Reuven - How is the work on constructor support for ByteBuddy codegen
> going? Does it still look like that's going to be a feasible way forward
> for generating schemas/coders for AutoValue classes?
>
> On Thu, Nov 15, 2018 at 4:37 PM Reuven Lax <re...@google.com> wrote:
>
>> I would hope so if possible.
>>
>> On Fri, Nov 16, 2018, 4:36 AM Kenneth Knowles <kenn@apache.org wrote:
>>
>>> Just some low-level detail: If there is no @DefaultSchema annotation but
>>> it is an @AutoValue class, can schema inference go ahead with the
>>> AutoValueSchema? Then the user doesn't have to do anything.
>>>
>>> Kenn
>>>
>>> On Wed, Nov 14, 2018 at 6:14 AM Reuven Lax <re...@google.com> wrote:
>>>
>>>> We already have a framework for ByteBuddy codegen for JavaBean Row
>>>> interfaces, which should hopefully be easy to extend AutoValue (and more
>>>> efficient than using reflection). I'm working on adding constructor support
>>>> to this right now.
>>>>
>>>> On Wed, Nov 14, 2018 at 12:29 AM Jeff Klukas <jk...@mozilla.com>
>>>> wrote:
>>>>
>>>>> Sounds, then, like we need to a define a new `AutoValueSchema extends
>>>>> SchemaProvider` and users would opt-in to this via the DefaultSchema
>>>>> annotation:
>>>>>
>>>>> @DefaultSchema(AutoValueSchema.class)
>>>>> @AutoValue
>>>>> public abstract MyClass ...
>>>>>
>>>>> Since we already have the JavaBean and JavaField reflection-based
>>>>> schema providers to use as a guide, it sounds like it may be best to try to
>>>>> implement this using reflection rather than implementing an AutoValue
>>>>> extension.
>>>>>
>>>>> A reflection-based approach here would hinge on being able to discover
>>>>> the package-private constructor for the concrete class and read its types.
>>>>> Those types would define the schema, and the fromRow impementation would
>>>>> call the discovered constructor.
>>>>>
>>>>> On Mon, Nov 12, 2018 at 10:02 AM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Nov 12, 2018 at 11:38 PM Jeff Klukas <jk...@mozilla.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Reuven - A SchemaProvider makes sense. It's not clear to me, though,
>>>>>>> whether that's more limited than a Coder. Do all values of the schema have
>>>>>>> to be simple types, or does Beam SQL support nested schemas?
>>>>>>>
>>>>>>
>>>>>> Nested schemas, collection types (lists and maps), and collections of
>>>>>> nested types are all supported.
>>>>>>
>>>>>>>
>>>>>>> Put another way, would a user be able to create an AutoValue class
>>>>>>> comprised of simple types and then use that as a field inside another
>>>>>>> AutoValue class? I can see how that's possible with Coders, but not clear
>>>>>>> whether that's possible with Row schemas.
>>>>>>>
>>>>>>
>>>>>> Yes, this is explicitly supported.
>>>>>>
>>>>>>>
>>>>>>> On Fri, Nov 9, 2018 at 8:22 PM Reuven Lax <re...@google.com> wrote:
>>>>>>>
>>>>>>>> Hi Jeff,
>>>>>>>>
>>>>>>>> I would suggest a slightly different approach. Instead of
>>>>>>>> generating a coder, writing a SchemaProvider that generates a schema for
>>>>>>>> AutoValue. Once a PCollection has a schema, a coder is not needed (as Beam
>>>>>>>> knows how to encode any type with a schema), and it will work seamlessly
>>>>>>>> with Beam SQL (in fact you don't need to write a transform to turn it into
>>>>>>>> a Row if a schema is registered).
>>>>>>>>
>>>>>>>> We already do this for POJOs and basic JavaBeans. I'm happy to help
>>>>>>>> do this for AutoValue.
>>>>>>>>
>>>>>>>> Reuven
>>>>>>>>
>>>>>>>> On Sat, Nov 10, 2018 at 5:50 AM Jeff Klukas <jk...@mozilla.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi all - I'm looking for some review and commentary on a proposed
>>>>>>>>> design for providing built-in Coders for AutoValue classes. There's
>>>>>>>>> existing discussion in BEAM-1891 [0] about using AvroCoder, but that's
>>>>>>>>> blocked on incompatibility between AutoValue and Avro's reflection
>>>>>>>>> machinery that don't look resolvable.
>>>>>>>>>
>>>>>>>>> I wrote up a design document [1] that instead proposes using
>>>>>>>>> AutoValue's extension API to automatically generate a Coder for each
>>>>>>>>> AutoValue class that users generate. A similar technique could be used to
>>>>>>>>> generate conversions to and from Row for use with BeamSql.
>>>>>>>>>
>>>>>>>>> I'd appreciate review of the design and thoughts on whether this
>>>>>>>>> seems feasible to support within the Beam codebase. I may be missing a
>>>>>>>>> simpler approach.
>>>>>>>>>
>>>>>>>>> [0] https://issues.apache.org/jira/browse/BEAM-1891
>>>>>>>>> [1]
>>>>>>>>> https://docs.google.com/document/d/1ucoik4WzUDfilqIz3I1AuMHc1J8DE6iv7gaUCDI42BI/edit?usp=sharing
>>>>>>>>>
>>>>>>>>

Re: Design review for supporting AutoValue Coders and conversions to Row

Posted by Jeff Klukas <jk...@mozilla.com>.
Reuven - How is the work on constructor support for ByteBuddy codegen
going? Does it still look like that's going to be a feasible way forward
for generating schemas/coders for AutoValue classes?

On Thu, Nov 15, 2018 at 4:37 PM Reuven Lax <re...@google.com> wrote:

> I would hope so if possible.
>
> On Fri, Nov 16, 2018, 4:36 AM Kenneth Knowles <kenn@apache.org wrote:
>
>> Just some low-level detail: If there is no @DefaultSchema annotation but
>> it is an @AutoValue class, can schema inference go ahead with the
>> AutoValueSchema? Then the user doesn't have to do anything.
>>
>> Kenn
>>
>> On Wed, Nov 14, 2018 at 6:14 AM Reuven Lax <re...@google.com> wrote:
>>
>>> We already have a framework for ByteBuddy codegen for JavaBean Row
>>> interfaces, which should hopefully be easy to extend AutoValue (and more
>>> efficient than using reflection). I'm working on adding constructor support
>>> to this right now.
>>>
>>> On Wed, Nov 14, 2018 at 12:29 AM Jeff Klukas <jk...@mozilla.com>
>>> wrote:
>>>
>>>> Sounds, then, like we need to a define a new `AutoValueSchema extends
>>>> SchemaProvider` and users would opt-in to this via the DefaultSchema
>>>> annotation:
>>>>
>>>> @DefaultSchema(AutoValueSchema.class)
>>>> @AutoValue
>>>> public abstract MyClass ...
>>>>
>>>> Since we already have the JavaBean and JavaField reflection-based
>>>> schema providers to use as a guide, it sounds like it may be best to try to
>>>> implement this using reflection rather than implementing an AutoValue
>>>> extension.
>>>>
>>>> A reflection-based approach here would hinge on being able to discover
>>>> the package-private constructor for the concrete class and read its types.
>>>> Those types would define the schema, and the fromRow impementation would
>>>> call the discovered constructor.
>>>>
>>>> On Mon, Nov 12, 2018 at 10:02 AM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Nov 12, 2018 at 11:38 PM Jeff Klukas <jk...@mozilla.com>
>>>>> wrote:
>>>>>
>>>>>> Reuven - A SchemaProvider makes sense. It's not clear to me, though,
>>>>>> whether that's more limited than a Coder. Do all values of the schema have
>>>>>> to be simple types, or does Beam SQL support nested schemas?
>>>>>>
>>>>>
>>>>> Nested schemas, collection types (lists and maps), and collections of
>>>>> nested types are all supported.
>>>>>
>>>>>>
>>>>>> Put another way, would a user be able to create an AutoValue class
>>>>>> comprised of simple types and then use that as a field inside another
>>>>>> AutoValue class? I can see how that's possible with Coders, but not clear
>>>>>> whether that's possible with Row schemas.
>>>>>>
>>>>>
>>>>> Yes, this is explicitly supported.
>>>>>
>>>>>>
>>>>>> On Fri, Nov 9, 2018 at 8:22 PM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>> Hi Jeff,
>>>>>>>
>>>>>>> I would suggest a slightly different approach. Instead of generating
>>>>>>> a coder, writing a SchemaProvider that generates a schema for AutoValue.
>>>>>>> Once a PCollection has a schema, a coder is not needed (as Beam knows how
>>>>>>> to encode any type with a schema), and it will work seamlessly with Beam
>>>>>>> SQL (in fact you don't need to write a transform to turn it into a Row if a
>>>>>>> schema is registered).
>>>>>>>
>>>>>>> We already do this for POJOs and basic JavaBeans. I'm happy to help
>>>>>>> do this for AutoValue.
>>>>>>>
>>>>>>> Reuven
>>>>>>>
>>>>>>> On Sat, Nov 10, 2018 at 5:50 AM Jeff Klukas <jk...@mozilla.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi all - I'm looking for some review and commentary on a proposed
>>>>>>>> design for providing built-in Coders for AutoValue classes. There's
>>>>>>>> existing discussion in BEAM-1891 [0] about using AvroCoder, but that's
>>>>>>>> blocked on incompatibility between AutoValue and Avro's reflection
>>>>>>>> machinery that don't look resolvable.
>>>>>>>>
>>>>>>>> I wrote up a design document [1] that instead proposes using
>>>>>>>> AutoValue's extension API to automatically generate a Coder for each
>>>>>>>> AutoValue class that users generate. A similar technique could be used to
>>>>>>>> generate conversions to and from Row for use with BeamSql.
>>>>>>>>
>>>>>>>> I'd appreciate review of the design and thoughts on whether this
>>>>>>>> seems feasible to support within the Beam codebase. I may be missing a
>>>>>>>> simpler approach.
>>>>>>>>
>>>>>>>> [0] https://issues.apache.org/jira/browse/BEAM-1891
>>>>>>>> [1]
>>>>>>>> https://docs.google.com/document/d/1ucoik4WzUDfilqIz3I1AuMHc1J8DE6iv7gaUCDI42BI/edit?usp=sharing
>>>>>>>>
>>>>>>>

Re: Design review for supporting AutoValue Coders and conversions to Row

Posted by Reuven Lax <re...@google.com>.
I would hope so if possible.

On Fri, Nov 16, 2018, 4:36 AM Kenneth Knowles <kenn@apache.org wrote:

> Just some low-level detail: If there is no @DefaultSchema annotation but
> it is an @AutoValue class, can schema inference go ahead with the
> AutoValueSchema? Then the user doesn't have to do anything.
>
> Kenn
>
> On Wed, Nov 14, 2018 at 6:14 AM Reuven Lax <re...@google.com> wrote:
>
>> We already have a framework for ByteBuddy codegen for JavaBean Row
>> interfaces, which should hopefully be easy to extend AutoValue (and more
>> efficient than using reflection). I'm working on adding constructor support
>> to this right now.
>>
>> On Wed, Nov 14, 2018 at 12:29 AM Jeff Klukas <jk...@mozilla.com> wrote:
>>
>>> Sounds, then, like we need to a define a new `AutoValueSchema extends
>>> SchemaProvider` and users would opt-in to this via the DefaultSchema
>>> annotation:
>>>
>>> @DefaultSchema(AutoValueSchema.class)
>>> @AutoValue
>>> public abstract MyClass ...
>>>
>>> Since we already have the JavaBean and JavaField reflection-based schema
>>> providers to use as a guide, it sounds like it may be best to try to
>>> implement this using reflection rather than implementing an AutoValue
>>> extension.
>>>
>>> A reflection-based approach here would hinge on being able to discover
>>> the package-private constructor for the concrete class and read its types.
>>> Those types would define the schema, and the fromRow impementation would
>>> call the discovered constructor.
>>>
>>> On Mon, Nov 12, 2018 at 10:02 AM Reuven Lax <re...@google.com> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Nov 12, 2018 at 11:38 PM Jeff Klukas <jk...@mozilla.com>
>>>> wrote:
>>>>
>>>>> Reuven - A SchemaProvider makes sense. It's not clear to me, though,
>>>>> whether that's more limited than a Coder. Do all values of the schema have
>>>>> to be simple types, or does Beam SQL support nested schemas?
>>>>>
>>>>
>>>> Nested schemas, collection types (lists and maps), and collections of
>>>> nested types are all supported.
>>>>
>>>>>
>>>>> Put another way, would a user be able to create an AutoValue class
>>>>> comprised of simple types and then use that as a field inside another
>>>>> AutoValue class? I can see how that's possible with Coders, but not clear
>>>>> whether that's possible with Row schemas.
>>>>>
>>>>
>>>> Yes, this is explicitly supported.
>>>>
>>>>>
>>>>> On Fri, Nov 9, 2018 at 8:22 PM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>> Hi Jeff,
>>>>>>
>>>>>> I would suggest a slightly different approach. Instead of generating
>>>>>> a coder, writing a SchemaProvider that generates a schema for AutoValue.
>>>>>> Once a PCollection has a schema, a coder is not needed (as Beam knows how
>>>>>> to encode any type with a schema), and it will work seamlessly with Beam
>>>>>> SQL (in fact you don't need to write a transform to turn it into a Row if a
>>>>>> schema is registered).
>>>>>>
>>>>>> We already do this for POJOs and basic JavaBeans. I'm happy to help
>>>>>> do this for AutoValue.
>>>>>>
>>>>>> Reuven
>>>>>>
>>>>>> On Sat, Nov 10, 2018 at 5:50 AM Jeff Klukas <jk...@mozilla.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all - I'm looking for some review and commentary on a proposed
>>>>>>> design for providing built-in Coders for AutoValue classes. There's
>>>>>>> existing discussion in BEAM-1891 [0] about using AvroCoder, but that's
>>>>>>> blocked on incompatibility between AutoValue and Avro's reflection
>>>>>>> machinery that don't look resolvable.
>>>>>>>
>>>>>>> I wrote up a design document [1] that instead proposes using
>>>>>>> AutoValue's extension API to automatically generate a Coder for each
>>>>>>> AutoValue class that users generate. A similar technique could be used to
>>>>>>> generate conversions to and from Row for use with BeamSql.
>>>>>>>
>>>>>>> I'd appreciate review of the design and thoughts on whether this
>>>>>>> seems feasible to support within the Beam codebase. I may be missing a
>>>>>>> simpler approach.
>>>>>>>
>>>>>>> [0] https://issues.apache.org/jira/browse/BEAM-1891
>>>>>>> [1]
>>>>>>> https://docs.google.com/document/d/1ucoik4WzUDfilqIz3I1AuMHc1J8DE6iv7gaUCDI42BI/edit?usp=sharing
>>>>>>>
>>>>>>

Re: Design review for supporting AutoValue Coders and conversions to Row

Posted by Anton Kedin <ke...@google.com>.
One reason is that @AutoValue is not guaranteed to be retained at runtime:
https://github.com/google/auto/blob/master/value/src/main/java/com/google/auto/value/AutoValue.java#L44


On Thu, Nov 15, 2018 at 11:36 AM Kenneth Knowles <ke...@apache.org> wrote:

> Just some low-level detail: If there is no @DefaultSchema annotation but
> it is an @AutoValue class, can schema inference go ahead with the
> AutoValueSchema? Then the user doesn't have to do anything.
>
> Kenn
>
> On Wed, Nov 14, 2018 at 6:14 AM Reuven Lax <re...@google.com> wrote:
>
>> We already have a framework for ByteBuddy codegen for JavaBean Row
>> interfaces, which should hopefully be easy to extend AutoValue (and more
>> efficient than using reflection). I'm working on adding constructor support
>> to this right now.
>>
>> On Wed, Nov 14, 2018 at 12:29 AM Jeff Klukas <jk...@mozilla.com> wrote:
>>
>>> Sounds, then, like we need to a define a new `AutoValueSchema extends
>>> SchemaProvider` and users would opt-in to this via the DefaultSchema
>>> annotation:
>>>
>>> @DefaultSchema(AutoValueSchema.class)
>>> @AutoValue
>>> public abstract MyClass ...
>>>
>>> Since we already have the JavaBean and JavaField reflection-based schema
>>> providers to use as a guide, it sounds like it may be best to try to
>>> implement this using reflection rather than implementing an AutoValue
>>> extension.
>>>
>>> A reflection-based approach here would hinge on being able to discover
>>> the package-private constructor for the concrete class and read its types.
>>> Those types would define the schema, and the fromRow impementation would
>>> call the discovered constructor.
>>>
>>> On Mon, Nov 12, 2018 at 10:02 AM Reuven Lax <re...@google.com> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Nov 12, 2018 at 11:38 PM Jeff Klukas <jk...@mozilla.com>
>>>> wrote:
>>>>
>>>>> Reuven - A SchemaProvider makes sense. It's not clear to me, though,
>>>>> whether that's more limited than a Coder. Do all values of the schema have
>>>>> to be simple types, or does Beam SQL support nested schemas?
>>>>>
>>>>
>>>> Nested schemas, collection types (lists and maps), and collections of
>>>> nested types are all supported.
>>>>
>>>>>
>>>>> Put another way, would a user be able to create an AutoValue class
>>>>> comprised of simple types and then use that as a field inside another
>>>>> AutoValue class? I can see how that's possible with Coders, but not clear
>>>>> whether that's possible with Row schemas.
>>>>>
>>>>
>>>> Yes, this is explicitly supported.
>>>>
>>>>>
>>>>> On Fri, Nov 9, 2018 at 8:22 PM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>> Hi Jeff,
>>>>>>
>>>>>> I would suggest a slightly different approach. Instead of generating
>>>>>> a coder, writing a SchemaProvider that generates a schema for AutoValue.
>>>>>> Once a PCollection has a schema, a coder is not needed (as Beam knows how
>>>>>> to encode any type with a schema), and it will work seamlessly with Beam
>>>>>> SQL (in fact you don't need to write a transform to turn it into a Row if a
>>>>>> schema is registered).
>>>>>>
>>>>>> We already do this for POJOs and basic JavaBeans. I'm happy to help
>>>>>> do this for AutoValue.
>>>>>>
>>>>>> Reuven
>>>>>>
>>>>>> On Sat, Nov 10, 2018 at 5:50 AM Jeff Klukas <jk...@mozilla.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all - I'm looking for some review and commentary on a proposed
>>>>>>> design for providing built-in Coders for AutoValue classes. There's
>>>>>>> existing discussion in BEAM-1891 [0] about using AvroCoder, but that's
>>>>>>> blocked on incompatibility between AutoValue and Avro's reflection
>>>>>>> machinery that don't look resolvable.
>>>>>>>
>>>>>>> I wrote up a design document [1] that instead proposes using
>>>>>>> AutoValue's extension API to automatically generate a Coder for each
>>>>>>> AutoValue class that users generate. A similar technique could be used to
>>>>>>> generate conversions to and from Row for use with BeamSql.
>>>>>>>
>>>>>>> I'd appreciate review of the design and thoughts on whether this
>>>>>>> seems feasible to support within the Beam codebase. I may be missing a
>>>>>>> simpler approach.
>>>>>>>
>>>>>>> [0] https://issues.apache.org/jira/browse/BEAM-1891
>>>>>>> [1]
>>>>>>> https://docs.google.com/document/d/1ucoik4WzUDfilqIz3I1AuMHc1J8DE6iv7gaUCDI42BI/edit?usp=sharing
>>>>>>>
>>>>>>

Re: Design review for supporting AutoValue Coders and conversions to Row

Posted by Kenneth Knowles <ke...@apache.org>.
Just some low-level detail: If there is no @DefaultSchema annotation but it
is an @AutoValue class, can schema inference go ahead with the
AutoValueSchema? Then the user doesn't have to do anything.

Kenn

On Wed, Nov 14, 2018 at 6:14 AM Reuven Lax <re...@google.com> wrote:

> We already have a framework for ByteBuddy codegen for JavaBean Row
> interfaces, which should hopefully be easy to extend AutoValue (and more
> efficient than using reflection). I'm working on adding constructor support
> to this right now.
>
> On Wed, Nov 14, 2018 at 12:29 AM Jeff Klukas <jk...@mozilla.com> wrote:
>
>> Sounds, then, like we need to a define a new `AutoValueSchema extends
>> SchemaProvider` and users would opt-in to this via the DefaultSchema
>> annotation:
>>
>> @DefaultSchema(AutoValueSchema.class)
>> @AutoValue
>> public abstract MyClass ...
>>
>> Since we already have the JavaBean and JavaField reflection-based schema
>> providers to use as a guide, it sounds like it may be best to try to
>> implement this using reflection rather than implementing an AutoValue
>> extension.
>>
>> A reflection-based approach here would hinge on being able to discover
>> the package-private constructor for the concrete class and read its types.
>> Those types would define the schema, and the fromRow impementation would
>> call the discovered constructor.
>>
>> On Mon, Nov 12, 2018 at 10:02 AM Reuven Lax <re...@google.com> wrote:
>>
>>>
>>>
>>> On Mon, Nov 12, 2018 at 11:38 PM Jeff Klukas <jk...@mozilla.com>
>>> wrote:
>>>
>>>> Reuven - A SchemaProvider makes sense. It's not clear to me, though,
>>>> whether that's more limited than a Coder. Do all values of the schema have
>>>> to be simple types, or does Beam SQL support nested schemas?
>>>>
>>>
>>> Nested schemas, collection types (lists and maps), and collections of
>>> nested types are all supported.
>>>
>>>>
>>>> Put another way, would a user be able to create an AutoValue class
>>>> comprised of simple types and then use that as a field inside another
>>>> AutoValue class? I can see how that's possible with Coders, but not clear
>>>> whether that's possible with Row schemas.
>>>>
>>>
>>> Yes, this is explicitly supported.
>>>
>>>>
>>>> On Fri, Nov 9, 2018 at 8:22 PM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> Hi Jeff,
>>>>>
>>>>> I would suggest a slightly different approach. Instead of generating a
>>>>> coder, writing a SchemaProvider that generates a schema for AutoValue. Once
>>>>> a PCollection has a schema, a coder is not needed (as Beam knows how to
>>>>> encode any type with a schema), and it will work seamlessly with Beam SQL
>>>>> (in fact you don't need to write a transform to turn it into a Row if a
>>>>> schema is registered).
>>>>>
>>>>> We already do this for POJOs and basic JavaBeans. I'm happy to help do
>>>>> this for AutoValue.
>>>>>
>>>>> Reuven
>>>>>
>>>>> On Sat, Nov 10, 2018 at 5:50 AM Jeff Klukas <jk...@mozilla.com>
>>>>> wrote:
>>>>>
>>>>>> Hi all - I'm looking for some review and commentary on a proposed
>>>>>> design for providing built-in Coders for AutoValue classes. There's
>>>>>> existing discussion in BEAM-1891 [0] about using AvroCoder, but that's
>>>>>> blocked on incompatibility between AutoValue and Avro's reflection
>>>>>> machinery that don't look resolvable.
>>>>>>
>>>>>> I wrote up a design document [1] that instead proposes using
>>>>>> AutoValue's extension API to automatically generate a Coder for each
>>>>>> AutoValue class that users generate. A similar technique could be used to
>>>>>> generate conversions to and from Row for use with BeamSql.
>>>>>>
>>>>>> I'd appreciate review of the design and thoughts on whether this
>>>>>> seems feasible to support within the Beam codebase. I may be missing a
>>>>>> simpler approach.
>>>>>>
>>>>>> [0] https://issues.apache.org/jira/browse/BEAM-1891
>>>>>> [1]
>>>>>> https://docs.google.com/document/d/1ucoik4WzUDfilqIz3I1AuMHc1J8DE6iv7gaUCDI42BI/edit?usp=sharing
>>>>>>
>>>>>

Re: Design review for supporting AutoValue Coders and conversions to Row

Posted by Reuven Lax <re...@google.com>.
We already have a framework for ByteBuddy codegen for JavaBean Row
interfaces, which should hopefully be easy to extend AutoValue (and more
efficient than using reflection). I'm working on adding constructor support
to this right now.

On Wed, Nov 14, 2018 at 12:29 AM Jeff Klukas <jk...@mozilla.com> wrote:

> Sounds, then, like we need to a define a new `AutoValueSchema extends
> SchemaProvider` and users would opt-in to this via the DefaultSchema
> annotation:
>
> @DefaultSchema(AutoValueSchema.class)
> @AutoValue
> public abstract MyClass ...
>
> Since we already have the JavaBean and JavaField reflection-based schema
> providers to use as a guide, it sounds like it may be best to try to
> implement this using reflection rather than implementing an AutoValue
> extension.
>
> A reflection-based approach here would hinge on being able to discover the
> package-private constructor for the concrete class and read its types.
> Those types would define the schema, and the fromRow impementation would
> call the discovered constructor.
>
> On Mon, Nov 12, 2018 at 10:02 AM Reuven Lax <re...@google.com> wrote:
>
>>
>>
>> On Mon, Nov 12, 2018 at 11:38 PM Jeff Klukas <jk...@mozilla.com> wrote:
>>
>>> Reuven - A SchemaProvider makes sense. It's not clear to me, though,
>>> whether that's more limited than a Coder. Do all values of the schema have
>>> to be simple types, or does Beam SQL support nested schemas?
>>>
>>
>> Nested schemas, collection types (lists and maps), and collections of
>> nested types are all supported.
>>
>>>
>>> Put another way, would a user be able to create an AutoValue class
>>> comprised of simple types and then use that as a field inside another
>>> AutoValue class? I can see how that's possible with Coders, but not clear
>>> whether that's possible with Row schemas.
>>>
>>
>> Yes, this is explicitly supported.
>>
>>>
>>> On Fri, Nov 9, 2018 at 8:22 PM Reuven Lax <re...@google.com> wrote:
>>>
>>>> Hi Jeff,
>>>>
>>>> I would suggest a slightly different approach. Instead of generating a
>>>> coder, writing a SchemaProvider that generates a schema for AutoValue. Once
>>>> a PCollection has a schema, a coder is not needed (as Beam knows how to
>>>> encode any type with a schema), and it will work seamlessly with Beam SQL
>>>> (in fact you don't need to write a transform to turn it into a Row if a
>>>> schema is registered).
>>>>
>>>> We already do this for POJOs and basic JavaBeans. I'm happy to help do
>>>> this for AutoValue.
>>>>
>>>> Reuven
>>>>
>>>> On Sat, Nov 10, 2018 at 5:50 AM Jeff Klukas <jk...@mozilla.com>
>>>> wrote:
>>>>
>>>>> Hi all - I'm looking for some review and commentary on a proposed
>>>>> design for providing built-in Coders for AutoValue classes. There's
>>>>> existing discussion in BEAM-1891 [0] about using AvroCoder, but that's
>>>>> blocked on incompatibility between AutoValue and Avro's reflection
>>>>> machinery that don't look resolvable.
>>>>>
>>>>> I wrote up a design document [1] that instead proposes using
>>>>> AutoValue's extension API to automatically generate a Coder for each
>>>>> AutoValue class that users generate. A similar technique could be used to
>>>>> generate conversions to and from Row for use with BeamSql.
>>>>>
>>>>> I'd appreciate review of the design and thoughts on whether this seems
>>>>> feasible to support within the Beam codebase. I may be missing a simpler
>>>>> approach.
>>>>>
>>>>> [0] https://issues.apache.org/jira/browse/BEAM-1891
>>>>> [1]
>>>>> https://docs.google.com/document/d/1ucoik4WzUDfilqIz3I1AuMHc1J8DE6iv7gaUCDI42BI/edit?usp=sharing
>>>>>
>>>>

Re: Design review for supporting AutoValue Coders and conversions to Row

Posted by Jeff Klukas <jk...@mozilla.com>.
Sounds, then, like we need to a define a new `AutoValueSchema extends
SchemaProvider` and users would opt-in to this via the DefaultSchema
annotation:

@DefaultSchema(AutoValueSchema.class)
@AutoValue
public abstract MyClass ...

Since we already have the JavaBean and JavaField reflection-based schema
providers to use as a guide, it sounds like it may be best to try to
implement this using reflection rather than implementing an AutoValue
extension.

A reflection-based approach here would hinge on being able to discover the
package-private constructor for the concrete class and read its types.
Those types would define the schema, and the fromRow impementation would
call the discovered constructor.

On Mon, Nov 12, 2018 at 10:02 AM Reuven Lax <re...@google.com> wrote:

>
>
> On Mon, Nov 12, 2018 at 11:38 PM Jeff Klukas <jk...@mozilla.com> wrote:
>
>> Reuven - A SchemaProvider makes sense. It's not clear to me, though,
>> whether that's more limited than a Coder. Do all values of the schema have
>> to be simple types, or does Beam SQL support nested schemas?
>>
>
> Nested schemas, collection types (lists and maps), and collections of
> nested types are all supported.
>
>>
>> Put another way, would a user be able to create an AutoValue class
>> comprised of simple types and then use that as a field inside another
>> AutoValue class? I can see how that's possible with Coders, but not clear
>> whether that's possible with Row schemas.
>>
>
> Yes, this is explicitly supported.
>
>>
>> On Fri, Nov 9, 2018 at 8:22 PM Reuven Lax <re...@google.com> wrote:
>>
>>> Hi Jeff,
>>>
>>> I would suggest a slightly different approach. Instead of generating a
>>> coder, writing a SchemaProvider that generates a schema for AutoValue. Once
>>> a PCollection has a schema, a coder is not needed (as Beam knows how to
>>> encode any type with a schema), and it will work seamlessly with Beam SQL
>>> (in fact you don't need to write a transform to turn it into a Row if a
>>> schema is registered).
>>>
>>> We already do this for POJOs and basic JavaBeans. I'm happy to help do
>>> this for AutoValue.
>>>
>>> Reuven
>>>
>>> On Sat, Nov 10, 2018 at 5:50 AM Jeff Klukas <jk...@mozilla.com> wrote:
>>>
>>>> Hi all - I'm looking for some review and commentary on a proposed
>>>> design for providing built-in Coders for AutoValue classes. There's
>>>> existing discussion in BEAM-1891 [0] about using AvroCoder, but that's
>>>> blocked on incompatibility between AutoValue and Avro's reflection
>>>> machinery that don't look resolvable.
>>>>
>>>> I wrote up a design document [1] that instead proposes using
>>>> AutoValue's extension API to automatically generate a Coder for each
>>>> AutoValue class that users generate. A similar technique could be used to
>>>> generate conversions to and from Row for use with BeamSql.
>>>>
>>>> I'd appreciate review of the design and thoughts on whether this seems
>>>> feasible to support within the Beam codebase. I may be missing a simpler
>>>> approach.
>>>>
>>>> [0] https://issues.apache.org/jira/browse/BEAM-1891
>>>> [1]
>>>> https://docs.google.com/document/d/1ucoik4WzUDfilqIz3I1AuMHc1J8DE6iv7gaUCDI42BI/edit?usp=sharing
>>>>
>>>

Re: Design review for supporting AutoValue Coders and conversions to Row

Posted by Reuven Lax <re...@google.com>.
On Mon, Nov 12, 2018 at 11:38 PM Jeff Klukas <jk...@mozilla.com> wrote:

> Reuven - A SchemaProvider makes sense. It's not clear to me, though,
> whether that's more limited than a Coder. Do all values of the schema have
> to be simple types, or does Beam SQL support nested schemas?
>

Nested schemas, collection types (lists and maps), and collections of
nested types are all supported.

>
> Put another way, would a user be able to create an AutoValue class
> comprised of simple types and then use that as a field inside another
> AutoValue class? I can see how that's possible with Coders, but not clear
> whether that's possible with Row schemas.
>

Yes, this is explicitly supported.

>
> On Fri, Nov 9, 2018 at 8:22 PM Reuven Lax <re...@google.com> wrote:
>
>> Hi Jeff,
>>
>> I would suggest a slightly different approach. Instead of generating a
>> coder, writing a SchemaProvider that generates a schema for AutoValue. Once
>> a PCollection has a schema, a coder is not needed (as Beam knows how to
>> encode any type with a schema), and it will work seamlessly with Beam SQL
>> (in fact you don't need to write a transform to turn it into a Row if a
>> schema is registered).
>>
>> We already do this for POJOs and basic JavaBeans. I'm happy to help do
>> this for AutoValue.
>>
>> Reuven
>>
>> On Sat, Nov 10, 2018 at 5:50 AM Jeff Klukas <jk...@mozilla.com> wrote:
>>
>>> Hi all - I'm looking for some review and commentary on a proposed design
>>> for providing built-in Coders for AutoValue classes. There's existing
>>> discussion in BEAM-1891 [0] about using AvroCoder, but that's blocked on
>>> incompatibility between AutoValue and Avro's reflection machinery that
>>> don't look resolvable.
>>>
>>> I wrote up a design document [1] that instead proposes using AutoValue's
>>> extension API to automatically generate a Coder for each AutoValue class
>>> that users generate. A similar technique could be used to generate
>>> conversions to and from Row for use with BeamSql.
>>>
>>> I'd appreciate review of the design and thoughts on whether this seems
>>> feasible to support within the Beam codebase. I may be missing a simpler
>>> approach.
>>>
>>> [0] https://issues.apache.org/jira/browse/BEAM-1891
>>> [1]
>>> https://docs.google.com/document/d/1ucoik4WzUDfilqIz3I1AuMHc1J8DE6iv7gaUCDI42BI/edit?usp=sharing
>>>
>>

Re: Design review for supporting AutoValue Coders and conversions to Row

Posted by Jeff Klukas <jk...@mozilla.com>.
Reuven - A SchemaProvider makes sense. It's not clear to me, though,
whether that's more limited than a Coder. Do all values of the schema have
to be simple types, or does Beam SQL support nested schemas?

Put another way, would a user be able to create an AutoValue class
comprised of simple types and then use that as a field inside another
AutoValue class? I can see how that's possible with Coders, but not clear
whether that's possible with Row schemas.

On Fri, Nov 9, 2018 at 8:22 PM Reuven Lax <re...@google.com> wrote:

> Hi Jeff,
>
> I would suggest a slightly different approach. Instead of generating a
> coder, writing a SchemaProvider that generates a schema for AutoValue. Once
> a PCollection has a schema, a coder is not needed (as Beam knows how to
> encode any type with a schema), and it will work seamlessly with Beam SQL
> (in fact you don't need to write a transform to turn it into a Row if a
> schema is registered).
>
> We already do this for POJOs and basic JavaBeans. I'm happy to help do
> this for AutoValue.
>
> Reuven
>
> On Sat, Nov 10, 2018 at 5:50 AM Jeff Klukas <jk...@mozilla.com> wrote:
>
>> Hi all - I'm looking for some review and commentary on a proposed design
>> for providing built-in Coders for AutoValue classes. There's existing
>> discussion in BEAM-1891 [0] about using AvroCoder, but that's blocked on
>> incompatibility between AutoValue and Avro's reflection machinery that
>> don't look resolvable.
>>
>> I wrote up a design document [1] that instead proposes using AutoValue's
>> extension API to automatically generate a Coder for each AutoValue class
>> that users generate. A similar technique could be used to generate
>> conversions to and from Row for use with BeamSql.
>>
>> I'd appreciate review of the design and thoughts on whether this seems
>> feasible to support within the Beam codebase. I may be missing a simpler
>> approach.
>>
>> [0] https://issues.apache.org/jira/browse/BEAM-1891
>> [1]
>> https://docs.google.com/document/d/1ucoik4WzUDfilqIz3I1AuMHc1J8DE6iv7gaUCDI42BI/edit?usp=sharing
>>
>

Re: Design review for supporting AutoValue Coders and conversions to Row

Posted by Reuven Lax <re...@google.com>.
Hi Jeff,

I would suggest a slightly different approach. Instead of generating a
coder, writing a SchemaProvider that generates a schema for AutoValue. Once
a PCollection has a schema, a coder is not needed (as Beam knows how to
encode any type with a schema), and it will work seamlessly with Beam SQL
(in fact you don't need to write a transform to turn it into a Row if a
schema is registered).

We already do this for POJOs and basic JavaBeans. I'm happy to help do this
for AutoValue.

Reuven

On Sat, Nov 10, 2018 at 5:50 AM Jeff Klukas <jk...@mozilla.com> wrote:

> Hi all - I'm looking for some review and commentary on a proposed design
> for providing built-in Coders for AutoValue classes. There's existing
> discussion in BEAM-1891 [0] about using AvroCoder, but that's blocked on
> incompatibility between AutoValue and Avro's reflection machinery that
> don't look resolvable.
>
> I wrote up a design document [1] that instead proposes using AutoValue's
> extension API to automatically generate a Coder for each AutoValue class
> that users generate. A similar technique could be used to generate
> conversions to and from Row for use with BeamSql.
>
> I'd appreciate review of the design and thoughts on whether this seems
> feasible to support within the Beam codebase. I may be missing a simpler
> approach.
>
> [0] https://issues.apache.org/jira/browse/BEAM-1891
> [1]
> https://docs.google.com/document/d/1ucoik4WzUDfilqIz3I1AuMHc1J8DE6iv7gaUCDI42BI/edit?usp=sharing
>