You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Alex Van Boxel <al...@vanboxel.be> on 2019/05/14 16:00:24 UTC

Schema is not final... is it allowed to override

Hi Schema lovers,

I'm implementing schema support for Protobuf and I was wondering if it's
allowed to override Schema. It looks tempting (as it's not final), as I
need a container for the Proto Descriptor.

For normal pre-compiled classes it's not required, but for DynamicMessage
it is. If I would be able to store it in Schema I can reuse it in
to/fromRow.

Thoughts?

 _/
_/ Alex Van Boxel

Re: Schema is not final... is it allowed to override

Posted by Alex Van Boxel <al...@vanboxel.be>.
Yes, the schema is build at Pipeline construction time, I'm pushing all of
the *complexity to the To/FromRow* functions. They will have an overlay of
the fields that convert to and from Proto. Work is moving alone file for
Proto support, working now:
- Primitives
- Wrapper types
- Timestamp

Everything will be ready for the Beam Summit :-)

 _/
_/ Alex Van Boxel


On Wed, May 15, 2019 at 3:26 AM Reuven Lax <re...@google.com> wrote:

>
>
> *From: *Alex Van Boxel <al...@vanboxel.be>
> *Date: *Tue, May 14, 2019 at 3:38 PM
> *To: *ML Beam/Dev
>
> ProtoBuf and certainly the Descriptor is a challenging beast, and I
>> certainly want to support DynamicMessage (see also my ProtoCoder PR).
>>
>> Creating a schema from the proto is easy, the trick is creating the
>> to/fromRow. With precomiled proto's I can easily get the Descriptor from
>> the class, but the is not available for DynamicMessages. I need the
>> descriptor to get the fields (via the FieldDescriptor).
>>
>
> However you do have the schema at graph-construction time, don't you?
>
>
>>
>> If I can't store it in the schema, I will probably need to store the
>> Descriptor in the Serializable toRow/fromRow: it's here that the
>> FieldDescriptor is required.
>>
>
> Yes, that should be fine.
>
>
>>
>> I had my doubts that Schema was allowed to be extensible... I'll add a PR
>> to make Schema final, should be a no brainer.
>>  _/
>> _/ Alex Van Boxel
>>
>>
>> On Tue, May 14, 2019 at 8:05 PM Reuven Lax <re...@google.com> wrote:
>>
>>> Can you explain what you're trying to do? I don't think that embedding
>>> the proto descriptor in the schema is a great way to go, but I may not be
>>> understanding the use case.
>>>
>>> *From: *Alex Van Boxel <al...@vanboxel.be>
>>> *Date: *Tue, May 14, 2019 at 9:00 AM
>>> *To: *ML Beam/Dev
>>>
>>> Hi Schema lovers,
>>>>
>>>> I'm implementing schema support for Protobuf and I was wondering if
>>>> it's allowed to override Schema. It looks tempting (as it's not final), as
>>>> I need a container for the Proto Descriptor.
>>>>
>>>> For normal pre-compiled classes it's not required, but for
>>>> DynamicMessage it is. If I would be able to store it in Schema I can reuse
>>>> it in to/fromRow.
>>>>
>>>> Thoughts?
>>>>
>>>>  _/
>>>> _/ Alex Van Boxel
>>>>
>>>

Re: Schema is not final... is it allowed to override

Posted by Reuven Lax <re...@google.com>.
*From: *Alex Van Boxel <al...@vanboxel.be>
*Date: *Tue, May 14, 2019 at 3:38 PM
*To: *ML Beam/Dev

ProtoBuf and certainly the Descriptor is a challenging beast, and I
> certainly want to support DynamicMessage (see also my ProtoCoder PR).
>
> Creating a schema from the proto is easy, the trick is creating the
> to/fromRow. With precomiled proto's I can easily get the Descriptor from
> the class, but the is not available for DynamicMessages. I need the
> descriptor to get the fields (via the FieldDescriptor).
>

However you do have the schema at graph-construction time, don't you?


>
> If I can't store it in the schema, I will probably need to store the
> Descriptor in the Serializable toRow/fromRow: it's here that the
> FieldDescriptor is required.
>

Yes, that should be fine.


>
> I had my doubts that Schema was allowed to be extensible... I'll add a PR
> to make Schema final, should be a no brainer.
>  _/
> _/ Alex Van Boxel
>
>
> On Tue, May 14, 2019 at 8:05 PM Reuven Lax <re...@google.com> wrote:
>
>> Can you explain what you're trying to do? I don't think that embedding
>> the proto descriptor in the schema is a great way to go, but I may not be
>> understanding the use case.
>>
>> *From: *Alex Van Boxel <al...@vanboxel.be>
>> *Date: *Tue, May 14, 2019 at 9:00 AM
>> *To: *ML Beam/Dev
>>
>> Hi Schema lovers,
>>>
>>> I'm implementing schema support for Protobuf and I was wondering if it's
>>> allowed to override Schema. It looks tempting (as it's not final), as I
>>> need a container for the Proto Descriptor.
>>>
>>> For normal pre-compiled classes it's not required, but for
>>> DynamicMessage it is. If I would be able to store it in Schema I can reuse
>>> it in to/fromRow.
>>>
>>> Thoughts?
>>>
>>>  _/
>>> _/ Alex Van Boxel
>>>
>>

Re: Schema is not final... is it allowed to override

Posted by Alex Van Boxel <al...@vanboxel.be>.
ProtoBuf and certainly the Descriptor is a challenging beast, and I
certainly want to support DynamicMessage (see also my ProtoCoder PR).

Creating a schema from the proto is easy, the trick is creating the
to/fromRow. With precomiled proto's I can easily get the Descriptor from
the class, but the is not available for DynamicMessages. I need the
descriptor to get the fields (via the FieldDescriptor).

If I can't store it in the schema, I will probably need to store the
Descriptor in the Serializable toRow/fromRow: it's here that the
FieldDescriptor is required.

I had my doubts that Schema was allowed to be extensible... I'll add a PR
to make Schema final, should be a no brainer.
 _/
_/ Alex Van Boxel


On Tue, May 14, 2019 at 8:05 PM Reuven Lax <re...@google.com> wrote:

> Can you explain what you're trying to do? I don't think that embedding the
> proto descriptor in the schema is a great way to go, but I may not be
> understanding the use case.
>
> *From: *Alex Van Boxel <al...@vanboxel.be>
> *Date: *Tue, May 14, 2019 at 9:00 AM
> *To: *ML Beam/Dev
>
> Hi Schema lovers,
>>
>> I'm implementing schema support for Protobuf and I was wondering if it's
>> allowed to override Schema. It looks tempting (as it's not final), as I
>> need a container for the Proto Descriptor.
>>
>> For normal pre-compiled classes it's not required, but for DynamicMessage
>> it is. If I would be able to store it in Schema I can reuse it in
>> to/fromRow.
>>
>> Thoughts?
>>
>>  _/
>> _/ Alex Van Boxel
>>
>

Re: Schema is not final... is it allowed to override

Posted by Reuven Lax <re...@google.com>.
Can you explain what you're trying to do? I don't think that embedding the
proto descriptor in the schema is a great way to go, but I may not be
understanding the use case.

*From: *Alex Van Boxel <al...@vanboxel.be>
*Date: *Tue, May 14, 2019 at 9:00 AM
*To: *ML Beam/Dev

Hi Schema lovers,
>
> I'm implementing schema support for Protobuf and I was wondering if it's
> allowed to override Schema. It looks tempting (as it's not final), as I
> need a container for the Proto Descriptor.
>
> For normal pre-compiled classes it's not required, but for DynamicMessage
> it is. If I would be able to store it in Schema I can reuse it in
> to/fromRow.
>
> Thoughts?
>
>  _/
> _/ Alex Van Boxel
>

Re: Schema is not final... is it allowed to override

Posted by Kenneth Knowles <ke...@apache.org>.
Ultimately, a schema will be translated to the protobuf that is currently
under design discussion, so Java-specific things like using inheritance to
store extra data on a class are not good patterns here.

However, coders are designed to be extensible, including having customized
portable representations. And every PCollection has a coder, so it seems
like ProtoCoder would be a possible place to put this.

Just high level thoughts. I don't really know exactly what your needs are
at the code level, here.

Kenn

*From: *Alex Van Boxel <al...@vanboxel.be>
*Date: *Tue, May 14, 2019 at 9:00 AM
*To: *ML Beam/Dev

Hi Schema lovers,
>
> I'm implementing schema support for Protobuf and I was wondering if it's
> allowed to override Schema. It looks tempting (as it's not final), as I
> need a container for the Proto Descriptor.
>
> For normal pre-compiled classes it's not required, but for DynamicMessage
> it is. If I would be able to store it in Schema I can reuse it in
> to/fromRow.
>
> Thoughts?
>
>  _/
> _/ Alex Van Boxel
>