You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@parquet.apache.org by James Cavanaugh <ja...@gmail.com> on 2021/06/04 19:46:15 UTC

Self Referencing Protobuf Solution

Hi Apache Parquet Team,

I had a question about this library, specifically about self referencing
protobufs when using the parquet-protobuf module for protobuf to parquet
conversion.

For example:
[image: image.png]

Currently, it seems the ProtoSchemaConverter will fail via
StackOverflowError if provided this, or any self-referencing Protobuf.
This appears to be due to the recursive algorithm infinitely iterating on
the self-referenced type.

However, this seems to be expected output, as it shouldn't be possible to
explicitly define a schema for a message which contains itself.
That all being said, has there been any thought put into these types of
protos and how to effectively deal with them? Or is it just assumed any
proto being converted to parquet has no self-referenced attributes?

Thanks and I appreciate any insight on the matter,
James

Re: Self Referencing Protobuf Solution

Posted by Micah Kornfield <em...@gmail.com>.

>
> That all being said, has there been any thought put into these types of
> protos and how to effectively deal with them? Or is it just assumed any
> proto being converted to parquet has no self-referenced attributes?


Typically, the way I've seen this handled in other systems is to have a
configured limit on recursion depth and populate columns as necessary.
This does require some investment in schema adaptation which can be subtle.

On Fri, Jun 4, 2021 at 10:21 PM James Cavanaugh <ja...@gmail.com>
wrote:

> Hi Apache Parquet Team,
>
> I had a question about this library, specifically about self referencing
> protobufs when using the parquet-protobuf module for protobuf to parquet
> conversion.
>
> For example:
> [image: image.png]
>
> Currently, it seems the ProtoSchemaConverter will fail via
> StackOverflowError if provided this, or any self-referencing Protobuf.
> This appears to be due to the recursive algorithm infinitely iterating on
> the self-referenced type.
>
> However, this seems to be expected output, as it shouldn't be possible to
> explicitly define a schema for a message which contains itself.
> That all being said, has there been any thought put into these types of
> protos and how to effectively deal with them? Or is it just assumed any
> proto being converted to parquet has no self-referenced attributes?
>
> Thanks and I appreciate any insight on the matter,
> James
>
>