You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "J Y (Jira)" <ji...@apache.org> on 2022/06/09 19:17:00 UTC
[jira] [Comment Edited] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552396#comment-17552396 ]
J Y edited comment on PARQUET-1711 at 6/9/22 7:16 PM:
------------------------------------------------------
i'd be ok of that approach: a proto option annotation to limit the recursion limit, then failing over to treat it as proto bytes. if the recursion limit is omitted/missing, then just treat the recursive definition as bytes after the first occurrence.
forgive me if this is a naive question, but what's the difficulty in supporting "typing" properly to handle recursive nesting?
PARQUET-129 is very much related/the same issue...
was (Author: jinyius):
i'd be ok of that approach: a proto option annotation to limit the recursion limit, then failing over to treat it as proto bytes. if the recursion limit is omitted/missing, then just treat the recursive definition as bytes after the first occurrence.
forgive me if this is a naive question, but what's the difficulty in supporting "typing" properly to handle recursive nesting?
> [parquet-protobuf] stack overflow when work with well known json type
> ---------------------------------------------------------------------
>
> Key: PARQUET-1711
> URL: https://issues.apache.org/jira/browse/PARQUET-1711
> Project: Parquet
> Issue Type: Bug
> Affects Versions: 1.10.1
> Reporter: Lawrence He
> Priority: Major
>
> Writing following protobuf message as parquet file is not possible:
> {code:java}
> syntax = "proto3";
> import "google/protobuf/struct.proto";
> package test;
> option java_outer_classname = "CustomMessage";
> message TestMessage {
> map<string, google.protobuf.ListValue> data = 1;
> } {code}
> Protobuf introduced "well known json type" such like [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] to work around json schema conversion.
> However writing above messages traps parquet writer into an infinite loop due to the "general type" support in protobuf. Current implementation will keep referencing 6 possible types defined in protobuf (null, bool, number, string, struct, list) and entering infinite loop when referencing "struct".
> {code:java}
> java.lang.StackOverflowErrorjava.lang.StackOverflowError at java.base/java.util.Arrays$ArrayItr.<init>(Arrays.java:4418) at java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at java.base/java.util.Collections$UnmodifiableCollection$1.<init>(Collections.java:1044) at java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) at org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) at org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) at org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) at org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) at org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) at org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) at org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) at org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) at org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) at org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) {code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)