You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/11/13 09:25:00 UTC

[jira] [Assigned] (SPARK-44001) Improve parsing of well known wrapper types

     [ https://issues.apache.org/jira/browse/SPARK-44001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot reassigned SPARK-44001:
--------------------------------------

    Assignee:     (was: Apache Spark)

> Improve parsing of well known wrapper types
> -------------------------------------------
>
>                 Key: SPARK-44001
>                 URL: https://issues.apache.org/jira/browse/SPARK-44001
>             Project: Spark
>          Issue Type: Improvement
>          Components: Protobuf
>    Affects Versions: 3.4.0
>            Reporter: Parth Upadhyay
>            Priority: Major
>              Labels: pull-request-available
>
> Under `com.google.protobuf`, there are some well known wrapper types for primitives, [namely|https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/wrappers.proto], useful for distinguishing between absence of primitive fields and their default values, as well as for use within `google.protobuf.Any` types. These types are:
> {code}
> DoubleValue
> FloatValue
> Int64Value
> Uint64Value
> Int32Value
> Uint32Value
> BoolValue
> StringValue
> BytesValue
> {code}
> Currently, when we deserialize these from a serialized protobuf into a spark struct, we expand them as if they were normal messages. Concretely, if we have
> {code}
> syntax = "proto3";
> import "google/protobuf/wrappers.proto"
> message WktExample {
>   google.protobuf.BoolValue bool_val = 1;
>   google.protobuf.Int32Value int32_val = 2;
> }
> {code}
> And a message like
> {code}
> WktExample(true, 100)
> {code}
> Then the behavior today is to deserialize this as.
> {code}
> {"bool_val": {"value": true}, "int32_val": {"value": 100}}
> {code}
> This is quite difficult to work with and not in the spirit of the wrapper type, so it would be nice to deserialize as
> {code}
> {"bool_val": true, "int32_val": 100}
> {code}
> This is also the behavior by other popular deserialization libraries, including java protobuf util [Jsonformat|https://github.com/protocolbuffers/protobuf/blob/main/java/util/src/main/java/com/google/protobuf/util/JsonFormat.java#L904-L914] and golangs [jsonpb|https://github.com/gogo/protobuf/blob/master/jsonpb/jsonpb.go#L207-L214].
> So for consistency with other libraries and improved usability, I propose we deserialize well known types in this way. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org