You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Joel Turkel (Jira)" <ji...@apache.org> on 2020/12/30 01:13:00 UTC

[jira] [Commented] (AVRO-2597) GenricUnion not accessible

    [ https://issues.apache.org/jira/browse/AVRO-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17256217#comment-17256217 ] 

Joel Turkel commented on AVRO-2597:
-----------------------------------

I hit this issue trying use the Avro C++ library to speed up Avro serialization/deserialization for the Ruby [avromatic|https://github.com/salsify/avromatic] library. Avromatic has special handling of "optional" fields (i.e. unions that include a null member type) so for a given union GenericDatum I need to figure out the number of member types in the union schema and whether the first member schema is a null type. Exposing the underlying union schema would make this easier and more efficient than the workarounds. 

I suspect exposing a GenericDatum#schema() method might create confusion since GenericDatum#type() would return a different result than GenericDatum#schema()->type(). Perhaps we could add a GenericDatum#unionSchema() method that would return the underlying union's schema? It's a little awkward since it would raise an error if the GenericDatum doesn't correspond to a union but that's consistent with the GenericDatum#unionBranch() and GenericDatum#selectBranch() methods.

> GenricUnion not accessible
> --------------------------
>
>                 Key: AVRO-2597
>                 URL: https://issues.apache.org/jira/browse/AVRO-2597
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: c++
>    Affects Versions: 1.9.0, 1.9.1
>            Reporter: Marcel Pfütze
>            Priority: Major
>
> In version 1.9.0 GenricDatum.hh was expanded by functionality that was formerly just in the GenricUnion class.
> The following was added:
> * isUnion()
> * selectBranch()
> * unionBranch()
> Additionally the methods to access type and value were overwritten to have a special if case for the union data type: 
> {code:c++}
> inline Type GenericDatum::type() const {
>     return (type_ == AVRO_UNION) ?
> #if __cplusplus >= 201703L
>         std::any_cast<GenericUnion>(&value_)->datum().type() :
> #else
>         boost::any_cast<GenericUnion>(&value_)->datum().type() :
> #endif
>         type_;
> }
> template<typename T> T& GenericDatum::value() {
>     return (type_ == AVRO_UNION) ?
> #if __cplusplus >= 201703L
>         std::any_cast<GenericUnion>(&value_)->datum().value<T>() :
>         *std::any_cast<T>(&value_);
> #else
>         boost::any_cast<GenericUnion>(&value_)->datum().value<T>() :
>         *boost::any_cast<T>(&value_);
> #endif
> }
> {code}
> The result of this is, that it's impossible to access the GenericUnion from a GenricDatum as the code always points to the branch of the union.
> Currently the only way to get the number of branches this union has, is by calling selectbranch() in a for-loop until the underlying leafAt() throws an exception.
> In avro 1.8.2 the following code could be used to select a unionbranch of a specific type.
> {code:c++}
> inline avro::GenericDatum & select_union_branch(avro::GenericDatum & datum, const avro::Type type)
> {
>     auto & un     =  datum.value<avro::GenericUnion>();
>     auto & schema = un.schema();
>     auto leaves   = schema->leaves();
>     for(std::size_t i=0; i < leaves; i++) {
>         if(schema->leafAt(i)->type() == type) {
>             un.selectBranch(i);
>             return un.datum();
>         }
>     }
>     throw std::runtime_error(
>             "select_union_branch: can't find branch with type=" +
>             std::to_string(type));
> }
> {code}
> My proposals:
> # Revert changes to make GenericDatum reflect methods of GenericUnion.
> ## A Union is a normal data type of Avro like Records or Lists. What is the point of hiding the type?
> ## The getters don't just return their value but do an extra check or even cast before returning. This is counterintuitive.
> # Add the schema() method to the GenericDatum class. At least the full functionality of the GenricUnion should be accessible if you can't access the GenericUnion itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)