You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2009/12/01 20:37:20 UTC

[jira] Created: (AVRO-248) make unions a named type

make unions a named type
------------------------

                 Key: AVRO-248
                 URL: https://issues.apache.org/jira/browse/AVRO-248
             Project: Avro
          Issue Type: New Feature
          Components: spec
            Reporter: Doug Cutting
            Assignee: Doug Cutting
             Fix For: 1.3.0


Unions are currently anonymous.  However it might be convenient if they were named.  In particular:
 - when code is generated for a union, a class could be generated that includes an enum indicating which branch of the union is taken, e.g., a union of string and int named Foo might cause a Java class like {code}
public class Foo {
  public static enum Type {STRING, INT};
  private Type type;
  private Object datum;
  public Type getType();
  public String getString() { if (type==STRING) return (String)datum; else throw ... }
  public void setString(String s) { type = STRING;  datum = s; }
  ....
}
{code} Then Java applications can easily use a switch statement to process union values rather than using instanceof.
 - when using reflection, an abstract class with a set of concrete implementations can be represented as a union (AVRO-241).  However, if one wishes to create an array one must know the name of the base class, which is not represented in the Avro schema.  One approach would be to add an annotation to the reflected array schema (AVRO-242) noting the base class.  But if the union itself were named, that could name the base class.  This would also make reflected protocol interfaces more consise, since the base class name could be used in parameters return types and fields.
 - Generalizing the above: Avro lacks class inheritance, unions are a way to model inheritance, and this model is more useful if the union is named.

This would be an incompatible change to schemas.  If we go this way, we should probably rename 1.3 to 2.0.  Note that AVRO-160 proposes an incompatible change to data file formats, which may also force a major release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-248) make unions a named type

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784376#action_12784376 ] 

Philip Zeyliger commented on AVRO-248:
--------------------------------------

I like naming unions.  So, +1 to the general idea.

Should we name the branches of the union?  i.e., a union is pretty much the same as a record, except instead of all the fields being set, exactly one field is set.  Admittedly, the names would often be boring, but that's true anyway.  FWIW, Thrift unions (http://issues.apache.org/jira/browse/THRIFT-409) support that syntax.  It would be sensible if unions and records were to have the same syntax, except for the "type".

We could easily enough continue supporting anonymous unions (for backwards compatibility), but, yes this would be an incompatible change.

> make unions a named type
> ------------------------
>
>                 Key: AVRO-248
>                 URL: https://issues.apache.org/jira/browse/AVRO-248
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.3.0
>
>
> Unions are currently anonymous.  However it might be convenient if they were named.  In particular:
>  - when code is generated for a union, a class could be generated that includes an enum indicating which branch of the union is taken, e.g., a union of string and int named Foo might cause a Java class like {code}
> public class Foo {
>   public static enum Type {STRING, INT};
>   private Type type;
>   private Object datum;
>   public Type getType();
>   public String getString() { if (type==STRING) return (String)datum; else throw ... }
>   public void setString(String s) { type = STRING;  datum = s; }
>   ....
> }
> {code} Then Java applications can easily use a switch statement to process union values rather than using instanceof.
>  - when using reflection, an abstract class with a set of concrete implementations can be represented as a union (AVRO-241).  However, if one wishes to create an array one must know the name of the base class, which is not represented in the Avro schema.  One approach would be to add an annotation to the reflected array schema (AVRO-242) noting the base class.  But if the union itself were named, that could name the base class.  This would also make reflected protocol interfaces more consise, since the base class name could be used in parameters return types and fields.
>  - Generalizing the above: Avro lacks class inheritance, unions are a way to model inheritance, and this model is more useful if the union is named.
> This would be an incompatible change to schemas.  If we go this way, we should probably rename 1.3 to 2.0.  Note that AVRO-160 proposes an incompatible change to data file formats, which may also force a major release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-248) make unions a named type

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784392#action_12784392 ] 

Philip Zeyliger commented on AVRO-248:
--------------------------------------

I could go both ways.  

(Yes, names!) Say we had a "host" record. That might be a union of "hostname" (string) or "IP address" (a record).  I would rather see the code say getHostname() rather than getString().  You can get around this by creating a Hostname record, but then it would be getHostname().getHostname(), since records always have field names.  The restriction to contain only one branch of any unnamed type could be relaxed.  Sometimes the "type" of two things is the same, and should be a primitive type, but they're different.  I'm struggling to come up with a great example, but perhaps a date could be expressed as "days since 1900" (Excel style) or "days since the epoch".  Both are ints.

(No names!) In Java, half the time the names of fields are boring.  Fields are called "outputStream" and have type "OutputStream", and, really, did we need both?

In Avro's case, especially because unions are the way to implement nullable fields, name-less is pretty convincing.

> make unions a named type
> ------------------------
>
>                 Key: AVRO-248
>                 URL: https://issues.apache.org/jira/browse/AVRO-248
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.3.0
>
>
> Unions are currently anonymous.  However it might be convenient if they were named.  In particular:
>  - when code is generated for a union, a class could be generated that includes an enum indicating which branch of the union is taken, e.g., a union of string and int named Foo might cause a Java class like {code}
> public class Foo {
>   public static enum Type {STRING, INT};
>   private Type type;
>   private Object datum;
>   public Type getType();
>   public String getString() { if (type==STRING) return (String)datum; else throw ... }
>   public void setString(String s) { type = STRING;  datum = s; }
>   ....
> }
> {code} Then Java applications can easily use a switch statement to process union values rather than using instanceof.
>  - when using reflection, an abstract class with a set of concrete implementations can be represented as a union (AVRO-241).  However, if one wishes to create an array one must know the name of the base class, which is not represented in the Avro schema.  One approach would be to add an annotation to the reflected array schema (AVRO-242) noting the base class.  But if the union itself were named, that could name the base class.  This would also make reflected protocol interfaces more consise, since the base class name could be used in parameters return types and fields.
>  - Generalizing the above: Avro lacks class inheritance, unions are a way to model inheritance, and this model is more useful if the union is named.
> This would be an incompatible change to schemas.  If we go this way, we should probably rename 1.3 to 2.0.  Note that AVRO-160 proposes an incompatible change to data file formats, which may also force a major release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-248) make unions a named type

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784380#action_12784380 ] 

Doug Cutting commented on AVRO-248:
-----------------------------------

Unions are currently only permitted to contain one branch of any unnamed type.  So branch names can be type names.  This permits implementations that don't use an explicit union representation to easily find the matching branch at runtime.  I don't see any need to remove this restriction (does it ever make sense to have two, distinct string branches?) so, given that, i don't see a need to name branches.

> make unions a named type
> ------------------------
>
>                 Key: AVRO-248
>                 URL: https://issues.apache.org/jira/browse/AVRO-248
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.3.0
>
>
> Unions are currently anonymous.  However it might be convenient if they were named.  In particular:
>  - when code is generated for a union, a class could be generated that includes an enum indicating which branch of the union is taken, e.g., a union of string and int named Foo might cause a Java class like {code}
> public class Foo {
>   public static enum Type {STRING, INT};
>   private Type type;
>   private Object datum;
>   public Type getType();
>   public String getString() { if (type==STRING) return (String)datum; else throw ... }
>   public void setString(String s) { type = STRING;  datum = s; }
>   ....
> }
> {code} Then Java applications can easily use a switch statement to process union values rather than using instanceof.
>  - when using reflection, an abstract class with a set of concrete implementations can be represented as a union (AVRO-241).  However, if one wishes to create an array one must know the name of the base class, which is not represented in the Avro schema.  One approach would be to add an annotation to the reflected array schema (AVRO-242) noting the base class.  But if the union itself were named, that could name the base class.  This would also make reflected protocol interfaces more consise, since the base class name could be used in parameters return types and fields.
>  - Generalizing the above: Avro lacks class inheritance, unions are a way to model inheritance, and this model is more useful if the union is named.
> This would be an incompatible change to schemas.  If we go this way, we should probably rename 1.3 to 2.0.  Note that AVRO-160 proposes an incompatible change to data file formats, which may also force a major release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-248) make unions a named type

Posted by "Thiruvalluvan M. G. (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784604#action_12784604 ] 

Thiruvalluvan M. G. commented on AVRO-248:
------------------------------------------

Talking about names, the current specification that records, enums and fixed (and now unions) are named seems somewhat arbitrary. Names serve two main purposes:
   - Named entities can be reused elsewhere in the schema
   - Names are used to differentiate branches in unions

Strictly speaking names are not required if things if these situations do not occur.

The third use of name is in code generation. If we can somehow handle the code generation part, I'd propose that we make names completely optional.

Also, one should be able to name the other non-primitive types - arrays and maps. The names for arrays and maps are of not much use for reuse, but very useful in unions. Today, one cannot have a union of int arrays and string arrays. One could argue that the same effect can be achieved by having an array of unions of int and string. But they are not the same. Array of unions is actually an heterogeneous array - some elements can be ints and some other strings.

In summary, I propose we make all compound types named, but make names optional for all of them.

I like Doug's new syntax for unions. The earlier way to implicitly specifying unions by a JSON array was not intuitive. If we make names optional and support both old and new syntax for unions, the change will not break the old schemas. But I suggest we withdraw support for the old syntax to keep the specification clean.

> make unions a named type
> ------------------------
>
>                 Key: AVRO-248
>                 URL: https://issues.apache.org/jira/browse/AVRO-248
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.3.0
>
>
> Unions are currently anonymous.  However it might be convenient if they were named.  In particular:
>  - when code is generated for a union, a class could be generated that includes an enum indicating which branch of the union is taken, e.g., a union of string and int named Foo might cause a Java class like {code}
> public class Foo {
>   public static enum Type {STRING, INT};
>   private Type type;
>   private Object datum;
>   public Type getType();
>   public String getString() { if (type==STRING) return (String)datum; else throw ... }
>   public void setString(String s) { type = STRING;  datum = s; }
>   ....
> }
> {code} Then Java applications can easily use a switch statement to process union values rather than using instanceof.
>  - when using reflection, an abstract class with a set of concrete implementations can be represented as a union (AVRO-241).  However, if one wishes to create an array one must know the name of the base class, which is not represented in the Avro schema.  One approach would be to add an annotation to the reflected array schema (AVRO-242) noting the base class.  But if the union itself were named, that could name the base class.  This would also make reflected protocol interfaces more consise, since the base class name could be used in parameters return types and fields.
>  - Generalizing the above: Avro lacks class inheritance, unions are a way to model inheritance, and this model is more useful if the union is named.
> This would be an incompatible change to schemas.  If we go this way, we should probably rename 1.3 to 2.0.  Note that AVRO-160 proposes an incompatible change to data file formats, which may also force a major release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-248) make unions a named type

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784972#action_12784972 ] 

Doug Cutting commented on AVRO-248:
-----------------------------------

> Also, one should be able to name the other non-primitive types - arrays and maps.

Note that one can name anything by wrapping it in a one-field record.  This adds no storage overhead and permits things to be used in a union.  This technique can be used to name unions too.  However this may not result in the simplest access from programs.

I'm hesitant to make sweeping incompatible changes to schemas unless they provide clear end-user advantages that cannot be had in other ways.

Naming unions may not be required.  For example, if we simply change the syntax for unions to:

 { "type": "union", "branches": ["string", "Bar", ... ] }

then AVRO-214's schema annotations might be sufficient.  For example, one might use an annotation like:

 { "type": "union", "branches": ["string", "Bar", ... ] , "java-class": "org.foo.FooUnion"}

This would tell Java to use a FooUnion to represent this.

> make unions a named type
> ------------------------
>
>                 Key: AVRO-248
>                 URL: https://issues.apache.org/jira/browse/AVRO-248
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.3.0
>
>
> Unions are currently anonymous.  However it might be convenient if they were named.  In particular:
>  - when code is generated for a union, a class could be generated that includes an enum indicating which branch of the union is taken, e.g., a union of string and int named Foo might cause a Java class like {code}
> public class Foo {
>   public static enum Type {STRING, INT};
>   private Type type;
>   private Object datum;
>   public Type getType();
>   public String getString() { if (type==STRING) return (String)datum; else throw ... }
>   public void setString(String s) { type = STRING;  datum = s; }
>   ....
> }
> {code} Then Java applications can easily use a switch statement to process union values rather than using instanceof.
>  - when using reflection, an abstract class with a set of concrete implementations can be represented as a union (AVRO-241).  However, if one wishes to create an array one must know the name of the base class, which is not represented in the Avro schema.  One approach would be to add an annotation to the reflected array schema (AVRO-242) noting the base class.  But if the union itself were named, that could name the base class.  This would also make reflected protocol interfaces more consise, since the base class name could be used in parameters return types and fields.
>  - Generalizing the above: Avro lacks class inheritance, unions are a way to model inheritance, and this model is more useful if the union is named.
> This would be an incompatible change to schemas.  If we go this way, we should probably rename 1.3 to 2.0.  Note that AVRO-160 proposes an incompatible change to data file formats, which may also force a major release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-248) make unions a named type

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795200#action_12795200 ] 

Doug Cutting commented on AVRO-248:
-----------------------------------

Todd> and then use ["UserId", "ProductId"] with some way to distinguish between the two.

In Avro a record with a single integer field is the same size as an integer, and then you can use multiple records in a union.  This seems nearly isomorphic, since at runtime you'd need a wrapper to distinguish the two branches anyway, no?

Hypothetically, we could permit only records in unions.  That would name branches, but be inconvenient.  It might also be non-pythonic.  At the other extreme, to be pythonic, we could name nothing, and instead wrap things in named tags when we want to use them in a union.  What we currently have is something in the middle: some things are permitted in unions without wrappers (e.g., importantly, null) while other distinctions require an explicit record-based wrapper.  Adding another layer of naming seems perhaps excessive.


> make unions a named type
> ------------------------
>
>                 Key: AVRO-248
>                 URL: https://issues.apache.org/jira/browse/AVRO-248
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>            Reporter: Doug Cutting
>             Fix For: 1.3.0
>
>
> Unions are currently anonymous.  However it might be convenient if they were named.  In particular:
>  - when code is generated for a union, a class could be generated that includes an enum indicating which branch of the union is taken, e.g., a union of string and int named Foo might cause a Java class like {code}
> public class Foo {
>   public static enum Type {STRING, INT};
>   private Type type;
>   private Object datum;
>   public Type getType();
>   public String getString() { if (type==STRING) return (String)datum; else throw ... }
>   public void setString(String s) { type = STRING;  datum = s; }
>   ....
> }
> {code} Then Java applications can easily use a switch statement to process union values rather than using instanceof.
>  - when using reflection, an abstract class with a set of concrete implementations can be represented as a union (AVRO-241).  However, if one wishes to create an array one must know the name of the base class, which is not represented in the Avro schema.  One approach would be to add an annotation to the reflected array schema (AVRO-242) noting the base class.  But if the union itself were named, that could name the base class.  This would also make reflected protocol interfaces more consise, since the base class name could be used in parameters return types and fields.
>  - Generalizing the above: Avro lacks class inheritance, unions are a way to model inheritance, and this model is more useful if the union is named.
> This would be an incompatible change to schemas.  If we go this way, we should probably rename 1.3 to 2.0.  Note that AVRO-160 proposes an incompatible change to data file formats, which may also force a major release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-248) make unions a named type

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794273#action_12794273 ] 

Todd Lipcon commented on AVRO-248:
----------------------------------

I am strongly pro-naming. AVRO-266 (object reuse for deserializing unions) is another reason that having names for unions makes sense.

As for nullability, I agree that we definitely don't want to force type names on all nullable fields. Anonymous unions are one solution, but special-casing nullability in schemas doesn't seem entirely wrong to me either...

As for naming other types, is a typedef construct useful? This would solve the union-of-arrays issue as well as some others. To give a concrete example, imagine an MR job where we want to aggregate over both users and products. Users and products are both represented by their database IDs. I'd want to write:

{"type": "union", "branches": [{"name": "user_id", "type": "int"}, {"name": "product_id", "type": "int"}]}

or with typedefs:
{"type": "typedef", "name": "UserId", "is_type": "int"},
{"type": "typedef", "name": "ProductId", "is_type": "int"}
and then use ["UserId", "ProductId"] with some way to distinguish between the two.

> make unions a named type
> ------------------------
>
>                 Key: AVRO-248
>                 URL: https://issues.apache.org/jira/browse/AVRO-248
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>            Reporter: Doug Cutting
>             Fix For: 1.3.0
>
>
> Unions are currently anonymous.  However it might be convenient if they were named.  In particular:
>  - when code is generated for a union, a class could be generated that includes an enum indicating which branch of the union is taken, e.g., a union of string and int named Foo might cause a Java class like {code}
> public class Foo {
>   public static enum Type {STRING, INT};
>   private Type type;
>   private Object datum;
>   public Type getType();
>   public String getString() { if (type==STRING) return (String)datum; else throw ... }
>   public void setString(String s) { type = STRING;  datum = s; }
>   ....
> }
> {code} Then Java applications can easily use a switch statement to process union values rather than using instanceof.
>  - when using reflection, an abstract class with a set of concrete implementations can be represented as a union (AVRO-241).  However, if one wishes to create an array one must know the name of the base class, which is not represented in the Avro schema.  One approach would be to add an annotation to the reflected array schema (AVRO-242) noting the base class.  But if the union itself were named, that could name the base class.  This would also make reflected protocol interfaces more consise, since the base class name could be used in parameters return types and fields.
>  - Generalizing the above: Avro lacks class inheritance, unions are a way to model inheritance, and this model is more useful if the union is named.
> This would be an incompatible change to schemas.  If we go this way, we should probably rename 1.3 to 2.0.  Note that AVRO-160 proposes an incompatible change to data file formats, which may also force a major release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-248) make unions a named type

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784427#action_12784427 ] 

Doug Cutting commented on AVRO-248:
-----------------------------------

The nullable field use case for unions makes me want to continue to permit anonymous unions.  An implementation could mostly ignore union names, except in resolving references to them while parsing schemas and protocols.  But an implementation might, if a name is provided for a union, represent it as an explicit type, or not (if it's anonymous).  So Java's specific implementation would only generate a class if a union is named, and use runtime typing for anonymous unions.  Does that sound reasonable?

> make unions a named type
> ------------------------
>
>                 Key: AVRO-248
>                 URL: https://issues.apache.org/jira/browse/AVRO-248
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.3.0
>
>
> Unions are currently anonymous.  However it might be convenient if they were named.  In particular:
>  - when code is generated for a union, a class could be generated that includes an enum indicating which branch of the union is taken, e.g., a union of string and int named Foo might cause a Java class like {code}
> public class Foo {
>   public static enum Type {STRING, INT};
>   private Type type;
>   private Object datum;
>   public Type getType();
>   public String getString() { if (type==STRING) return (String)datum; else throw ... }
>   public void setString(String s) { type = STRING;  datum = s; }
>   ....
> }
> {code} Then Java applications can easily use a switch statement to process union values rather than using instanceof.
>  - when using reflection, an abstract class with a set of concrete implementations can be represented as a union (AVRO-241).  However, if one wishes to create an array one must know the name of the base class, which is not represented in the Avro schema.  One approach would be to add an annotation to the reflected array schema (AVRO-242) noting the base class.  But if the union itself were named, that could name the base class.  This would also make reflected protocol interfaces more consise, since the base class name could be used in parameters return types and fields.
>  - Generalizing the above: Avro lacks class inheritance, unions are a way to model inheritance, and this model is more useful if the union is named.
> This would be an incompatible change to schemas.  If we go this way, we should probably rename 1.3 to 2.0.  Note that AVRO-160 proposes an incompatible change to data file formats, which may also force a major release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-248) make unions a named type

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785041#action_12785041 ] 

Philip Zeyliger commented on AVRO-248:
--------------------------------------

BTW, I came up with another argument for named union branches while thinking about the python implementation yesterday.  In python, you're not supposed to ever use instanceof.  Say you have two record types, A, and B, with A having fields a,b, c, and B having fields a,b,c,d.  In pythonic theory, you're supposed to take the object you're dealing with, and, to see if it's an instance of A, just see if it has fields "a", "b", and "c" using getattr.  Voila, it's an instance of A.  Of course, for all we know, it was actually an instance of B.  Because of this, you'd have to annotate every non-primitive with it's avro type, and, moreover, you'd have to make sure you can always distinguish between primitive types (string, unicode and bytes are the most irksome here).  It's do-able, but can lead to some confusing situations.

Just wanted to throw this out there.  I still think both approaches have disadvantages.

> make unions a named type
> ------------------------
>
>                 Key: AVRO-248
>                 URL: https://issues.apache.org/jira/browse/AVRO-248
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.3.0
>
>
> Unions are currently anonymous.  However it might be convenient if they were named.  In particular:
>  - when code is generated for a union, a class could be generated that includes an enum indicating which branch of the union is taken, e.g., a union of string and int named Foo might cause a Java class like {code}
> public class Foo {
>   public static enum Type {STRING, INT};
>   private Type type;
>   private Object datum;
>   public Type getType();
>   public String getString() { if (type==STRING) return (String)datum; else throw ... }
>   public void setString(String s) { type = STRING;  datum = s; }
>   ....
> }
> {code} Then Java applications can easily use a switch statement to process union values rather than using instanceof.
>  - when using reflection, an abstract class with a set of concrete implementations can be represented as a union (AVRO-241).  However, if one wishes to create an array one must know the name of the base class, which is not represented in the Avro schema.  One approach would be to add an annotation to the reflected array schema (AVRO-242) noting the base class.  But if the union itself were named, that could name the base class.  This would also make reflected protocol interfaces more consise, since the base class name could be used in parameters return types and fields.
>  - Generalizing the above: Avro lacks class inheritance, unions are a way to model inheritance, and this model is more useful if the union is named.
> This would be an incompatible change to schemas.  If we go this way, we should probably rename 1.3 to 2.0.  Note that AVRO-160 proposes an incompatible change to data file formats, which may also force a major release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (AVRO-248) make unions a named type

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784381#action_12784381 ] 

Doug Cutting edited comment on AVRO-248 at 12/1/09 7:59 PM:
------------------------------------------------------------

To be concrete, what I propose as a union syntax is something like:
  { "type": "union", "name": "Foo", "branches": ["string", "Bar", ... ] }

      was (Author: cutting):
    To be concrete, what I propose as a union syntax is something like:
  { "type": "union", "branches": ["string", "Foo", ... ] }
  
> make unions a named type
> ------------------------
>
>                 Key: AVRO-248
>                 URL: https://issues.apache.org/jira/browse/AVRO-248
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.3.0
>
>
> Unions are currently anonymous.  However it might be convenient if they were named.  In particular:
>  - when code is generated for a union, a class could be generated that includes an enum indicating which branch of the union is taken, e.g., a union of string and int named Foo might cause a Java class like {code}
> public class Foo {
>   public static enum Type {STRING, INT};
>   private Type type;
>   private Object datum;
>   public Type getType();
>   public String getString() { if (type==STRING) return (String)datum; else throw ... }
>   public void setString(String s) { type = STRING;  datum = s; }
>   ....
> }
> {code} Then Java applications can easily use a switch statement to process union values rather than using instanceof.
>  - when using reflection, an abstract class with a set of concrete implementations can be represented as a union (AVRO-241).  However, if one wishes to create an array one must know the name of the base class, which is not represented in the Avro schema.  One approach would be to add an annotation to the reflected array schema (AVRO-242) noting the base class.  But if the union itself were named, that could name the base class.  This would also make reflected protocol interfaces more consise, since the base class name could be used in parameters return types and fields.
>  - Generalizing the above: Avro lacks class inheritance, unions are a way to model inheritance, and this model is more useful if the union is named.
> This would be an incompatible change to schemas.  If we go this way, we should probably rename 1.3 to 2.0.  Note that AVRO-160 proposes an incompatible change to data file formats, which may also force a major release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-248) make unions a named type

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784475#action_12784475 ] 

Philip Zeyliger commented on AVRO-248:
--------------------------------------

bq. Does that sound reasonable?

Yes.

> make unions a named type
> ------------------------
>
>                 Key: AVRO-248
>                 URL: https://issues.apache.org/jira/browse/AVRO-248
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.3.0
>
>
> Unions are currently anonymous.  However it might be convenient if they were named.  In particular:
>  - when code is generated for a union, a class could be generated that includes an enum indicating which branch of the union is taken, e.g., a union of string and int named Foo might cause a Java class like {code}
> public class Foo {
>   public static enum Type {STRING, INT};
>   private Type type;
>   private Object datum;
>   public Type getType();
>   public String getString() { if (type==STRING) return (String)datum; else throw ... }
>   public void setString(String s) { type = STRING;  datum = s; }
>   ....
> }
> {code} Then Java applications can easily use a switch statement to process union values rather than using instanceof.
>  - when using reflection, an abstract class with a set of concrete implementations can be represented as a union (AVRO-241).  However, if one wishes to create an array one must know the name of the base class, which is not represented in the Avro schema.  One approach would be to add an annotation to the reflected array schema (AVRO-242) noting the base class.  But if the union itself were named, that could name the base class.  This would also make reflected protocol interfaces more consise, since the base class name could be used in parameters return types and fields.
>  - Generalizing the above: Avro lacks class inheritance, unions are a way to model inheritance, and this model is more useful if the union is named.
> This would be an incompatible change to schemas.  If we go this way, we should probably rename 1.3 to 2.0.  Note that AVRO-160 proposes an incompatible change to data file formats, which may also force a major release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-248) make unions a named type

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-248:
------------------------------

    Fix Version/s:     (was: 1.3.0)

> make unions a named type
> ------------------------
>
>                 Key: AVRO-248
>                 URL: https://issues.apache.org/jira/browse/AVRO-248
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>            Reporter: Doug Cutting
>
> Unions are currently anonymous.  However it might be convenient if they were named.  In particular:
>  - when code is generated for a union, a class could be generated that includes an enum indicating which branch of the union is taken, e.g., a union of string and int named Foo might cause a Java class like {code}
> public class Foo {
>   public static enum Type {STRING, INT};
>   private Type type;
>   private Object datum;
>   public Type getType();
>   public String getString() { if (type==STRING) return (String)datum; else throw ... }
>   public void setString(String s) { type = STRING;  datum = s; }
>   ....
> }
> {code} Then Java applications can easily use a switch statement to process union values rather than using instanceof.
>  - when using reflection, an abstract class with a set of concrete implementations can be represented as a union (AVRO-241).  However, if one wishes to create an array one must know the name of the base class, which is not represented in the Avro schema.  One approach would be to add an annotation to the reflected array schema (AVRO-242) noting the base class.  But if the union itself were named, that could name the base class.  This would also make reflected protocol interfaces more consise, since the base class name could be used in parameters return types and fields.
>  - Generalizing the above: Avro lacks class inheritance, unions are a way to model inheritance, and this model is more useful if the union is named.
> This would be an incompatible change to schemas.  If we go this way, we should probably rename 1.3 to 2.0.  Note that AVRO-160 proposes an incompatible change to data file formats, which may also force a major release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-248) make unions a named type

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784381#action_12784381 ] 

Doug Cutting commented on AVRO-248:
-----------------------------------

To be concrete, what I propose as a union syntax is something like:
  { "type": "union", "branches": ["string", "Foo", ... ] }

> make unions a named type
> ------------------------
>
>                 Key: AVRO-248
>                 URL: https://issues.apache.org/jira/browse/AVRO-248
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.3.0
>
>
> Unions are currently anonymous.  However it might be convenient if they were named.  In particular:
>  - when code is generated for a union, a class could be generated that includes an enum indicating which branch of the union is taken, e.g., a union of string and int named Foo might cause a Java class like {code}
> public class Foo {
>   public static enum Type {STRING, INT};
>   private Type type;
>   private Object datum;
>   public Type getType();
>   public String getString() { if (type==STRING) return (String)datum; else throw ... }
>   public void setString(String s) { type = STRING;  datum = s; }
>   ....
> }
> {code} Then Java applications can easily use a switch statement to process union values rather than using instanceof.
>  - when using reflection, an abstract class with a set of concrete implementations can be represented as a union (AVRO-241).  However, if one wishes to create an array one must know the name of the base class, which is not represented in the Avro schema.  One approach would be to add an annotation to the reflected array schema (AVRO-242) noting the base class.  But if the union itself were named, that could name the base class.  This would also make reflected protocol interfaces more consise, since the base class name could be used in parameters return types and fields.
>  - Generalizing the above: Avro lacks class inheritance, unions are a way to model inheritance, and this model is more useful if the union is named.
> This would be an incompatible change to schemas.  If we go this way, we should probably rename 1.3 to 2.0.  Note that AVRO-160 proposes an incompatible change to data file formats, which may also force a major release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.