You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by kant kodali <ka...@gmail.com> on 2018/01/09 11:04:40 UTC

How to create union of two records so i can successfully parse it?

Hi All,

I have avro messages in a Kafka topic and the requirement is that I should
be able to parse messages that can either have schema1 or schema2. I was
thinking to create a union of two records but I am not sure if I am doing
it right and I am obviously running into various exceptions like
ArrayOutOfBoundsException and so on.

so I am going to simplify my problem here. Imagine I have the following as
an Example

*schema1: *

{“type”:“record”,“name”:“hello1”,“fields”:[{“name”:“foo”,“type”:“int”,“
default”:1}]}


*schema2: *

{“type”:“record”,“name”:“hello2”,“fields”:[{“name”:“bar”,“type
”:“int”,“default”:1}]}


and if I do Schema.CreateUnion(Arrays.asList(schema1, schema2)) I get the
following

*unionSchema:*

[{“type”:“record”,“name”:“a”,“fields”:[{“name”:“one”,“type”:
“int”,“default”:1}]},{“type”:“record”,“name”:“b”,“fields”:[{“name”:“one”,“
type”:“int”,“default”:1}]}]



Now say my messages inside kafka topic will be something like this

*message1:*

{"foo": 5}

*message2: *

{"bar": 10}


and if I use unionSchema I am unable to parse it! and I am not sure why? I
can't find any resources on how to do this online. any suggestions will be
great.

Thanks!

Re: How to create union of two records so i can successfully parse it?

Posted by kant kodali <ka...@gmail.com>.
Hi Nandor,

On an another note Is there a way to come up with one schema that can work
for both of these messages?

{"hello1":{"foo": 5}}
{"foo": 5}

Thanks a lot!

On Wed, Jan 10, 2018 at 8:29 AM, kant kodali <ka...@gmail.com> wrote:

> Thanks Nandor! Which schema should I use for serialization now? say I want
> to serialize the following
>
> {"hello1":{"foo": 5}}
>
> On Wed, Jan 10, 2018 at 6:05 AM, Nandor Kollar <nk...@cloudera.com>
> wrote:
>
>> Hi,
>>
>> Yes, the schema is different, doesn't match with the schema. I think you
>> need something like this:
>>
>> Schema (union of records of type "hello1" or "hello2"):
>>
>> [
>>   {
>>     "type": "record",
>>     "name": "hello1",
>>     "fields": [
>>       {
>>         "name": "foo",
>>         "type": "int",
>>         "default": 1
>>       }
>>     ]
>>   },
>>   {
>>     "type": "record",
>>     "name": "hello2",
>>     "fields": [
>>       {
>>         "name": "bar",
>>         "type": "int",
>>         "default": 1
>>       }
>>     ]
>>   }
>> ]
>>
>> If you have a schema like above, you should be able to use hello1 and
>> hello2 type records:
>> {"hello1":{"foo": 5}}
>> {"hello2":{"bar": 10}}
>>
>> Hope this solves your question.
>>
>> Regards,
>> Nandor
>>
>> On Wed, Jan 10, 2018 at 2:10 PM, kant kodali <ka...@gmail.com> wrote:
>>
>>> Sorry my formatting got messed up. here is the schema I used.
>>>
>>>
>>> {
>>>     "type" : "record",
>>>     "name" : "data",
>>>     "namespace" : "example",
>>>     "fields" : [
>>>                  {"type":"record","name":"hello1","fields":[{"name":
>>> "foo","type":"int","default":1}]},
>>>                  {"type":"record","name":"hello2","fields":[{"name":
>>> "bar","type":"int","default":1}]}
>>>                ]
>>> }
>>>
>>>
>>> On Wed, Jan 10, 2018 at 5:08 AM, kant kodali <ka...@gmail.com> wrote:
>>>
>>>> Hi Nandor,
>>>>
>>>> Thanks a lot for this. What you have said makes logical sense but I am
>>>> new to Avro so I am just trying to figure out how the schema definition
>>>> would like for the following messages
>>>>
>>>> {"hello1":{"foo": 5}}
>>>> {"hello2":{"bar": 10}}
>>>>
>>>> I have tried the following schema definition to parse the above
>>>> messages but it didnt quite work so I am wondering how the schema should
>>>> look like?
>>>>
>>>>
>>>> {
>>>>     "type" : "record",
>>>>     "name" : "data",
>>>>     "namespace" : "example",
>>>>     "fields" : [
>>>>                 {"type":"record","name":"hello1","fields":[{"name":
>>>> "foo","type":"int","default":1}]},
>>>>                 {"type":"record","name":"hello2","fields":[{"name":
>>>> "bar","type":"int","default":1}]}
>>>>                ]
>>>> }
>>>>
>>>> On Tue, Jan 9, 2018 at 3:22 AM, Nandor Kollar <nk...@cloudera.com>
>>>> wrote:
>>>>
>>>>> I think the problem is: you created a union of records, but the Avro
>>>>> doesn't know if it is a hello1 record instance, or a hello2 record
>>>>> instance. In this case the you should encode the data like this:
>>>>> {"hello1":{"foo": 5}}
>>>>> {"hello2":{"bar": 10}}
>>>>> Here <https://avro.apache.org/docs/1.8.1/spec.html#json_encoding> is
>>>>> the relevant part of the specification.
>>>>>
>>>>> Nandor
>>>>>
>>>>> On Tue, Jan 9, 2018 at 12:06 PM, kant kodali <ka...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Sorry I had a typo I am correcting it here
>>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I have avro messages in a Kafka topic and the requirement is that I
>>>>>> should be able to parse messages that can either have schema1 or schema2. I
>>>>>> was thinking to create a union of two records but I am not sure if I am
>>>>>> doing it right and I am obviously running into various exceptions like
>>>>>> ArrayOutOfBoundsException and so on.
>>>>>>
>>>>>> so I am going to simplify my problem here. Imagine I have the
>>>>>> following as an Example
>>>>>>
>>>>>> *schema1: *
>>>>>>
>>>>>> {“type”:“record”,“name”:“hello1”,“fields”:[{“name”:“foo”,“type
>>>>>> ”:“int”,“default”:1}]}
>>>>>>
>>>>>>
>>>>>> *schema2: *
>>>>>>
>>>>>> {“type”:“record”,“name”:“hello2”,“fields”:[{“name”:“bar”,“type
>>>>>> ”:“int”,“default”:1}]}
>>>>>>
>>>>>>
>>>>>> and if I do Schema.CreateUnion(Arrays.asList(schema1, schema2)) I
>>>>>> get the following
>>>>>>
>>>>>> *unionSchema:*
>>>>>>
>>>>>> [{“type”:“record”,“name”:“hello1”,“fields”:[{“name”:“foo”,“type
>>>>>> ”:“int”,“default”:1}]},{“type”:“record”,“name”:“hello2”,“
>>>>>> fields”:[{“name”:“bar”,“type”:“int”,“default”:1}]}]
>>>>>>
>>>>>>
>>>>>>
>>>>>> Now say my messages inside kafka topic will be something like this
>>>>>>
>>>>>> *message1:*
>>>>>>
>>>>>> {"foo": 5}
>>>>>>
>>>>>> *message2: *
>>>>>>
>>>>>> {"bar": 10}
>>>>>>
>>>>>>
>>>>>> and if I use unionSchema I am unable to parse it! and I am not sure
>>>>>> why? I can't find any resources on how to do this online. any suggestions
>>>>>> will be great.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> On Tue, Jan 9, 2018 at 3:04 AM, kant kodali <ka...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I have avro messages in a Kafka topic and the requirement is that I
>>>>>>> should be able to parse messages that can either have schema1 or schema2. I
>>>>>>> was thinking to create a union of two records but I am not sure if I am
>>>>>>> doing it right and I am obviously running into various exceptions like
>>>>>>> ArrayOutOfBoundsException and so on.
>>>>>>>
>>>>>>> so I am going to simplify my problem here. Imagine I have the
>>>>>>> following as an Example
>>>>>>>
>>>>>>> *schema1: *
>>>>>>>
>>>>>>> {“type”:“record”,“name”:“hello1”,“fields”:[{“name”:“foo”,“type
>>>>>>> ”:“int”,“default”:1}]}
>>>>>>>
>>>>>>>
>>>>>>> *schema2: *
>>>>>>>
>>>>>>> {“type”:“record”,“name”:“hello2”,“fields”:[{“name”:“bar”,“type
>>>>>>> ”:“int”,“default”:1}]}
>>>>>>>
>>>>>>>
>>>>>>> and if I do Schema.CreateUnion(Arrays.asList(schema1, schema2)) I
>>>>>>> get the following
>>>>>>>
>>>>>>> *unionSchema:*
>>>>>>>
>>>>>>> [{“type”:“record”,“name”:“a”,“fields”:[{“name”:“one”,“type”:
>>>>>>> “int”,“default”:1}]},{“type”:“record”,“name”:“b”,“fields”:[{
>>>>>>> “name”:“one”,“type”:“int”,“default”:1}]}]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Now say my messages inside kafka topic will be something like this
>>>>>>>
>>>>>>> *message1:*
>>>>>>>
>>>>>>> {"foo": 5}
>>>>>>>
>>>>>>> *message2: *
>>>>>>>
>>>>>>> {"bar": 10}
>>>>>>>
>>>>>>>
>>>>>>> and if I use unionSchema I am unable to parse it! and I am not sure
>>>>>>> why? I can't find any resources on how to do this online. any suggestions
>>>>>>> will be great.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: How to create union of two records so i can successfully parse it?

Posted by kant kodali <ka...@gmail.com>.
Thanks Nandor! Which schema should I use for serialization now? say I want
to serialize the following

{"hello1":{"foo": 5}}

On Wed, Jan 10, 2018 at 6:05 AM, Nandor Kollar <nk...@cloudera.com> wrote:

> Hi,
>
> Yes, the schema is different, doesn't match with the schema. I think you
> need something like this:
>
> Schema (union of records of type "hello1" or "hello2"):
>
> [
>   {
>     "type": "record",
>     "name": "hello1",
>     "fields": [
>       {
>         "name": "foo",
>         "type": "int",
>         "default": 1
>       }
>     ]
>   },
>   {
>     "type": "record",
>     "name": "hello2",
>     "fields": [
>       {
>         "name": "bar",
>         "type": "int",
>         "default": 1
>       }
>     ]
>   }
> ]
>
> If you have a schema like above, you should be able to use hello1 and
> hello2 type records:
> {"hello1":{"foo": 5}}
> {"hello2":{"bar": 10}}
>
> Hope this solves your question.
>
> Regards,
> Nandor
>
> On Wed, Jan 10, 2018 at 2:10 PM, kant kodali <ka...@gmail.com> wrote:
>
>> Sorry my formatting got messed up. here is the schema I used.
>>
>>
>> {
>>     "type" : "record",
>>     "name" : "data",
>>     "namespace" : "example",
>>     "fields" : [
>>                  {"type":"record","name":"hello1","fields":[{"name":"foo"
>> ,"type":"int","default":1}]},
>>                  {"type":"record","name":"hello2","fields":[{"name":"bar"
>> ,"type":"int","default":1}]}
>>                ]
>> }
>>
>>
>> On Wed, Jan 10, 2018 at 5:08 AM, kant kodali <ka...@gmail.com> wrote:
>>
>>> Hi Nandor,
>>>
>>> Thanks a lot for this. What you have said makes logical sense but I am
>>> new to Avro so I am just trying to figure out how the schema definition
>>> would like for the following messages
>>>
>>> {"hello1":{"foo": 5}}
>>> {"hello2":{"bar": 10}}
>>>
>>> I have tried the following schema definition to parse the above messages
>>> but it didnt quite work so I am wondering how the schema should look like?
>>>
>>>
>>> {
>>>     "type" : "record",
>>>     "name" : "data",
>>>     "namespace" : "example",
>>>     "fields" : [
>>>                 {"type":"record","name":"hello1","fields":[{"name":"foo"
>>> ,"type":"int","default":1}]},
>>>                 {"type":"record","name":"hello2","fields":[{"name":"bar"
>>> ,"type":"int","default":1}]}
>>>                ]
>>> }
>>>
>>> On Tue, Jan 9, 2018 at 3:22 AM, Nandor Kollar <nk...@cloudera.com>
>>> wrote:
>>>
>>>> I think the problem is: you created a union of records, but the Avro
>>>> doesn't know if it is a hello1 record instance, or a hello2 record
>>>> instance. In this case the you should encode the data like this:
>>>> {"hello1":{"foo": 5}}
>>>> {"hello2":{"bar": 10}}
>>>> Here <https://avro.apache.org/docs/1.8.1/spec.html#json_encoding> is
>>>> the relevant part of the specification.
>>>>
>>>> Nandor
>>>>
>>>> On Tue, Jan 9, 2018 at 12:06 PM, kant kodali <ka...@gmail.com>
>>>> wrote:
>>>>
>>>>> Sorry I had a typo I am correcting it here
>>>>>
>>>>> Hi All,
>>>>>
>>>>> I have avro messages in a Kafka topic and the requirement is that I
>>>>> should be able to parse messages that can either have schema1 or schema2. I
>>>>> was thinking to create a union of two records but I am not sure if I am
>>>>> doing it right and I am obviously running into various exceptions like
>>>>> ArrayOutOfBoundsException and so on.
>>>>>
>>>>> so I am going to simplify my problem here. Imagine I have the
>>>>> following as an Example
>>>>>
>>>>> *schema1: *
>>>>>
>>>>> {“type”:“record”,“name”:“hello1”,“fields”:[{“name”:“foo”,“type
>>>>> ”:“int”,“default”:1}]}
>>>>>
>>>>>
>>>>> *schema2: *
>>>>>
>>>>> {“type”:“record”,“name”:“hello2”,“fields”:[{“name”:“bar”,“type
>>>>> ”:“int”,“default”:1}]}
>>>>>
>>>>>
>>>>> and if I do Schema.CreateUnion(Arrays.asList(schema1, schema2)) I get
>>>>> the following
>>>>>
>>>>> *unionSchema:*
>>>>>
>>>>> [{“type”:“record”,“name”:“hello1”,“fields”:[{“name”:“foo”,“type
>>>>> ”:“int”,“default”:1}]},{“type”:“record”,“name”:“hello2”,“
>>>>> fields”:[{“name”:“bar”,“type”:“int”,“default”:1}]}]
>>>>>
>>>>>
>>>>>
>>>>> Now say my messages inside kafka topic will be something like this
>>>>>
>>>>> *message1:*
>>>>>
>>>>> {"foo": 5}
>>>>>
>>>>> *message2: *
>>>>>
>>>>> {"bar": 10}
>>>>>
>>>>>
>>>>> and if I use unionSchema I am unable to parse it! and I am not sure
>>>>> why? I can't find any resources on how to do this online. any suggestions
>>>>> will be great.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> On Tue, Jan 9, 2018 at 3:04 AM, kant kodali <ka...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I have avro messages in a Kafka topic and the requirement is that I
>>>>>> should be able to parse messages that can either have schema1 or schema2. I
>>>>>> was thinking to create a union of two records but I am not sure if I am
>>>>>> doing it right and I am obviously running into various exceptions like
>>>>>> ArrayOutOfBoundsException and so on.
>>>>>>
>>>>>> so I am going to simplify my problem here. Imagine I have the
>>>>>> following as an Example
>>>>>>
>>>>>> *schema1: *
>>>>>>
>>>>>> {“type”:“record”,“name”:“hello1”,“fields”:[{“name”:“foo”,“type
>>>>>> ”:“int”,“default”:1}]}
>>>>>>
>>>>>>
>>>>>> *schema2: *
>>>>>>
>>>>>> {“type”:“record”,“name”:“hello2”,“fields”:[{“name”:“bar”,“type
>>>>>> ”:“int”,“default”:1}]}
>>>>>>
>>>>>>
>>>>>> and if I do Schema.CreateUnion(Arrays.asList(schema1, schema2)) I
>>>>>> get the following
>>>>>>
>>>>>> *unionSchema:*
>>>>>>
>>>>>> [{“type”:“record”,“name”:“a”,“fields”:[{“name”:“one”,“type”:
>>>>>> “int”,“default”:1}]},{“type”:“record”,“name”:“b”,“fields”:[{
>>>>>> “name”:“one”,“type”:“int”,“default”:1}]}]
>>>>>>
>>>>>>
>>>>>>
>>>>>> Now say my messages inside kafka topic will be something like this
>>>>>>
>>>>>> *message1:*
>>>>>>
>>>>>> {"foo": 5}
>>>>>>
>>>>>> *message2: *
>>>>>>
>>>>>> {"bar": 10}
>>>>>>
>>>>>>
>>>>>> and if I use unionSchema I am unable to parse it! and I am not sure
>>>>>> why? I can't find any resources on how to do this online. any suggestions
>>>>>> will be great.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: How to create union of two records so i can successfully parse it?

Posted by Nandor Kollar <nk...@cloudera.com>.
Hi,

Yes, the schema is different, doesn't match with the schema. I think you
need something like this:

Schema (union of records of type "hello1" or "hello2"):

[
  {
    "type": "record",
    "name": "hello1",
    "fields": [
      {
        "name": "foo",
        "type": "int",
        "default": 1
      }
    ]
  },
  {
    "type": "record",
    "name": "hello2",
    "fields": [
      {
        "name": "bar",
        "type": "int",
        "default": 1
      }
    ]
  }
]

If you have a schema like above, you should be able to use hello1 and
hello2 type records:
{"hello1":{"foo": 5}}
{"hello2":{"bar": 10}}

Hope this solves your question.

Regards,
Nandor

On Wed, Jan 10, 2018 at 2:10 PM, kant kodali <ka...@gmail.com> wrote:

> Sorry my formatting got messed up. here is the schema I used.
>
>
> {
>     "type" : "record",
>     "name" : "data",
>     "namespace" : "example",
>     "fields" : [
>                  {"type":"record","name":"hello1","fields":[{"name":"foo",
> "type":"int","default":1}]},
>                  {"type":"record","name":"hello2","fields":[{"name":"bar",
> "type":"int","default":1}]}
>                ]
> }
>
>
> On Wed, Jan 10, 2018 at 5:08 AM, kant kodali <ka...@gmail.com> wrote:
>
>> Hi Nandor,
>>
>> Thanks a lot for this. What you have said makes logical sense but I am
>> new to Avro so I am just trying to figure out how the schema definition
>> would like for the following messages
>>
>> {"hello1":{"foo": 5}}
>> {"hello2":{"bar": 10}}
>>
>> I have tried the following schema definition to parse the above messages
>> but it didnt quite work so I am wondering how the schema should look like?
>>
>>
>> {
>>     "type" : "record",
>>     "name" : "data",
>>     "namespace" : "example",
>>     "fields" : [
>>                 {"type":"record","name":"hello1","fields":[{"name":"foo",
>> "type":"int","default":1}]},
>>                 {"type":"record","name":"hello2","fields":[{"name":"bar",
>> "type":"int","default":1}]}
>>                ]
>> }
>>
>> On Tue, Jan 9, 2018 at 3:22 AM, Nandor Kollar <nk...@cloudera.com>
>> wrote:
>>
>>> I think the problem is: you created a union of records, but the Avro
>>> doesn't know if it is a hello1 record instance, or a hello2 record
>>> instance. In this case the you should encode the data like this:
>>> {"hello1":{"foo": 5}}
>>> {"hello2":{"bar": 10}}
>>> Here <https://avro.apache.org/docs/1.8.1/spec.html#json_encoding> is
>>> the relevant part of the specification.
>>>
>>> Nandor
>>>
>>> On Tue, Jan 9, 2018 at 12:06 PM, kant kodali <ka...@gmail.com> wrote:
>>>
>>>> Sorry I had a typo I am correcting it here
>>>>
>>>> Hi All,
>>>>
>>>> I have avro messages in a Kafka topic and the requirement is that I
>>>> should be able to parse messages that can either have schema1 or schema2. I
>>>> was thinking to create a union of two records but I am not sure if I am
>>>> doing it right and I am obviously running into various exceptions like
>>>> ArrayOutOfBoundsException and so on.
>>>>
>>>> so I am going to simplify my problem here. Imagine I have the following
>>>> as an Example
>>>>
>>>> *schema1: *
>>>>
>>>> {“type”:“record”,“name”:“hello1”,“fields”:[{“name”:“foo”,“type
>>>> ”:“int”,“default”:1}]}
>>>>
>>>>
>>>> *schema2: *
>>>>
>>>> {“type”:“record”,“name”:“hello2”,“fields”:[{“name”:“bar”,“type
>>>> ”:“int”,“default”:1}]}
>>>>
>>>>
>>>> and if I do Schema.CreateUnion(Arrays.asList(schema1, schema2)) I get
>>>> the following
>>>>
>>>> *unionSchema:*
>>>>
>>>> [{“type”:“record”,“name”:“hello1”,“fields”:[{“name”:“foo”,“type
>>>> ”:“int”,“default”:1}]},{“type”:“record”,“name”:“hello2”,“
>>>> fields”:[{“name”:“bar”,“type”:“int”,“default”:1}]}]
>>>>
>>>>
>>>>
>>>> Now say my messages inside kafka topic will be something like this
>>>>
>>>> *message1:*
>>>>
>>>> {"foo": 5}
>>>>
>>>> *message2: *
>>>>
>>>> {"bar": 10}
>>>>
>>>>
>>>> and if I use unionSchema I am unable to parse it! and I am not sure
>>>> why? I can't find any resources on how to do this online. any suggestions
>>>> will be great.
>>>>
>>>> Thanks!
>>>>
>>>> On Tue, Jan 9, 2018 at 3:04 AM, kant kodali <ka...@gmail.com> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I have avro messages in a Kafka topic and the requirement is that I
>>>>> should be able to parse messages that can either have schema1 or schema2. I
>>>>> was thinking to create a union of two records but I am not sure if I am
>>>>> doing it right and I am obviously running into various exceptions like
>>>>> ArrayOutOfBoundsException and so on.
>>>>>
>>>>> so I am going to simplify my problem here. Imagine I have the
>>>>> following as an Example
>>>>>
>>>>> *schema1: *
>>>>>
>>>>> {“type”:“record”,“name”:“hello1”,“fields”:[{“name”:“foo”,“type
>>>>> ”:“int”,“default”:1}]}
>>>>>
>>>>>
>>>>> *schema2: *
>>>>>
>>>>> {“type”:“record”,“name”:“hello2”,“fields”:[{“name”:“bar”,“type
>>>>> ”:“int”,“default”:1}]}
>>>>>
>>>>>
>>>>> and if I do Schema.CreateUnion(Arrays.asList(schema1, schema2)) I get
>>>>> the following
>>>>>
>>>>> *unionSchema:*
>>>>>
>>>>> [{“type”:“record”,“name”:“a”,“fields”:[{“name”:“one”,“type”:
>>>>> “int”,“default”:1}]},{“type”:“record”,“name”:“b”,“fields”:[{
>>>>> “name”:“one”,“type”:“int”,“default”:1}]}]
>>>>>
>>>>>
>>>>>
>>>>> Now say my messages inside kafka topic will be something like this
>>>>>
>>>>> *message1:*
>>>>>
>>>>> {"foo": 5}
>>>>>
>>>>> *message2: *
>>>>>
>>>>> {"bar": 10}
>>>>>
>>>>>
>>>>> and if I use unionSchema I am unable to parse it! and I am not sure
>>>>> why? I can't find any resources on how to do this online. any suggestions
>>>>> will be great.
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: How to create union of two records so i can successfully parse it?

Posted by kant kodali <ka...@gmail.com>.
Sorry my formatting got messed up. here is the schema I used.


{
    "type" : "record",
    "name" : "data",
    "namespace" : "example",
    "fields" : [
                 {"type":"record","name":"hello1","fields":[{"name":"foo",
"type":"int","default":1}]},
                 {"type":"record","name":"hello2","fields":[{"name":"bar",
"type":"int","default":1}]}
               ]
}


On Wed, Jan 10, 2018 at 5:08 AM, kant kodali <ka...@gmail.com> wrote:

> Hi Nandor,
>
> Thanks a lot for this. What you have said makes logical sense but I am new
> to Avro so I am just trying to figure out how the schema definition would
> like for the following messages
>
> {"hello1":{"foo": 5}}
> {"hello2":{"bar": 10}}
>
> I have tried the following schema definition to parse the above messages
> but it didnt quite work so I am wondering how the schema should look like?
>
>
> {
>     "type" : "record",
>     "name" : "data",
>     "namespace" : "example",
>     "fields" : [
>                 {"type":"record","name":"hello1","fields":[{"name":"foo",
> "type":"int","default":1}]},
>                 {"type":"record","name":"hello2","fields":[{"name":"bar",
> "type":"int","default":1}]}
>                ]
> }
>
> On Tue, Jan 9, 2018 at 3:22 AM, Nandor Kollar <nk...@cloudera.com>
> wrote:
>
>> I think the problem is: you created a union of records, but the Avro
>> doesn't know if it is a hello1 record instance, or a hello2 record
>> instance. In this case the you should encode the data like this:
>> {"hello1":{"foo": 5}}
>> {"hello2":{"bar": 10}}
>> Here <https://avro.apache.org/docs/1.8.1/spec.html#json_encoding> is the
>> relevant part of the specification.
>>
>> Nandor
>>
>> On Tue, Jan 9, 2018 at 12:06 PM, kant kodali <ka...@gmail.com> wrote:
>>
>>> Sorry I had a typo I am correcting it here
>>>
>>> Hi All,
>>>
>>> I have avro messages in a Kafka topic and the requirement is that I
>>> should be able to parse messages that can either have schema1 or schema2. I
>>> was thinking to create a union of two records but I am not sure if I am
>>> doing it right and I am obviously running into various exceptions like
>>> ArrayOutOfBoundsException and so on.
>>>
>>> so I am going to simplify my problem here. Imagine I have the following
>>> as an Example
>>>
>>> *schema1: *
>>>
>>> {“type”:“record”,“name”:“hello1”,“fields”:[{“name”:“foo”,“type
>>> ”:“int”,“default”:1}]}
>>>
>>>
>>> *schema2: *
>>>
>>> {“type”:“record”,“name”:“hello2”,“fields”:[{“name”:“bar”,“type
>>> ”:“int”,“default”:1}]}
>>>
>>>
>>> and if I do Schema.CreateUnion(Arrays.asList(schema1, schema2)) I get
>>> the following
>>>
>>> *unionSchema:*
>>>
>>> [{“type”:“record”,“name”:“hello1”,“fields”:[{“name”:“foo”,“type
>>> ”:“int”,“default”:1}]},{“type”:“record”,“name”:“hello2”,“
>>> fields”:[{“name”:“bar”,“type”:“int”,“default”:1}]}]
>>>
>>>
>>>
>>> Now say my messages inside kafka topic will be something like this
>>>
>>> *message1:*
>>>
>>> {"foo": 5}
>>>
>>> *message2: *
>>>
>>> {"bar": 10}
>>>
>>>
>>> and if I use unionSchema I am unable to parse it! and I am not sure why?
>>> I can't find any resources on how to do this online. any suggestions will
>>> be great.
>>>
>>> Thanks!
>>>
>>> On Tue, Jan 9, 2018 at 3:04 AM, kant kodali <ka...@gmail.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I have avro messages in a Kafka topic and the requirement is that I
>>>> should be able to parse messages that can either have schema1 or schema2. I
>>>> was thinking to create a union of two records but I am not sure if I am
>>>> doing it right and I am obviously running into various exceptions like
>>>> ArrayOutOfBoundsException and so on.
>>>>
>>>> so I am going to simplify my problem here. Imagine I have the following
>>>> as an Example
>>>>
>>>> *schema1: *
>>>>
>>>> {“type”:“record”,“name”:“hello1”,“fields”:[{“name”:“foo”,“type
>>>> ”:“int”,“default”:1}]}
>>>>
>>>>
>>>> *schema2: *
>>>>
>>>> {“type”:“record”,“name”:“hello2”,“fields”:[{“name”:“bar”,“type
>>>> ”:“int”,“default”:1}]}
>>>>
>>>>
>>>> and if I do Schema.CreateUnion(Arrays.asList(schema1, schema2)) I get
>>>> the following
>>>>
>>>> *unionSchema:*
>>>>
>>>> [{“type”:“record”,“name”:“a”,“fields”:[{“name”:“one”,“type”:
>>>> “int”,“default”:1}]},{“type”:“record”,“name”:“b”,“fields”:[{
>>>> “name”:“one”,“type”:“int”,“default”:1}]}]
>>>>
>>>>
>>>>
>>>> Now say my messages inside kafka topic will be something like this
>>>>
>>>> *message1:*
>>>>
>>>> {"foo": 5}
>>>>
>>>> *message2: *
>>>>
>>>> {"bar": 10}
>>>>
>>>>
>>>> and if I use unionSchema I am unable to parse it! and I am not sure
>>>> why? I can't find any resources on how to do this online. any suggestions
>>>> will be great.
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: How to create union of two records so i can successfully parse it?

Posted by kant kodali <ka...@gmail.com>.
Hi Nandor,

Thanks a lot for this. What you have said makes logical sense but I am new
to Avro so I am just trying to figure out how the schema definition would
like for the following messages

{"hello1":{"foo": 5}}
{"hello2":{"bar": 10}}

I have tried the following schema definition to parse the above messages
but it didnt quite work so I am wondering how the schema should look like?


{
    "type" : "record",
    "name" : "data",
    "namespace" : "example",
    "fields" : [
                {"type":"record","name":"hello1","fields":[{"name":"foo",
"type":"int","default":1}]},
                {"type":"record","name":"hello2","fields":[{"name":"bar",
"type":"int","default":1}]}
               ]
}

On Tue, Jan 9, 2018 at 3:22 AM, Nandor Kollar <nk...@cloudera.com> wrote:

> I think the problem is: you created a union of records, but the Avro
> doesn't know if it is a hello1 record instance, or a hello2 record
> instance. In this case the you should encode the data like this:
> {"hello1":{"foo": 5}}
> {"hello2":{"bar": 10}}
> Here <https://avro.apache.org/docs/1.8.1/spec.html#json_encoding> is the
> relevant part of the specification.
>
> Nandor
>
> On Tue, Jan 9, 2018 at 12:06 PM, kant kodali <ka...@gmail.com> wrote:
>
>> Sorry I had a typo I am correcting it here
>>
>> Hi All,
>>
>> I have avro messages in a Kafka topic and the requirement is that I
>> should be able to parse messages that can either have schema1 or schema2. I
>> was thinking to create a union of two records but I am not sure if I am
>> doing it right and I am obviously running into various exceptions like
>> ArrayOutOfBoundsException and so on.
>>
>> so I am going to simplify my problem here. Imagine I have the following
>> as an Example
>>
>> *schema1: *
>>
>> {“type”:“record”,“name”:“hello1”,“fields”:[{“name”:“foo”,“type
>> ”:“int”,“default”:1}]}
>>
>>
>> *schema2: *
>>
>> {“type”:“record”,“name”:“hello2”,“fields”:[{“name”:“bar”,“type
>> ”:“int”,“default”:1}]}
>>
>>
>> and if I do Schema.CreateUnion(Arrays.asList(schema1, schema2)) I get
>> the following
>>
>> *unionSchema:*
>>
>> [{“type”:“record”,“name”:“hello1”,“fields”:[{“name”:“foo”,“type
>> ”:“int”,“default”:1}]},{“type”:“record”,“name”:“hello2”,“
>> fields”:[{“name”:“bar”,“type”:“int”,“default”:1}]}]
>>
>>
>>
>> Now say my messages inside kafka topic will be something like this
>>
>> *message1:*
>>
>> {"foo": 5}
>>
>> *message2: *
>>
>> {"bar": 10}
>>
>>
>> and if I use unionSchema I am unable to parse it! and I am not sure why?
>> I can't find any resources on how to do this online. any suggestions will
>> be great.
>>
>> Thanks!
>>
>> On Tue, Jan 9, 2018 at 3:04 AM, kant kodali <ka...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> I have avro messages in a Kafka topic and the requirement is that I
>>> should be able to parse messages that can either have schema1 or schema2. I
>>> was thinking to create a union of two records but I am not sure if I am
>>> doing it right and I am obviously running into various exceptions like
>>> ArrayOutOfBoundsException and so on.
>>>
>>> so I am going to simplify my problem here. Imagine I have the following
>>> as an Example
>>>
>>> *schema1: *
>>>
>>> {“type”:“record”,“name”:“hello1”,“fields”:[{“name”:“foo”,“type
>>> ”:“int”,“default”:1}]}
>>>
>>>
>>> *schema2: *
>>>
>>> {“type”:“record”,“name”:“hello2”,“fields”:[{“name”:“bar”,“type
>>> ”:“int”,“default”:1}]}
>>>
>>>
>>> and if I do Schema.CreateUnion(Arrays.asList(schema1, schema2)) I get
>>> the following
>>>
>>> *unionSchema:*
>>>
>>> [{“type”:“record”,“name”:“a”,“fields”:[{“name”:“one”,“type”:
>>> “int”,“default”:1}]},{“type”:“record”,“name”:“b”,“fields”:[{
>>> “name”:“one”,“type”:“int”,“default”:1}]}]
>>>
>>>
>>>
>>> Now say my messages inside kafka topic will be something like this
>>>
>>> *message1:*
>>>
>>> {"foo": 5}
>>>
>>> *message2: *
>>>
>>> {"bar": 10}
>>>
>>>
>>> and if I use unionSchema I am unable to parse it! and I am not sure why?
>>> I can't find any resources on how to do this online. any suggestions will
>>> be great.
>>>
>>> Thanks!
>>>
>>>
>>>
>>>
>>>
>>>
>>
>

Re: How to create union of two records so i can successfully parse it?

Posted by Nandor Kollar <nk...@cloudera.com>.
I think the problem is: you created a union of records, but the Avro
doesn't know if it is a hello1 record instance, or a hello2 record
instance. In this case the you should encode the data like this:
{"hello1":{"foo": 5}}
{"hello2":{"bar": 10}}
Here <https://avro.apache.org/docs/1.8.1/spec.html#json_encoding> is the
relevant part of the specification.

Nandor

On Tue, Jan 9, 2018 at 12:06 PM, kant kodali <ka...@gmail.com> wrote:

> Sorry I had a typo I am correcting it here
>
> Hi All,
>
> I have avro messages in a Kafka topic and the requirement is that I should
> be able to parse messages that can either have schema1 or schema2. I was
> thinking to create a union of two records but I am not sure if I am doing
> it right and I am obviously running into various exceptions like
> ArrayOutOfBoundsException and so on.
>
> so I am going to simplify my problem here. Imagine I have the following as
> an Example
>
> *schema1: *
>
> {“type”:“record”,“name”:“hello1”,“fields”:[{“name”:“foo”,“type
> ”:“int”,“default”:1}]}
>
>
> *schema2: *
>
> {“type”:“record”,“name”:“hello2”,“fields”:[{“name”:“bar”,“type
> ”:“int”,“default”:1}]}
>
>
> and if I do Schema.CreateUnion(Arrays.asList(schema1, schema2)) I get the
> following
>
> *unionSchema:*
>
> [{“type”:“record”,“name”:“hello1”,“fields”:[{“name”:“foo”,“type
> ”:“int”,“default”:1}]},{“type”:“record”,“name”:“
> hello2”,“fields”:[{“name”:“bar”,“type”:“int”,“default”:1}]}]
>
>
>
> Now say my messages inside kafka topic will be something like this
>
> *message1:*
>
> {"foo": 5}
>
> *message2: *
>
> {"bar": 10}
>
>
> and if I use unionSchema I am unable to parse it! and I am not sure why? I
> can't find any resources on how to do this online. any suggestions will be
> great.
>
> Thanks!
>
> On Tue, Jan 9, 2018 at 3:04 AM, kant kodali <ka...@gmail.com> wrote:
>
>> Hi All,
>>
>> I have avro messages in a Kafka topic and the requirement is that I
>> should be able to parse messages that can either have schema1 or schema2. I
>> was thinking to create a union of two records but I am not sure if I am
>> doing it right and I am obviously running into various exceptions like
>> ArrayOutOfBoundsException and so on.
>>
>> so I am going to simplify my problem here. Imagine I have the following
>> as an Example
>>
>> *schema1: *
>>
>> {“type”:“record”,“name”:“hello1”,“fields”:[{“name”:“foo”,“type
>> ”:“int”,“default”:1}]}
>>
>>
>> *schema2: *
>>
>> {“type”:“record”,“name”:“hello2”,“fields”:[{“name”:“bar”,“type
>> ”:“int”,“default”:1}]}
>>
>>
>> and if I do Schema.CreateUnion(Arrays.asList(schema1, schema2)) I get
>> the following
>>
>> *unionSchema:*
>>
>> [{“type”:“record”,“name”:“a”,“fields”:[{“name”:“one”,“type”:
>> “int”,“default”:1}]},{“type”:“record”,“name”:“b”,“fields”:[{
>> “name”:“one”,“type”:“int”,“default”:1}]}]
>>
>>
>>
>> Now say my messages inside kafka topic will be something like this
>>
>> *message1:*
>>
>> {"foo": 5}
>>
>> *message2: *
>>
>> {"bar": 10}
>>
>>
>> and if I use unionSchema I am unable to parse it! and I am not sure why?
>> I can't find any resources on how to do this online. any suggestions will
>> be great.
>>
>> Thanks!
>>
>>
>>
>>
>>
>>
>

Re: How to create union of two records so i can successfully parse it?

Posted by kant kodali <ka...@gmail.com>.
Sorry I had a typo I am correcting it here

Hi All,

I have avro messages in a Kafka topic and the requirement is that I should
be able to parse messages that can either have schema1 or schema2. I was
thinking to create a union of two records but I am not sure if I am doing
it right and I am obviously running into various exceptions like
ArrayOutOfBoundsException and so on.

so I am going to simplify my problem here. Imagine I have the following as
an Example

*schema1: *

{“type”:“record”,“name”:“hello1”,“fields”:[{“name”:“foo”,“type
”:“int”,“default”:1}]}


*schema2: *

{“type”:“record”,“name”:“hello2”,“fields”:[{“name”:“bar”,“type
”:“int”,“default”:1}]}


and if I do Schema.CreateUnion(Arrays.asList(schema1, schema2)) I get the
following

*unionSchema:*

[{“type”:“record”,“name”:“hello1”,“fields”:[{“name”:“foo”,“type”:
“int”,“default”:1}]},{“type”:“record”,“name”:“hello2”,“fields”:[{
“name”:“bar”,“type”:“int”,“default”:1}]}]



Now say my messages inside kafka topic will be something like this

*message1:*

{"foo": 5}

*message2: *

{"bar": 10}


and if I use unionSchema I am unable to parse it! and I am not sure why? I
can't find any resources on how to do this online. any suggestions will be
great.

Thanks!

On Tue, Jan 9, 2018 at 3:04 AM, kant kodali <ka...@gmail.com> wrote:

> Hi All,
>
> I have avro messages in a Kafka topic and the requirement is that I should
> be able to parse messages that can either have schema1 or schema2. I was
> thinking to create a union of two records but I am not sure if I am doing
> it right and I am obviously running into various exceptions like
> ArrayOutOfBoundsException and so on.
>
> so I am going to simplify my problem here. Imagine I have the following as
> an Example
>
> *schema1: *
>
> {“type”:“record”,“name”:“hello1”,“fields”:[{“name”:“foo”,“type
> ”:“int”,“default”:1}]}
>
>
> *schema2: *
>
> {“type”:“record”,“name”:“hello2”,“fields”:[{“name”:“bar”,“type
> ”:“int”,“default”:1}]}
>
>
> and if I do Schema.CreateUnion(Arrays.asList(schema1, schema2)) I get the
> following
>
> *unionSchema:*
>
> [{“type”:“record”,“name”:“a”,“fields”:[{“name”:“one”,“type”:
> “int”,“default”:1}]},{“type”:“record”,“name”:“b”,“fields”:[{“name”:“one”,“
> type”:“int”,“default”:1}]}]
>
>
>
> Now say my messages inside kafka topic will be something like this
>
> *message1:*
>
> {"foo": 5}
>
> *message2: *
>
> {"bar": 10}
>
>
> and if I use unionSchema I am unable to parse it! and I am not sure why? I
> can't find any resources on how to do this online. any suggestions will be
> great.
>
> Thanks!
>
>
>
>
>
>