You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Erwin Speybroeck <er...@crv4all.com> on 2020/03/31 06:51:41 UTC

AVRO definition question - record within a record?

Hi,

I need to be able to make a POST call to an API and the body should look like this :

{
  "location" : "355669",
  "countryCode" : "NL",
  "identificationNumber" : "NL 672760327",
  "externalId" : "KTSPRED_01_817997491",
  "dateTime" : "2019-11-08T04:33:41.000Z",
  "value" : "GEMIDDELD_RISICO",
  "type" : "ketosis_prediction",
  "additionalInformation" : "{
                   "calvingDate": "2018-10-01",
                   "parity": "3",
                   "create_date": "2019-11-08T04:33:41.000Z "
             }"
}

I tried the following AVRO definition for serialisation (starting from a csv file) :

{
    "type" : "record",
    "name" : "person",
    "namespace" : "nifi",
    "fields" : [{"name" : "location" ,
                  "type" : "int"},

                 {"name" : "country" ,
                  "type" : "string"},

                 {"name" : "animal_number" ,
                  "type" : "string"},

                 {"name" : "alert_id" ,
                  "type" : "string"},

                 {"name" : "alert_date" ,
                  "type" : "string"},

                 {"name" : "type_of_alert" ,
                  "type" : "string"},

                 {"name" : "alert_name" ,
                  "type" : "string"},

                  {"name" : "additionalInformation",
                   "type" : {
                         "type" : "record",
                         "name" : "test",
                         "fields" : [
                            {"name" : "calving_date",
                             "type" : "string"},

                            {"name" : "parity",
                             "type" : "string"},

                            {"name" : "create_dtm_dl",
                             "type" : "string"}
                         ]},
                          "default" : {}
                }
    ]
}

But it does not work. Is it possible to define a new record within a record? Or should it be done in another way?

My hive tables are in CSV and I have to convert them to JSON so I can post them.
To create this JSON I have to use an AVRO schema. It works fine until the field "additionalInformation".

I'm not able to generate the fields inside additionalInformation, the only thing I can do is to say that additionalInformation is a string. But then it doesn't create the fields that I want and it doesn't post it.
ABove is my AVRO schema trying to create the JSON. The BOLD part is the one trying to create the additionalInformation field as a record, but it doesn't work and I have to change the type to string so that it works, but then the POST body is not json.

The csv file looks like this - maybe I need to change this input file in some way?

alert_name;animal_number;country;location;alert_id;type_of_alert;alert_date;calving_date;parity;create_dtm_dl
"ketosis_prediction";"NL 743169121";"NL";83618;"KTSPRED_01_817997482";"HOOG_RISICO";"2019-11-08 04:33:38.0";2019-11-07 00:00:00.0;4;2019-11-09 19:13:29.484
"ketosis_prediction";"NL 672760327";"NL";355669;"KTSPRED_01_817997491";"GEMIDDELD_RISICO";"2019-11-08 04:33:41.0";2019-11-07 00:00:00.0;3;2019-11-09 19:13:29.484


Met vriendelijke groet, Kind regards, S pozdravem, Freundlichen Grüßen, Atenciosamente,

Erwin Speybroeck
Lead Business Consultant | BU Data

[cid:image001.png@01D60735.64135CE0]

[cid:image002.png@01D60735.64135CE0] (0)26-3898621
[cid:image003.png@01D60735.64135CE0] 0032475-252401
[cid:image004.png@01D60735.64135CE0] erwin.speybroeck@crv4all.com<ma...@crv4all.com>

This message is subject to the following E-mail Disclaimer. (http://www.crv4all.com/disclaimer-email/) CRV Holding B.V. seats according to the articles of association in Arnhem, Dutch trade number 09125050.

Re: AVRO definition question - record within a record?

Posted by fa...@legsem.com.
This code, using your schema: 

Schema schema = new Schema.Parser().parse(new
File("src/test/data/nestedrecord.avsc"));
JsonEncoder out = EncoderFactory.get().jsonEncoder(schema, System.out,
true);
DatumWriter<Object> writer = new GenericDatumWriter<>(schema);
GenericRecord person = new GenericData.Record(schema);
person.put("location", 5);
person.put("country", "TH");
person.put("animal_number", "7");
person.put("alert_id", "ab1");
person.put("alert_date", "2014-12-05");
person.put("type_of_alert", "tu");
person.put("alert_name", "zu");
GenericRecord test = new
GenericData.Record(schema.getField("additionalInformation").schema());
test.put("calving_date", "2014-12-05");
test.put("parity", "p");
test.put("create_dtm_dl", "12:12:12");
person.put("additionalInformation", test);
writer.write(person, out);
out.flush(); 

Produces this result: 

{
"location" : 5,
"country" : "TH",
"animal_number" : "7",
"alert_id" : "ab1",
"alert_date" : "2014-12-05",
"type_of_alert" : "tu",
"alert_name" : "zu",
"additionalInformation" : {
    "calving_date" : "2014-12-05",
    "parity" : "p",
    "create_dtm_dl" : "12:12:12"
    }
} 

So nested records are properly supported in avro and widely used. 

Maybe something wrong in your code you are using? 

Cheers 

On 31.03.2020 08:51, Erwin Speybroeck wrote:

> Hi, 
> 
> I need to be able to make a POST call to an API and the body should look like this : 
> 
> { 
> 
> "location" : "355669", 
> 
> "countryCode" : "NL", 
> 
> "identificationNumber" : "NL 672760327", 
> 
> "externalId" : "KTSPRED_01_817997491", 
> 
> "dateTime" : "2019-11-08T04:33:41.000Z", 
> 
> "value" : "GEMIDDELD_RISICO", 
> 
> "type" : "ketosis_prediction", 
> 
> "additionalInformation" : "{ 
> 
> "calvingDate": "2018-10-01", 
> 
> "parity": "3", 
> 
> "create_date": "2019-11-08T04:33:41.000Z " 
> 
> }" 
> 
> } 
> 
> I tried the following AVRO definition for serialisation (starting from a csv file) : 
> 
> { 
> 
> "type" : "record", 
> 
> "name" : "person", 
> 
> "namespace" : "nifi", 
> 
> "fields" : [{"name" : "location" , 
> 
> "type" : "int"}, 
> 
> {"name" : "country" , 
> 
> "type" : "string"}, 
> 
> {"name" : "animal_number" , 
> 
> "type" : "string"}, 
> 
> {"name" : "alert_id" , 
> 
> "type" : "string"}, 
> 
> {"name" : "alert_date" , 
> 
> "type" : "string"}, 
> 
> {"name" : "type_of_alert" , 
> 
> "type" : "string"}, 
> 
> {"name" : "alert_name" , 
> 
> "type" : "string"}, 
> 
> {"name" : "additionalInformation", 
> 
> "type" : { 
> 
> "type" : "record", 
> 
> "name" : "test", 
> 
> "fields" : [ 
> 
> {"name" : "calving_date", 
> 
> "type" : "string"}, 
> 
> {"name" : "parity", 
> 
> "type" : "string"}, 
> 
> {"name" : "create_dtm_dl", 
> 
> "type" : "string"} 
> 
> ]}, 
> 
> "default" : {} 
> 
> } 
> 
> ] 
> 
> } 
> 
> But it does not work. Is it possible to define a new record within a record? Or should it be done in another way? 
> 
> My hive tables are in CSV and I have to convert them to JSON so I can post them. 
> 
> To create this JSON I have to use an AVRO schema. It works fine until the field "additionalInformation". 
> 
> I'm not able to generate the fields inside additionalInformation, the only thing I can do is to say that additionalInformation is a string. But then it doesn't create the fields that I want and it doesn't post it. 
> 
> ABove is my AVRO schema trying to create the JSON. The BOLD part is the one trying to create the additionalInformation field as a record, but it doesn't work and I have to change the type to string so that it works, but then the POST body is not json. 
> 
> The csv file looks like this - maybe I need to change this input file in some way? 
> 
> alert_name;animal_number;country;location;alert_id;type_of_alert;alert_date;calving_date;parity;create_dtm_dl 
> 
> "ketosis_prediction";"NL 743169121";"NL";83618;"KTSPRED_01_817997482";"HOOG_RISICO";"2019-11-08 04:33:38.0";2019-11-07 00:00:00.0;4;2019-11-09 19:13:29.484 
> 
> "ketosis_prediction";"NL 672760327";"NL";355669;"KTSPRED_01_817997491";"GEMIDDELD_RISICO";"2019-11-08 04:33:41.0";2019-11-07 00:00:00.0;3;2019-11-09 19:13:29.484 
> 
> Met vriendelijke groet, Kind regards, S pozdravem, Freundlichen Grüßen, Atenciosamente, 
> 
> Erwin Speybroeck 
> 
> _Lead Business Consultant | BU Data_ 
> 
> (0)26-3898621 
> 
> 0032475-252401 
> 
> erwin.speybroeck@crv4all.com 
> 
> This message is subject to the following E-mail Disclaimer. (http://www.crv4all.com/disclaimer-email/) CRV Holding B.V. seats according to the articles of association in Arnhem, Dutch trade number 09125050.