You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Steve Champagne <ch...@gmail.com> on 2017/03/28 14:24:41 UTC

Pulling API Endpoints into Kafka Topics in Avro

I'm in the process of creating an ingest workflow that will pull into Kafka
topics a number of API endpoints on an hourly basis. I'd like convert them
from JSON to AVRO when I bring them in. I have, however, run into a few
problems that I haven't been able to figure out and haven't turned anything
up through searches. This seems like it would be a fairly common use case
of NiFi, so I figured I'd ask around to see what others are doing in these
cases.

The first problem that I'm running into is that some of the endpoints have
objects of the form:

{
  "metricsPerAgent": {
    "6453": {
      "connectedEngagements": 3,
      "nonInteractiveTotalHandlingTime": 0
    },
    "6454": {
      "connectedEngagements": 1,
      "nonInteractiveTotalHandlingTime": 0
    }
  }
}

I'm using an UpdateAttribute processor to add a schema that I get from
running the object through the InferAvroSchema processor and running the
flowfile into a ConvertJSONToAvro processor. There, unfortunately, I'm
getting an error with the ConvertJSONToAvro processor not liking the field
names being numbers. What do people normally do in cases like these?

Thanks!

Re: Pulling API Endpoints into Kafka Topics in Avro

Posted by Steve Champagne <ch...@gmail.com>.
Ah, that worked great! I hadn't known about the Avro map type. Thanks! 😃

On Tue, Mar 28, 2017 at 11:51 AM, James Wing <jv...@gmail.com> wrote:

> Steve,
>
> The inferred schemas can be helpful to get you started, but I recommend
> providing your own Avro schema based on your knowledge of what should be
> guaranteed to downstream systems.  If you want to pass untyped data, you
> can't really beat JSON.  Avro schema isn't so bad, honest.
>
> As part of the numeric key issue, I think your snippet above suggests that
> the keys are not fixed in each sample?  It might be covered by using an
> Avro "map" type rather than a "record":
>
> {
>     "type": "record",
>     "name": "testrecord",
>     "fields": [
>         {
>             "name": "metricsPerAgent",
>             "type": {
>                 "type": "map",
>                 "values": {
>                     "type": "record",
>                     "name": "agentMetrics",
>                     "fields": [
>                         {
>                             "name": "connectedEngagements",
>                             "type": "long"
>                         },
>                         {
>                             "name": "nonInteractiveTotalHandlingTime",
>                             "type": "long"
>                         }
>                     ]
>                 }
>             }
>         }
>     ]
> }
>
> Thanks,
>
> James
>
>
>
> On Tue, Mar 28, 2017 at 7:24 AM, Steve Champagne <ch...@gmail.com>
> wrote:
>
>> I'm in the process of creating an ingest workflow that will pull into
>> Kafka topics a number of API endpoints on an hourly basis. I'd like convert
>> them from JSON to AVRO when I bring them in. I have, however, run into a
>> few problems that I haven't been able to figure out and haven't turned
>> anything up through searches. This seems like it would be a fairly common
>> use case of NiFi, so I figured I'd ask around to see what others are doing
>> in these cases.
>>
>> The first problem that I'm running into is that some of the endpoints
>> have objects of the form:
>>
>> {
>>   "metricsPerAgent": {
>>     "6453": {
>>       "connectedEngagements": 3,
>>       "nonInteractiveTotalHandlingTime": 0
>>     },
>>     "6454": {
>>       "connectedEngagements": 1,
>>       "nonInteractiveTotalHandlingTime": 0
>>     }
>>   }
>> }
>>
>> I'm using an UpdateAttribute processor to add a schema that I get from
>> running the object through the InferAvroSchema processor and running the
>> flowfile into a ConvertJSONToAvro processor. There, unfortunately, I'm
>> getting an error with the ConvertJSONToAvro processor not liking the field
>> names being numbers. What do people normally do in cases like these?
>>
>> Thanks!
>>
>
>

Re: Pulling API Endpoints into Kafka Topics in Avro

Posted by James Wing <jv...@gmail.com>.
Steve,

The inferred schemas can be helpful to get you started, but I recommend
providing your own Avro schema based on your knowledge of what should be
guaranteed to downstream systems.  If you want to pass untyped data, you
can't really beat JSON.  Avro schema isn't so bad, honest.

As part of the numeric key issue, I think your snippet above suggests that
the keys are not fixed in each sample?  It might be covered by using an
Avro "map" type rather than a "record":

{
    "type": "record",
    "name": "testrecord",
    "fields": [
        {
            "name": "metricsPerAgent",
            "type": {
                "type": "map",
                "values": {
                    "type": "record",
                    "name": "agentMetrics",
                    "fields": [
                        {
                            "name": "connectedEngagements",
                            "type": "long"
                        },
                        {
                            "name": "nonInteractiveTotalHandlingTime",
                            "type": "long"
                        }
                    ]
                }
            }
        }
    ]
}

Thanks,

James



On Tue, Mar 28, 2017 at 7:24 AM, Steve Champagne <ch...@gmail.com>
wrote:

> I'm in the process of creating an ingest workflow that will pull into
> Kafka topics a number of API endpoints on an hourly basis. I'd like convert
> them from JSON to AVRO when I bring them in. I have, however, run into a
> few problems that I haven't been able to figure out and haven't turned
> anything up through searches. This seems like it would be a fairly common
> use case of NiFi, so I figured I'd ask around to see what others are doing
> in these cases.
>
> The first problem that I'm running into is that some of the endpoints have
> objects of the form:
>
> {
>   "metricsPerAgent": {
>     "6453": {
>       "connectedEngagements": 3,
>       "nonInteractiveTotalHandlingTime": 0
>     },
>     "6454": {
>       "connectedEngagements": 1,
>       "nonInteractiveTotalHandlingTime": 0
>     }
>   }
> }
>
> I'm using an UpdateAttribute processor to add a schema that I get from
> running the object through the InferAvroSchema processor and running the
> flowfile into a ConvertJSONToAvro processor. There, unfortunately, I'm
> getting an error with the ConvertJSONToAvro processor not liking the field
> names being numbers. What do people normally do in cases like these?
>
> Thanks!
>