You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Justin Ryan <ju...@ziprealty.com> on 2016/02/29 23:52:38 UTC

Avro source: could not find schema for event

Hiya,

I¹ve got a fairly simply flume agent pulling events from kafka and landing
them in HDFS.  For plain text messages, this works fine.

I created a topic specifically for the purpose of testing sending avro
messages through kafka to land in HDFS, which I¹m having some trouble with.

I noted from 
https://thisdataguy.com/2014/07/28/avro-end-to-end-in-hdfs-part-2-flume-setu
p/ the example of flume¹s default avro schema[0], which will do for my
testing, and set up my python-avro producer to send messages with this
schema.  Unfortunately, I still have flume looping this message in its¹ log:

  org.apache.flume.FlumeException: Could not find schema for event

I¹m running out of assumptions to rethink / verify here, would appreciate
any guidance on what I may be missing..

Thanks in advance,

Justin

[0] {
 "type": "record",
 "name": "Event",
 "fields": [{
   "name": "headers",
   "type": {
     "type": "map",
     "values": "string"
   }
 }, {
   "name": "body",
   "type": "bytes"
 }]
}




Re: Avro source: could not find schema for event

Posted by Justin Ryan <ju...@ziprealty.com>.
Thanks, Hari ­ I was looking for something like this.

I am still a bit confused, because when I write producer / consumer code in
Python, my consumer reads data out of kafka just fine and has no information
about the schema.

That said, I need to keep it available in HDFS for producers anyway, so this
will certainly do.

Is there any interaction between flume and schema registries?

From:  Hari Shreedharan <hs...@cloudera.com>
Reply-To:  <us...@flume.apache.org>
Date:  Tuesday, March 8, 2016 at 1:11 PM
To:  "user@flume.apache.org" <us...@flume.apache.org>
Subject:  Re: Avro source: could not find schema for event

You can use a URL (on HDFS/HTTP), that points to the schema:
https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-hdfs-sink/sr
c/main/java/org/apache/flume/sink/hdfs/AvroEventSerializer.java#L70

Use that URL to store your schema for the event, so you don't have to add it
to the event itself.

Avro schema is only embedded in the files and not in event data, so we need
to make sure we write to the correct file based on the event's own schema.
avro_event works because we write the events out in a fixed schema (not the
event's schema itself).


Thanks,
Hari

On Tue, Mar 8, 2016 at 1:05 PM, Justin Ryan <ju...@ziprealty.com> wrote:
> Hiya folks, still struggling with this, is anyone on the list familiar with
> AvroEventSerializer$Builder ?
> 
> While I have gotten past my outright failure, I¹ve only done so by adopting a
> fairly inflexible schema, which seems counter to the goal of using avro.
> Particularly frustrating is that flume simply needs to pass the existing
> message along, though I understand it likely needs to grok to separate
> messages.  I can¹t even find Kafka consumer code which is capable of being
> schema-aware.
> 
> From:  Justin Ryan <ju...@ziprealty.com>
> Reply-To:  <us...@flume.apache.org>
> Date:  Thursday, March 3, 2016 at 2:08 PM
> To:  <us...@flume.apache.org>
> Subject:  Re: Avro source: could not find schema for event
> 
> Update:
> 
> So, I changed my serializer from
> org.apache.flume.sink.hdfs.AvroEventSerializer$Builder to avro_event, and this
> started working.  Well, working-ish, the data is a little funky but it¹s
> arriving, being delivered to HDFS, and I can pull a file and examine it
> manually.
> 
> I seem to remember that I had the former based on some things I read about not
> having to specify a schema, since the schema is embedded in the avro data.
> 
> So I¹m confused, it seems that my previous configuration should have worked
> without any special attention to the schema, but I got complaints that the
> schema couldn¹t be found.
> 
> If anyone could shed a bit of light here, it would be much appreciated.
> 
> From:  Justin Ryan <ju...@ziprealty.com>
> Reply-To:  <us...@flume.apache.org>
> Date:  Monday, February 29, 2016 at 2:52 PM
> To:  "user@flume.apache.org" <us...@flume.apache.org>
> Subject:  Avro source: could not find schema for event
> 
> Hiya,
> 
> I¹ve got a fairly simply flume agent pulling events from kafka and landing
> them in HDFS.  For plain text messages, this works fine.
> 
> I created a topic specifically for the purpose of testing sending avro
> messages through kafka to land in HDFS, which I¹m having some trouble with.
> 
> I noted from 
> https://thisdataguy.com/2014/07/28/avro-end-to-end-in-hdfs-part-2-flume-setup/
> the example of flume¹s default avro schema[0], which will do for my testing,
> and set up my python-avro producer to send messages with this schema.
> Unfortunately, I still have flume looping this message in its¹ log:
> 
>   org.apache.flume.FlumeException: Could not find schema for event
> 
> I¹m running out of assumptions to rethink / verify here, would appreciate any
> guidance on what I may be missing..
> 
> Thanks in advance,
> 
> Justin
> 
> [0] {
>  "type": "record",
>  "name": "Event",
>  "fields": [{
>    "name": "headers",
>    "type": {
>      "type": "map",
>      "values": "string"
>    }
>  }, {
>    "name": "body",
>    "type": "bytes"
>  }]
> }
> 




Re: Avro source: could not find schema for event

Posted by Hari Shreedharan <hs...@cloudera.com>.
You can use a URL (on HDFS/HTTP), that points to the schema:
https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/AvroEventSerializer.java#L70

Use that URL to store your schema for the event, so you don't have to add
it to the event itself.

Avro schema is only embedded in the files and not in event data, so we need
to make sure we write to the correct file based on the event's own schema.
avro_event works because we write the events out in a fixed schema (not the
event's schema itself).


Thanks,
Hari

On Tue, Mar 8, 2016 at 1:05 PM, Justin Ryan <ju...@ziprealty.com> wrote:

> Hiya folks, still struggling with this, is anyone on the list familiar
> with AvroEventSerializer$Builder ?
>
> While I have gotten past my outright failure, I’ve only done so by
> adopting a fairly inflexible schema, which seems counter to the goal of
> using avro.  Particularly frustrating is that flume simply needs to pass
> the existing message along, though I understand it likely needs to grok to
> separate messages.  I can’t even find Kafka consumer code which is capable
> of being schema-aware.
>
> From: Justin Ryan <ju...@ziprealty.com>
> Reply-To: <us...@flume.apache.org>
> Date: Thursday, March 3, 2016 at 2:08 PM
> To: <us...@flume.apache.org>
> Subject: Re: Avro source: could not find schema for event
>
> Update:
>
> So, I changed my serializer from
> org.apache.flume.sink.hdfs.AvroEventSerializer$Builder to avro_event, and
> this started working.  Well, working-ish, the data is a little funky but
> it’s arriving, being delivered to HDFS, and I can pull a file and examine
> it manually.
>
> I seem to remember that I had the former based on some things I read about
> not having to specify a schema, since the schema is embedded in the avro
> data.
>
> So I’m confused, it seems that my previous configuration should have
> worked without any special attention to the schema, but I got complaints
> that the schema couldn’t be found.
>
> If anyone could shed a bit of light here, it would be much appreciated.
>
> From: Justin Ryan <ju...@ziprealty.com>
> Reply-To: <us...@flume.apache.org>
> Date: Monday, February 29, 2016 at 2:52 PM
> To: "user@flume.apache.org" <us...@flume.apache.org>
> Subject: Avro source: could not find schema for event
>
> Hiya,
>
> I’ve got a fairly simply flume agent pulling events from kafka and landing
> them in HDFS.  For plain text messages, this works fine.
>
> I created a topic specifically for the purpose of testing sending avro
> messages through kafka to land in HDFS, which I’m having some trouble with.
>
> I noted from
> https://thisdataguy.com/2014/07/28/avro-end-to-end-in-hdfs-part-2-flume-setup/
> the example of flume’s default avro schema[0], which will do for my
> testing, and set up my python-avro producer to send messages with this
> schema.  Unfortunately, I still have flume looping this message in its’ log:
>
>   org.apache.flume.FlumeException: Could not find schema for event
>
> I’m running out of assumptions to rethink / verify here, would appreciate
> any guidance on what I may be missing..
>
> Thanks in advance,
>
> Justin
>
> [0] {
>  "type": "record",
>  "name": "Event",
>  "fields": [{
>    "name": "headers",
>    "type": {
>      "type": "map",
>      "values": "string"
>    }
>  }, {
>    "name": "body",
>    "type": "bytes"
>  }]
> }
>
>

Re: Avro source: could not find schema for event

Posted by Justin Ryan <ju...@ziprealty.com>.
Hiya folks, still struggling with this, is anyone on the list familiar with
AvroEventSerializer$Builder ?

While I have gotten past my outright failure, I¹ve only done so by adopting
a fairly inflexible schema, which seems counter to the goal of using avro.
Particularly frustrating is that flume simply needs to pass the existing
message along, though I understand it likely needs to grok to separate
messages.  I can¹t even find Kafka consumer code which is capable of being
schema-aware.

From:  Justin Ryan <ju...@ziprealty.com>
Reply-To:  <us...@flume.apache.org>
Date:  Thursday, March 3, 2016 at 2:08 PM
To:  <us...@flume.apache.org>
Subject:  Re: Avro source: could not find schema for event

Update:

So, I changed my serializer from
org.apache.flume.sink.hdfs.AvroEventSerializer$Builder to avro_event, and
this started working.  Well, working-ish, the data is a little funky but
it¹s arriving, being delivered to HDFS, and I can pull a file and examine it
manually.

I seem to remember that I had the former based on some things I read about
not having to specify a schema, since the schema is embedded in the avro
data.

So I¹m confused, it seems that my previous configuration should have worked
without any special attention to the schema, but I got complaints that the
schema couldn¹t be found.

If anyone could shed a bit of light here, it would be much appreciated.

From:  Justin Ryan <ju...@ziprealty.com>
Reply-To:  <us...@flume.apache.org>
Date:  Monday, February 29, 2016 at 2:52 PM
To:  "user@flume.apache.org" <us...@flume.apache.org>
Subject:  Avro source: could not find schema for event

Hiya,

I¹ve got a fairly simply flume agent pulling events from kafka and landing
them in HDFS.  For plain text messages, this works fine.

I created a topic specifically for the purpose of testing sending avro
messages through kafka to land in HDFS, which I¹m having some trouble with.

I noted from 
https://thisdataguy.com/2014/07/28/avro-end-to-end-in-hdfs-part-2-flume-setu
p/ the example of flume¹s default avro schema[0], which will do for my
testing, and set up my python-avro producer to send messages with this
schema.  Unfortunately, I still have flume looping this message in its¹ log:

  org.apache.flume.FlumeException: Could not find schema for event

I¹m running out of assumptions to rethink / verify here, would appreciate
any guidance on what I may be missing..

Thanks in advance,

Justin

[0] {
 "type": "record",
 "name": "Event",
 "fields": [{
   "name": "headers",
   "type": {
     "type": "map",
     "values": "string"
   }
 }, {
   "name": "body",
   "type": "bytes"
 }]
}




Re: Avro source: could not find schema for event

Posted by Justin Ryan <ju...@ziprealty.com>.
Update:

So, I changed my serializer from
org.apache.flume.sink.hdfs.AvroEventSerializer$Builder to avro_event, and
this started working.  Well, working-ish, the data is a little funky but
it¹s arriving, being delivered to HDFS, and I can pull a file and examine it
manually.

I seem to remember that I had the former based on some things I read about
not having to specify a schema, since the schema is embedded in the avro
data.

So I¹m confused, it seems that my previous configuration should have worked
without any special attention to the schema, but I got complaints that the
schema couldn¹t be found.

If anyone could shed a bit of light here, it would be much appreciated.

From:  Justin Ryan <ju...@ziprealty.com>
Reply-To:  <us...@flume.apache.org>
Date:  Monday, February 29, 2016 at 2:52 PM
To:  "user@flume.apache.org" <us...@flume.apache.org>
Subject:  Avro source: could not find schema for event

Hiya,

I¹ve got a fairly simply flume agent pulling events from kafka and landing
them in HDFS.  For plain text messages, this works fine.

I created a topic specifically for the purpose of testing sending avro
messages through kafka to land in HDFS, which I¹m having some trouble with.

I noted from 
https://thisdataguy.com/2014/07/28/avro-end-to-end-in-hdfs-part-2-flume-setu
p/ the example of flume¹s default avro schema[0], which will do for my
testing, and set up my python-avro producer to send messages with this
schema.  Unfortunately, I still have flume looping this message in its¹ log:

  org.apache.flume.FlumeException: Could not find schema for event

I¹m running out of assumptions to rethink / verify here, would appreciate
any guidance on what I may be missing..

Thanks in advance,

Justin

[0] {
 "type": "record",
 "name": "Event",
 "fields": [{
   "name": "headers",
   "type": {
     "type": "map",
     "values": "string"
   }
 }, {
   "name": "body",
   "type": "bytes"
 }]
}